Introduction
Amazon EMR is a managed cluster platform that simplifies running big data frameworks (such as Apache Hadoop and Apache Spark) on AWS to process and analyze vast amounts of data.
By using these frameworks and related open-source projects (such as Apache Hive and Apache Pig), you can:
- Process data for analytics purposes and business intelligence workloads.
- Use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases. For example, Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
Note
Use the OpsRamp AWS public cloud integration to discover and collect metrics against the AWS service.Setup
To set up the OpsRamp AWS integration and discover the AWS service,
go to AWS Integration Discovery Profile and select EMR
.
Metrics
OpsRamp Metric | Metric Display Name | Unit | Aggregation Type | Description |
---|---|---|---|---|
aws_elasticmapreduce_IsIdle | IsIdle | Count | Average | Indicates that a cluster is no longer performing work, but is still alive and accruing charges. Set to 1 if no tasks and jobs are running; set to 0 otherwise. |
aws_elasticmapreduce_ContainerAllocated | ContainerAllocated | Count | Average | Number of resource containers allocated by the ResourceManager. |
aws_elasticmapreduce_ContainerReserved | ContainerReserved | Count | AVERAGE | Number of containers reserved. |
aws_elasticmapreduce_ContainerPending | ContainerPending | Count | Average | Number of containers in the queue that have not yet been allocated. |
aws_elasticmapreduce_AppsCompleted | AppsCompleted | Count | AVERAGE | Number of applications submitted to YARN (Hadoop generation)) that have completed. |
aws_elasticmapreduce_AppsKilled | AppsKilled | Count | AVERAGE | Number of applications submitted to YARN (Hadoop generation)) that have been killed. |
aws_elasticmapreduce_AppsPending | AppsPending | Count | AVERAGE | Number of applications submitted to YARN (Hadoop generation) that are in a pending state. |
aws_elasticmapreduce_AppsRunning | AppsRunning | Count | AVERAGE | Number of applications submitted to YARN (Hadoop generation) that are running. |
aws_elasticmapreduce_AppsSubmitted | AppsSubmitted | Count | AVERAGE | Number of applications submitted to YARN (Hadoop generation). |
aws_elasticmapreduce_CapacityRemainingGB | CapacityRemainingGB | Bytes | AVERAGE | Amount of remaining HDFS disk capacity. |
aws_elasticmapreduce_CoreNodesRunning | CoreNodesRunning | Count | AVERAGE | Number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists. |
aws_elasticmapreduce_CoreNodesPending | CoreNodesPending | Count | AVERAGE | Number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. |
aws_elasticmapreduce_CorruptBlocks | CorruptBlocks | Count | AVERAGE | Gives the big picture about what is going on with cluster and can provide insight into what is causing the slow down in processing. |
aws_elasticmapreduce_HDFSUtilization | HDFSUtilization | Percent | AVERAGE | Percentage of HDFS storage currently used. |
aws_elasticmapreduce_HDFSBytesRead | HDFSBytesRead | Bytes Read | AVERAGE | Number of bytes read from HDFS. |
aws_elasticmapreduce_HDFSBytesWritten | HDFSBytesWritten | Bytes Written | AVERAGE | Number of bytes written to HDFS. |
aws_elasticmapreduce_LiveDataNodes | LiveDataNodes | Percent | AVERAGE | Percentage of data nodes that are receiving work from Hadoop. |
aws_elasticmapreduce_MRTotalNodes | MRTotalNodes | Count | AVERAGE | Number of nodes presently available to MapReduce jobs. |
aws_elasticmapreduce_MRActiveNodes | MRActiveNodes | Count | AVERAGE | Number of nodes presently running MapReduce tasks or jobs. |
aws_elasticmapreduce_MRLostNodes | MRLostNodes | Count | AVERAGE | Number of nodes allocated to MapReduce that have been marked in a LOST state. |
aws_elasticmapreduce_MRUnhealthyNodes | MRUnhealthyNodes | AVERAGE | Number of nodes available to MapReduce jobs marked in an UNHEALTHY state. | |
aws_elasticmapreduce_MRDecommissionedNodes | MRDecommissionedNodes | Count | AVERAGE | Number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state. |
aws_elasticmapreduce_MRRebootedNodes | MRRebootedNodes | Count | AVERAGE | Number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state. |
aws_elasticmapreduce_S3BytesWritten | S3BytesWritten | Bytes Written | AVERAGE | Number of bytes written to Amazon S3. |
aws_elasticmapreduce_S3BytesRead | S3BytesRead | Bytes Read | AVERAGE | Number of bytes read from Amazon S3. |
aws_elasticmapreduce_MissingBlocks | MissingBlocks | Count | AVERAGE | Number of blocks in which HDFS has no replicas. These might be corrupt blocks. |
aws_elasticmapreduce_TotalLoad | TotalLoad | Count | AVERAGE | Total number of concurrent data transfers. |
aws_elasticmapreduce_MemoryTotalMB | MemoryTotalMB | Bytes | AVERAGE | Total amount of memory in the cluster. |
aws_elasticmapreduce_MemoryReservedMB | MemoryReservedMB | Bytes | AVERAGE | Amount of memory reserved. |
aws_elasticmapreduce_MemoryAvailableMB | MemoryAvailableMB | Bytes | AVERAGE | Amount of memory available to be allocated. |
aws_elasticmapreduce_MemoryAllocatedMB | MemoryAllocatedMB | Bytes | AVERAGE | Amount of memory allocated to the cluster. |
aws_elasticmapreduce_PendingDeletionBlocks | PendingDeletionBlocks | Count | AVERAGE | Number of blocks marked for deletion. |
aws_elasticmapreduce_UnderReplicatedBlocks | UnderReplicatedBlocks | Count | AVERAGE | Number of blocks that need to be replicated one or more times. |
aws_elasticmapreduce_dfs_FSNamesystem_PendingReplicationBlocks | dfs.FSNamesystem.PendingReplicationBlocks | Count | AVERAGE | Status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests. |
aws_elasticmapreduce_ContainerPendingRatio | Container Pending Ratio | Count | Average | Ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior. |
aws_elasticmapreduce_AppsFailed | Apps Failed | Count | Average | Number of applications submitted to YARN that have failed to complete. |
aws_elasticmapreduce_YARNMemoryAvailablePercentage | YARN Memory Available Percentage | Percent | Average | Percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage. |
cloud.instance.state | Status/State | n/a | n/a | n/a |
Event support
CloudTrail event support
- Supported
- Configurable in OpsRamp AWS Integration Discovery Profile.
CloudWatch alarm support
- Supported
- Configurable in OpsRamp AWS Integration Discovery Profile.