AWS Elastic MapReduce

Introduction

Amazon EMR is a managed cluster platform that simplifies running big data frameworks (such as Apache Hadoop and Apache Spark) on AWS to process and analyze vast amounts of data.

By using these frameworks and related open-source projects (such as Apache Hive and Apache Pig), you can:

Process data for analytics purposes and business intelligence workloads.
Use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases. For example, Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Note

Use the OpsRamp AWS public cloud integration to discover and collect metrics against the AWS service.

Setup

To set up the OpsRamp AWS integration and discover the AWS service, go to AWS Integration Discovery Profile and select EMR.

Metrics


OpsRamp Metric	Metric Display Name	Unit	Aggregation Type	Description
aws_elasticmapreduce_IsIdle	IsIdle	Count	Average	Indicates that a cluster is no longer performing work, but is still alive and accruing charges. Set to 1 if no tasks and jobs are running; set to 0 otherwise.
aws_elasticmapreduce_ContainerAllocated	ContainerAllocated	Count	Average	Number of resource containers allocated by the ResourceManager.
aws_elasticmapreduce_ContainerReserved	ContainerReserved	Count	AVERAGE	Number of containers reserved.
aws_elasticmapreduce_ContainerPending	ContainerPending	Count	Average	Number of containers in the queue that have not yet been allocated.
aws_elasticmapreduce_AppsCompleted	AppsCompleted	Count	AVERAGE	Number of applications submitted to YARN (Hadoop generation)) that have completed.
aws_elasticmapreduce_AppsKilled	AppsKilled	Count	AVERAGE	Number of applications submitted to YARN (Hadoop generation)) that have been killed.
aws_elasticmapreduce_AppsPending	AppsPending	Count	AVERAGE	Number of applications submitted to YARN (Hadoop generation) that are in a pending state.
aws_elasticmapreduce_AppsRunning	AppsRunning	Count	AVERAGE	Number of applications submitted to YARN (Hadoop generation) that are running.
aws_elasticmapreduce_AppsSubmitted	AppsSubmitted	Count	AVERAGE	Number of applications submitted to YARN (Hadoop generation).
aws_elasticmapreduce_CapacityRemainingGB	CapacityRemainingGB	Bytes	AVERAGE	Amount of remaining HDFS disk capacity.
aws_elasticmapreduce_CoreNodesRunning	CoreNodesRunning	Count	AVERAGE	Number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists.
aws_elasticmapreduce_CoreNodesPending	CoreNodesPending	Count	AVERAGE	Number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests.
aws_elasticmapreduce_CorruptBlocks	CorruptBlocks	Count	AVERAGE	Gives the big picture about what is going on with cluster and can provide insight into what is causing the slow down in processing.
aws_elasticmapreduce_HDFSUtilization	HDFSUtilization	Percent	AVERAGE	Percentage of HDFS storage currently used.
aws_elasticmapreduce_HDFSBytesRead	HDFSBytesRead	Bytes Read	AVERAGE	Number of bytes read from HDFS.
aws_elasticmapreduce_HDFSBytesWritten	HDFSBytesWritten	Bytes Written	AVERAGE	Number of bytes written to HDFS.
aws_elasticmapreduce_LiveDataNodes	LiveDataNodes	Percent	AVERAGE	Percentage of data nodes that are receiving work from Hadoop.
aws_elasticmapreduce_MRTotalNodes	MRTotalNodes	Count	AVERAGE	Number of nodes presently available to MapReduce jobs.
aws_elasticmapreduce_MRActiveNodes	MRActiveNodes	Count	AVERAGE	Number of nodes presently running MapReduce tasks or jobs.
aws_elasticmapreduce_MRLostNodes	MRLostNodes	Count	AVERAGE	Number of nodes allocated to MapReduce that have been marked in a LOST state.
aws_elasticmapreduce_MRUnhealthyNodes	MRUnhealthyNodes		AVERAGE	Number of nodes available to MapReduce jobs marked in an UNHEALTHY state.
aws_elasticmapreduce_MRDecommissionedNodes	MRDecommissionedNodes	Count	AVERAGE	Number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state.
aws_elasticmapreduce_MRRebootedNodes	MRRebootedNodes	Count	AVERAGE	Number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state.
aws_elasticmapreduce_S3BytesWritten	S3BytesWritten	Bytes Written	AVERAGE	Number of bytes written to Amazon S3.
aws_elasticmapreduce_S3BytesRead	S3BytesRead	Bytes Read	AVERAGE	Number of bytes read from Amazon S3.
aws_elasticmapreduce_MissingBlocks	MissingBlocks	Count	AVERAGE	Number of blocks in which HDFS has no replicas. These might be corrupt blocks.
aws_elasticmapreduce_TotalLoad	TotalLoad	Count	AVERAGE	Total number of concurrent data transfers.
aws_elasticmapreduce_MemoryTotalMB	MemoryTotalMB	Bytes	AVERAGE	Total amount of memory in the cluster.
aws_elasticmapreduce_MemoryReservedMB	MemoryReservedMB	Bytes	AVERAGE	Amount of memory reserved.
aws_elasticmapreduce_MemoryAvailableMB	MemoryAvailableMB	Bytes	AVERAGE	Amount of memory available to be allocated.
aws_elasticmapreduce_MemoryAllocatedMB	MemoryAllocatedMB	Bytes	AVERAGE	Amount of memory allocated to the cluster.
aws_elasticmapreduce_PendingDeletionBlocks	PendingDeletionBlocks	Count	AVERAGE	Number of blocks marked for deletion.
aws_elasticmapreduce_UnderReplicatedBlocks	UnderReplicatedBlocks	Count	AVERAGE	Number of blocks that need to be replicated one or more times.
aws_elasticmapreduce_dfs_FSNamesystem_PendingReplicationBlocks	dfs.FSNamesystem.PendingReplicationBlocks	Count	AVERAGE	Status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests.
aws_elasticmapreduce_ContainerPendingRatio	Container Pending Ratio	Count	Average	Ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior.
aws_elasticmapreduce_AppsFailed	Apps Failed	Count	Average	Number of applications submitted to YARN that have failed to complete.
aws_elasticmapreduce_YARNMemoryAvailablePercentage	YARN Memory Available Percentage	Percent	Average	Percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage.
cloud.instance.state	Status/State	n/a	n/a	n/a

Event support

CloudTrail event support

Supported
Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

Supported
Configurable in OpsRamp AWS Integration Discovery Profile.

External reference

What Is Amazon EMR?