Introduction
Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment.
SageMaker provides:
- An integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you do not have to manage servers.
- A common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment.
With native support for bring-your-own-algorithms and frameworks, Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows. Deploy a model into a secure and scalable environment by launching it with a single click from the Amazon SageMaker console. Training and hosting are billed by minutes of usage, with no minimum fees and no upfront commitments.
Amazon SageMaker Ground Truth
High-quality training datasets by using workers along with machine learning to create labeled datasets.
Amazon SageMaker Training
An Amazon SageMaker training job is an iterative process that teaches a model to make predictions by presenting examples from a training dataset. Typically, a training algorithm computes several metrics, such as training error and prediction accuracy. These metrics help diagnose whether the model is learning well and will generalize well for making predictions on unseen data. The training algorithm writes the values of these metrics to logs, which Amazon SageMaker monitors and sends to Amazon CloudWatch in real-time.
Amazon SageMaker Endpoint
Creates an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models.
Amazon SageMaker Transform Job
Use batch transform when you need to do the following:
- Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
- Get inferences from large datasets.
- Run inference when you do not need a persistent endpoint.
- Associate input records with inferences to assist the interpretation of results.
Setup
To set up the integration:
- Select SageMaker GroundTruth in AWS Integration Discovery Profile to discover AWS SageMaker GroundTruth.
- Select SageMaker Training in AWS Integration Discovery Profile to discover AWS SageMaker Training Job.
- Select SageMaker EndPoint in AWS Integration Discovery Profile to discover AWS SageMaker Endpoint.
- Select SageMaker Transform Job in AWS Integration Discovery Profile to discover AWS SageMaker Transform Job.
Metrics
GroundTruth metrics
OpsRamp Metric | Metric Display Name | Unit | Aggregation Type | Description |
---|---|---|---|---|
aws_sagemaker_labelingjobs_ActiveWorkers | ActiveWorkers | Count | Sum | Number of workers on a private work team performing a labeling job. |
aws_sagemaker_labelingjobs_JobsSucceeded | JobsSucceeded | None | Sum | Number of labeling jobs that succeeded. To get the total number of labeling jobs that succeeded. |
aws_sagemaker_labelingjobs_DatasetObjectsAutoAnnotated | DatasetObjectsAutoAnnotated | Count | Sum | Number of dataset objects auto-annotated in a labeling job. |
aws_sagemaker_labelingjobs_DatasetObjectsHumanAnnotated | DatasetObjectsHumanAnnotated | Count | Sum | Number of dataset objects annotated by a human in a labeling job. |
aws_sagemaker_labelingjobs_DatasetObjectsLabelingFailed | DatasetObjectsLabelingFailed | Count | Sum | Number of dataset objects that failed labeling in a labeling job. |
aws_sagemaker_labelingjobs_TotalDatasetObjectsLabeled | TotalDatasetObjectsLabeled | Count | Sum | Number of dataset objects labeled successfully in a labeling job. |
aws_sagemaker_labelingjobs_JobsStopped | JobsStopped | Count | Sum | Number of labeling jobs that were stopped. |
Training metrics
OpsRamp Metric | Metric Display Name | Unit | Aggregation Type | Description |
---|---|---|---|---|
aws_sagemaker_trainingjobs_CPUUtilization | CPUUtilization | Percent | Average | Percentage of CPU units used by the containers on an instance. |
aws_sagemaker_trainingjobs_MemoryUtilization | MemoryUtilization | Percent | Average | Percentage of memory used by the containers on an instance. |
aws_sagemaker_trainingjobs_GPUUtilization | GPUUtilization | Percent | Average | Percentage of GPU units used by the containers on an instance. |
aws_sagemaker_trainingjobs_GPUMemoryUtilization | GPUMemoryUtilization | Percent | Average | Percentage of GPU memory used by the containers on an instance. |
aws_sagemaker_trainingjobs_DiskUtilization | DiskUtilization | Percent | Average | Percentage of disk space used by the containers on an instance. |
Event support
CloudTrail event support
- Supported (Sagemaker GroundTruth, Training, Endpoint, Transform Job)
- Configurable in OpsRamp AWS Integration Discovery Profile.
CloudWatch alarm support
- Not Supported