Introduction
The Azure Batch AI service has retired.
The at-scale training capabilities of Batch AI are available in Azure Machine Learning service. Along with many other machine learning capabilities, the Azure Machine Learning service includes a cloud-based managed compute target for training and batch scoring machine learning models. Azure Machine Learning service is a generally available service. This means that it comes with a committed SLA and various support plans to choose from. Pricing for using Azure infrastructure either through the Batch AI service or through the Azure Machine Learning service should not vary, as we only charge the price for the underlying compute in both cases.
Note
Use OpsRamp Azure Public cloud Integration to discover and collect metrics against Azure Batch AI Workspaces.Setup
To set up the OpsRamp Azure integration and discover the Azure service,
go to Azure Integration Discovery Profile and select Machine Learning Services Workspaces
.
Metrics
OpsRamp Metric | Metric Display Name | Unit | Aggregation Type | Description |
---|---|---|---|---|
azure_ml_services_workspaces_Active_Cores | Active Cores | Count | Average | Number of active cores. |
azure_ml_services_workspaces_Active_Nodes | Active Nodes | Count | Average | Number of active nodes. |
azure_ml_services_workspaces_Cancel_Requested_Runs | Cancel Requested Runs | Count | Total | Number of runs where cancel was requested for this workspace. |
azure_ml_services_workspaces_Cancelled_Runs | Cancelled Runs | Count | Total | Number of runs cancelled for this workspace. |
azure_ml_services_workspaces_Completed_Runs | Completed Runs | Count | Total | Number of runs completed successfully for this workspace. |
azure_ml_services_workspaces_CpuUtilization | CpuUtilization | Count | Average | Percentage of memory utilization on a CPU node. |
azure_ml_services_workspaces_Errors | Errors | Count | Total | Number of run errors in this workspace. |
azure_ml_services_workspaces_Failed_Runs | Failed Runs | Count | Total | Number of runs failed for this workspace. |
azure_ml_services_workspaces_Finalizing_Runs | Finalizing Runs | Count | Total | Number of runs entered finalizing state for this workspace. |
azure_ml_services_workspaces_GpuUtilization | GpuUtilization | Count | Average | Percentage of memory utilization on a GPU node. |
azure_ml_services_workspaces_Idle_Cores | Idle Cores | Count | Average | Number of idle cores. |
azure_ml_services_workspaces_Idle_Nodes | Idle Nodes | Count | Average | Number of idle nodes. |
azure_ml_services_workspaces_Leaving_Cores | Leaving Cores | Count | Average | Number of leaving cores. |
azure_ml_services_workspaces_Model_Deploy_Failed | Model Deploy Failed | Count | Total | Number of model deployments that failed in this workspace. |
azure_ml_services_workspaces_Model_Deploy_Started | Model Deploy Started | Count | Total | Number of model deployments started in this workspace. |
azure_ml_services_workspaces_Model_Deploy_Succeeded | Model Deploy Succeeded | Count | Total | Number of model deployments that succeeded in this workspace. |
azure_ml_services_workspaces_Model_Register_Failed | Model Register Failed | Count | Total | Number of model registrations that failed in this workspace. |
azure_ml_services_workspaces_Model_Register_Succeeded | Model Register Succeeded | Count | Total | Number of model registrations that succeeded in this workspace. |
azure_ml_services_workspaces_Not_Responding_Runs | Not Responding Runs | Count | Total | Number of runs not responding for this workspace. |
azure_ml_services_workspaces_Not_Started_Runs | Not Started Runs | Count | Total | Number of runs in Not Started state for this workspace. |
azure_ml_services_workspaces_Preempted_Cores | Preempted Cores | Count | Average | Number of preempted cores. |
azure_ml_services_workspaces_Preempted_Nodes | Preempted Nodes | Count | Average | Number of preempted nodes. |
azure_ml_services_workspaces_Preparing_Runs | Preparing Runs | Count | Total | Number of runs that are preparing for this workspace. |
azure_ml_services_workspaces_Provisioning_Runs | Provisioning Runs | Count | Total | Number of runs that are provisioning for this workspace. |
azure_ml_services_workspaces_Queued_Runs | Queued Runs | Count | Total | Number of runs that are queued for this workspace. |
azure_ml_services_workspaces_Quota_Utilization_Percentage | Quota Utilization Percentage | Count | Average | Percent of quota utilized. |
azure_ml_services_workspaces_Started_Runs | Started Runs | Count | Total | Number of runs running for this workspace. |
azure_ml_services_workspaces_Starting_Runs | Starting Runs | Count | Total | Number of runs started for this workspace. |
azure_ml_services_workspaces_Total_Cores | Total Cores | Count | Average | Number of total cores. |
azure_ml_services_workspaces_Total_Nodes | Total Nodes | Count | Average | Number of total nodes. |
azure_ml_services_workspaces_Unusable_Cores | Unusable Cores | Count | Average | Number of unusable cores. |
azure_ml_services_workspaces_Unusable_Nodes | Unusable Nodes | Count | Average | Number of unusable nodes. |
azure_ml_services_workspaces_Warnings | Warnings | Count | Total | Number of run warnings in this workspace. |
Event support
- Supported: Azure events for Azure Machine Learning Services Workspaces
- Configure Azure Events in OpsRamp Azure Integration Discovery Profile.