Introduction

The Azure Batch AI service has retired.

The at-scale training capabilities of Batch AI are available in Azure Machine Learning service. Along with many other machine learning capabilities, the Azure Machine Learning service includes a cloud-based managed compute target for training and batch scoring machine learning models. Azure Machine Learning service is a generally available service. This means that it comes with a committed SLA and various support plans to choose from. Pricing for using Azure infrastructure either through the Batch AI service or through the Azure Machine Learning service should not vary, as we only charge the price for the underlying compute in both cases.

Setup

To set up the OpsRamp Azure integration and discover the Azure service, go to Azure Integration Discovery Profile and select Machine Learning Services Workspaces.

Metrics

OpsRamp MetricMetric Display NameUnitAggregation TypeDescription
azure_ml_services_workspaces_Active_CoresActive CoresCountAverageNumber of active cores.
azure_ml_services_workspaces_Active_NodesActive NodesCountAverageNumber of active nodes.
azure_ml_services_workspaces_Cancel_Requested_RunsCancel Requested RunsCountTotalNumber of runs where cancel was requested for this workspace.
azure_ml_services_workspaces_Cancelled_RunsCancelled RunsCountTotalNumber of runs cancelled for this workspace.
azure_ml_services_workspaces_Completed_RunsCompleted RunsCountTotalNumber of runs completed successfully for this workspace.
azure_ml_services_workspaces_CpuUtilizationCpuUtilizationCountAveragePercentage of memory utilization on a CPU node.
azure_ml_services_workspaces_ErrorsErrorsCountTotalNumber of run errors in this workspace.
azure_ml_services_workspaces_Failed_RunsFailed RunsCountTotalNumber of runs failed for this workspace.
azure_ml_services_workspaces_Finalizing_RunsFinalizing RunsCountTotalNumber of runs entered finalizing state for this workspace.
azure_ml_services_workspaces_GpuUtilizationGpuUtilizationCountAveragePercentage of memory utilization on a GPU node.
azure_ml_services_workspaces_Idle_CoresIdle CoresCountAverageNumber of idle cores.
azure_ml_services_workspaces_Idle_NodesIdle NodesCountAverageNumber of idle nodes.
azure_ml_services_workspaces_Leaving_CoresLeaving CoresCountAverageNumber of leaving cores.
azure_ml_services_workspaces_Model_Deploy_FailedModel Deploy FailedCountTotalNumber of model deployments that failed in this workspace.
azure_ml_services_workspaces_Model_Deploy_StartedModel Deploy StartedCountTotalNumber of model deployments started in this workspace.
azure_ml_services_workspaces_Model_Deploy_SucceededModel Deploy SucceededCountTotalNumber of model deployments that succeeded in this workspace.
azure_ml_services_workspaces_Model_Register_FailedModel Register FailedCountTotalNumber of model registrations that failed in this workspace.
azure_ml_services_workspaces_Model_Register_SucceededModel Register SucceededCountTotalNumber of model registrations that succeeded in this workspace.
azure_ml_services_workspaces_Not_Responding_RunsNot Responding RunsCountTotalNumber of runs not responding for this workspace.
azure_ml_services_workspaces_Not_Started_RunsNot Started RunsCountTotalNumber of runs in Not Started state for this workspace.
azure_ml_services_workspaces_Preempted_CoresPreempted CoresCountAverageNumber of preempted cores.
azure_ml_services_workspaces_Preempted_NodesPreempted NodesCountAverageNumber of preempted nodes.
azure_ml_services_workspaces_Preparing_RunsPreparing RunsCountTotalNumber of runs that are preparing for this workspace.
azure_ml_services_workspaces_Provisioning_RunsProvisioning RunsCountTotalNumber of runs that are provisioning for this workspace.
azure_ml_services_workspaces_Queued_RunsQueued RunsCountTotalNumber of runs that are queued for this workspace.
azure_ml_services_workspaces_Quota_Utilization_PercentageQuota Utilization PercentageCountAveragePercent of quota utilized.
azure_ml_services_workspaces_Started_RunsStarted RunsCountTotalNumber of runs running for this workspace.
azure_ml_services_workspaces_Starting_RunsStarting RunsCountTotalNumber of runs started for this workspace.
azure_ml_services_workspaces_Total_CoresTotal CoresCountAverageNumber of total cores.
azure_ml_services_workspaces_Total_NodesTotal NodesCountAverageNumber of total nodes.
azure_ml_services_workspaces_Unusable_CoresUnusable CoresCountAverageNumber of unusable cores.
azure_ml_services_workspaces_Unusable_NodesUnusable NodesCountAverageNumber of unusable nodes.
azure_ml_services_workspaces_WarningsWarningsCountTotalNumber of run warnings in this workspace.

Event support

  • Supported: Azure events for Azure Machine Learning Services Workspaces
  • Configure Azure Events in OpsRamp Azure Integration Discovery Profile.

External reference