Managing Alert Correlation

Introduction

Alert Correlation is the process where similar alerts are grouped together as an Inference to reduce unnecessary noise created by individual alerts. You can manage an Inference rather than addressing individual alerts thereby reducing the effort to sift through multiple alerts.

Prerequisite

To access alert correlation policies, OpsQ View and OpsQ Manage permissions are required.
To create an alert correlation policy, Partner Administrator or Client Administrator roles are required.

Creating alert correlation policies

To create an alert correlation policy:

From All Clients, select a client.
Go to Setup > Alerts > Alert Correlation and click the + Add button.
From CREATE ALERT CORRELATION POLICY, provide the policy Name, select the Client from the drop-down list and then select the required Mode from the drop-down list.
Configure the Filter Criteria to select resources whose alerts will match the policy.
1. Turn ON the toggle button.
2. From the displayed alert conditions, choose from ANY or ALL of the defined conditions to apply as a filter for the alerts.
3. Add conditions for an alert based on the alert properties listed:
  1. Select Native Attributes to filter resources based on predefined attributes.
  2. Select the required attribute, an operator from the drop-down list, and then provide the value.
    Note: Click + to add additional filter criteria.
From the Policy Definition section, configure the following:
1. Provide the Inference subject.
- You can use alerts and resource tokens to configure the Inference subject.
- If the subject is NOT provided, then OpsRamp considers the subject of the first alert as the Inference subject. From Correlate using Time section, select either Alert sequence recommended by machine learning model or Within Time window. If you select Within Time window option, select the time from the drop-down list. To further reinforce the correlation, upload a CSV file or configure Topology. Click +Alert Similarity Rule, select the attribute, and the appropriate matching condition from the drop-down lists.

Editing alert correlation policy

To edit the alert correlation policy:

From All Clients, select a client.
Go to Setup >Alerts > Alert Correlation. is displayed.
From the ALERT CORRELATION POLICIES page, click on the required alert correlation policy name.
click Edit and configure the policy details.
Click Save.

Changing alert correlation policy state

To change the mode of an alert correlation policy:

From All Clients, select a client.
Go to Setup > Alert Management > Alert Correlation. The Alert Correlation Policy page is displayed with the list of all Alert Correlation Policies created.
From the Alert Correlation Policy page, select the desired mode from Mode drop-down menu. The selected mode is displayed in the Mode column.

Deleting alert correlation policy

If required, you can delete the alert correlation policy to remove from the system. When deleted, the correlation of alerts getting newly ingested to the system and matching the deleted alert correlation policy does not happen. Alert Correlation Policies are deleted in the following situations:

The device/resource generating the alerts is unavailable.
You do not want to correlate the alerts.

To delete the alert correlation policy:

From All Clients, select a client.
Go to Setup > Alert Management > Alert Correlation.
From ALERT CORRELATION POLICIES LIST, select the checkbox of desired policy name and click Delete.
From the confirmation popup, click Yes to delete. The selected alert correlation policy gets deleted.

Defining precedence

Precedence determines the order of execution for an alert correlation policy. For example, if a VMware is part of agent status alert correlation policy and Network outage alert correlation policy, a user can determine which alert correlation policy should execute first to correlate alerts from the VMware.

To determine the precedence:

From All Clients and from the displayed list select a client.
Go to Setup > Alerts > Alert Correlation.
Drag and place the Inference in the appropriate row to adjust the order.
The numbers in alert correlation policy Precedence column change accordingly.

Viewing alert sequences

The Alert Sequence Clusters window helps you to visualize the detected alert sequences in your environment. You can view the alert sequences detected from the existing alert data and sequences related to an Inference.

These sequences are unmodified alert sequences fetched from the existing alert data. You can view the alert sequences detected from the existing alert data and sequences related to an Inference.

OpsRamp groups similar alert sequences together and provides a count for each sequence which helps you to easily visualize the alert sequences and the number of times alerts are triggered in a certain sequence.

The Alert sequence clusters window serves as a verification of ML correlation. For example, if ML correlates alerts cpu.utilization and system.ping together, you can use the Alert Sequence Clusters window to find the sequences that have cpu.utilization and system.ping together.

To view the alert sequences detected from existing alert data:

From All Clients, select a client.
Go to Setup > Alert Management > Alert Correlation.
Click on an ML-based alert correlation policy. Note: You can easily identify an ML-based alert correlation policy.
The ML Status against the policy contains a status like Training Started, Ready, etc.
From the Policy Definition field, click Detected alert sequence patterns in alert data.

To view alert sequences related to an Inference:

Click All Clients, select a client.
Go to Alerts.
Click on the required Inference name. Alert Details page page is displayed.
Click Correlated Alerts tab. List of correlated alerts appear.
Click Show detected alert sequence patterns.

Removing alerts from an inference

You can remove alerts from an Inference. The alerts can be removed from either the Quick view window or the Alert Details page.

For example, if you do not want an alert to be correlated, you can remove an alert from the Inference. The removed alert then appears on the alerts browser as an individual alert.

Important

If an Inference has two correlated alerts, removing one correlated alert makes both the alerts as individual alerts and the Inference is automatically correlated.

To remove alerts from the quick view:

On the Alerts Browser page, provide the alert ID in the search box.
The alert is displayed on the Browser page along with the number of correlated alerts.
Click on the number adjacent to the alert subject.
Number of Correlated Alerts
Select the required alert and then click Remove.
Number of Correlated Alerts

The alert is removed from the Inference. A comment appears in the Details tab as shown in the below screenshot.

Viewing inference statistics

Inference Stats widget displays the statistics of Inferences generated within a partner or client.

The widget has the following information:

Total Events: Refers to the total number of events generated.
Total Alerts: Refers to the total number of alerts created after ingestion in OpsRamp.
Total Inferences: Refers to the total number of Inferences generated.
Total Correlated Alerts: Refers to the total number of alerts correlated.
Volume Optimized: Refers to the percentage of reduction in alerts volume due to alert correlation.

Creating Inference Stats widgets

To create an Inference Stats widget:

From All Clients, select a client.
Go to Dashboard > +Add Widget.
From OTHER PREDEFINED WIDGET, click Inference Stats.
Configure the following parameters:
- Time Range: Filter for Inferences triggered within a certain time span.
- Default time span is Last 4 hours.
- Refresh every: Refers to the time frequency at which the Widget should refresh and display the recent data.
- Default refresh time is 5 minutes.
- Inference Stats: Refers to the mode of inferences that must be included in the widget
  - Select Enabled policies only to view the statistics of enabled (ON mode) inferences.
    - If this mode is selected, then the total number of inferences and the total number of correlated alerts created from the enabled correlation policies appear on the widget.
    - In this widget, the volume optimization is based on inferences and correlated alerts created from the enabled correlation policies.
  - Select Enabled and Observed policies to view statistics of enabled and observed inferences.
    - If this mode is selected, then the total number of inferences and the total number of correlated alerts created from both the enabled and observed correlation policies appear on the widget.
    - In this widget, the volume optimization is based on the inferences and correlated alerts created from both the enabled and observed correlation policies.
- Widget Title: Refers to the name of a Widget
- Select the Chart Style and click Save.

The Inference Stats widget is created and appears on the dashboard.

Scenarios

Correlate alerts due to an unexpected cause

The DevOps team just rolled out a new code update to an app running on multiple servers. The update has a bug and is causing high memory utilization issues on each app instance generating multiple Critical and Warning alerts. These alerts are causing multiple issues across the infrastructure. The DevOps team is receiving multiple alerts one after the other making the team difficult to diagnose the problem.

Solution:

Define an Alert Correlation Policy to correlate alerts that have similar content.
Configure an alert condition on Alert Source attribute to filter alerts that generate from the same app name.
Alerts that generate within the specific time span possessing the app name are correlated to form an Inference.

A customer restarts the agent on VMware resources. As a result, multiple alerts on agent status are generated causing high amount of alert noise. The customer wants the agent status alerts generated within 1 hour to appear as a single alert in order to reduce the alert noise.

Solution:
Define an Alert Correlation policy to correlate the agent status alerts on VMware resources generated with a span of 1 hour. Provide the metric that monitors the agent status of the resources.

Filter for VMware resources on which the Alert Correlation Policy should be applied using Native/Custom attributes.
From the Policy Definition section, select time span from the Within Time window drop-down.
Click +Add Alert Similarity, select Alert Metric from the attribute drop-down list, select the operator, and provide the value as Agent Status. Alerts that generate within the specific time span are correlated to form an Inference.

What to do next