Alerting Service Condition Types
This document describes the implementation of all condition types in the alerting service. The alerting service provides a unified interface for generating alerts based on telemetry data, enabling proactive monitoring and issue detection.
Overview
The alerting service supports several types of conditions for triggering alerts:
- Threshold Condition: Triggers when a metric exceeds a threshold
- Frequency Condition: Triggers when events occur at a certain frequency
- Absence Condition: Triggers when no events are received for a period of time
- Change Condition: Triggers when a metric changes by a certain amount
- Custom Condition: Triggers based on custom logic
Condition Types
Threshold Condition
The threshold condition triggers an alert when a metric exceeds a threshold. It supports various comparison operators and aggregation functions.
Properties
- type:
AlertRuleConditionType.THRESHOLD
- metric: The metric to monitor (required)
- threshold: The threshold value (required)
- operator: The comparison operator (required)
gt
: Greater thanlt
: Less thaneq
: Equal tone
: Not equal toge
: Greater than or equal tole
: Less than or equal to
- timeWindow: The time window in seconds (optional)
- properties: Additional properties (optional)
- aggregation: The aggregation function (optional)
avg
: Average (default)max
: Maximummin
: Minimumsum
: Sumcount
: Countlast
: Last value
- aggregation: The aggregation function (optional)
Example
const thresholdCondition: AlertRuleCondition = {
type: AlertRuleConditionType.THRESHOLD,
metric: 'response_time',
threshold: 1000,
operator: 'gt',
timeWindow: 300, // 5 minutes
properties: {
aggregation: 'avg'
}
};
Implementation
The threshold condition is implemented as follows:
- Filter events within the time window if specified
- Extract metric values from events (from properties or measurements)
- Calculate the aggregate value based on the aggregation function
- Compare the aggregate value to the threshold using the specified operator
Frequency Condition
The frequency condition triggers an alert when events occur at a certain frequency. It counts the number of events within a time window and triggers if the count exceeds a threshold.
Properties
- type:
AlertRuleConditionType.FREQUENCY
- timeWindow: The time window in seconds (required)
- minCount: The minimum count of events (required)
Example
const frequencyCondition: AlertRuleCondition = {
type: AlertRuleConditionType.FREQUENCY,
timeWindow: 300, // 5 minutes
minCount: 5
};
Implementation
The frequency condition is implemented as follows:
- Filter events within the time window
- Count the number of events
- Compare the count to the minimum count
Absence Condition
The absence condition triggers an alert when no events are received for a period of time. It's useful for detecting when a service or component is down.
Properties
- type:
AlertRuleConditionType.ABSENCE
- timeWindow: The time window in seconds (required)
Example
const absenceCondition: AlertRuleCondition = {
type: AlertRuleConditionType.ABSENCE,
timeWindow: 300 // 5 minutes
};
Implementation
The absence condition is implemented as follows:
- Filter events within the time window
- Check if there are no events in the time window
Change Condition
The change condition triggers an alert when a metric changes by a certain amount. It supports both absolute and percentage changes.
Properties
- type:
AlertRuleConditionType.CHANGE
- metric: The metric to monitor (required)
- threshold: The threshold value (required)
- operator: The comparison operator (required)
- timeWindow: The time window in seconds (optional)
- properties: Additional properties (optional)
- usePercentage: Whether to use percentage change (optional, default: false)
Example
const changeCondition: AlertRuleCondition = {
type: AlertRuleConditionType.CHANGE,
metric: 'response_time',
threshold: 50,
operator: 'gt',
timeWindow: 300, // 5 minutes
properties: {
usePercentage: true
}
};
Implementation
The change condition is implemented as follows:
- Filter events within the time window if specified
- Extract metric values from events (from properties or measurements)
- Calculate the change between the first and last values
- Calculate the percentage change if usePercentage is true
- Compare the change to the threshold using the specified operator
Custom Condition
The custom condition allows for custom logic to evaluate events. It's useful for complex conditions that can't be expressed using the other condition types.
Properties
- type:
AlertRuleConditionType.CUSTOM
- evaluate: A function that evaluates events and returns a boolean (required)
Example
const customCondition: AlertRuleCondition = {
type: AlertRuleConditionType.CUSTOM,
evaluate: (events) => {
// Custom logic to evaluate events
return events.some(event =>
event.properties?.statusCode === 500 &&
event.properties?.endpoint === '/api/users'
);
}
};
Implementation
The custom condition is implemented by calling the evaluate function with the events.
Usage
Alert conditions are used in alert rules to define when an alert should be triggered. Multiple conditions can be combined in a single rule, and all conditions must be met for the alert to be triggered.
const rule: AlertRule = {
id: 'api-error-alert',
name: 'API Error Alert',
description: 'Alert on API error events',
severity: AlertSeverity.ERROR,
eventTypes: ['error'],
eventNames: ['api_error'],
conditions: [
{
type: AlertRuleConditionType.FREQUENCY,
timeWindow: 300, // 5 minutes
minCount: 3
},
{
type: AlertRuleConditionType.THRESHOLD,
metric: 'response_time',
threshold: 1000,
operator: 'gt',
timeWindow: 300, // 5 minutes
properties: {
aggregation: 'avg'
}
}
],
enabled: true
};
alerting.addRule(rule);
Benefits
The implementation of all condition types provides several benefits:
- Flexibility: Support for different types of conditions allows for flexible alert rules
- Customization: Custom conditions allow for complex logic that can't be expressed using the other condition types
- Aggregation: Support for different aggregation functions allows for more precise alerting
- Time Windows: Support for time windows allows for alerting based on recent events
- Comparison Operators: Support for different comparison operators allows for more precise alerting
Next Steps
The following steps are recommended to further improve the alerting service:
- Add More Condition Types: Add support for more condition types (trend, anomaly, etc.)
- Add More Aggregation Functions: Add support for more aggregation functions (median, percentile, etc.)
- Add Support for Multiple Metrics: Add support for conditions that involve multiple metrics
- Add Support for Composite Conditions: Add support for conditions that combine multiple conditions with logical operators (AND, OR, NOT)
- Add Support for Dynamic Thresholds: Add support for thresholds that are calculated dynamically based on historical data