用途
当包含告警触发器的模板创建执行后,该执行初始为等待中状态。如果告警触发器中设置的监控项达到告警阈值,执行状态则切换为运行中,并立即开始执行模板中定义后续任务,后续任务一般为自动解除告警的相关操作。应用场景举例,如当ECS实例的cpu使用率超过90%时,触发告警,自动执行重启该实例的操作。
注意
在告警触发器中,可设置监控项有两大类,分别是预装插件采集的和ECS原生自带的,关于如何区分可参见监控项说明。如需对云监控插件类采集的监控项进行监控,请您先为待监控实例安装插件,否则告警无法触发。插件安装方法:在云监控控制台的主机监控中选择待监控实例,单击点击安装即可。
限制
触发器有如下限制:
一个模板只允许有一个触发器动作。
触发器动作的任务必须定义在模板Tasks中的第一个任务。
被嵌套的模板(子模板)中不允许有触发器动作。
语法
YAML格式
Tasks: - Name: taskName1 # 任务名称 Action: 'ACS::AlarmTrigger' Properties: Namespace: 'acs_ecs_dashboard' # 必填,产品的数据命名空间。比如ecs产品。可选参数通过查询DescribeMetricMetaList接口获得。 MetricName: 'cpu_total' # 必填,监控项名称。比如当前消耗的总CPU百分比。可选参数通过查询DescribeMetricMetaList接口获得。 Statistics: 'Average' # 告警统计方法。如Average为统计某时间段平均值。可选参数通过查询DescribeMetricMetaList接口获得。 ComparisonOperator: 'GreaterThanThreshold' # 必填,阈值比较符。可选择比较类型有,GreaterThanOrEqualToThreshold:大于等于、GreaterThanThreshold:大于、LessThanOrEqualToThreshold:小于等于、LessThanThreshold:小于、NotEqualToThreshold:不等、GreaterThanYesterday:同比昨天时间上涨、LessThanYesterday:同比昨天时间下降、GreaterThanLastWeek:同比上周同一时间上涨、LessThanLastWeek:同比上周同一时间下降、GreaterThanLastPeriod:环比上周期上涨、LessThanLastPeriod:环比上周期下降。 Threshold: '90' # 告警阈值,比如cpu90%的总使用率。 Resources: '[{"resource":"_ALL"}]' # 必填,需要告警的资源。如[{"resource":"_ALL"}]为表示账号下所有资源,如指定具体实例为[{"instanceId":"i-bp123467zxcvb"}];如指定某实例上的磁盘分区[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}];指定实例上的多个磁盘分区,[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}] Times: 1 # 报警重复次数。 Interval: 60 # 告警规则的探测周期,单位为秒。默认为监控项的最小频率60s。 SilenceTime: 3600 # 通道沉默周期,单位为秒。默认86400秒(即1天)。监控数据持续超过报警规则阈值时,每个沉默周期内只发送1次报警通知。 Outputs: paraName1: Type: String ValueSelector: .key # 此处的.key表示获取json消息体中的某个key的值,后附json样式。具体即.instanceId会得到"i-abc12345zxcv",告警触发的事件对应消息体Json样式 { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-1390000****-exec-2130c0c073fa487098d3", "userId": "1390000****", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "1390000****" }
JSON格式(请参考YAML注释说明)
{ "Tasks": [ { "Name": "taskName1", "Action": "ACS::AlarmTrigger", "Properties": { "Namespace": "acs_ecs_dashboard", "MetricName": "cpu_total", "Statistics": "Average", "ComparisonOperator": "GreaterThanThreshold", "Threshold": "90", "Resources": "[{\"resource\":\"_ALL\"}]", "Times": 1, "Interval": 60, "SilenceTime": 3600 }, "Outputs": { "paraName1": { "Type": "String", "ValueSelector": ".key" } } } ] }
示例
在1分钟周期内,若被监控ECS实例的CPU总使用率超过阈值,则实例自动重启。
YAML格式
FormatVersion: OOS-2019-06-01 Description: en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed. zh-cn: 按tag在ECS实例CPU利用率超过阈值时执行实例重启。所选实例必须已安装云监控Agent。 name-en: ACS-ECS-RebootInstanceAtHighCpuByTags name-zh-cn: 按tag在ECS实例CPU利用率超过阈值时执行实例重启 categories: - alarm-trigger Parameters: tags: Type: Json Description: en: The tags to select ECS instances. zh-cn: 实例的标签。 AssociationProperty: Tags threshold: Type: Number Description: en: The CPU utilization threshold. zh-cn: CPU利用率阈值。 silenceTime: Type: Number Description: en: The silence time of alarm (seconds). zh-cn: 告警通道沉默周期(秒)。 Default: 60 OOSAssumeRole: Description: en: The RAM role to be assumed by OOS. zh-cn: OOS扮演的RAM角色。 Type: String Default: OOSServiceRole RamRole: '{{ OOSAssumeRole }}' Tasks: - Name: alarmTrigger Action: 'ACS::AlarmTrigger' Description: en: Set the CPU utilization alarm for ECS instance. zh-cn: 对ECS实例的CPU使用率进行监控。 Properties: Namespace: acs_ecs_dashboard MetricName: cpu_total Statistics: Average ComparisonOperator: GreaterThanThreshold Threshold: '{{threshold}}' Times: 1 SilenceTime: '{{ silenceTime }}' Period: 60 Interval: 60 Outputs: InstanceId: Type: String ValueSelector: .instanceId - Name: CheckForInstances Action: 'ACS::CheckFor' Description: en: Check ECS instance has specified tag. zh-cn: 检查ECS实例有指定的tag。 OnError: 'ACS::END' Properties: Service: ECS API: DescribeInstances Parameters: Tags: '{{ tags }}' InstanceIds: '["{{ alarmTrigger.instanceId }}"]' PropertySelector: TotalCount DesiredValues: - 1 - Name: RebootInstance Action: 'ACS::ECS::RebootInstance' Description: en: Restarts the ECS instances. zh-cn: 重启实例。 Properties: instanceId: '{{ alarmTrigger.instanceId }}'
JSON格式
{
"FormatVersion": "OOS-2019-06-01",
"Description": {
"en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed.",
"zh-cn": "按tag在ECS实例CPU利用率超过阈值时执行实例重启。所选实例必须已安装云监控Agent。",
"name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
"name-zh-cn": "按tag在ECS实例CPU利用率超过阈值时执行实例重启",
"categories": [
"alarm-trigger"
]
},
"Parameters": {
"tags": {
"Type": "Json",
"Description": {
"en": "The tags to select ECS instances.",
"zh-cn": "实例的标签。"
},
"AssociationProperty": "Tags"
},
"threshold": {
"Type": "Number",
"Description": {
"en": "The CPU utilization threshold.",
"zh-cn": "CPU利用率阈值。"
}
},
"silenceTime": {
"Type": "Number",
"Description": {
"en": "The silence time of alarm (seconds).",
"zh-cn": "告警通道沉默周期(秒)。"
},
"Default": 60
},
"OOSAssumeRole": {
"Description": {
"en": "The RAM role to be assumed by OOS.",
"zh-cn": "OOS扮演的RAM角色。"
},
"Type": "String",
"Default": "OOSServiceRole"
}
},
"RamRole": "{{ OOSAssumeRole }}",
"Tasks": [
{
"Name": "alarmTrigger",
"Action": "ACS::AlarmTrigger",
"Description": {
"en": "Set the CPU utilization alarm for ECS instance.",
"zh-cn": "对ECS实例的CPU使用率进行监控。"
},
"Properties": {
"Namespace": "acs_ecs_dashboard",
"MetricName": "cpu_total",
"Statistics": "Average",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": "{{threshold}}",
"Times": 1,
"SilenceTime": "{{ silenceTime }}",
"Period": 60,
"Interval": 60
},
"Outputs": {
"InstanceId": {
"Type": "String",
"ValueSelector": ".instanceId"
}
}
},
{
"Name": "CheckForInstances",
"Action": "ACS::CheckFor",
"Description": {
"en": "Check ECS instance has specified tag.",
"zh-cn": "检查ECS实例有指定的tag。"
},
"OnError": "ACS::END",
"Properties": {
"Service": "ECS",
"API": "DescribeInstances",
"Parameters": {
"Tags": "{{ tags }}",
"InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
},
"PropertySelector": "TotalCount",
"DesiredValues": [
1
]
}
},
{
"Name": "RebootInstance",
"Action": "ACS::ECS::RebootInstance",
"Description": {
"en": "Restarts the ECS instances.",
"zh-cn": "重启实例。"
},
"Properties": {
"instanceId": "{{ alarmTrigger.instanceId }}"
}
}
]
}
文档内容是否对您有帮助?