告警触发器ACS::AlarmTrigger

用途

当包含告警触发器的模板创建执行后,该执行初始为等待中状态。如果告警触发器中设置的监控项达到告警阈值,执行状态则切换为运行中,并立即开始执行模板中定义后续任务,后续任务一般为自动解除告警的相关操作。应用场景举例,如当ECS实例的cpu使用率超过90%时,触发告警,自动执行重启该实例的操作。

注意

在告警触发器中,可设置监控项有两大类,分别是预装插件采集的和ECS原生自带的,关于如何区分可参见监控项说明。如需对云监控插件类采集的监控项进行监控,请您先为待监控实例安装插件,否则告警无法触发。插件安装方法:在云监控控制台的主机监控中选择待监控实例,单击点击安装即可。

限制

触发器有如下限制:

  • 一个模板只允许有一个触发器动作。

  • 触发器动作的任务必须定义在模板Tasks中的第一个任务。

  • 被嵌套的模板(子模板)中不允许有触发器动作。

语法

  • YAML格式

    Tasks:
    - Name: taskName1 # 任务名称
      Action: 'ACS::AlarmTrigger'
      Properties:
          Namespace: 'acs_ecs_dashboard'  # 必填,产品的数据命名空间。比如ecs产品。可选参数通过查询DescribeMetricMetaList接口获得。
          MetricName: 'cpu_total'  #  必填,监控项名称。比如当前消耗的总CPU百分比。可选参数通过查询DescribeMetricMetaList接口获得。
          Statistics: 'Average' # 告警统计方法。如Average为统计某时间段平均值。可选参数通过查询DescribeMetricMetaList接口获得。
          ComparisonOperator:  'GreaterThanThreshold' #   必填,阈值比较符。可选择比较类型有,GreaterThanOrEqualToThreshold:大于等于、GreaterThanThreshold:大于、LessThanOrEqualToThreshold:小于等于、LessThanThreshold:小于、NotEqualToThreshold:不等、GreaterThanYesterday:同比昨天时间上涨、LessThanYesterday:同比昨天时间下降、GreaterThanLastWeek:同比上周同一时间上涨、LessThanLastWeek:同比上周同一时间下降、GreaterThanLastPeriod:环比上周期上涨、LessThanLastPeriod:环比上周期下降。
          Threshold: '90' # 告警阈值,比如cpu90%的总使用率。
          Resources: '[{"resource":"_ALL"}]'  #  必填,需要告警的资源。如[{"resource":"_ALL"}]为表示账号下所有资源,如指定具体实例为[{"instanceId":"i-bp123467zxcvb"}];如指定某实例上的磁盘分区[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}];指定实例上的多个磁盘分区,[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}]
          Times: 1 # 报警重复次数。
          Interval: 60 # 告警规则的探测周期,单位为秒。默认为监控项的最小频率60s。
          SilenceTime: 3600   # 通道沉默周期,单位为秒。默认86400秒(即1天)。监控数据持续超过报警规则阈值时,每个沉默周期内只发送1次报警通知。
      Outputs:  
       paraName1:
           Type: String
           ValueSelector: .key # 此处的.key表示获取json消息体中的某个key的值,后附json样式。具体即.instanceId会得到"i-abc12345zxcv",告警触发的事件对应消息体Json样式 { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-1390000****-exec-2130c0c073fa487098d3", "userId": "1390000****", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "1390000****" }
  • JSON格式(请参考YAML注释说明)

    {
      "Tasks": [
        {
          "Name": "taskName1",
          "Action": "ACS::AlarmTrigger",
          "Properties": {
            "Namespace": "acs_ecs_dashboard",
            "MetricName": "cpu_total",
            "Statistics": "Average",
            "ComparisonOperator": "GreaterThanThreshold",
            "Threshold": "90",
            "Resources": "[{\"resource\":\"_ALL\"}]",
            "Times": 1,
            "Interval": 60,
            "SilenceTime": 3600
          },
          "Outputs": {
            "paraName1": {
              "Type": "String",
              "ValueSelector": ".key"
            }
          }
        }
      ]
    }

示例

在1分钟周期内,若被监控ECS实例的CPU总使用率超过阈值,则实例自动重启。

  • YAML格式 

    FormatVersion: OOS-2019-06-01
    Description:
      en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed.
      zh-cn: 按tag在ECS实例CPU利用率超过阈值时执行实例重启。所选实例必须已安装云监控Agent。
      name-en: ACS-ECS-RebootInstanceAtHighCpuByTags
      name-zh-cn: 按tag在ECS实例CPU利用率超过阈值时执行实例重启
      categories:
        - alarm-trigger
    Parameters:
      tags:
        Type: Json
        Description:
          en: The tags to select ECS instances.
          zh-cn: 实例的标签。
        AssociationProperty: Tags
      threshold:
        Type: Number
        Description:
          en: The CPU utilization threshold.
          zh-cn: CPU利用率阈值。
      silenceTime:
        Type: Number
        Description:
          en: The silence time of alarm (seconds).
          zh-cn: 告警通道沉默周期(秒)。
        Default: 60
      OOSAssumeRole:
        Description:
          en: The RAM role to be assumed by OOS.
          zh-cn: OOS扮演的RAM角色。
        Type: String
        Default: OOSServiceRole
    RamRole: '{{ OOSAssumeRole }}'
    Tasks:
      - Name: alarmTrigger
        Action: 'ACS::AlarmTrigger'
        Description:
          en: Set the CPU utilization alarm for ECS instance.
          zh-cn: 对ECS实例的CPU使用率进行监控。
        Properties:
          Namespace: acs_ecs_dashboard
          MetricName: cpu_total
          Statistics: Average
          ComparisonOperator: GreaterThanThreshold
          Threshold: '{{threshold}}'
          Times: 1
          SilenceTime: '{{ silenceTime }}'
          Period: 60
          Interval: 60
        Outputs:
          InstanceId:
            Type: String
            ValueSelector: .instanceId
      - Name: CheckForInstances
        Action: 'ACS::CheckFor'
        Description:
          en: Check ECS instance has specified tag.
          zh-cn: 检查ECS实例有指定的tag。
        OnError: 'ACS::END'
        Properties:
          Service: ECS
          API: DescribeInstances
          Parameters:
            Tags: '{{ tags }}'
            InstanceIds: '["{{ alarmTrigger.instanceId }}"]'
          PropertySelector: TotalCount
          DesiredValues:
            - 1
      - Name: RebootInstance
        Action: 'ACS::ECS::RebootInstance'
        Description:
          en: Restarts the ECS instances.
          zh-cn: 重启实例。
        Properties:
          instanceId: '{{ alarmTrigger.instanceId }}'
                                            

  • JSON格式

{
  "FormatVersion": "OOS-2019-06-01",
  "Description": {
    "en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed.",
    "zh-cn": "按tag在ECS实例CPU利用率超过阈值时执行实例重启。所选实例必须已安装云监控Agent。",
    "name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
    "name-zh-cn": "按tag在ECS实例CPU利用率超过阈值时执行实例重启",
    "categories": [
      "alarm-trigger"
    ]
  },
  "Parameters": {
    "tags": {
      "Type": "Json",
      "Description": {
        "en": "The tags to select ECS instances.",
        "zh-cn": "实例的标签。"
      },
      "AssociationProperty": "Tags"
    },
    "threshold": {
      "Type": "Number",
      "Description": {
        "en": "The CPU utilization threshold.",
        "zh-cn": "CPU利用率阈值。"
      }
    },
    "silenceTime": {
      "Type": "Number",
      "Description": {
        "en": "The silence time of alarm (seconds).",
        "zh-cn": "告警通道沉默周期(秒)。"
      },
      "Default": 60
    },
    "OOSAssumeRole": {
      "Description": {
        "en": "The RAM role to be assumed by OOS.",
        "zh-cn": "OOS扮演的RAM角色。"
      },
      "Type": "String",
      "Default": "OOSServiceRole"
    }
  },
  "RamRole": "{{ OOSAssumeRole }}",
  "Tasks": [
    {
      "Name": "alarmTrigger",
      "Action": "ACS::AlarmTrigger",
      "Description": {
        "en": "Set the CPU utilization alarm for ECS instance.",
        "zh-cn": "对ECS实例的CPU使用率进行监控。"
      },
      "Properties": {
        "Namespace": "acs_ecs_dashboard",
        "MetricName": "cpu_total",
        "Statistics": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": "{{threshold}}",
        "Times": 1,
        "SilenceTime": "{{ silenceTime }}",
        "Period": 60,
        "Interval": 60
      },
      "Outputs": {
        "InstanceId": {
          "Type": "String",
          "ValueSelector": ".instanceId"
        }
      }
    },
    {
      "Name": "CheckForInstances",
      "Action": "ACS::CheckFor",
      "Description": {
        "en": "Check ECS instance has specified tag.",
        "zh-cn": "检查ECS实例有指定的tag。"
      },
      "OnError": "ACS::END",
      "Properties": {
        "Service": "ECS",
        "API": "DescribeInstances",
        "Parameters": {
          "Tags": "{{ tags }}",
          "InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
        },
        "PropertySelector": "TotalCount",
        "DesiredValues": [
          1
        ]
      }
    },
    {
      "Name": "RebootInstance",
      "Action": "ACS::ECS::RebootInstance",
      "Description": {
        "en": "Restarts the ECS instances.",
        "zh-cn": "重启实例。"
      },
      "Properties": {
        "instanceId": "{{ alarmTrigger.instanceId }}"
      }
    }
  ]
}