Datadog是云上应用的监控和分析平台,用于自动采集和分析日志、指标、链路追踪等数据,监控基础设施事件、云服务事件。Datadog为服务器、应用程序以及采集到的各种数据提供了很好的可观测效果。您只需在Datadog集成的Webhook中配置日志服务的开放告警接口URL,即可将Datadog的告警消息发送给日志服务。

前提条件

已创建协议Datadog的开放告警应用。具体操作,请参见配置开放告警对外接口

Datadog配置

  1. 登录Datadog控制台。
  2. 配置Webhook。
    1. 在顶部导航栏中,选择Integrations图标 > Integrations
    2. Integrations页签中,找到webhooks,将鼠标悬浮在webhooks框中,单击Install
    3. 安装完成后,将鼠标悬浮在webhooks框中,单击Configure
    4. Webhooks区域,单击New
    5. New Webhook区域,配置如下参数,然后单击Save
      datadog-webhook
      参数 说明
      Name webhook的名称。
      URL 告警消息的接收端,此处配置为您在日志服务中创建开放告警服务和应用后生成的接口信息(完整URL)。如何获取,请参见获取接口信息
      Payload 定义告警消息的内容,Datadog将根据此配置生成告警消息内容。关于Datadog提供的告警消息变量的更多信息,请参见Datadog官方文档

      在配置Payload时,需注意如下事项。

      • labels字段中,必须添加tags字段。
      • annotations字段中,必须添加title字段、event_msg字段和text_only_msg字段。
      • 其余由Datadog提供的但未被使用的变量,您可以自定义选择添加到labels字段或者annotations字段中。
      • labelsannotations之外的其他字段,您必须按照如下示例进行配置。

      您可以按照如下内容配置Payload

      {
          "alert_instance_id": "$ID",
          "alert_id": "$ALERT_ID",
          "alert_name": "$ALERT_TITLE",
          "alert_time": "$LAST_UPDATED",
          "fire_time": "$DATE",
          "resolve_time": "$DATE",
          "status": "$ALERT_TRANSITION",
          "labels": {
              "tags": "$TAGS"
          },
          "annotations": {
              "title": "$EVENT_TITLE",
              "event_msg": "$EVENT_MSG",
              "text_only_msg": "$TEXT_ONLY_MSG",
              "alert_metric": "$ALERT_METRIC",
              "alert_query": "$ALERT_QUERY",
              "alert_scope": "$ALERT_SCOPE",
              "alert_status": "$ALERT_STATUS",
              "alert_type": "$ALERT_TYPE",
              "email": "$EMAIL",
              "event_type": "$EVENT_TYPE",
              "hostname": "$HOSTNAME",
              "logs_sample": "$LOGS_SAMPLE",
              "metric_namespace": "$METRIC_NAMESPACE",
              "priority": "$PRIORITY",
              "user": "$USER",
              "username": "$USERNAME",
              "__aggreg_key__": "$AGGREG_KEY",
              "__alert_cycle_key__": "$ALERT_CYCLE_KEY",
              "__incident_attachments__": "$INCIDENT_ATTACHMENTS",
              "__incident_commander__": "$INCIDENT_COMMANDER",
              "__incident_customer_impact__": "$INCIDENT_CUSTOMER_IMPACT",
              "__incident_fildes__": "$INCIDENT_FIELDS",
              "__incident_public_id__": "$INCIDENT_PUBLIC_ID",
              "__incident_title": "$INCIDENT_TITLE",
              "__incident_url__": "$INCIDENT_URL",
              "__org_id__": "$ORG_ID",
              "__org_name__": "$ORG_NAME",
              "__security_rule_name__": "$SECURITY_RULE_NAME",
              "__security_signal_id__": "$SECURITY_SIGNAL_ID",
              "__security_signal_severity__": "$SECURITY_SIGNAL_SEVERITY",
              "__security_signal_title__": "$SECURITY_SIGNAL_TITLE",
              "__security_signal_msg__": "$SECURITY_SIGNAL_MSG",
              "__security_signal_attributes__": "$SECURITY_SIGNAL_ATTRIBUTES",
              "__security_rule_id__": "$SECURITY_RULE_ID",
              "__security_rule_query__": "$SECURITY_RULE_QUERY",
              "__security_rule_group_by_fields__": "$SECURITY_RULE_GROUP_BY_FIELDS",
              "__security_rule_type__": "$SECURITY_RULE_TYPE",
              "__link_snapshot_url__": "$SNAPSHOT",
              "__synthetics_test_name__": "$SYNTHETICS_TEST_NAME",
              "__synthetics_first_failing_step_name__": "$SYNTHETICS_FIRST_FAILING_STEP_NAME"   
          },
          "severity": "$ALERT_PRIORITY",
          "drill_down_query": "$LINK"     
      }
  3. 配置通知渠道。
    1. 在顶部导航栏中,选择通知渠道图标 > Manage Monitors
    2. 单击目标Monitor对应的修改图标。
    3. 配置Notify your team为您在步骤2中所创建的Webhook。
    4. 单击Save

Datadog告警消息

如果您将所有由Datadog提供的但未被使用的变量都添加到了annotations字段中,那么日志服务将收到如下所示的Datadog告警消息。
{
    "alert_instance_id": "123456",
    "alert_id": "123456",
    "alert_name": "STOP on host:abcdefgh",
    "alert_time": "1628647425000",
    "fire_time": "1628647425000",
    "resolve_time": "1627561306000",
    "status": "Triggered",
    "labels": {
        "tags": "ali,host:abcdefgh,monitor"
    },
    "annotations": {
        "title": "[P1] [Triggered on {host:abcdefgh}] STOP",
        "event_msg": "%%%\nwarning\nhost stop\n @webhook-webhook-test-all\n\nThe monitor was last triggered at Thu Jul 29 2021 12:21:45 UTC.\n\n- - -\n\n[[Monitor Status](https://app.datadoghq.com/monitors/1234?to_ts=1234&group=host%3Aabcdefgh&from_ts=1627560405000)] \u00b7 [[Edit Monitor](https://app.datadoghq.com/monitors#1234/edit)] \u00b7 [[View abcdefgh](https://app.datadoghq.com/infrastructure?filter=abcdefgh)] \u00b7 [[Show Processes](https://app.datadoghq.com/process?sort=memory%2CASC&to_ts=1234&tags=host%abcdefgh&from_ts=1627560405000&live=false&showSummaryGraphs=true)]\n%%%",
        "text_only_msg": "\nwarning\nhost stop\n @webhook-webhook-test-all\n\nMetric Graph: https://app.datadoghq.com/monitors/1234?to_ts=1627561365000&group=host%abcdefgh&from_ts=1627557705000 \u00b7 Monitor Status: https://app.datadoghq.com/monitors/1234?group=host%abcdefgh \u00b7 Edit Monitor: https://app.datadoghq.com/monitors#42655965/edit \u00b7 Event URL: https://app.datadoghq.com/event/event?id=1234 \u00b7 View abcdefgh: https://app.datadoghq.com/infrastructure?filter=abcdefgh \u00b7 Show Processes: https://app.datadoghq.com/process?sort=memory%2CASC&to_ts=None&tags=host%abcdefgh&from_ts=None&live=false&showSummaryGraphs=true",
        "alert_metric": "null",
        "alert_query": "\"datadog.agent.up\".over(\"host:abcdefgh\").by(\"host\").last(2).count_by_status()",
        "alert_scope": "host:abcdefgh",
        "alert_status": "",
        "alert_type": "error",
        "email": "",
        "event_type": "service_check",
        "hostname": "abcdefgh",
        "logs_sample": "null",
        "metric_namespace": "",
        "priority": "normal",
        "user": "null",
        "username": "",
        "__aggreg_key__": "a1b2c3",
        "__alert_cycle_key__": "123456789",
        "__incident_attachments__": "null",
        "__incident_commander__": "null",
        "__incident_customer_impact__": "null",
        "__incident_fildes__": "null",
        "__incident_public_id__": "null",
        "__incident_title": "null",
        "__incident_url__": "null",
        "__org_id__": "123",
        "__org_name__": "ali",
        "__security_rule_name__": "null",
        "__security_signal_id__": "null",
        "__security_signal_severity__": "null",
        "__security_signal_title__": "null",
        "__security_signal_msg__": "null",
        "__security_signal_attributes__": "null",
        "__security_rule_id__": "null",
        "__security_rule_query__": "$SECURITY_RULE_QUERY",
        "__security_rule_group_by_fields__": "null",
        "__security_rule_type__": "null",
        "__link_snapshot_url__": "null",
        "__synthetics_test_name__": "null",
        "__synthetics_first_failing_step_name__": "null"   
    },
    "severity": "P1",
    "drill_down_query": "https://app.datadoghq.com/event/event?id=123456"     
}

字段映射

Datadog告警消息被接入到日志服务后,映射为日志服务告警内容。示例如下:
{
    "aliuid": "aliuid1",
    "alert_instance_id": "123456",
    "alert_id": "123456",
    "alert_type": "sls_pub",
    "alert_name": "STOP on host:abcdefgh",
    "region": "",
    "project": "",
    "project_id": 0,
    "next_eval_interval": 0,
    "alert_time": 1628647425,
    "fire_time": 1628647425,
    "fire_results": null,
    "fire_results_count": 0,
    "resolve_time": 0,
    "status": "firing",
    "results": null,
    "labels":{
        "__ali__": "ali",
        "__host__": "abcdefgh",
        "__monitor__": "monitor"
    },
    "annotations":{
        "__aggreg_key__": "1a2b3c4d",
        "__alert_cycle_key__": "123456",
        "__config_app__": "sls_pub_alert",
        "__link_edit_monitor__": "https://app.datadoghq.com/monitors#1234/edit",
        "__link_metric_graph__": "https://app.datadoghq.com/monitors/1234?to_ts=1628647485000&group=host%abcdefgh&from_ts=1628643825000",
        "__link_monitor_status__": "https://app.datadoghq.com/monitors/123?group=host%abcdefgh",
        "__link_show_processes__": "https://app.datadoghq.com/process?sort=memory%2CASC&to_ts=None&tags=host%abcdefgh&from_ts=None&live=false&showSummaryGraphs=true",
        "__link_view_izbp****hqpwt26z__": "https://app.datadoghq.com/infrastructure?filter=abcdefgh",
        "__org_id__": "579186",
        "__org_name__": "ali",
        "__pub_alert_app__": "",
        "__pub_alert_protocol__": "datadog",
        "__pub_alert_region__": "",
        "__pub_alert_service__": "",
        "alert_query": "\"datadog.agent.up\".over(\"host:abcdefgh\").by(\"host\").last(2).count_by_status()",
        "alert_scope": "host:izbp1cerzh0yyvrhqpwt26z",
        "alert_type": "error",
        "desc": "warning\nhost stop\n@webhook-test\nThe monitor was last triggered at Wed Aug 11 2021 02:03:45 UTC.\n- - -\n",
        "event_type": "service_check",
        "hostname": "abcdefgh",
        "priority": "normal",
        "title": "[P1] [Triggered on {host:abcdefgh}] STOP"
    },
    "severity": 10,
    "policy":{
        "alert_policy_id": "",
        "action_policy_id": "",
        "use_default": false,
        "repeat_interval": "0s"
    },
    "template": null,
    "drill_down_query": "https://app.datadoghq.com/event/event?id=123456"
}
日志服务 Datadog 说明
aliuid 用于接入告警的开放告警应用所属的阿里云账号ID
alert_id alert_id 告警监控规则的ID
alert_instance_id alert_instance_id 告警消息的ID
alert_type 告警类型,固定为sls_pub。
alert_name alert_name 告警监控规则的名称
status status 告警状态。
  • 如果Datadog告警消息中status字段的值为Triggered、Re-Triggered、No Data、Re-No Data、Warn、Re-Warn、Renotify,则status的值为firing。
  • 如果Datadog告警消息中status字段的值为Recovered,则status的值为resolved。
next_eval_interval 告警评估间隔时间,固定为0。
alert_time alert_time 告警触发时间。
fire_time fire_time 告警首次触发时间。
resolve_time resolve_time 告警恢复时间。
  • 如果status字段的值为resolved,则resolve_time的值为Datadog告警消息中resolve_time字段的值。
  • 如果status字段的值为firing,则resolve_time的值为0。
labels labels 告警标签信息。
Datadog告警消息的 labels字段中的tags字段值将被英文逗号(,)拆分为多个字符串。
  • 如果字符串为Key:Value格式,则将在Key的前后添加两个下划线(__)。
  • 如果字符串为非Key:Value格式,则系统自动将该字符串构造为Key:Value格式,Key为__字符串__,Value为字符串
例如"ali,host:1a2b3c4d"将被解析成如下格式。
{
    "__ali__": "ali",
    "__host__": "1a2b3c4d"
}

另外Datadog告警消息的labels字段中,其余未被使用且字段值非空的字段和其字段值都会被添加到日志服务告警消息的labels字段中。

annotations annotations Datadog告警被接入到日志服务后,日志服务告警的annotations字段中将添加如下额外的字段。
  • desc:告警内容描述。从Datadog告警消息中的event_msg字段中解析得到。
  • title:告警消息的标题。对应Datadog告警消息中的event_title字段的值。

以下字段从Datadog告警消息中的text_only_msg字段中解析得到。

  • __link_metric_graph__:指标图表的URL。
  • __link_monitor_status__:告警规则查询状态的URL。
  • __link_edit_monitor__:编辑告警规则的URL。
  • __link_view_{$hostname}__:查看监控主机状态的URL,其中{$hostname}为监控的主机名称。
  • __link_show_process__:查看监控主机实时运行进程的URL。

另外Datadog告警消息annotations字段中,其余未被使用且字段值非空的字段和其字段值都会被添加到日志服务告警消息的annotations字段中。

severity severity 告警严重度,Datadog告警严重度与日志服务告警严重度的映射关系如下:
  • P1:严重
  • P2:高
  • P3:中
  • P4:低
  • P5:报告
说明 如果Datadog告警中未定义严重度,则日志服务告警严重度映射为中。
policy 开放告警应用中配置的告警策略。更多信息,请参见Policy结构
project 告警中心所属的Project。更多信息,请参见项目(Project)
drill_down_query drill_down_query 单击字段值中的链接,可跳转到Datadog告警事件的管理页面。