本文介绍如何使用Lindorm ML进行时序异常检测。

背景信息

时序异常检测作为有效的时序数据分析功能,被广泛应用于各种重要的领域,例如网络安全检测、大型工业设备的日常维护等。 时序异常检测通常还应用于AIOps场景,例如,在需要同时监控多个系统的场景中,不同的系统有着不同的表现和需求,如资源使用率的高低、特殊时间段内有无负载等等。传统基于静态阈值的监控告警功能需要针对不同的业务分别配置不同的规则,这导致了系统的运维效率低、易产生误报、漏报、告警风暴等问题。 而时序异常检测能够在连续的时序指标中通过AI算法自动发现异常点,并通过数据库内机器学习算法自动学习每条时间线的特征,进行异常值检测,规则灵活,检测及时且结果准确,帮助运维人员更好的提高运维效率。

前提条件

数据准备

示例表service_monitor的结构如下:
+--------------+-----------+------------+------------+
| columnName   | typeName  | columnKind | primaryKey |
+--------------+-----------+------------+------------+
| time         | TIMESTAMP | TIMESTAMP  | false      |
| service_name | VARCHAR   | TAG        | false      |
| op_name      | VARCHAR   | TAG        | false      |
| host_ip      | VARCHAR   | TAG        | false      |
| qps          | DOUBLE    | FIELD      | false      |
| rt           | DOUBLE    | FIELD      | false      |
+--------------+-----------+------------+------------+
示例数据如下:
+---------------------------+---------------+-----------+----------+-----+----+
|           time            |  service_name |  op_name  | host_ip  | qps | rt |
+---------------------------+----------------------------------+--------------+
| 2021-01-01T00:00:00+08:00 |   service_1   |    put    | 10.0.0.1 | 500 | 10 |
| 2021-01-01T00:00:05+08:00 |   service_1   |    put    | 10.0.0.1 | 600 | 8  |
| 2021-01-01T00:00:10+08:00 |   service_1   |    put    | 10.0.0.1 | 400 | 12 |
| 2021-01-01T00:00:15+08:00 |   service_1   |    put    | 10.0.0.1 | 700 | 7  |
| 2021-01-01T00:00:20+08:00 |   service_1   |    put    | 10.0.0.1 | 900 | 5  |
+---------------------------+---------------+-----------+----------+-----+----+

操作步骤

以下以某个业务监控系统的异常检测场景为例,介绍如何使用Lindorm ML进行时序异常检测。

  1. 使用CREATE MODEL语句进行模型训练,示例如下:
    CREATE MODEL esd_model
    FROM (SELECT * FROM service_monitor)
    TARGET qps
    TASK time_series_anomaly_detection
    ALGORITHM esd
    SETTINGS
    (
    );
    说明 TASK指定为TIME_SERIES_ANOMALY_DETECTION后,模型推理时只可以使用anomaly_detect函数。
  2. 模型管理。使用SHOW MODEL model_name语句查看模型信息,示例如下:
    SHOW MODEL esd_model;
    返回结果:
    +-----------+--------+----------------+-------------------------------+-----------+---------------------------------+---------------+-----------------------+---------+-------------------------------+-------------------------------+
    |   name    | status |  sql_function  |           task_type           | algorithm |             query               | preprocessors |       settings        | metrics |         created_time          |          update_time          |
    +-----------+--------+----------------+-------------------------------+-----------+---------------------------------+---------------+-----------------------+---------+-------------------------------+-------------------------------+
    | esd_model | Ready  | anomaly_detect | TIME_SERIES_ANOMALY_DETECTION | ESD       | SELECT * FROM `service_monitor` | []            | {train_mode=INENGINE} | {}      | 2022-11-02T18:48:28.717+08:00 | 2022-11-02T18:48:35.085+08:00 |
    +-----------+--------+----------------+-------------------------------+-----------+---------------------------------+---------------+-----------------------+---------+-------------------------------+-------------------------------+
  3. 模型推理。使用anomaly_detect函数进行即席时序异常检测,示例如下:
    SELECT `time`, service_name, op_name, host_ip, anomaly_detect(qps, 'esd_model') AS qps_detect_result FROM service_monitor WHERE `time` >= '2022-01-01T01:00:00+08:00' sample BY 0;
    返回结果:
    +---------------------------+--------------+---------+-----------+--------------------+
    |           time            | servict_name | op_name |  host_ip  | qps_detect_result  |
    +---------------------------+--------------+---------+-----------+--------------------+
    | 2022-01-01T01:00:00+08:00 |  service_1   |   put   | 10.0.0.1  |    false           |
    | 2022-01-01T01:00:05+08:00 |  service_1   |   put   | 10.0.0.1  |    true            |
    | 2022-01-01T01:00:10+08:00 |  service_1   |   put   | 10.0.0.1  |    false           |
    | 2022-01-01T01:00:15+08:00 |  service_1   |   put   | 10.0.0.1  |    false           |
    | 2022-01-01T01:00:20+08:00 |  service_1   |   put   | 10.0.0.1  |    false           |
    | 2022-01-01T01:00:25+08:00 |  service_1   |   put   | 10.0.0.1  |    false           |
    +---------------------------+--------------+---------+-----------+--------------------+
    说明 如果您需要进行持续的异常检测,可以配合连续查询功能实现,具体操作,请参见连续查询如何不间断地进行时序异常检测