本文介绍如何使用Lindorm ML进行时序预测。

背景信息

精准的时序预测是电商、物流、交通、旅游等业务的核心需求和基础能力。不同的场景一般涉及不同的预测标的,对预测的时间维度也有不同的要求。时序预测可以在涵盖各种场景的同时,提供更符合场景要求的时间维度,如天、小时和更细的时间级别。同时,时序预测功能也是很多商业决策的必需功能,它对后续仓配、履约、库存和资源调控等决策均有着非常重要的作用。例如,单品的销量预测是网络购物平台订货、库存和补货调拨的主要依据之一。

前提条件

数据准备

示例表fresh_sales共有一条时间线。时间线的Tag为id_code=bf502edc7025604a51c96d21e09de0e8,brand_id=11077664934,cate1_id=104,cate2_id=237。Field为sales。

表结构如下:

+------------+-----------+------------+------------+
| columnName | typeName  | columnKind | primaryKey |
+------------+-----------+------------+------------+
| id_code    | VARCHAR   | TAG        | true       |
| time       | TIMESTAMP | TIMESTAMP  | false      |
| sales      | DOUBLE    | FIELD      | false      |
| brand_id   | VARCHAR   | TAG        | false      |
| cate1_id   | VARCHAR   | TAG        | false      |
| cate2_id   | VARCHAR   | TAG        | false      |
+------------+-----------+------------+------------+
示例数据如下:
+---------------------------+----------------------------------+-------------+----------+----------+-------+
|           time            |             id_code              |  brand_id   | cate1_id | cate2_id | sales |
+---------------------------+----------------------------------+-------------+----------+----------+-------+
| 2021-01-01T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 11114152323 | 104      | 237      | 117   |
| 2021-01-02T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 11114152323 | 104      | 237      | 118   |
| 2021-01-03T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 11114152323 | 104      | 237      | 144   |
| 2021-01-04T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 11114152323 | 104      | 237      | 133   |
| 2021-01-05T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 11114152323 | 104      | 237      | 126   |
+---------------------------+----------------------------------+-------------+----------+----------+-------+

时序预测

以下以商品销量预测场景为例,介绍如何使用Lindorm ML进行时序预测。

  1. 使用CREATE MODEL语句进行模型训练,示例如下:
    CREATE MODEL tft_model
    FROM (SELECT * FROM fresh_sales WHERE `time` > '2021-02-08T00:00:00+08:00')
    TARGET sales
    TASK time_series_forecast
    ALGORITHM tft
    SETTINGS
    (
      time_column 'time',
      group_columns 'id_code',
      feat_static_columns 'cate1_id,cate2_id,brand_id',
      context_length '28',
      prediction_length '6',
      epochs '5',
      freq '1D'
    );
    说明 TASK指定为TIME_SERIES_FORECAST后,模型推理时只可以使用forecast函数。
  2. 模型管理。使用SHOW MODEL model_name语句查看模型信息,示例如下:
    SHOW MODEL tft_model;

    返回结果:

    +-----------+--------+--------------+----------------------+-----------+------------------------------------+---------------+-------------------------------------------------+--------------------------------+-------------------------------+-------------------------------+
    |   name    | status | sql_function |       task_type      | algorithm |               query                | preprocessors |                    settings                     |            metrics             |         created_time          |          update_time          |
    +-----------+--------+--------------+----------------------+-----------+------------------------------------+---------------+-------------------------------------------------+--------------------------------+-------------------------------+-------------------------------+
    | tft_model | Ready  | forecast     | TIME_SERIES_FORECAST | TFT       | SELECT `time`, FIRST(`sales`) AS   | []            | {time_column=time, group_columns=id_code,       | {MAPE=0.35002756118774414,     | 2022-11-04T11:38:05.873+08:00 | 2022-11-04T11:39:14.046+08:00 |
    |           |        |              |                      |           | `sales`, `id_code`, `cate1_id`,    |               | feat_static_columns=cate1_id,cate2_id,brand_id, | MASE=0.41281554008773325,      |                               |                               |
    |           |        |              |                      |           | `cate2_id`, `brand_id` FROM        |               | context_length=28, prediction_length=6,         | MSE=456.3769938151042}         |                               |                               |
    |           |        |              |                      |           | `fresh_sales` WHERE `time` >       |               | epochs=5, freq=1D,train_mode=LOCAL,             |                                |                               |                               |
    |           |        |              |                      |           | '2021-02-08T00:00:00+08:00'        |               | past_length=28,                                 |                                |                               |                               |
    |           |        |              |                      |           | sample by 1D fill zero             |               | forecast_start=2022-07-31 08:00:00}             |                                |                               |                               |
    +-----------+--------+--------------+----------------------+-----------+------------------------------------+---------------+-------------------------------------------------+--------------------------------+-------------------------------+-------------------------------+
  3. 模型推理。使用forecast函数进行时序预测,示例如下:
    SELECT `time`, id_code, forecast(sales, 'tft_model') AS sales_forecast FROM fresh_sales WHERE `time` >= '2022-07-18T00:00:00+08:00' AND id_code = 'bf502edc7025604a51c96d21e09de0e8' sample BY 0;

    返回结果:

    +---------------------------+----------------------------------+--------------------+
    |           time            |             id_code              |   sales_forecast   |
    +---------------------------+----------------------------------+--------------------+
    | 2022-07-18T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 57.61831283569336  |
    | 2022-07-19T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 57.64776611328125  |
    | 2022-07-20T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 58.00449752807617  |
    | 2022-07-21T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 59.41561508178711  |
    | 2022-07-22T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 58.925498962402344 |
    | 2022-07-23T00:00:00+08:00 | bf502edc7025604a51c96d21e09de0e8 | 58.494712829589844 |
    +---------------------------+----------------------------------+--------------------+

准确率计算

如果您想要了解时序预测的准确率,您可以通过以下方式进行计算,此处以商品销量预测场景为例。

商品销量预测场景涉及到多条时间线的同时预测,通常以汇总的1-wMAPE准确率来衡量算法的综合准确率。假设,商品i在t天内的真实销量是Rit,时序预测值是Fit

  1. 通过以下公式,计算商品每天的1-MAPE准确率。
    yuce-ch

    其中abs为绝对值函数,例如,abs(-3.7)=3.7。

    假设商品某天的真实销量Rit为63,时序预测销量Fit为58,代入以上公式,结果取整,可以得到时序预测的准确率为92.1%。

  2. 以真实销量Rit为权重加权平均计算总体准确率,并汇总到各个维度,计算1-wMAPE准确率。
    yuce2-ch

    其中∑为求和函数,表示将所有计算结果相加。

    假设某商品三天的时序预测准确率Acc分别为92.1%、92.6%、100%,真实销量Rit为63、54、58,时序预测销量Fit为58、58、58,代入以上公式,结果取整,可以得到这三天的时序预测准确率为94.9%。