x13_arima

更新时间:
复制 MD 格式

x13_arima is a seasonal time series forecasting algorithm built on the open-source X-13ARIMA-SEATS package. It works like a filter: it separates the underlying signal in your data from noise and seasonal patterns, then projects that signal forward to generate forecasts.

Use x13_arima when your data has a known seasonal structure — monthly sales, quarterly revenue, weekly traffic — and you want full control over the model order. If you are not sure which orders to use, use x13_auto_arima instead. You provide only the upper bounds for each order, and x13_auto_arima searches for the optimal values automatically.

How it works

x13_arima fits a Seasonal ARIMA model, expressed as ARIMA(p,d,q)(P,D,Q)m:

PartParametersMeaning
Non-seasonal(p,d,q)Autoregressive order, differencing order, moving average order
Seasonal(P,D,Q)Seasonal autoregressive order, seasonal differencing, seasonal moving average order
PeriodmNumber of observations per season (e.g., 12 for monthly data in a yearly cycle)

ARIMA (Autoregressive Integrated Moving Average) was introduced by Box and Jenkins in the early 1970s and is also known as the Box-Jenkins model.

Limits

LimitValue
Rows per groupMaximum 1,200 records
ColumnsOne numeric column per run

Configure the component

Method 1: configure in the PAI console (recommended)

On the pipeline page of Machine Learning Designer, add the x13_arima component and configure the following parameters.

Fields Setting tab

ParameterRequiredDescription
Time Series ColumnYesSorts the numeric column. Does not affect the values themselves.
Value ColumnYesThe numeric column to forecast.
Stratification ColumnNoComma-separated column names (e.g., col0,col1). A separate model is fitted for each group.

Parameters Setting tab

ParameterRequiredRangeDefaultDescription
Format (p,d,q)YesNon-negative integers in [0, 36]Non-seasonal ARIMA order. p: autoregressive order. d: differencing order. q: moving average order.
Start DateNoyear.season (e.g., 1986.1)1.1Start position of the time series. See Time series format.
Series FrequencyNoPositive integer in [1, 12]12Number of observations per unit period. 12 means monthly data in a yearly cycle.
Format (sp,sd,sq)NoNon-negative integers in [0, 36]Non-seasonalSeasonal ARIMA order. sp: seasonal autoregressive order. sd: seasonal differencing. sq: seasonal moving average order.
Seasonal CycleNo(0, 12]12The seasonal period length.
Prediction EntriesNoPositive integer in (0, 120]12Number of future steps to forecast.
Prediction Confidence LevelNo(0, 1)0.95Confidence interval width for forecast bounds.

Tuning tab

ParameterDescription
CoresNumber of cores. Determined by the system by default.
MemoryMemory per core, in MB. Determined by the system by default.

Method 2: run PAI commands

Use a SQL Script or ODPS SQL component to call x13_arima directly.

PAI -name x13_arima
    -project algo_public
    -DinputTableName=pai_ft_x13_arima_input
    -DseqColName=id
    -DvalueColName=number
    -Dorder=3,1,1
    -Dstart=1949.1
    -Dfrequency=12
    -Dseasonal=0,1,1
    -Dperiod=12
    -DpredictStep=12
    -DoutputPredictTableName=pai_ft_x13_arima_out_predict
    -DoutputDetailTableName=pai_ft_x13_arima_out_detail

Parameters

ParameterRequiredDefaultDescription
inputTableNameYesName of the input table.
inputTablePartitionsNoFull tablePartitions to read from the input table.
seqColNameYesTime series column. Used only to sort the valueColName column.
valueColNameYesNumeric column to forecast.
groupColNamesNoGrouping columns, comma-separated (e.g., col0,col1). A separate model is fitted per group.
orderYesNon-seasonal order as p,d,q. Non-negative integers in [0, 36].
startNo1.1Start position of the time series in n1.n2 format. See Time series format.
frequencyNo12Number of observations per unit period. Range: (0, 12]. A value of 12 means 12 months per year.
seasonalNoNon-seasonalSeasonal order as sp,sd,sq. Non-negative integers in [0, 36].
periodNoSame as frequencySeasonal period length. Range: (0, 100].
maxiterNo1500Maximum number of training iterations.
tolNo1e-5Convergence tolerance (DOUBLE).
predictStepNo12Number of future steps to forecast. Range: (0, 365].
confidenceLevelNo0.95Confidence level for forecast bounds. Range: (0, 1).
outputPredictTableNameYesOutput table for forecast results.
outputDetailTableNameYesOutput table for model details.
outputTablePartitionNoNo partitionPartition spec for the output tables.
coreNumNoSystem defaultNumber of cores. Must be a positive integer. Used together with memSizePerCore.
memSizePerCoreNoSystem defaultMemory per core, in MB. Range: [1024, 65536].
lifecycleNoLifecycle of the output tables in MaxCompute.

Default resource allocation

ConditioncoreNummemSizePerCore
groupColNames not set14096 MB
groupColNames setfloor(total rows / 120,000)4096 MB

Time series format

The start and frequency parameters jointly define how your data maps to calendar time. Think of the data as a sequence of values placed into a two-level calendar: a major unit ts1 (e.g., year) and a sub-unit ts2 (e.g., month).

  • frequency — number of ts2 sub-units per ts1 major unit

  • start — written as n1.n2, meaning the n2th ts2 in the n1th ts1

Granularityts1ts2frequencystart example
MonthlyYearMonth121949.2 = February 1949
QuarterlyYearQuarter41949.2 = Q2 1949
Daily (weekly cycle)WeekDay71949.2 = 2nd day of week 1949
Custom single unitAny11949.1 = the 1949th unit

Example

This example uses the AirPassengers dataset, which records the monthly count of international airline passengers from 1949 to 1960. For dataset details, see AirPassengers (R datasets).

Prepare the input table

The input table has two columns: id (sequence number) and number (passenger count).

idnumber
1112
2118
3132
4129
5121
......

Create the table and upload the data using the MaxCompute client. For installation instructions, see MaxCompute client (odpscmd). For Tunnel command usage, see Tunnel commands.

create table pai_ft_x13_arima_input(id bigint, number bigint);
tunnel upload xxxx/airpassengers.csv pai_ft_x13_arima_input -h true;

This dataset uses monthly frequency starting from January 1949, so start=1949.1 and frequency=12. The table below shows how those 14 values [1, 2, 3, ..., 15] would map if start=1949.3 and frequency=12 (monthly, starting March 1949):

YearJanFebMarAprMayJunJulAugSepOctNovDec
194912345678910
19501112131415

The following examples show additional time series mappings for quarterly, weekly, and single-unit granularities using the same value sequence [1, 2, 3, ..., 15]:

Quarterly (frequency=4, start=1949.3)

YearQtr1Qtr2Qtr3Qtr4
194912
19503456
195178910
195211121314
19531415

Forecasting begins from the second quarter of 1953.

Daily with weekly cycle (frequency=7, start=1949.3)

WeekSunMonTueWedThuFriSat
194912345
19506789101112
1951131415

Forecasting begins from the fourth day of the 1951st week.

Single unit (frequency=1, start=1949.1)

Cyclep1
19491
19502
19513
19524
19535
19546
19557
19568
19579
195810
195911
196012
196113
196214
196315

Forecasting begins in 1963.

Run the PAI command

Run the following command using a SQL Script or ODPS SQL component:

PAI -name x13_arima
    -project algo_public
    -DinputTableName=pai_ft_x13_arima_input
    -DseqColName=id
    -DvalueColName=number
    -Dorder=3,1,1
    -Dseasonal=0,1,1
    -Dstart=1949.1
    -Dfrequency=12
    -Dperiod=12
    -DpredictStep=12
    -DoutputPredictTableName=pai_ft_x13_arima_out_predict
    -DoutputDetailTableName=pai_ft_x13_arima_out_detail

Output tables

outputPredictTableName — forecast results

ColumnDescription
pdateForecast date.
forecastPredicted value.
lowerLower bound of the confidence interval (default: 95%).
upperUpper bound of the confidence interval (default: 95%).
image

outputDetailTableName — model details

ColumnDescription
keyRecord type: model (model specification), evaluation (evaluation metrics), parameters (training parameters), or log (training logs).
summaryDetail content for the corresponding key.
image

FAQ

Why are all prediction results the same value?

The model fell back to a mean model, which outputs the training data mean for all forecast steps. This happens when training fails due to instability after temporal differencing, non-convergence, or zero variance. To see the exact error, open the Logview for the job and check the stderr file for individual nodes.

How do I choose p, d, q, sp, sd, and sq values?

If you are not confident in the parameter settings, use x13_auto_arima instead. Set only the upper limits for each order, and x13_auto_arima searches for the optimal values automatically.

Error: `Number of observations after differencing and/or conditional AR estimation is 9, which is less than the minimum series length required for the model estimated, 24`

The training data is too short for the specified model. Modify the frequency parameter or add more historical data.

Error: `Order of the MA operator is too large`

In most cases, this error occurs because the training data is insufficient. Add more training data.

Error: `Series to be modelled and/or seasonally adjusted must have at least 3 complete years of data`

When you specify seasonal parameters (seasonal), the input must contain at least three complete years of data.

What's next