The Anomaly Detection component identifies data points or patterns in a dataset that significantly deviate from normal behavior. It supports two detection methods: Box Plot for continuous (numeric) features and Attribute Value Frequency (AVF) for categorical (enumeration) features, making it suitable for detecting errors, frauds, and outliers across a wide range of datasets.
Choose a detection method
Select the method that matches your column types before configuring the component.
| Method | Feature type | Use when |
|---|---|---|
| Box Plot | Continuous (numeric) | Columns contain numeric measurements, such as prices, rates, or counts |
| Attribute Value Frequency (AVF) | Categorical (enumeration) | Columns contain discrete categories, such as status codes, labels, or identifiers |
Each method applies only to columns that match its feature type.
Configure the component
Method 1: Configure on the pipeline page (recommended)
In Machine Learning Designer, add the Anomaly Detection component to your pipeline, then set the following parameters in the component panel.
| Parameter | Description |
|---|---|
| Feature Columns | The columns to include in anomaly detection. Select all columns that belong to the same feature type. |
| Anomaly Detection Method | The detection method. Select Box Plot for continuous features or Attribute Value Frequency (AVF) for categorical features. |
Method 2: Use PAI commands
Run the Anomaly Detection component through Platform for AI (PAI) commands. Use the SQL Script component to call PAI commands. For more information, see SQL Script (Scenario 4: Execute PAI commands within the SQL script component).
PAI -name fe_detect_runner -project algo_public
-DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \
-Dlifecycle="28"
-DdetectStrategy="boxPlot"
-DmodelTable="pai_temp_2458_23565_2"
-DinputTable="pai_bank_data"
-DoutputTable="pai_temp_2458_23565_1";Parameters
| Parameter | Required | Type | Default | Valid values | Description |
|---|---|---|---|---|---|
inputTable | Yes | String | — | — | The name of the input table. |
inputTablePartitions | No | String | All partitions | — | The partitions in the input table. If not specified, all partitions are used. See Partition format for syntax. |
selectedCols | Yes | String | — | — | The feature columns to analyze. Accepts any data type. |
detectStrategy | Yes | String | — | boxPlot, avf | The detection method. Use boxPlot for continuous features or avf for categorical features. |
outputTable | Yes | String | — | — | The name of the output table. Contains data with anomalous features. |
modelTable | Yes | String | — | — | The name of the table used to store the anomaly detection model. |
lifecycle | No | Integer | 7 | — | The lifecycle of the output table. |
coreNum | No | Integer | — | 1–9999 | The number of cores. Must be set together with memSizePerCore. |
memSizePerCore | No | Integer (MB) | — | 2048–65536 | The memory allocated per core. Must be set together with coreNum. |
Partition format
Use the inputTablePartitions parameter to filter input data by partition.
| Scenario | Format | Example |
|---|---|---|
| Single partition | partition_name=value | ds=20240101 |
| Multiple partitions | name1=value1,name2=value2 | ds=20240101,region=cn |
| Multi-level partitions | name1=value1/name2=value2 | year=2024/month=01 |
Separate multiple partitions with commas (,).
Output
The outputTable contains data with anomalous features from the input table. After the component runs, query the output table to review the flagged records and determine which data points deviate from normal patterns.
Related topics
SQL Script — learn how to execute PAI commands within the SQL Script component