Anomaly detection

更新时间:
复制 MD 格式

The Anomaly Detection component identifies data points or patterns in a dataset that significantly deviate from normal behavior. It supports two detection methods: Box Plot for continuous (numeric) features and Attribute Value Frequency (AVF) for categorical (enumeration) features, making it suitable for detecting errors, frauds, and outliers across a wide range of datasets.

Choose a detection method

Select the method that matches your column types before configuring the component.

MethodFeature typeUse when
Box PlotContinuous (numeric)Columns contain numeric measurements, such as prices, rates, or counts
Attribute Value Frequency (AVF)Categorical (enumeration)Columns contain discrete categories, such as status codes, labels, or identifiers
Each method applies only to columns that match its feature type.

Configure the component

Method 1: Configure on the pipeline page (recommended)

In Machine Learning Designer, add the Anomaly Detection component to your pipeline, then set the following parameters in the component panel.

ParameterDescription
Feature ColumnsThe columns to include in anomaly detection. Select all columns that belong to the same feature type.
Anomaly Detection MethodThe detection method. Select Box Plot for continuous features or Attribute Value Frequency (AVF) for categorical features.

Method 2: Use PAI commands

Run the Anomaly Detection component through Platform for AI (PAI) commands. Use the SQL Script component to call PAI commands. For more information, see SQL Script (Scenario 4: Execute PAI commands within the SQL script component).

PAI -name fe_detect_runner -project algo_public
     -DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \
     -Dlifecycle="28"
     -DdetectStrategy="boxPlot"
     -DmodelTable="pai_temp_2458_23565_2"
     -DinputTable="pai_bank_data"
     -DoutputTable="pai_temp_2458_23565_1";

Parameters

ParameterRequiredTypeDefaultValid valuesDescription
inputTableYesStringThe name of the input table.
inputTablePartitionsNoStringAll partitionsThe partitions in the input table. If not specified, all partitions are used. See Partition format for syntax.
selectedColsYesStringThe feature columns to analyze. Accepts any data type.
detectStrategyYesStringboxPlot, avfThe detection method. Use boxPlot for continuous features or avf for categorical features.
outputTableYesStringThe name of the output table. Contains data with anomalous features.
modelTableYesStringThe name of the table used to store the anomaly detection model.
lifecycleNoInteger7The lifecycle of the output table.
coreNumNoInteger1–9999The number of cores. Must be set together with memSizePerCore.
memSizePerCoreNoInteger (MB)2048–65536The memory allocated per core. Must be set together with coreNum.

Partition format

Use the inputTablePartitions parameter to filter input data by partition.

ScenarioFormatExample
Single partitionpartition_name=valueds=20240101
Multiple partitionsname1=value1,name2=value2ds=20240101,region=cn
Multi-level partitionsname1=value1/name2=value2year=2024/month=01
Separate multiple partitions with commas (,).

Output

The outputTable contains data with anomalous features from the input table. After the component runs, query the output table to review the flagged records and determine which data points deviate from normal patterns.

Related topics

  • SQL Script — learn how to execute PAI commands within the SQL Script component