Feature importance filtering-Platform For AI(PAI)-阿里云帮助中心

After a Linear Model Feature Importance, GBDT Feature Importance, or Random Forest Feature Importance component produces a scored weight table, Feature Importance Filtering selects the top N features from that table and writes the filtered dataset to an output table. This lets you reduce input dimensionality before training without manually inspecting or ranking feature scores.

Prerequisites

Before you begin, ensure that you have:

A completed run of one of the upstream feature importance components: Linear Model Feature Importance, GBDT Feature Importance, or Random Forest Feature Importance
The output weight table from that component (used as weightTable)
An input data table whose features you want to filter (used as inputTable)

How it works

Feature Importance Filtering reads the feature scores from weightTable and keeps the top N features. The filtered feature set is written to outputTable. A model file capturing the filter configuration is saved to modelTable.

Configure the component

PAI -name fe_filter_runner -project algo_public
    -DselectedCols=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome
    -DinputTable=pai_dense_10_10
    -DweightTable=pai_temp_2252_20319_1
    -DtopN=5
    -DmodelTable=pai_temp_2252_20320_2
    -DoutputTable=pai_temp_2252_20320_1;

This example filters the top 5 features from pai_dense_10_10, using the weight table from an upstream feature importance component, and writes the result to pai_temp_2252_20320_1.

Parameters

Parameter	Description	Required	Default
`inputTable`	Name of the input table	Yes	—
`inputTablePartitions`	Partitions to read from the input table. By default, all partitions are read. Specify a single partition as `partition_name=value`, multiple partitions as `name1=value1,name2=value2` (comma-separated), or multi-level partitions as `name1=value1/name2=value2`.	No	All partitions
`weightTable`	The feature importance weight table. Must be an output table from the Linear Model Feature Importance, GBDT Feature Importance, or Random Forest Feature Importance component.	Yes	—
`outputTable`	The output table after the top N features are filtered.	Yes	—
`modelTable`	The model file generated by feature filtering.	Yes	—
`selectedCols`	Columns from `inputTable` to consider as candidates for filtering. By default, all columns are considered.	No	All columns
`topN`	Number of top-ranked features to keep. Must be a positive integer.	No	10
`lifecycle`	Retention period of the output table, in days. Must be a positive integer.	No	7

What's next

Connect outputTable to a training component to build a model using the selected features.
To understand how each upstream component calculates feature scores, see the Linear Model Feature Importance, GBDT Feature Importance, and Random Forest Feature Importance documentation.