After a Linear Model Feature Importance, GBDT Feature Importance, or Random Forest Feature Importance component produces a scored weight table, Feature Importance Filtering selects the top N features from that table and writes the filtered dataset to an output table. This lets you reduce input dimensionality before training without manually inspecting or ranking feature scores.
Prerequisites
Before you begin, ensure that you have:
A completed run of one of the upstream feature importance components: Linear Model Feature Importance, GBDT Feature Importance, or Random Forest Feature Importance
The output weight table from that component (used as
weightTable)An input data table whose features you want to filter (used as
inputTable)
How it works
Feature Importance Filtering reads the feature scores from weightTable and keeps the top N features. The filtered feature set is written to outputTable. A model file capturing the filter configuration is saved to modelTable.
Configure the component
PAI -name fe_filter_runner -project algo_public
-DselectedCols=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome
-DinputTable=pai_dense_10_10
-DweightTable=pai_temp_2252_20319_1
-DtopN=5
-DmodelTable=pai_temp_2252_20320_2
-DoutputTable=pai_temp_2252_20320_1;This example filters the top 5 features from pai_dense_10_10, using the weight table from an upstream feature importance component, and writes the result to pai_temp_2252_20320_1.
Parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
inputTable | Name of the input table | Yes | — |
inputTablePartitions | Partitions to read from the input table. By default, all partitions are read. Specify a single partition as partition_name=value, multiple partitions as name1=value1,name2=value2 (comma-separated), or multi-level partitions as name1=value1/name2=value2. | No | All partitions |
weightTable | The feature importance weight table. Must be an output table from the Linear Model Feature Importance, GBDT Feature Importance, or Random Forest Feature Importance component. | Yes | — |
outputTable | The output table after the top N features are filtered. | Yes | — |
modelTable | The model file generated by feature filtering. | Yes | — |
selectedCols | Columns from inputTable to consider as candidates for filtering. By default, all columns are considered. | No | All columns |
topN | Number of top-ranked features to keep. Must be a positive integer. | No | 10 |
lifecycle | Retention period of the output table, in days. Must be a positive integer. | No | 7 |
What's next
Connect
outputTableto a training component to build a model using the selected features.To understand how each upstream component calculates feature scores, see the Linear Model Feature Importance, GBDT Feature Importance, and Random Forest Feature Importance documentation.