Linear model feature importance

更新时间:
复制 MD 格式

The Linear Model Feature Importance component calculates the feature importance for linear models, including linear regression and logistic regression for binary classification. It supports both sparse data and dense data formats. This topic describes how to configure the component.

Limitations

The supported computing engine is MaxCompute.

Component configuration

You can configure the parameters for the Linear Model Feature Importance component in one of the following ways.

Method 1: Visual interface

Configure the component parameters on the pipeline page in Machine Learning Designer.

Tab

Parameter

Description

Fields Setting

Feature Columns

The feature columns from the input table to use for training. This parameter is optional. By default, all columns are used except for the label column.

Target Column

Required. Click Select Fields to select the label column.

Input table data is in sparse format

Optional.

Tuning

Number of Cores

The number of cores for computation. Optional.

Memory per Core (MB)

The amount of memory per core, in MB. Optional.

Method 2: PAI command

You can configure the component parameters by running a PAI command in the SQL Script component. For more information, see SQL Script.

PAI -name regression_feature_importance -project algo_public
    -DmodelName=xlab_m_logisticregressi_20317_v0
    -DoutputTableName=pai_temp_2252_20321_1
    -DlabelColName=y
    -DfeatureColNames=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign
    -DenableSparse=false -DinputTableName=pai_dense_10_9;

Parameter

Required

Description

Default

inputTableName

Yes

The name of the input table.

None

outputTableName

Yes

The name of the output table.

None

labelColName

Yes

The name of the label column in the input table.

None

modelName

Yes

The name of the input model.

None

featureColNames

No

The feature columns to select from the input table.

All columns except for the label column.

inputTablePartitions

No

The partitions to use from the input table.

The entire table.

enableSparse

No

Specifies whether the input data is in a sparse format.

false

itemDelimiter

No

The delimiter between key-value pairs for input data in a sparse format.

Space

kvDelimiter

No

The delimiter between a key and its value for input data in a sparse format.

Colon (:)

lifecycle

No

The lifecycle of the output table, in days.

Not specified

coreNum

No

The number of cores.

Auto

memSizePerCore

No

The amount of memory per core, in MB.

Auto

Example

  1. Create a table named bank_data and import data into it. For more information, see Create a table and Import data.

  2. Run the following SQL statement to generate training data.

    create table if not exists pai_dense_10_9 as
    select
        age,campaign,pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, nr_employed, fixed_deposit
    from  bank_data limit 10;
  3. Build and run a pipeline. For more information, see Algorithm modeling. To connect the components, connect the output of Read Table-1 to the inputs of both Logistic Regression for Multiclass Classification and Linear Model Feature Importance. Then, connect the output of Logistic Regression for Multiclass Classification to the input of Linear Model Feature Importance.

    1. Drag the Read Table, Logistic Regression for Multiclass Classification, and Linear Model Feature Importance components from the component list in Machine Learning Designer to the canvas.

    2. Connect the components to build the pipeline.

    3. Configure the parameters for each component.

      • On the canvas, click the Read Table-1 component. In the right-side pane, on the Select Table tab, set Table Name to bank_data.

      • On the canvas, click the Logistic Regression for Multiclass Classification-1 component. In the right-side pane, on the Fields Setting tab, set Feature Columns to age, campaign, pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, and nr_employed. Set Target Column to fixed_deposit. Use the default values for the other parameters.

      • On the canvas, click the Linear Model Feature Importance-1 component. In the right-side pane, on the Fields Setting tab, set Target Column to fixed_deposit. Use the default values for the other parameters.

    4. After you configure the parameters, click the Run button image.

  4. After the pipeline runs successfully, right-click the Linear Model Feature Importance-1 component and choose View Data > Model Importance Table. The Linear Model Feature Importance component outputs a table that shows the colname, weight, and importance for each feature. Values are presented in scientific notation. For example, the weight of the age feature is 9.61816270808075E-5.

    The metric formulas are as follows.

    Column name

    Formula

    weight

    abs(w_)

    importance

    abs(w_j) * STD(f_i)

    Note

    abs(w_j) is the absolute value of the feature coefficient, and STD(f_i) is the standard deviation of the training data.

  5. Right-click the Linear Model Feature Importance-1 component and select View Analytics Report.image

Related documents