Logistic regression for multiclass classification-Platform For AI(PAI)-阿里云帮助中心

Standard logistic regression handles binary classification. The Logistic Regression for Multiclass Classification component in Platform for AI (PAI) extends it to support multiclass classification using the L-BFGS optimization algorithm. The component accepts both sparse and dense input data formats.

Configure the component

Two configuration methods are available. Use Machine Learning Designer for visual, no-code setup. Use PAI commands for scripted or pipeline-automated workflows.

Method 1: Configure in Machine Learning Designer

Open the Logistic Regression for Multiclass Classification component in Machine Learning Designer (formerly Machine Learning Studio) and set the following parameters.

Fields Setting tab

Parameter	Description
Training feature columns	Feature columns selected from the input table for training. Supports the DOUBLE and BIGINT data types. A maximum of 20 million features are supported.
Target columns	Label columns in the input table.
Sparse format	Whether the input data is in sparse format.

Parameters Setting tab

Parameter	Description
Regularization type	The penalty applied to the model during training. Valid values: L1, L2, and None.
Maximum number of iterations	The maximum number of L-BFGS iterations. Default: `100`.
Regularization coefficient	The strength of the regularization penalty. Not applicable when Regularization type is set to None.
Minimum convergence deviance	The convergence threshold for the L-BFGS algorithm. Training stops when the difference in log-likelihood between consecutive iterations falls below this value. Default: `0.000001`.

Method 2: Use PAI commands

Pass parameters directly to the logisticregression_multi algorithm using PAI commands. Run PAI commands through the SQL Script component. For more information, see SQL Script.

The following example shows the command syntax:

PAI -name logisticregression_multi
    -project algo_public
    -DmodelName="xlab_m_logistic_regression_6096"
    -DregularizedLevel="1"
    -DmaxIter="100"
    -DregularizedType="l1"
    -Depsilon="0.000001"
    -DlabelColName="y"
    -DfeatureColNames="pdays,emp_var_rate"
    -DgoodValue="1"
    -DinputTableName="bank_data"

Parameters

Parameter	Required	Default	Description
`inputTableName`	Yes	—	Name of the input table.
`featureColNames`	No	All numeric columns	Feature columns selected from the input table for training. A maximum of 20 million features are supported.
`labelColName`	Yes	—	Name of the label column.
`inputTablePartitions`	No	Full table	Partitions selected from the input table. Use `partition_name=value` for single partitions and `name1=value1/name2=value2` for multi-level partitions. Separate multiple partitions with commas (`,`).
`modelName`	Yes	—	Name of the output model.
`regularizedType`	No	`l1`	Regularization type. Valid values: `l1`, `l2`, and `None`.
`regularizedLevel`	No	`1.0`	Regularization coefficient. Not applicable when `regularizedType` is `None`.
`maxIter`	No	`100`	Maximum number of L-BFGS iterations.
`epsilon`	No	`1.0e-06`	Convergence threshold for the L-BFGS algorithm. Training stops when the difference in log-likelihood between consecutive iterations is less than this value.
`enableSparse`	No	`false`	Whether the input data is in sparse format. Valid values: `true` and `false`.
`itemDelimiter`	No	`,`	Delimiter between key-value pairs in sparse-format input.
`kvDelimiter`	No	`:`	Delimiter between keys and values in sparse-format input.
`coreNum`	No	System default	Number of cores.
`memSizePerCore`	No	System default	Memory allocated per core, in MB.

Example

This example trains a multiclass logistic regression model on a four-feature dataset and runs predictions. All commands are run through the SQL Script component.

Step 1: Create training data

Run the following SQL statements to create the multi_lr_test_input table:

drop table if exists multi_lr_test_input;
create table multi_lr_test_input
as
select
    *
from
(
    select
        cast(1 as double) as f0,
        cast(0 as double) as f1,
        cast(0 as double) as f2,
        cast(0 as double) as f3,
        cast(0 as bigint) as label
    union all
        select
            cast(0 as double) as f0,
            cast(1 as double) as f1,
            cast(0 as double) as f2,
            cast(0 as double) as f3,
            cast(0 as bigint) as label
    union all
        select
            cast(0 as double) as f0,
            cast(0 as double) as f1,
            cast(1 as double) as f2,
            cast(0 as double) as f3,
            cast(2 as bigint) as label
    union all
        select
            cast(0 as double) as f0,
            cast(0 as double) as f1,
            cast(0 as double) as f2,
            cast(1 as double) as f3,
            cast(1 as bigint) as label
) a;

The table contains four DOUBLE feature columns (f0–f3) and one BIGINT label column:

f0	f1	f2	f3	label
1.0	0.0	0.0	0.0	0
0.0	0.0	1.0	0.0	2
0.0	0.0	0.0	1.0	1
0.0	1.0	0.0	0.0	0

Step 2: Train the model

Run the following PAI command to train the model and save it as multi_lr_test_model:

drop offlinemodel if exists multi_lr_test_model;
PAI -name logisticregression_multi
    -project algo_public
    -DmodelName="multi_lr_test_model"
    -DitemDelimiter=","
    -DregularizedLevel="1"
    -DmaxIter="100"
    -DregularizedType="None"
    -Depsilon="0.000001"
    -DkvDelimiter=":"
    -DlabelColName="label"
    -DfeatureColNames="f0,f1,f2,f3"
    -DenableSparse="false"
    -DinputTableName="multi_lr_test_input";

Step 3: Run predictions

Run the following PAI command to generate predictions and write results to multi_lr_test_prediction_result:

drop table if exists multi_lr_test_prediction_result;
PAI -name prediction
    -project algo_public
    -DdetailColName="prediction_detail"
    -DmodelName="multi_lr_test_model"
    -DitemDelimiter=","
    -DresultColName="prediction_result"
    -Dlifecycle="28"
    -DoutputTableName="multi_lr_test_prediction_result"
    -DscoreColName="prediction_score"
    -DkvDelimiter=":"
    -DinputTableName="multi_lr_test_input"
    -DenableSparse="false"
    -DappendColNames="label";

Step 4: View results

Query the multi_lr_test_prediction_result table to review the prediction output:

label	prediction_result	prediction_score	prediction_detail
0	0	0.9999997274902165	{"0": 0.9999997274902165, "1": 2.324679066261573e-07, "2": 2.324679066261569e-07}
0	0	0.9999997274902165	{"0": 0.9999997274902165, "1": 2.324679066261573e-07, "2": 2.324679066261569e-07}
2	2	0.9999999155958832	{"0": 2.018833979850994e-07, "1": 2.324679066261573e-07, "2": 0.9999999155958832}
1	1	0.9999999155958832	{"0": 2.018833979850994e-07, "1": 0.9999999155958832, "2": 2.324679066261569e-07}

The output columns contain the following information:

prediction_result: The predicted class label.
prediction_score: The probability assigned to the predicted class.
prediction_detail: A JSON object mapping each class label to its predicted probability. Each key is a class label and each value is the model's confidence for that class. For example, {"0": 0.999..., "1": 2.32e-07, "2": 2.32e-07} indicates that the model assigns near-certainty to class 0 and near-zero probability to classes 1 and 2.