Standard logistic regression handles binary classification. The Logistic Regression for Multiclass Classification component in Platform for AI (PAI) extends it to support multiclass classification using the L-BFGS optimization algorithm. The component accepts both sparse and dense input data formats.
Configure the component
Two configuration methods are available. Use Machine Learning Designer for visual, no-code setup. Use PAI commands for scripted or pipeline-automated workflows.
Method 1: Configure in Machine Learning Designer
Open the Logistic Regression for Multiclass Classification component in Machine Learning Designer (formerly Machine Learning Studio) and set the following parameters.
Fields Setting tab
| Parameter | Description |
|---|---|
| Training feature columns | Feature columns selected from the input table for training. Supports the DOUBLE and BIGINT data types. A maximum of 20 million features are supported. |
| Target columns | Label columns in the input table. |
| Sparse format | Whether the input data is in sparse format. |
Parameters Setting tab
| Parameter | Description |
|---|---|
| Regularization type | The penalty applied to the model during training. Valid values: L1, L2, and None. |
| Maximum number of iterations | The maximum number of L-BFGS iterations. Default: 100. |
| Regularization coefficient | The strength of the regularization penalty. Not applicable when Regularization type is set to None. |
| Minimum convergence deviance | The convergence threshold for the L-BFGS algorithm. Training stops when the difference in log-likelihood between consecutive iterations falls below this value. Default: 0.000001. |
Method 2: Use PAI commands
Pass parameters directly to the logisticregression_multi algorithm using PAI commands. Run PAI commands through the SQL Script component. For more information, see SQL Script.
The following example shows the command syntax:
PAI -name logisticregression_multi
-project algo_public
-DmodelName="xlab_m_logistic_regression_6096"
-DregularizedLevel="1"
-DmaxIter="100"
-DregularizedType="l1"
-Depsilon="0.000001"
-DlabelColName="y"
-DfeatureColNames="pdays,emp_var_rate"
-DgoodValue="1"
-DinputTableName="bank_data"Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
inputTableName | Yes | — | Name of the input table. |
featureColNames | No | All numeric columns | Feature columns selected from the input table for training. A maximum of 20 million features are supported. |
labelColName | Yes | — | Name of the label column. |
inputTablePartitions | No | Full table | Partitions selected from the input table. Use partition_name=value for single partitions and name1=value1/name2=value2 for multi-level partitions. Separate multiple partitions with commas (,). |
modelName | Yes | — | Name of the output model. |
regularizedType | No | l1 | Regularization type. Valid values: l1, l2, and None. |
regularizedLevel | No | 1.0 | Regularization coefficient. Not applicable when regularizedType is None. |
maxIter | No | 100 | Maximum number of L-BFGS iterations. |
epsilon | No | 1.0e-06 | Convergence threshold for the L-BFGS algorithm. Training stops when the difference in log-likelihood between consecutive iterations is less than this value. |
enableSparse | No | false | Whether the input data is in sparse format. Valid values: true and false. |
itemDelimiter | No | , | Delimiter between key-value pairs in sparse-format input. |
kvDelimiter | No | : | Delimiter between keys and values in sparse-format input. |
coreNum | No | System default | Number of cores. |
memSizePerCore | No | System default | Memory allocated per core, in MB. |
Example
This example trains a multiclass logistic regression model on a four-feature dataset and runs predictions. All commands are run through the SQL Script component.
Step 1: Create training data
Run the following SQL statements to create the multi_lr_test_input table:
drop table if exists multi_lr_test_input;
create table multi_lr_test_input
as
select
*
from
(
select
cast(1 as double) as f0,
cast(0 as double) as f1,
cast(0 as double) as f2,
cast(0 as double) as f3,
cast(0 as bigint) as label
union all
select
cast(0 as double) as f0,
cast(1 as double) as f1,
cast(0 as double) as f2,
cast(0 as double) as f3,
cast(0 as bigint) as label
union all
select
cast(0 as double) as f0,
cast(0 as double) as f1,
cast(1 as double) as f2,
cast(0 as double) as f3,
cast(2 as bigint) as label
union all
select
cast(0 as double) as f0,
cast(0 as double) as f1,
cast(0 as double) as f2,
cast(1 as double) as f3,
cast(1 as bigint) as label
) a;The table contains four DOUBLE feature columns (f0–f3) and one BIGINT label column:
| f0 | f1 | f2 | f3 | label |
|---|---|---|---|---|
| 1.0 | 0.0 | 0.0 | 0.0 | 0 |
| 0.0 | 0.0 | 1.0 | 0.0 | 2 |
| 0.0 | 0.0 | 0.0 | 1.0 | 1 |
| 0.0 | 1.0 | 0.0 | 0.0 | 0 |
Step 2: Train the model
Run the following PAI command to train the model and save it as multi_lr_test_model:
drop offlinemodel if exists multi_lr_test_model;
PAI -name logisticregression_multi
-project algo_public
-DmodelName="multi_lr_test_model"
-DitemDelimiter=","
-DregularizedLevel="1"
-DmaxIter="100"
-DregularizedType="None"
-Depsilon="0.000001"
-DkvDelimiter=":"
-DlabelColName="label"
-DfeatureColNames="f0,f1,f2,f3"
-DenableSparse="false"
-DinputTableName="multi_lr_test_input";Step 3: Run predictions
Run the following PAI command to generate predictions and write results to multi_lr_test_prediction_result:
drop table if exists multi_lr_test_prediction_result;
PAI -name prediction
-project algo_public
-DdetailColName="prediction_detail"
-DmodelName="multi_lr_test_model"
-DitemDelimiter=","
-DresultColName="prediction_result"
-Dlifecycle="28"
-DoutputTableName="multi_lr_test_prediction_result"
-DscoreColName="prediction_score"
-DkvDelimiter=":"
-DinputTableName="multi_lr_test_input"
-DenableSparse="false"
-DappendColNames="label";Step 4: View results
Query the multi_lr_test_prediction_result table to review the prediction output:
| label | prediction_result | prediction_score | prediction_detail |
|---|---|---|---|
| 0 | 0 | 0.9999997274902165 | {"0": 0.9999997274902165, "1": 2.324679066261573e-07, "2": 2.324679066261569e-07} |
| 0 | 0 | 0.9999997274902165 | {"0": 0.9999997274902165, "1": 2.324679066261573e-07, "2": 2.324679066261569e-07} |
| 2 | 2 | 0.9999999155958832 | {"0": 2.018833979850994e-07, "1": 2.324679066261573e-07, "2": 0.9999999155958832} |
| 1 | 1 | 0.9999999155958832 | {"0": 2.018833979850994e-07, "1": 0.9999999155958832, "2": 2.324679066261569e-07} |
The output columns contain the following information:
prediction_result: The predicted class label.
prediction_score: The probability assigned to the predicted class.
prediction_detail: A JSON object mapping each class label to its predicted probability. Each key is a class label and each value is the model's confidence for that class. For example,
{"0": 0.999..., "1": 2.32e-07, "2": 2.32e-07}indicates that the model assigns near-certainty to class0and near-zero probability to classes1and2.