混淆矩阵

更新时间:
复制 MD 格式

A confusion matrix is used in supervised learning and corresponds to a matching matrix in unsupervised learning. In performance evaluation, a confusion matrix compares classification results with actual values and displays the classification accuracy in a matrix. This topic describes how to configure the Confusion Matrix component.

Limitations

The only supported computing engine is MaxCompute.

Component configuration

You can configure the Confusion Matrix component using one of the following methods.

Method 1: Use the UI

Configure the component parameters on the pipeline page in Machine Learning Designer.

Parameter

Description

Original Label Column

Supports numeric data types.

Prediction Result Label Column

This parameter is required if Threshold is not specified.

Threshold

A sample is considered positive if its value is greater than this threshold.

Prediction Result Detail Column

This parameter cannot be used with the Prediction Result Label Column parameter. This parameter is required if Threshold is specified.

Positive Sample Label

This parameter is required if Threshold is specified.

Method 2: Use a PAI command

You can use a PAI command to configure the component parameters. You can run PAI commands by using the SQL Script component. For more information, see SQL script.

  • Threshold not specified

    pai -name confusionmatrix -project algo_public
        -DinputTableName=wpbc_pred
        -DoutputTableName=wpbc_confu
        -DlabelColName=label
        -DpredictionColName=prediction_result;
  • Threshold specified

    pai -name confusionmatrix -project algo_public
        -DinputTableName=wpbc_pred
        -DoutputTableName=wpbc_confu
        -DlabelColName=label
        -DpredictionDetailColName=prediction_detail
        -Dthreshold=0.8
        -DgoodValue=N;

Parameter

Required

Description

Default

inputTableName

Yes

The name of the input table, which is the output table from a prediction component.

N/A

inputTablePartition

No

The partition of the input table.

The entire table

outputTableName

Yes

The name of the output table to store the confusion matrix.

N/A

labelColName

Yes

The name of the original label column.

N/A

predictionColName

No

The name of the prediction result column. This parameter is required if threshold is not specified.

N/A

predictionDetailColName

No

The name of the prediction result detail column. This parameter is required if threshold is specified.

N/A

threshold

No

The threshold for classifying positive samples.

0.5

goodValue

No

The label value that corresponds to a positive outcome in binary classification. This parameter is required if threshold is specified.

N/A

coreNum

No

The number of cores for computing.

Automatically allocated

memSizePerCore

No

The amount of memory for each core, in MB.

Automatically allocated

lifecycle

No

The lifecycle of the output table.

N/A

Example

  1. Use a MaxCompute client to create a table named test_data with the following columns: id bigint, label string, prediction_result string. To learn how to install and configure a MaxCompute client, see Connect by using a local client (odpscmd). To create a table, see Create a table.

  2. Import the following sample data into the test_data table. To learn how to import data, see Import data.

    id

    label

    prediction_result

    0

    A

    A

    1

    A

    B

    2

    A

    A

    3

    A

    A

    4

    B

    B

    5

    B

    B

    6

    B

    A

    7

    B

    B

    8

    B

    A

    9

    A

    A

  3. Build a pipeline and run the components. For more information, see Algorithm modeling.

    1. In the component list on the left side of Machine Learning Designer, search for the Read Table and Confusion Matrix components, and drag them to the canvas.

    2. Connect the components to build a pipeline.

    3. Configure the component parameters.

      • Click the Read Table-1 component on the canvas. On the Select Table tab in the right-side pane, set Table Name to test_data.

      • Click the Confusion Matrix-1 component on the canvas. In the right pane, configure the parameters as shown in the following table. Leave the other parameters at their default values.

        Parameter

        Description

        Original Label Column

        Select the label column.

        Prediction Result Label Column

        Enter prediction_result.

    4. After you configure the parameters, click the Run button image to run the pipeline.

  4. When the pipeline finishes, right-click the Confusion Matrix-1 component and select Visual Analysis from the shortcut menu to view the output.

    • Click the Confusion Matrix tab to view the resulting confusion matrix.

      image

    • Click the Statistics tab to view the model statistics.

      The statistics include TruePositive, FalsePositive, Accuracy, Precision, Recall, and F1 score. For example, for Model A, the respective metric values are 4, 2, 0.7, 0.6667, 0.8, and 0.7273. For Model B, the respective metric values are 3, 1, 0.7, 0.75, 0.6, and 0.6667.

Related Topics

  • For more information about components in Machine Learning Designer, see Overview of Machine Learning Designer.

  • Machine Learning Designer provides a variety of algorithm components. You can select the appropriate components for data processing based on your use case. For more information, see Component Reference.