Prediction

更新时间:
复制 MD 格式

The Prediction component applies a trained model to new data and writes the results to an output table. Use it when your model was trained with a traditional data mining component that does not have a paired prediction component.

Prerequisites

Before you begin, ensure that you have:

  • A trained model in Machine Learning Designer

  • An input table with the feature columns the model expects

Configure the component

Method 1: Configure in Machine Learning Designer

On the pipeline canvas in the Machine Learning Platform for AI (PAI) console, select the Prediction component and configure the following parameters.

Fields Setting tab

ParameterDescription
Feature columnsFeature columns selected from the input table. By default, all columns are selected.
Reserved columnsColumns to carry through to the output table. Include the label column to make downstream evaluation easier.
Output result columnOutput column that contains the top prediction result.
Output score columnOutput column that contains the probability of the top prediction result.
Output detail columnOutput column that contains all possible results and their probabilities.
Sparse matrixEnable if the input data is in sparse format (key-value pairs).
KV delimiterDelimiter between keys and values in sparse data. Default: colon (:).
KV pair delimiterDelimiter between key-value pairs in sparse data. Default: comma (,).

Tuning tab

ParameterDescription
CoresNumber of cores. Must be a positive integer. Use together with Memory size per core.
Memory size per coreMemory per core, in MB. Use together with Cores.

Method 2: Run a PAI command

Run the following command using the SQL Script component:

pai -name prediction
    -DmodelName=nb_model
    -DinputTableName=wpbc
    -DoutputTableName=wpbc_pred
    -DappendColNames=label;

Parameters

ParameterRequiredDescriptionDefault
inputTableNameYesName of the input table.
modelNameYesName of the trained model.
outputTableNameYesName of the output table.
featureColNamesNoFeature columns from the input table, separated by commas.All columns
appendColNamesNoInput columns to append to the output table.None
inputTablePartitionsNoPartitions to read from the input table. Supported formats: partition_name=value for a single partition, name1=value1/name2=value2 for multi-level partitions. Separate multiple partitions with commas.Full table
outputTablePartitionNoPartition to write results to in the output table.None
resultColNameNoOutput column for the top prediction result.prediction_result
scoreColNameNoOutput column for the probability of the top prediction result.prediction_score
detailColNameNoOutput column for all possible results and their probabilities.prediction_detail
enableSparseNoWhether the input data is sparse. Valid values: true, false.false
itemDelimiterNoDelimiter between sparse key-value pairs.,
kvDelimiterNoDelimiter between sparse keys and values.:
lifecycleNoLifecycle of the output table.None
coreNumNoNumber of cores.Automatically allocated
memSizePerCoreNoMemory per core, in MB.Automatically allocated

Example

This example builds a random forest classifier and runs Prediction on the same data.

  1. Create the test input table:

    create table pai_rf_test_input as
    select * from
    (
    select 1 as f0,2 as f1, "good" as class
    union all
    select 1 as f0,3 as f1, "good" as class
    union all
    select 1 as f0,4 as f1, "bad" as class
    union all
    select 0 as f0,3 as f1, "good" as class
    union all
    select 0 as f0,4 as f1, "bad" as class
    )tmp;
  2. Train the model using the random forest algorithm:

    PAI -name randomforests
       -project algo_public
       -DinputTableName="pai_rf_test_input"
       -DmodelName="pai_rf_test_model"
       -DforceCategorical="f1"
       -DlabelColName="class"
       -DfeatureColNames="f0,f1"
       -DmaxRecordSize="100000"
       -DminNumPer="0"
       -DminNumObj="2"
       -DtreeNum="3";
  3. Run Prediction against the trained model:

    PAI -name prediction
        -project algo_public
        -DinputTableName=pai_rf_test_input
        -DmodelName=pai_rf_test_model
        -DresultColName=prediction_result
        -DscoreColName=prediction_score
        -DdetailColName=prediction_detail
        -DoutputTableName=pai_temp_2283_76333_1
  4. View the output table pai_temp_2283_76333_1:

    Prediction results

    The output table contains three columns:

    • prediction_result: the top prediction result (the class with the highest probability). In this example, the value is good or bad.

    • prediction_score: the probability of the top prediction result. In this example, the prediction result can be good or bad, depending on whose probability is higher; prediction_score contains the highest probabilities.

    • prediction_detail: all possible results and their probabilities.