Regression evaluation-Platform For AI(PAI)-阿里云帮助中心

Regression Model Evaluation measures how well a regression model's predictions match actual outcomes. It computes a set of standard regression metrics—including mean squared error (MSE), mean absolute error (MAE), and R-squared (R²)—and generates a residual histogram to help you visualize prediction errors and identify areas for model improvement.

Prerequisites

Before you begin, ensure that you have:

A trained regression model with prediction output
An input table with two numeric columns: one for actual (observed) values and one for predicted values

Important

Both the actual-value column and the predicted-value column must use numeric data types. Non-numeric columns are not supported.

Configure the component

Method 1: Configure on the pipeline page

Add a Regression Model Evaluation component to your pipeline and configure the following parameters:

Category	Parameter	Description
Fields Setting	Original Regression Value	The actual observed values of the target variable. Used as the ground truth for evaluating prediction accuracy.
Fields Setting	Predicted Regression Value	The values predicted by your regression model based on the input features.
Tuning	Worker number	Number of workers for distributed computation. For sizing guidance, see Appendix: How to estimate resource usage.
Tuning	Memory Size per Node	Memory allocated to each worker node. See the same appendix for sizing guidance.

Method 2: Use PAI commands

Run the component by passing parameters to the regression_evaluation algorithm through a SQL Script component:

PAI -name regression_evaluation -project algo_public
    -DinputTableName=input_table
    -DyColName=y_col
    -DpredictionColName=prediction_col
    -DindexOutputTableName=index_output_table
    -DresidualOutputTableName=residual_output_table;

Parameter	Required	Default	Description
`inputTableName`	Yes	—	Name of the input table.
`inputTablePartitions`	No	Full table	Partitions to read from the input table. Omit to use the full table.
`yColName`	Yes	—	Column name containing the actual (observed) values. Must be numeric.
`predictionColName`	Yes	—	Column name containing the predicted values. Must be numeric.
`indexOutputTableName`	Yes	—	Name of the output table that stores regression metrics.
`residualOutputTableName`	Yes	—	Name of the output table that stores the residual histogram data.
`intervalNum`	No	100	Number of intervals (bins) in the residual histogram.
`lifecycle`	No	—	Retention period for the output tables. Must be a positive integer.
`coreNum`	No	System default	Number of CPU cores. Valid values: 1–9,999.
`memSizePerCore`	No	System default	Memory per core, in MB. Valid values: 1,024–65,536.

Output

The regression metrics output table (indexOutputTableName) is generated in JSON format and contains the following fields.

Regression metrics table

Metric	Description	Interpretation
`MSE`	Mean squared error	Lower is better. Penalizes large errors more than MAE due to squaring.
`RMSE`	Root mean square error	Lower is better. Expressed in the same unit as your target variable, making it easier to interpret than MSE.
`MAE`	Mean absolute error	Lower is better. The average magnitude of prediction errors, less sensitive to outliers than MSE.
`MAPE`	Mean absolute percentage error	Lower is better. Expresses error as a percentage of actual values; useful when comparing models across different scales.
`MAD`	Mean absolute deviation	Lower is better.
`R2`	R-squared (coefficient of determination)	Measures the proportion of variance in the actual values explained by the model.
`R`	Coefficient of multiple correlations	Measures the correlation between actual and predicted values.
`SST`	Total sum of squares	The total variance in the actual values.
`SSE`	Sum of squared errors	The variance left unexplained by the model.
`SSR`	Sum of squares due to regression	The variance explained by the model. SST = SSE + SSR.
`count`	Row count	Number of rows used in the evaluation.
`yMean`	Mean of actual values	The arithmetic mean of the actual (observed) target values.
`predictionMean`	Mean of predicted values	The arithmetic mean of the model's predicted values.

Residual histogram table

The residual histogram table (residualOutputTableName) stores the distribution of prediction errors (residuals = actual − predicted) across the number of intervals specified with intervalNum.

How to read the residual histogram:

Symmetric, bell-shaped distribution centered near zero: The model's errors are random and well-distributed—a sign of a well-fitted model.
Skewed distribution: The model systematically over-predicts or under-predicts for a subset of inputs. Investigate whether feature engineering or model selection could reduce the bias.
Wide spread: Large residuals indicate high variance. Consider regularization or collecting more training data.