Regression Model Evaluation measures how well a regression model's predictions match actual outcomes. It computes a set of standard regression metrics—including mean squared error (MSE), mean absolute error (MAE), and R-squared (R²)—and generates a residual histogram to help you visualize prediction errors and identify areas for model improvement.
Prerequisites
Before you begin, ensure that you have:
-
A trained regression model with prediction output
-
An input table with two numeric columns: one for actual (observed) values and one for predicted values
Both the actual-value column and the predicted-value column must use numeric data types. Non-numeric columns are not supported.
Configure the component
Method 1: Configure on the pipeline page
Add a Regression Model Evaluation component to your pipeline and configure the following parameters:
| Category | Parameter | Description |
|---|---|---|
| Fields Setting | Original Regression Value | The actual observed values of the target variable. Used as the ground truth for evaluating prediction accuracy. |
| Predicted Regression Value | The values predicted by your regression model based on the input features. | |
| Tuning | Worker number | Number of workers for distributed computation. For sizing guidance, see Appendix: How to estimate resource usage. |
| Memory Size per Node | Memory allocated to each worker node. See the same appendix for sizing guidance. |
Method 2: Use PAI commands
Run the component by passing parameters to the regression_evaluation algorithm through a SQL Script component:
PAI -name regression_evaluation -project algo_public
-DinputTableName=input_table
-DyColName=y_col
-DpredictionColName=prediction_col
-DindexOutputTableName=index_output_table
-DresidualOutputTableName=residual_output_table;
| Parameter | Required | Default | Description |
|---|---|---|---|
inputTableName |
Yes | — | Name of the input table. |
inputTablePartitions |
No | Full table | Partitions to read from the input table. Omit to use the full table. |
yColName |
Yes | — | Column name containing the actual (observed) values. Must be numeric. |
predictionColName |
Yes | — | Column name containing the predicted values. Must be numeric. |
indexOutputTableName |
Yes | — | Name of the output table that stores regression metrics. |
residualOutputTableName |
Yes | — | Name of the output table that stores the residual histogram data. |
intervalNum |
No | 100 | Number of intervals (bins) in the residual histogram. |
lifecycle |
No | — | Retention period for the output tables. Must be a positive integer. |
coreNum |
No | System default | Number of CPU cores. Valid values: 1–9,999. |
memSizePerCore |
No | System default | Memory per core, in MB. Valid values: 1,024–65,536. |
Output
The regression metrics output table (indexOutputTableName) is generated in JSON format and contains the following fields.
Regression metrics table
| Metric | Description | Interpretation |
|---|---|---|
MSE |
Mean squared error | Lower is better. Penalizes large errors more than MAE due to squaring. |
RMSE |
Root mean square error | Lower is better. Expressed in the same unit as your target variable, making it easier to interpret than MSE. |
MAE |
Mean absolute error | Lower is better. The average magnitude of prediction errors, less sensitive to outliers than MSE. |
MAPE |
Mean absolute percentage error | Lower is better. Expresses error as a percentage of actual values; useful when comparing models across different scales. |
MAD |
Mean absolute deviation | Lower is better. |
R2 |
R-squared (coefficient of determination) | Measures the proportion of variance in the actual values explained by the model. |
R |
Coefficient of multiple correlations | Measures the correlation between actual and predicted values. |
SST |
Total sum of squares | The total variance in the actual values. |
SSE |
Sum of squared errors | The variance left unexplained by the model. |
SSR |
Sum of squares due to regression | The variance explained by the model. SST = SSE + SSR. |
count |
Row count | Number of rows used in the evaluation. |
yMean |
Mean of actual values | The arithmetic mean of the actual (observed) target values. |
predictionMean |
Mean of predicted values | The arithmetic mean of the model's predicted values. |
Residual histogram table
The residual histogram table (residualOutputTableName) stores the distribution of prediction errors (residuals = actual − predicted) across the number of intervals specified with intervalNum.
How to read the residual histogram:
-
Symmetric, bell-shaped distribution centered near zero: The model's errors are random and well-distributed—a sign of a well-fitted model.
-
Skewed distribution: The model systematically over-predicts or under-predicts for a subset of inputs. Investigate whether feature engineering or model selection could reduce the bias.
-
Wide spread: Large residuals indicate high variance. Consider regularization or collecting more training data.