Gradient Boosting Decision Trees (GBDT) Binary Classification V2 trains a binary classification model by combining multiple weak decision trees into a single strong learner. It incorporates XGBoost's second-order optimization and LightGBM's histogram approximation, making it fast, accurate, and interpretable.
This component runs on MaxCompute only.
How it works
The model is an ensemble of CART decision trees, where each tree corrects the residual errors of the previous one. The process follows the recursive gradient boosting formula:

In this formula,
is a CART decision tree,
are the tree's parameters, and
is the step size. Each tree optimizes the objective function relative to the previous tree, and the final model contains multiple decision trees.
Label requirement: Binary class labels must be 0 and 1.
Supported input formats
The component accepts two input formats:
| Format | Column selection | Data format |
|---|---|---|
| Multiple feature columns (default) | Multiple columns of double, bigint, or string type | Numerical features are binned; categorical features use a many-vs-many splitting strategy (no one-hot encoding needed) |
| Sparse vector format | One string column | Key-value pairs separated by spaces; key and value separated by a colon. Example: 1:0.3 3:0.9 |
Configure the component
Pair this component with GBDT Binary Classification Prediction V2 to score new data. After training, deploy the model as an online service. For details, see Deploy a pipeline as an online service.
Input ports
| Port | Required | Recommended upstream component |
|---|---|---|
| Input Data | Yes | Read Table |
Fields setting
| Parameter | Required | Default | Description |
|---|---|---|---|
| Use Sparse Vector Format | No | No | Enable this if your feature data is in sparse vector format (key:value pairs). When enabled, select exactly one string column as the feature column. |
| Select Feature Columns | Yes | — | Feature columns used for training. In non-sparse mode, select columns of double, bigint, or string type. In sparse mode, select one string column. |
| Select Categorical Feature Columns | No | — | Columns to treat as categorical features. All other selected feature columns are treated as numerical. Only applies in non-sparse mode. |
| Select Label Column | Yes | — | The label column. Values must be 0 or 1. |
| Select Weight Column | No | — | An optional column of sample weights for training. |
Parameter setting
| Parameter | Default | Valid values | Description |
|---|---|---|---|
| Number of Trees | 1 | Positive integer | The number of trees in the ensemble. More trees improve accuracy but increase training time. Use alongside Learning Rate: smaller learning rates generally require more trees. |
| Maximum Number of Leaf Nodes | 32 | Positive integer | Maximum leaf nodes per tree. Larger values let each tree capture more complex patterns but increase the risk of overfitting. |
| Learning Rate | 0.05 | Float | Shrinks each tree's contribution. Lower values make training more conservative and reduce overfitting risk, but require more trees to reach the same accuracy. |
| Ratio of Samples | 0.6 | (0, 1] | Fraction of training samples used per tree. Values below 1.0 introduce randomness (stochastic gradient boosting), which reduces variance and helps prevent overfitting. |
| Ratio of Features | 0.6 | (0, 1] | Fraction of features considered per tree. Values below 1.0 increase diversity among trees and reduce overfitting, at the cost of some accuracy. |
| Minimum Number of Samples in a Leaf Node | 500 | Positive integer | Minimum samples required in a leaf node. Higher values prevent the model from fitting to very small data subsets, helping to control overfitting. |
| Maximum Number of Bins | 32 | Positive integer | Maximum bins when discretizing continuous features. More bins produce more precise splits but increase training cost. Equivalent to 1 / Sketch-based Approximate Precision in PS-SMART. |
| Maximum Number of Distinct Categories | 1024 | Positive integer | Maximum distinct categories for categorical features. Categories exceeding this rank (sorted by frequency) are merged into one bucket. More categories allow finer splits but increase overfitting risk and training cost. |
| Number of features | Auto-calculated | Positive integer | For sparse vector format only. Set to max feature ID + 1. Leave blank to let the system scan the data automatically. |
| Initial Prediction | Auto-calculated | Float | The prior probability of positive samples. Leave blank to let the system estimate from the data. |
| Random Seed | 0 | Integer | Seed for random sampling. Set a fixed value for reproducible runs. |
Tuning
These parameters control compute resources and do not affect model accuracy.
| Parameter | Default | Description |
|---|---|---|
| Choose Running Mode | MaxCompute | Running environment. Valid values: MaxCompute, Flink. |
| Number of Instances | Auto-calculated | Number of compute instances. Adjust from the auto-generated value if jobs fail or are slow. |
| Memory Per Instance | Auto-calculated | Memory per instance, in MB. Adjust from the auto-generated value if jobs run out of memory. |
| Num of Threads | 1 | Threads per instance. Multi-threading increases resource use; performance gains are non-linear and may decrease if you exceed the optimal thread count. |
Output ports
| Port | Data type | Content | Recommended downstream component |
|---|---|---|---|
| Output Model | MaxCompute table | Trained GBDT model, ready for prediction or online deployment | GBDT Binary Classification Prediction V2 |
| Output Feature Importance | MaxCompute table | Feature importance scores using the gain metric by default. View directly; cannot connect to PAI command-based components such as GBDT Feature Importance V2. | — |
The area under curve (AUC) is the default evaluation metric. After the job completes, view AUC metrics in the worker log.
Migrate from PS-SMART Binary Classification Training
If you previously used PS-SMART Binary Classification Training, use the following table to map parameters to their GBDT Binary Classification V2 equivalents.
| PS-SMART parameter | GBDT V2 equivalent | Notes |
|---|---|---|
| Use Sparse Format | Use Sparse Vector Format | — |
| Feature Columns | Select Feature Columns | — |
| Label Column | Select Label Column | — |
| Weight Column | Select Weight Column | — |
| Evaluation Indicator Type | Not supported | AUC is used by default. View metrics in the worker log. |
| Trees | Number of Trees | — |
| Maximum Tree Depth | Maximum Number of Leaf Nodes | Maximum Number of Leaf Nodes = 2 ^ (Maximum Tree Depth - 1) |
| Data Sampling Fraction | Ratio of Samples | — |
| Feature Sampling Fraction | Ratio of Features | — |
| L1 Penalty Coefficient | Not supported | — |
| L2 Penalty Coefficient | Not supported | — |
| Learning Rate | Learning Rate | — |
| Sketch-based Approximate Precision | Maximum Number of Bins | Maximum Number of Bins = 1 / Sketch-based Approximate Precision |
| Minimum Split Loss Change | Minimum Number of Samples in a Leaf Node | Cannot be converted directly. Both parameters help prevent overfitting. |
| Features | Features | — |
| Global Offset | Global Offset | — |
| Random Seed | Random Seed | — |
| Feature Importance Type | Not supported | Defaults to gain. |
| Cores | Number of Instances | Values are not equivalent. Start from the system-generated value and adjust. |
| Memory Size per Core | Memory Per Instance | Values are not equivalent. Start from the system-generated value and adjust. |
What's next
GBDT Binary Classification Prediction V2 — score new data using the trained model
Deploy a pipeline as an online service — serve the model for real-time inference
PS-SMART Binary Classification Training — the predecessor component, for reference