To predict repurchases, you must first configure a model. You can use the model for prediction only after it runs successfully. After the model is trained, you can view the top 10 most important features and use model validation to understand the model's expected accuracy and recall rates.
Prerequisites
The algorithm model requires a behavioral dataset for training. The algorithm engine uses this data to generate a model. The quality of the model depends on the training data. Higher data quality and a larger data volume result in better model performance.
The data requirements for the behavioral dataset differ from those of a general dataset. You must prepare the data as described in Behavioral dataset example.
The data must be stored in an ADB 3.0 data source, and the data source must be connected to Quick Audience. For more information, see Create a data source or Grant permissions on a data source table.
You must create a dataset from the prepared data. For more information, see Create a behavioral dataset.
Create an algorithm model
Only one model can be in a running state in a workspace at a time. If a model with the Training Succeeded status already exists in the workspace, you must unpublish it before you can create a new one.
The prompt above the list shows the number of used model tasks out of the total number of purchased model tasks. This number is the sum for all workspaces in the organization. Creating and updating models consume available model tasks. Failed executions are not counted.
Procedure:
Go to Workspace > User Insights > Model Hub > Repurchase Prediction > Model Configuration.

In the upper-right corner, click Create Model. The configuration page appears.

Select the order details table as the training data.
Select the mapped value for "Order Purchase Time" in the behavior type field of the order details table.
Select the mapped value for positive behaviors such as "Purchase" in the behavior object attribute field of the order details table.
Enter the repurchase period in days. The value must be an integer from 15 to 90. This configures the model to predict user repurchases within the next N days.
Note"Next N days" refers to the N days following the most recent behavior time in the behavioral dataset.
Select the checkbox to confirm that creating the task consumes one available model task. Click Save and Execute. The model training starts.
Clicking Save only saves the configuration.
Manage algorithm models
The algorithm model list appears.

Only one model can be in a running state in a workspace. You must unpublish the existing model before creating a new one.
Running states include the following: Not Started, Pending Training, Training, and Training Succeeded.
Not Started: The model is saved but has not started training.
Pending Training: If more than five model training and audience prediction tasks are running in the organization, additional models are queued for training.
Training
Training Succeeded: After the model is successfully trained, audience prediction tasks in this workspace use this model by default.
Training Failed: The model training fails if it runs for 24 hours without producing a result, is manually stopped, or encounters other issues. You can move the mouse pointer over the
icon to view the reason for the failure.Unpublished
You can perform operations on the model, such as editing, viewing training details, manually updating, stopping training, and unpublishing.
Edit
For models with a status of Not Started or Training Failed, you can click the
icon to modify the model configuration. The configuration process is the same as when you create a model.
View training details
For models with a status of Training Succeeded, you can click the
icon to view detailed training information. For more information, see View training details.
Manually update
For models that are not in the Pending Training or Training state, you can click the
icon to retrain the model. A new model is generated to replace the original one.
To ensure prediction accuracy, you should update the model when the volume of training data changes significantly. When the system detects that the data volume of the behavioral dataset has increased by 20%, an icon appears next to the dataset name to prompt you to update the model.
Before you start retraining the model, a dialog box appears. It informs you that if the model training is successful, one available model task will be consumed, and the original model will be unpublished after the training starts. Click Confirm to start the training.
Unpublish
For models that are not in the Pending Training or Training state, you can click
> Unpublish to unpublish the model.
After a model is unpublished, its data is deleted immediately if it is not associated with any prediction tasks.
View training details
For a successfully trained model, you can click Training Details to go to the details page. On this page, you can view model information, the top 10 training features, and model validation results.

Top 10 training features
Understanding the top 10 training features helps you understand the most significant behavioral features of the audience in the prediction results.
The top 10 training features are the 10 user metrics with the highest importance in the algorithm model. For information about how to compare the top 10 training features between the predicted audience and a random audience, see Model validation.
The top 10 training features are shown below.

All training label features are processed from the original behavioral data during model training. Their meanings are described in the following table.
Training label feature | Meaning |
Historical average purchase amount | Total purchase amount of the user / Number of purchases by the user |
Historical maximum purchase amount | Maximum purchase amount of the user |
Historical minimum purchase amount | Minimum purchase amount of the user |
Historical total purchase amount | Total purchase amount of the user |
Historical number of purchases | Number of purchases by the user |
Number of purchases in the last 7 days | Number of user purchases in the last 7 days |
Number of purchases in the last 30 days | Number of user purchases in the last 30 days |
Number of purchases in the last 90 days | Number of user purchases in the last 90 days |
Days since first purchase | Number of days from the user's first purchase to today |
Days since last purchase | Number of days from the user's last purchase to today |
Historical purchase days | The number of days on which the user made a purchase. Multiple purchases on the same day are counted as one. |
Number of behavior channels | Number of channels for purchase behaviors |
Average purchase interval | User purchase interval = (Time interval between the last and first purchase) / (Number of purchases - 1) |
Repurchase ratio | Average user purchase interval / Days since the last purchase |
Model validation
Model validation helps you understand the expected prediction performance based on accuracy and recall rates. This information helps you select an appropriate number of predicted users in subsequent audience prediction tasks to achieve better performance.
Model validation compares the accuracy and recall rates of a random audience and a high-potential validation audience of the same size. It also compares the value distribution of their top 10 training features.
First, the system selects a random audience and a high-potential validation audience of the same size:
High-potential validation audience: The model is used to predict repurchases for a portion of the historical audience. The top N% of this audience with the highest predicted purchase probability, which is a total of M people, is used as the high-potential validation audience.
The value of N% is set to different values, such as 5%, 25%, and 50%. The total number of people, M, varies accordingly. This process corresponds to selecting different numbers of predicted users in the audience prediction task results.
Random audience: M people are randomly selected from the historical audience to serve as a control group. Its size is equal to that of the high-potential validation audience.
Then, the system calculates the accuracy and recall rates for the high-potential validation audience and the random audience based on their purchases within the repurchase period. These rates serve as quantitative indicators of prediction success:
Accuracy: Number of purchasers in the predicted audience (high-potential validation audience or random audience) / Total number of people in the predicted audience
Recall rate: Number of purchasers in the predicted audience (high-potential validation audience or random audience) / Total number of purchasers in the entire historical audience
The comparison of accuracy and recall rates between the random audience and the high-potential validation audience is shown below.

In the results:
The accuracy and recall rates of the high-potential validation audience are generally higher than those of a random audience of the same size. This indicates that the algorithm model successfully predicted the high-potential audience.
A smaller high-potential validation audience generally has higher accuracy and recall rates than a larger one. This is because only a portion of the historical audience has prominent training features, while the training feature data for the rest of the audience shows little variation.
The accuracy and recall rates of the random audience generally do not fluctuate significantly with size. This is a result of selecting a random audience.
Therefore, to achieve higher accuracy and recall rates in subsequent audience prediction tasks, you should select a smaller number of users with a high purchase probability. When you need to select a larger number of predicted users, you can determine the number by referring to the accuracy and recall rates in the model validation results. For more information about the specific method, see Audience prediction result details.
Finally, the system compares the value distribution of the top 10 training features for the random audience and the high-potential validation audience.
As shown in the figure, after you select a label (training feature), a comparison chart appears. The statistical period for the data is the previous year. By default, the high-potential validation audience that is displayed consists of the top 25% of users with the highest purchase probability.


