Model Configuration

更新时间:
复制 MD 格式

To predict repurchases, you must first configure a model. You can use the model for prediction only after it runs successfully. After the model is trained, you can view the top 10 most important features and use model validation to understand the model's expected accuracy and recall rates.

Prerequisites

The algorithm model requires a behavioral dataset for training. The algorithm engine uses this data to generate a model. The quality of the model depends on the training data. Higher data quality and a larger data volume result in better model performance.

  1. The data requirements for the behavioral dataset differ from those of a general dataset. You must prepare the data as described in Behavioral dataset example.

  2. The data must be stored in an ADB 3.0 data source, and the data source must be connected to Quick Audience. For more information, see Create a data source or Grant permissions on a data source table.

  3. You must create a dataset from the prepared data. For more information, see Create a behavioral dataset.

Create an algorithm model

Note
  • Only one model can be in a running state in a workspace at a time. If a model with the Training Succeeded status already exists in the workspace, you must unpublish it before you can create a new one.

  • The prompt above the list shows the number of used model tasks out of the total number of purchased model tasks. This number is the sum for all workspaces in the organization. Creating and updating models consume available model tasks. Failed executions are not counted.

Procedure:

  1. Go to Workspace > User Insights > Model Hub > Repurchase Prediction > Model Configuration.

    image

  2. In the upper-right corner, click Create Model. The configuration page appears.

    image

  3. Select the order details table as the training data.

  4. Select the mapped value for "Order Purchase Time" in the behavior type field of the order details table.

  5. Select the mapped value for positive behaviors such as "Purchase" in the behavior object attribute field of the order details table.

  6. Enter the repurchase period in days. The value must be an integer from 15 to 90. This configures the model to predict user repurchases within the next N days.

    Note

    "Next N days" refers to the N days following the most recent behavior time in the behavioral dataset.

  7. Select the checkbox to confirm that creating the task consumes one available model task. Click Save and Execute. The model training starts.

    Clicking Save only saves the configuration.

Manage algorithm models

The algorithm model list appears.

image

  • Only one model can be in a running state in a workspace. You must unpublish the existing model before creating a new one.

Running states include the following: Not Started, Pending Training, Training, and Training Succeeded.

  • Not Started: The model is saved but has not started training.

  • Pending Training: If more than five model training and audience prediction tasks are running in the organization, additional models are queued for training.

  • Training

  • Training Succeeded: After the model is successfully trained, audience prediction tasks in this workspace use this model by default.

  • Training Failed: The model training fails if it runs for 24 hours without producing a result, is manually stopped, or encounters other issues. You can move the mouse pointer over the 23 icon to view the reason for the failure.

  • Unpublished

You can perform operations on the model, such as editing, viewing training details, manually updating, stopping training, and unpublishing.

Edit

For models with a status of Not Started or Training Failed, you can click the 243 icon to modify the model configuration. The configuration process is the same as when you create a model.

View training details

For models with a status of Training Succeeded, you can click the 213 icon to view detailed training information. For more information, see View training details.

Manually update

For models that are not in the Pending Training or Training state, you can click the 234 icon to retrain the model. A new model is generated to replace the original one.

Note
  • To ensure prediction accuracy, you should update the model when the volume of training data changes significantly. When the system detects that the data volume of the behavioral dataset has increased by 20%, an icon appears next to the dataset name to prompt you to update the model.

  • Before you start retraining the model, a dialog box appears. It informs you that if the model training is successful, one available model task will be consumed, and the original model will be unpublished after the training starts. Click Confirm to start the training.

Unpublish

For models that are not in the Pending Training or Training state, you can click 43 > Unpublish to unpublish the model.

Note

After a model is unpublished, its data is deleted immediately if it is not associated with any prediction tasks.

View training details

For a successfully trained model, you can click Training Details to go to the details page. On this page, you can view model information, the top 10 training features, and model validation results.

image

Top 10 training features

Understanding the top 10 training features helps you understand the most significant behavioral features of the audience in the prediction results.

The top 10 training features are the 10 user metrics with the highest importance in the algorithm model. For information about how to compare the top 10 training features between the predicted audience and a random audience, see Model validation.

The top 10 training features are shown below.

1626

All training label features are processed from the original behavioral data during model training. Their meanings are described in the following table.

Training label feature

Meaning

Historical average purchase amount

Total purchase amount of the user / Number of purchases by the user

Historical maximum purchase amount

Maximum purchase amount of the user

Historical minimum purchase amount

Minimum purchase amount of the user

Historical total purchase amount

Total purchase amount of the user

Historical number of purchases

Number of purchases by the user

Number of purchases in the last 7 days

Number of user purchases in the last 7 days

Number of purchases in the last 30 days

Number of user purchases in the last 30 days

Number of purchases in the last 90 days

Number of user purchases in the last 90 days

Days since first purchase

Number of days from the user's first purchase to today

Days since last purchase

Number of days from the user's last purchase to today

Historical purchase days

The number of days on which the user made a purchase. Multiple purchases on the same day are counted as one.

Number of behavior channels

Number of channels for purchase behaviors

Average purchase interval

User purchase interval = (Time interval between the last and first purchase) / (Number of purchases - 1)

Repurchase ratio

Average user purchase interval / Days since the last purchase

Model validation

Model validation helps you understand the expected prediction performance based on accuracy and recall rates. This information helps you select an appropriate number of predicted users in subsequent audience prediction tasks to achieve better performance.

Model validation compares the accuracy and recall rates of a random audience and a high-potential validation audience of the same size. It also compares the value distribution of their top 10 training features.

  1. First, the system selects a random audience and a high-potential validation audience of the same size:

    • High-potential validation audience: The model is used to predict repurchases for a portion of the historical audience. The top N% of this audience with the highest predicted purchase probability, which is a total of M people, is used as the high-potential validation audience.

      The value of N% is set to different values, such as 5%, 25%, and 50%. The total number of people, M, varies accordingly. This process corresponds to selecting different numbers of predicted users in the audience prediction task results.

    • Random audience: M people are randomly selected from the historical audience to serve as a control group. Its size is equal to that of the high-potential validation audience.

  2. Then, the system calculates the accuracy and recall rates for the high-potential validation audience and the random audience based on their purchases within the repurchase period. These rates serve as quantitative indicators of prediction success:

    • Accuracy: Number of purchasers in the predicted audience (high-potential validation audience or random audience) / Total number of people in the predicted audience

    • Recall rate: Number of purchasers in the predicted audience (high-potential validation audience or random audience) / Total number of purchasers in the entire historical audience

    The comparison of accuracy and recall rates between the random audience and the high-potential validation audience is shown below.

    image

    In the results:

    • The accuracy and recall rates of the high-potential validation audience are generally higher than those of a random audience of the same size. This indicates that the algorithm model successfully predicted the high-potential audience.

    • A smaller high-potential validation audience generally has higher accuracy and recall rates than a larger one. This is because only a portion of the historical audience has prominent training features, while the training feature data for the rest of the audience shows little variation.

    • The accuracy and recall rates of the random audience generally do not fluctuate significantly with size. This is a result of selecting a random audience.

    Therefore, to achieve higher accuracy and recall rates in subsequent audience prediction tasks, you should select a smaller number of users with a high purchase probability. When you need to select a larger number of predicted users, you can determine the number by referring to the accuracy and recall rates in the model validation results. For more information about the specific method, see Audience prediction result details.

  3. Finally, the system compares the value distribution of the top 10 training features for the random audience and the high-potential validation audience.

    As shown in the figure, after you select a label (training feature), a comparison chart appears. The statistical period for the data is the previous year. By default, the high-potential validation audience that is displayed consists of the top 25% of users with the highest purchase probability.

    image