Best practices for PAI-Rec modular algorithm customization

更新时间:
复制 MD 格式

This guide uses a public dataset to walk you through configuring feature engineering, recall, and fine-grained ranking in PAI-Rec, then deploying the generated code to DataWorks. The modular approach lets you publish or unpublish individual functions without affecting the rest of the pipeline.

Why modularize recommendation solution customization?

  • Feature groups simplify model iteration.

  • One-click publishing deploys code and starts data backfill automatically.

  • Only the ranking module fully backfills all required components (feature groups, samples, and feature generation modules), reducing rework from feature engineering errors.

Prerequisites

Complete the following preparations before you begin:

1. Create a PAI-Rec instance and initialize the service

  1. Log in to the Recommendation System Development Platform homepage and click Buy Now.

  2. On the PAI-Rec instance purchase page, configure the following key parameters, and then click Buy Now.

    Parameter

    Description

    Region and zone

    The region where your cloud service is deployed.

    Service Type

    Select Standard Edition and enable Recommendation Solution Customization.

  3. Log in to the PAI-Rec Management Console. In the upper-left corner of the top menu bar, select a region.

  4. In the left navigation pane, choose Instances, and then click the instance name to open the instance details page.

  5. In the Procedure section, click Cloud Service Configuration to open the System Configuration > Cloud Service Configuration page. Click Edit, configure the parameters as shown in the Resource configuration table, and then click Exit.

    Resource configuration

    Parameter

    Description

    Modeling

    Machine Learning Platform for AI Workspace

    Enter the default PAI workspace you created.

    DataWorks Workspace

    Enter the automatically generated DataWorks workspace.

    MaxCompute Workspace

    Enter the MaxCompute project you created.

    OSS Bucket

    Select the OSS bucket you created.

    Engine

    Real-time Recall Engine

    For the Whether to use PAI-FeatureStore option, select Yes.

    Real-time Feature Query

    For the Whether to use PAI-FeatureStore option, select Yes.

  6. In the left navigation pane, choose System Configuration > Permissions. On the Services tab, verify the authorization status of each cloud service.

2. Clone the public dataset

1. Synchronize data table

This solution offers two data input options:

  1. Clone data from the pai_online_project within a fixed time window. This option does not support scheduled tasks.

  2. Use a Python script to generate data. You can run the task in DataWorks to generate data for a specified time range.

To schedule daily data generation and model training, use the second option, which deploys Python code to generate data. See the Generate data using code tab.

Fixed time window

PAI-Rec prepares three tables in the public project pai_online_project:

  • User table: pai_online_project.rec_sln_demo_user_table

  • Item table: pai_online_project.rec_sln_demo_item_table

  • Behavior table: pai_online_project.rec_sln_demo_behavior_table

The operations in this topic are based on these three tables. The data is randomly generated for simulation with no real-world business meaning, which may result in low metrics (such as AUC) during training. Run SQL commands in DataWorks to synchronize the table data from pai_online_project to your DataWorks project (for example, DataWorks_a):

  1. Log in to the DataWorks console and select a region from the upper-left corner of the top menu bar.

  2. In the left-side navigation pane, click Data Development and O&M > Data Development.

  3. Select the DataWorks workspace that you created and click Enter Data Development.

  4. Hover over Create, and then choose New node > MaxCompute > ODPS SQL. Configure the parameters as described in the following table, and then click Confirm.

    Resource configuration

    Parameter

    Description

    Engine instance

    Select your bound MaxCompute data source.

    Node type

    Select the ODPS SQL node type.

    Path

    Select the path for the current node. For example, Business Flow/Workflow/MaxCompute.

    Name

    Enter a custom name, such as Data.

  5. In the new node editor, copy and run the following code to synchronize the user, item, and behavior tables from the pai_online_project project to your MaxCompute project (for example, project_mc). Before you run the code, set variables to specify a 100-day data window ending on the bizdate. Typically, you should set bizdate to the previous day. In the Scheduling parameters area, click Add Parameter, and add two parameters: bizdate with the value $[yyyymmdd-1], and bizdate_100 with the value $[yyyymmdd-100]. Running the following code synchronizes the data from the public pai_online_project project to your project:

CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_v1(
 user_id BIGINT COMMENT 'Unique user ID',
 gender STRING COMMENT 'Gender',
 age BIGINT COMMENT 'Age',
 city STRING COMMENT 'City',
 item_cnt BIGINT COMMENT 'Number of published items',
 follow_cnt BIGINT COMMENT 'Number of followed accounts',
 follower_cnt BIGINT COMMENT 'Number of followers',
 register_time BIGINT COMMENT 'Registration time',
 tags STRING COMMENT 'User tags'
) PARTITIONED BY (ds STRING) STORED AS ALIORC;
INSERT OVERWRITE TABLE rec_sln_demo_user_table_v1 PARTITION(ds)
SELECT *
FROM pai_online_project.rec_sln_demo_user_table
WHERE ds >= "${bizdate_100}" and ds <= "${bizdate}";
CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_v1(
 item_id BIGINT COMMENT 'Item ID',
 duration DOUBLE COMMENT 'Video duration',
 title STRING COMMENT 'Title',
 category STRING COMMENT 'Primary category',
 author BIGINT COMMENT 'Author',
 click_count BIGINT COMMENT 'Total clicks',
 praise_count BIGINT COMMENT 'Total likes',
 pub_time BIGINT COMMENT 'Publish time'
) PARTITIONED BY (ds STRING) STORED AS ALIORC;
INSERT OVERWRITE TABLE rec_sln_demo_item_table_v1 PARTITION(ds)
SELECT *
FROM pai_online_project.rec_sln_demo_item_table
WHERE ds >= "${bizdate_100}" and ds <= "${bizdate}";
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_v1(
 request_id STRING COMMENT 'Tracking ID/Request ID',
 user_id STRING COMMENT 'Unique user ID',
 exp_id STRING COMMENT 'Experiment ID',
 page STRING COMMENT 'Page',
 net_type STRING COMMENT 'Network type',
 event_time BIGINT COMMENT 'Event time',
 item_id STRING COMMENT 'Item ID',
 event STRING COMMENT 'Event type',
 playtime DOUBLE COMMENT 'Playback/Reading duration'
) PARTITIONED BY (ds STRING) STORED AS ALIORC;
INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_v1 PARTITION(ds)
SELECT *
FROM pai_online_project.rec_sln_demo_behavior_table
WHERE ds >= "${bizdate_100}" and ds <= "${bizdate}";

Code generation

Data from a fixed time window cannot be used for scheduled tasks. To run tasks on a schedule, deploy a Python script to generate data:

  1. In the DataWorks console, create a PyODPS 3 node (Create and manage MaxCompute nodes).

  2. Download create_data.py, and then paste the contents of the file into the PyODPS 3 node editor.

  3. On the right side of the editor, click Configure Scheduling and configure the parameters. Then, click the Save image and Submit image icons in the upper-right corner.

    • Configure scheduling parameters:

      • Note on variable substitution:

        In the scheduling parameters, set the $user_table_name argument to rec_sln_demo_user_table.

        Set the $item_table_name argument to rec_sln_demo_item_table.

        Set the $behavior_table_name argument to rec_sln_demo_behavior_table.

        In addition to the three table name arguments, the script also uses a bizdate scheduling parameter, which is passed as the $bizdate variable.

        Parameter configuration:

        Add a parameter named bizdate and set its value to $[yyyymmdd-1].

    • Configure scheduling dependencies.

  4. Click Operation Center, and select Auto Triggered Task O&M > Recurring Job.

  5. Click Data backfill > Current and Descendant Nodes Retroactively in the Actions column of the target task.

  6. In the Data Backfill panel, set the data timestamp, and then click Submit and Navigate.

    For an optimal 60-day data backfill period, set the data timestamp to Scheduled Task Date - 60 to ensure data integrity.

2. Configure dependency nodes

To ensure smooth code generation and deployment, add three SQL code nodes to your DataWorks project with scheduling dependencies set to the root node. After configuring the nodes, publish them:

  1. Hover over Create and select new node > General-purpose > virtual node. Create three virtual nodes using the following resource configuration, then click Confirm.

    Resource configuration

    Parameter

    Description

    Default example

    Node type

    Select the node type.

    Virtual node

    Path

    Select the path for the current node.

    Business Flow/Workflow/General

    Name

    Enter the name of the corresponding synchronized data table.

    • rec_sln_demo_user_table_v1

    • rec_sln_demo_item_table_v1

    • rec_sln_demo_behavior_table_v1

  2. Select each node and set its content to select 1;. Then, click Configure Scheduling on the right and complete the following settings:

    • In the Time attributes section, set the Rerun attribute to Rerun node after success or failure.

    • In the Scheduling dependencies > Ancestor Nodes section, enter the DataWorks workspace name, select the node with the _root suffix, and click Add.

      Apply this dependency configuration to all three virtual nodes.

  3. Click the image icon for each virtual node to submit the node.

3. Register data

Before configuring feature engineering, recall, and ranking, register the three synchronized tables in PAI-Rec:

  1. Log in to the PAI-Rec Management Console and select a region from the top-left corner.

  2. In the left-side navigation pane, click Instances, and then click the name of an instance to open its details page.

  3. In the left-side navigation pane, navigate to Recommendation Solution Customization > Data Registration. On the MaxCompute Table tab, click Create Data Table. Add a user table, an item table, and a behavior table using the configuration in the table below. Then, click Import.

    Parameter

    Description

    Example

    MaxCompute project

    Select the MaxCompute project you created.

    project_mc

    MaxCompute table

    Select the data tables synchronized to your DataWorks workspace.

    • User table: rec_sln_demo_user_table_v1

    • Item table: rec_sln_demo_item_table_v1

    • Behavior table: rec_sln_demo_behavior_table_v1

    Data table name

    Enter a custom name.

    • User table

    • Item table

    • Behavior table

4. Create a recommendation scenario

Create a recommendation scenario before configuring recommendation tasks. For basic concepts and traffic encoding, see Basic Concepts.

In the left navigation pane, click Recommendation Scenario and then Create Scenario. Configure a recommendation scenario with the following resource configuration and click Determine.

Resource configuration

Parameter

Description

Example

Scenario name

A custom name for the scenario.

Homepage

Scenario description

An optional description of the scenario.

None

5. Set up an algorithm

For a full production configuration, use the following recall and fine-grained ranking settings:

  • Global hot recall: Retrieves the top-K most popular items based on log data.

  • Grouped hot recall: Retrieves popular item candidates from specific groups, such as city or gender, to improve relevance.

  • Etrec u2i recall: Uses the etrec collaborative filtering algorithm.

  • Swing u2i recall (optional): Uses the Swing algorithm.

  • Vector recall (optional): Generates candidates using the Deep Structured Semantic Model (DSSM).

  • Fine-grained ranking: Uses the MultiTower model for single-objective ranking and the DBMTL model for multi-objective ranking.

This guide covers global hot recall and etrec u2i recall from RECommender (eTREC, a collaborative filtering implementation), plus fine-grained ranking. Steps:

  1. In the left-side navigation pane, choose Recommendation Solution Customization. Select a scenario you have created, and then click Create Modular Recommendation Solution. Create a solution with the following resource configurations, and then click Save and Go to Algorithm Solution Configuration.

    Leave unspecified parameters at their default values. Data Table Configuration.

    Resource configuration

    Parameter

    Description

    Solution Name

    Enter a custom name.

    Scenario Name

    Select the recommendation scenario that you created.

    Offline Data Source

    Select the MaxCompute project associated with the recommendation scenario.

    Algorithm Framework

    Select PyTorch. You can choose a different algorithm framework based on your preference.

    DataWorks Workspace

    Select the DataWorks workspace associated with the recommendation scenario.

    Workflow Name

    The name of the business flow created in DataWorks when the recommendation solution script is deployed. You can enter a custom name, such as Flow.

    StorageAPI configuration

    Chinese mainland: For regions such as China (Beijing) and China (Shanghai), you can select "StorageAPI", which is a pay-as-you-go data transmission service.

    Other regions: For regions such as China (Hong Kong), Singapore, and Germany (Frankfurt), you must first purchase and use a dedicated Data Transmission Service resource group. If a pay-as-you-go option is not available, you must purchase a subscription-based data transmission service. After purchase, refresh the page and select the name of your subscription-based service. In the PAI-DLC TorchEasyRec training job in DataWorks, add a parameter in a format similar to the following: -odps_data_quota_name ot_xxxx_p#ot_yyyy.

    slim_mode

    Enable this option if your DataWorks version limits the size of imported code packages. You must then manually upload any packages that exceed the limit. For this solution, select No.

    OSS bucket

    Select the OSS bucket associated with the recommendation scenario.

    Project

    Select the FeatureStore project you created, and select FeatureDB as the online data source.

    User Entity

    Select the user feature entity from your FeatureStore project.

    Item Entity

    Select the item feature entity from your FeatureStore project.

  2. On the Configure Table tab, click Add next to a data table. Configure the behavior log table, user table, and item table by setting their corresponding partition, event, feature, and timestamp fields, and then click Next.

    Leave unspecified parameters at their default values. Data Table Configuration.

    Behavior log table resource configuration

    When you configure the behavior log table, adjust the settings based on your actual data. In this tutorial, the behavior log contains key information such as request ID, unique user identifier, page, event timestamp, and event type. If your table includes richer data dimensions, we recommend categorizing this information as user or item information to simplify subsequent feature engineering.

    Parameter

    Description

    Default example

    Behavior table name

    Select the registered behavior table.

    rec_sln_demo_behavior_table_v1

    Time partition

    The partition field of the behavior table.

    ds

    yyyymmdd

    Behavior information configuration

    Request ID

    The ID that identifies each recommendation request in the logs, typically a program-generated UUID. This parameter is optional.

    request_id

    Behavior event

    The field that records behavior events in the logs.

    event

    Behavior event enum values

    The enumerated values for behavior events, such as exposure, click, add to cart, or purchase.

    expr,click,praise

    Behavior value

    A field that represents the depth of a behavior, such as transaction price or watch duration.

    playtime

    Behavior timestamp

    The time the log event occurred, as a timestamp accurate to the second.

    event_time

    Timestamp format

    Used in conjunction with the behavior timestamp.

    unixtime

    Behavior scenario

    The field indicating the scenario where the log was generated, such as a homepage, search page, or product details page.

    page

    Scenario enum values

    Specifies which scenarios' data to use. This enables per-scenario feature statistics in subsequent feature engineering.

    home,detail

    User information configuration

    User ID

    The field for the user ID in the behavior table.

    user_id

    User categorical features

    Categorical user features in the behavior table, such as network type, operating platform, or gender.

    net_type

    Item information configuration

    Item ID

    The field for the item ID in the behavior table.

    item_id

    User table resource configuration

    Parameter

    Description

    Default example

    User table name

    Select the registered user table.

    rec_sln_demo_user_table_v1

    Time partition

    The time partition field of the user table.

    ds

    yyyymmdd

    User information configuration

    User ID

    The user ID field in the user table.

    user_id

    Registration timestamp

    The time the user registered.

    register_time

    Timestamp format

    Used in conjunction with the registration timestamp.

    unixtime

    Categorical feature

    Categorical fields in the user table, such as gender, age group, or city.

    gender,city

    Numerical Features

    Numerical fields in the user table, such as the number of created items or points.

    age,item_cnt,follow_cnt,follower_cnt

    Tag feature

    The name of the tag feature field.

    tags

    Item table resource configuration

    Parameter

    Description

    Default example

    Item table name

    Select the registered item table.

    rec_sln_demo_item_table_v1

    Time partition

    The time partition field of the item table.

    ds

    yyyymmdd

    Item information configuration

    Item ID

    The item ID field in the item table.

    item_id

    Author ID

    The author of the item.

    author

    Listing timestamp

    The name of the item listing timestamp field.

    pub_time

    Timestamp format

    Used in conjunction with the listing timestamp.

    unixtime

    Categorical feature

    Categorical fields in the item table, such as category.

    category

    Numerical Features

    Numerical fields in the item table, such as price, cumulative sales, or number of likes.

    click_count,praise_count

  3. Under Feature Group Configuration, click Add. Configure the parameters as shown in the following table. Set the feature module name and version, and select the user table, item table, and behavior log table that you configured on the Configure Table tab.

    Resource configuration

    Parameter

    Description

    Default example

    Statistical Period

    This setting is used for batch feature generation. To avoid creating too many features, this solution sets the statistical periods to 3, 7, and 15 days. These periods are used to calculate statistical features for users and items over the last 3, 7, and 15 days, respectively.

    If user behavior is sparse, consider using a 21-day period.

    3,7,15

    Behavior

    Select the behavior events you configured. We recommend adding them in the following order: expr (exposure), click, and praise.

    expr,click,praise

    Click Determine. This action generates various statistical features for both users and items. Click Configure Feature for this feature group to view the validation features. For this solution, do not edit the derived features and keep the default settings. You can edit derived features to suit your business needs (Feature Configuration). Click Publish for this feature group, select the latest data partition date as the task run date, set the maximum task parallelism to 10, and keep other settings at their defaults. Then, click Determine. This action generates a node, deploys it to DataWorks, and backfills data by date. Wait for the module's status to change to Published before you proceed to the next step. Click Online Details to view more information about the module. In the Data Refill Task List, you can check the run status of each task. If a task fails, click View Task Node to go to DataWorks and view detailed error information. After fixing the issue, find the node in the Data Refill Task List and click Rerun. After the task succeeds, you can proceed.

  4. Under Label Table Configuration, this module builds sample targets from the behavior table. Click Add, choose Label Module as the module type, enter a name for the label module, and select the feature module that you configured in the previous step. Click Create next to Fine-grained ranking target settings (labels) and add the following two labels:

    • Target 1: Set Fine-grained ranking target name to is_click, set Fine-grained ranking target expression to max(if(event='click',1,0)), and select CLASSIFICATION for Target type.

    • Target 2 (note that the 'l' in 'ln' is lowercase): Set Fine-grained ranking target name to ln_playtime, set Fine-grained ranking target expression to ln(sum(playtime)+1), set Fine-grained ranking target dependency to is_click, and select REGRESSION for Target type. Then, click OK.

    • After you click Determine, publish the module in the same way you published the feature module. Wait for the status to change to Published before proceeding to the next step.

  5. Under Sample Configuration, this module associates sample target tables with features and generates model features in FeatureStore. Click Add, choose Sample Module as the module type, name the model feature, and select the feature and label modules from the previous steps. Publish the module and wait for the status to change to Published before proceeding to the next step.

  6. Under Feature Generation Configuration, this module derives additional features from all features in the sample table. Custom configuration is not currently supported. Click Add, choose Feature Generation as the module type, choose Ranking as the feature generation type, and select the model feature module from the previous sample configuration step. Wait for the status to change to Published before proceeding to the next step.

  7. Under Configure Sorting Method, click Add in the fine-grained ranking section. For the feature module output, select the feature generation module from the previous step. Leave the other options at their default values and click Determine. Then, publish the module. After the status changes to Published, the model service is deployed to PAI-EAS.

  8. Under Retrieval Configuration, click Add next to the target category, configure the parameters, click Confirm, and then Publish the configuration. This document includes several recall configuration methods. To complete the deployment quickly, you only need to configure Global Hot Recall and etrec u2i Recall. The other methods, such as vector recall and collaborative metric recall, are for reference only.

Resource configuration

Global hot recall

Global hot recall ranks popular items based on click events, where top_n represents the number of items in the ranking. To modify the scoring formula or the events used for scoring, you can generate the relevant code, deploy it to DataWorks, and then make your changes.

The scoring formula is click_uv*click_uv/(expr+adj_factor)*exp(-item_publish_days/fresh_decay_denom), where:

  • click_uv: For the same click-through rate (CTR), a higher number of unique user clicks indicates that an item is more popular.

  • click_uv/(expr+adj_factor): This is the smoothed click-through rate (CTR), where click_uv is the number of unique users who clicked and expr is the number of exposures. The adjustment factor adj_factor prevents division by zero. It also prevents the CTR from approaching 1 when the exposure count is low, which ensures the calculated CTR is closer to the true CTR.

  • exp(-item_publish_days/fresh_decay_denom): This penalizes older items. item_publish_days is the number of days between the publication date and the current date.

On the edit configuration page, set the parameters as follows: set Recall model name to global_hot and Version to 1. For Behavior log table, select rec_sln_demo_behavior_table_v1. For Item table, select rec_sln_demo_item_table_v1. Set Recall time window to 15 days and Recall count to 500. For Exposure behavior event, select expr, and for Click behavior event, select click. Keep the Hot score formula switch off. For Recall engine, select FeatureStore.

Etrec u2i recall

etrec is an item-based collaborative filtering algorithm (etrec collaborative filtering).

Configure the etrec u2i recall form: enter a Recall model name (such as etrec) and Version (such as 1). Select a Behavior log table (such as rec_sln_demo_behavior_table_v1). Set Whether to use real-time u2i to false and for Recall engine, select FeatureStore. In the u2i behavior weight section, you can add events and weights, for example, a weight of 1 for click and 3 for praise.

Parameter

Description

Training days

The number of days of behavior logs to use for training. The default is 30 days. You can adjust this value based on your log volume.

Recall count

The number of user-to-item recommendations to generate offline.

U2ITrigger

The items with which a user has interacted, such as items that were clicked, added to favorites, or purchased. This typically excludes items that were only exposed.

Behavior time window

The number of recent days of behavior data to collect. The default is 15.

Behavior time decay coefficient

This value is typically between 0 and 1. A higher value means that older behaviors decay faster and have less weight when constructing the trigger_item.

Trigger selection count

The number of item IDs to retrieve for each user to perform a Cartesian product with the i2i data generated by etrec. A value between 10 and 50 is recommended. If this value is too large, it may result in too many recall candidates.

u2i behavior weight

For exposure events, either do not configure them or set their weight to 0. We recommend not configuring exposure events, as this skips processing user exposure data.

I2I model settings

Parameter settings for etrec (etrec collaborative filtering). The number of related items should not be too large. After you enable the switch, configure the following parameters: for Similarity calculation strategy (sim_type), select an option such as wbcosine, asymcosine, or jaccard (the example uses wbcosine); set Number of related items (top_n) to 500; set Max behaviors per user (max_bhv) to 500; set Min behaviors per user (min_bhv) to 2; for Calculation policy (operator), select an option such as add, mul, min, or max (the example uses add); set Similarity calculation weight coefficient (weight) to 1; set Decay coefficient (alpha) to 0.5. When finished, click Confirm.

Grouped hot recall

You can configure rankings based on attributes such as city and gender to provide basic personalized recall. In the following example, the grouping key is a combination of gender and the bucket number of a numeric feature.

Configure the parameters: set Recall type to User grouped hot recall, Recall model name to group_hot, Version to 1, Behavior log table to rec_sln_demo_behavior_table_v1, Item table to rec_sln_demo_item_table_v1, User table to rec_sln_demo_user_table_v1, Feature module to ft, Recall time window to 15, Recall count to 500, Exposure behavior event to expr, and Click behavior event to click.

In the User group trigger section, add the feature gender (no bucket boundaries) and follow_cnt (bucket boundaries: 1,5,10,20). In the Behavior group trigger section, add the feature net_type (no bucket boundaries). For Recall engine, select FeatureStore.

Swing u2i recall

Swing is an item correlation method that measures item similarity based on the User-Item-User pattern.

Configure the Swing u2i recall model parameters: set Recall type to Swing u2i recall, Recall model name to swing, Version to 1, Behavior log table to rec_sln_demo_behavior_table_v1, Use real-time u2i to false, Training days to 30, and Recall count to 500. For Recall engine, select FeatureStore, and enable the U2I Trigger(weighted_behavior_u2i) switch.

Configure the U2I Trigger parameters: set Behavior time window to 15, Decay coefficient to 0.2, and Trigger selection count to 10. In the u2i behavior weight section, click Add to add behavior events: an event named click with a weight of 1, and an event named praise with a weight of 3.

Enable the I2I model settings switch, then configure the parameters: set Number of related items (top_n) to 500, Max clicks per user (max_click_per_user) to 600, Max users per item (max_user_per_item) to 700, Max time span (max_time_span) to 1, Adjustment coefficient (alpha1) to 5, Adjustment coefficient (alpha2) to 1, and Adjustment coefficient (beta) to 0.3. For Item weight calculation method (norm_method), select COUNT.

Vector recall

This solution uses the DSSM vector recall method. The configuration is as follows:

  • Recall target name: This typically indicates whether a click occurred. Set this to is_click.

  • Recall target selection: Set this to max(if(event='click', 1, 0)).

    The following code provides an example:

    select max(if(event='click',1,0)) is_click ,...
    from ${behavior_table}
    where between dt=${bizdate_start} and dt=${bizdate_end}
    group by req_id,user_id,item

    In this code:

    • ${behavior_table}: The behavior table.

    • ${bizdate_start}: The start date of the behavior time window.

    • event: The event field in the ${behavior_table} table. Use the actual name of the event field from your table.

    • is_click: The target name.

    The formulas for dimension calculation are as follows:

    EMB_SQRT4_STEP8: (8 + Pow(count, 0.25)) / 8) * 8
    EMB_SQRT4_STEP4: (4 + Pow(count, 0.25)) / 4) * 4
    EMB_LN_STEP8:    (8 + Log(count + 1)) / 8) * 8
    EMB_LN_STEP4:    (4 + Log(count + 1)) / 4) * 4

    Here, count is the number of enumerated feature values. Use the Log function for features with a large number of values.

Configure the recall model parameters: set Recall type to Vector recall, Recall model name to dssm, Version to 1, Feature module to ft, and Model type to dssm. Enable the Negative sampling strategy switch, set Sampling type to negative_sampler, and set Number of negative samples to 1024.

Set Target type to CLASSIFICATION and Training days to 30. Set Share embedding to true. Keep the Sample weight switch off. Set Incremental training to true and Incremental training days to 1.