Customizing PAI-Rec recommendation algorithms

更新时间:
复制 MD 格式

This topic explains how to use a public dataset to quickly get started with PAI-Rec. You will configure key features for a custom recommendation algorithm, such as feature engineering, recall, and fine-grained ranking. You will then generate and deploy the code to the corresponding workflow in DataWorks.

Prerequisites

Before you begin, complete the following:

1. Create a PAI-Rec instance

  1. Log in to the PAI-Rec homepage and click Buy Now.

  2. On the PAI-Rec instance purchase page, configure the following parameters and click Buy Now.

    Parameter

    Description

    Region and availability zone

    The region where your service is deployed.

    Service Type

    For this solution, select Standard Edition and enable the Recommendation Solution Customization feature.

  3. Log in to the PAI-Rec console. In the upper-left corner of the top menu bar, select a region.

  4. In the navigation pane on the left, choose Instance list. Click the instance name to go to the instance details page.

  5. In the Procedure section, click Cloud Service Configuration. This takes you to the System Configuration > Cloud Service Configuration page. Click Edit, configure the following parameters, and then click Exit.

    Resource configuration

    Parameter

    Description

    Modeling

    Machine Learning Platform for AI Workspace

    Enter your default PAI workspace.

    DataWorks Workspace

    Enter the automatically generated DataWorks workspace.

    MaxCompute Workspace

    Enter your MaxCompute project.

    OSS Bucket

    Select your OSS bucket.

    Engine

    Real-time Recall Engine

    For Whether to use PAI-FeatureStore, select Yes.

    Real-time Feature Query

    For Whether to use PAI-FeatureStore, select Yes.

  6. In the navigation pane on the left, choose System Configuration > Permissions. On the Services tab, verify that each cloud product is correctly authorized.

2. Clone the public dataset

1. Synchronize tables

This solution accepts input data in two ways:

  1. Clone data for a fixed time window from the pai_online_project project. This method does not support routine scheduling.

  2. Use a Python script to generate data. You can run a task in DataWorks to generate this data for a specific period.

To schedule daily data generation and model training, use the second method. Deploy the specified Python code to generate the data. For more information, see the Generate data using code tab.

Fixed time window

PAI-Rec provides three common tables for recommendation algorithms in the publicly accessible pai_online_project project:

  • user table: pai_online_project.rec_sln_demo_user_table

  • item table: pai_online_project.rec_sln_demo_item_table

  • behavior table: pai_online_project.rec_sln_demo_behavior_table

The operations in this solution use these three tables. The data is randomly generated for demonstration and is not real business data. Therefore, metrics such as Area Under the Curve (AUC) obtained from training will be low. You must run SQL commands in DataWorks to synchronize the table data from the pai_online_project project to your DataWorks project, for example, DataWorks_a. Follow these steps:

  1. Log on to the DataWorks console. In the top menu bar, select a region.

  2. In the left-side navigation pane, click Data Development and O&M > Data Development.

  3. Select the DataWorks workspace that you created and click Go To Data Development.

  4. Hover over Create and choose Create Node > MaxCompute > ODPS SQL. Configure the parameters as described in the following table and click Confirm.

    Resource configuration

    Parameter

    Description

    Engine instance

    Select the bound MaxCompute data source.

    Node type

    Select ODPS SQL.

    Path

    Select a storage path for the node. For example, Business Flow/Workflow/MaxCompute.

    Name

    Enter a name, for example, Data.

  5. In the new node's editor, paste and run the following code to synchronize the user, item, and behavior tables from the pai_online_project project to your MaxCompute project, for example, project_mc. The code requires a 100-day data window. To define this window to end on the previous day, go to the Scheduling Parameters section, click Add Parameter, and add two parameters: bizdate with the value $[yyyymmdd-1] and bizdate_100 with the value $[yyyymmdd-100].

CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_v1(
 user_id BIGINT COMMENT 'Unique user ID',
 gender STRING COMMENT 'Gender',
 age BIGINT COMMENT 'Age',
 city STRING COMMENT 'City',
 item_cnt BIGINT COMMENT 'Number of created items',
 follow_cnt BIGINT COMMENT 'Number of follows',
 follower_cnt BIGINT COMMENT 'Number of followers',
 register_time BIGINT COMMENT 'Registration time',
 tags STRING COMMENT 'User tags'
) PARTITIONED BY (ds STRING) STORED AS ALIORC;
INSERT OVERWRITE TABLE rec_sln_demo_user_table_v1 PARTITION(ds)
SELECT *
FROM pai_online_project.rec_sln_demo_user_table
WHERE ds >= "${bizdate_100}" and ds <= "${bizdate}";
CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_v1(
 item_id BIGINT COMMENT 'Item ID',
 duration DOUBLE COMMENT 'Video duration',
 title STRING COMMENT 'Title',
 category STRING COMMENT 'Primary tag',
 author BIGINT COMMENT 'Author',
 click_count BIGINT COMMENT 'Total clicks',
 praise_count BIGINT COMMENT 'Total likes',
 pub_time BIGINT COMMENT 'Publication time'
) PARTITIONED BY (ds STRING) STORED AS ALIORC;
INSERT OVERWRITE TABLE rec_sln_demo_item_table_v1 PARTITION(ds)
SELECT *
FROM pai_online_project.rec_sln_demo_item_table
WHERE ds >= "${bizdate_100}" and ds <= "${bizdate}";
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_v1(
 request_id STRING COMMENT 'Instrumentation ID/Request ID',
 user_id STRING COMMENT 'Unique user ID',
 exp_id STRING COMMENT 'Experiment ID',
 page STRING COMMENT 'Page',
 net_type STRING COMMENT 'Network type',
 event_time BIGINT COMMENT 'Behavior time',
 item_id STRING COMMENT 'Item ID',
 event STRING COMMENT 'Behavior type',
 playtime DOUBLE COMMENT 'Playback/Read duration'
) PARTITIONED BY (ds STRING) STORED AS ALIORC;
INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_v1 PARTITION(ds)
SELECT *
FROM pai_online_project.rec_sln_demo_behavior_table
WHERE ds >= "${bizdate_100}" and ds <= "${bizdate}";

Code generation

Periodic task scheduling is not supported for fixed time-window data. To run scheduled tasks, you must deploy a Python script to generate data. Follow these steps:

  1. In the DataWorks console, create a PyODPS 3 node. For more information, see Create and manage MaxCompute nodes.

  2. Download create_data.py and paste its content into the PyODPS 3 node.

  3. Click Configure Scheduling on the right, configure the following parameters, and then click Save image and Submit image in the upper-right corner.

    • Configure scheduling parameters:

      • Set the $user_table_name parameter to rec_sln_demo_user_table.

        Set the $item_table_name parameter to rec_sln_demo_item_table.

        Set the $behavior_table_name parameter to rec_sln_demo_behavior_table.

        In addition to the table name parameters, you must also add a parameter for the business date.

        This parameter is named bizdate.

        Set its value to $[yyyymmdd-1].

    • Configure a scheduling dependency.

  4. Go to the Operation Center and choose Auto Triggered Task O&M > Recurring Job.

  5. In the Actions column of the target task, choose Backfill Data > Current and Descendant Nodes Retroactively.

  6. In the Backfill Data panel, set a business date and click Submit and Go.

    The optimal time range for data backfill is 60 days. We recommend that you set the business date to Task scheduled date - 60 to ensure data integrity.

2. Configure dependency nodes

To ensure smooth code generation and deployment, add three SQL code nodes to your DataWorks project. Set the scheduling dependencies of these nodes to the root node of the workspace. Finally, publish the nodes. The procedure is as follows:

  1. Hover over Create and choose Create Node > General-purpose > Virtual Node. Create three virtual nodes with the following resource configuration, and click Confirm.

    Resource configuration

    Parameter

    Description

    Example

    Node type

    Select the node type.

    Virtual Node

    Path

    The storage path for the node.

    Workflow/Workflow/General

    Name

    A unique name for the node.

    • rec_sln_demo_user_table_v1

    • rec_sln_demo_item_table_v1

    • rec_sln_demo_behavior_table_v1

  2. For each virtual node, set its content to select 1;. Then, click Configure Scheduling on the right and configure the following settings:

    • In the Time property section, set Rerun property to Rerun When Succeeded Or Failed.

    • In the Scheduling dependencies > Upstream dependencies section, enter your DataWorks workspace name, select the node with the _root suffix, and click Add.

      Apply these settings to all three virtual nodes.

  3. Click the image icon next to each virtual node to submit it.

3. Register data

To configure feature engineering, recall, and sorting algorithms in the custom recommendation solution, you must first register the three tables that have been synchronized to your DataWorks project. The procedure is as follows:

  1. Log in to the PAI-Rec console. In the top menu bar, select a region.

  2. In the left-side navigation pane, choose Instance List. Click the instance name to open the instance details page.

  3. In the left-side navigation pane, choose Recommendation Solution Customization > Data Registration. On the MaxCompute Table tab, click Create Data Table. Add one user table, one item table, and one behavior table as described in the following table, and then click Import.

    Parameter

    Description

    Example

    MaxCompute project

    Select your MaxCompute project.

    project_mc

    MaxCompute table

    Select the data tables synchronized to your DataWorks workspace.

    • user table: rec_sln_demo_user_table_v1

    • item table: rec_sln_demo_item_table_v1

    • behavior table: rec_sln_demo_behavior_table_v1

    data table name

    Enter a custom name.

    • user table

    • item table

    • behavior table

4. Create recommendation scenario

Before you can configure a recommendation task, you must first create a recommendation scenario. For an explanation of recommendation scenarios and traffic IDs, see Basic Concepts.

In the left navigation pane, click Recommendation Scenario. Click Create Scenario, configure the new scenario according to the following table, and click Determine.

Resource configuration

Parameter

Description

Example

Scenario name

Enter a custom name.

HomePage

Scenario description

Enter a detailed description.

None

5. Configure an algorithm solution

To fully configure a real-world scenario, use the following recall and fine-grained ranking configurations:

  • global hot recall: Selects the top-k most popular items from log data.

  • global hot fallback recall: Uses Redis as a fallback to prevent empty responses from the recommendation API.

  • grouped hot recall: Recalls items by groups (e.g., city, gender) to provide more accurate recommendations of popular items.

  • etrec u2i recall: Uses the etrec collaborative filtering algorithm.

  • Swing u2i recall (optional): Uses the Swing algorithm.

  • cold-start recall (optional): Handles cold-start scenarios using the DropoutNet algorithm.

  • fine-grained ranking: Uses MultiTower for single-objective ranking and DBMTL for multi-objective ranking.

You should enable algorithms such as vector recall or PDN recall only after the initial recall covers most of your candidates. Vector recall requires a vector recall engine. FeatureDB does not support vector recall, so this example does not configure it.

This article shows a sample configuration and deployment. To simplify the process, this example uses only global hot recall and the u2i recall strategy from RECommender (eTREC, an implementation of collaborative filtering) for the recall configuration, and fine-grained ranking for the ranking configuration. The steps are as follows:

  1. In the left-side navigation pane, choose Recommendation Solution Customization > Solution Configurations. Select the scenario that you created, click Create Recommendation Solution, provide the following resource configuration, and then click Save and Go to Algorithm Solution Configuration.

    Keep the default values for unspecified parameters. For more information, see Data table configuration.

    Resource configuration

    Parameter

    Description

    Solution Name

    Enter a custom name.

    Scenario Name

    Select the recommendation scenario that you created.

    Offline Data Source

    Select the MaxCompute project associated with the recommendation scenario.

    DataWorks Workspace

    Select the DataWorks workspace associated with the recommendation scenario.

    Workflow Name

    The name of the workflow that DataWorks creates when you deploy the solution script. You can enter a custom name, such as Flow.

    StorageAPI configuration

    For regions in the Chinese mainland, such as Beijing and Shanghai, you can select "StorageAPI", which is a pay-as-you-go Data Transmission Service.

    For regions outside the Chinese mainland, such as China (Hong Kong), Singapore, and Frankfurt, you must first purchase and use a dedicated resource group for Data Transmission Service. If a pay-as-you-go option is not available, you must purchase a subscription for Data Transmission Service. Then, refresh the page and select the name of your Data Transmission Service subscription. Add a parameter to the PAI-DLC TorchEasyRec training task in DataWorks, for example: -odps_data_quota_name ot_xxxx_p#ot_yyyy.

    slim_mode

    If the DataWorks edition you purchased has a size limit on the code packages imported by Migration Assistant, you can use this feature and manually upload the code packages that exceed the size limit. For this solution, select No.

    OSS Bucket

    Select the OSS Bucket associated with the recommendation scenario.

    Project

    Select the FeatureStore project that you created. For the online data source, select FeatureDB.

    User Entity

    Select the user entity for the FeatureStore project, which is named 'user'.

    Item Entity

    Select the item entity for the FeatureStore project, which is named 'item'.

  2. In the Configure Table node, click Add next to the target data table. Configure the partitions, events, features, and timestamps for the behavior log table, user table, and item table as detailed in the following sections. Click Next.

    You can keep the default values for any parameters that are not described. For more information, see Data Table Configuration.

    Behavior log table

    Configure the behavior log table settings to match your data. In this example, the behavior log contains core information such as the request ID, user ID, behavior page, behavior timestamp, and behavior category. If your table has richer data dimensions, classify them by user and item to simplify feature engineering.

    Parameter

    Description

    Example

    Behavior Table Name

    Select the registered behavior table.

    rec_sln_demo_behavior_table_v1

    Time Partition

    The partition field of the behavior table.

    ds

    yyyymmdd

    Behavior information

    Request ID

    The ID that identifies each recommendation request in the log. This is typically a UUID generated by the application. This field is optional.

    request_id

    Behavior Event

    The field in the log that records the behavior event.

    event

    Behavior Event Enumeration Values

    The enumerated values for the behavior event, such as impression, click, add-to-cart, or purchase.

    expr, click, praise

    Behavior Value

    The value that indicates the depth of a behavior, such as a transaction price or viewing duration.

    playtime

    Behavior Timestamp

    The event time, as a Unix timestamp in seconds.

    event_time

    Timestamp Format

    Specifies the format of the behavior timestamp.

    unixtime

    Behavior Scenario

    The field specifying the event scenario, such as a home page, search page, or product details page.

    page

    Scenario Enumeration Values

    The scenarios to use for data processing. This enables scenario-specific feature engineering.

    home, detail

    User information

    User ID

    The user ID field in the behavior table.

    user_id

    User Categorical Features

    User categorical features from the behavior table, such as network type, operating platform, or gender.

    net_type

    Item information

    Item ID

    The item ID field in the behavior table.

    item_id

    User table

    Parameter

    Description

    Example

    User Table Name

    Select the registered user table.

    rec_sln_demo_user_table_v1

    Time Partition

    The time partition field of the user table.

    ds

    yyyymmdd

    User information

    User ID

    The user ID field in the user table.

    user_id

    Registration Timestamp

    The time the user registered.

    register_time

    Timestamp Format

    Specifies the format of the registration timestamp.

    unixtime

    Categorical Features

    Categorical fields from the user table, such as gender, age group, or city.

    gender, city

    Numerical Features

    Numerical fields from the user table, such as the number of created items or points.

    age, item_cnt, follow_cnt, follower_cnt

    Tag Feature

    The field name for the tag feature.

    tags

    Item table

    Parameter

    Description

    Example

    Item Table Name

    Select the registered item table.

    rec_sln_demo_item_table_v1

    Time Partition

    The time partition field of the item table.

    ds

    yyyymmdd

    Item information

    Item ID

    The item ID field in the item table.

    item_id

    Author ID

    The author of the item.

    author

    Listing Timestamp

    The field for the item's listing timestamp.

    pub_time

    Timestamp Format

    Specifies the format of the listing timestamp.

    unixtime

    Categorical Features

    Categorical fields from the item table, such as category.

    category

    Numerical Features

    Numerical fields from the item table, such as price, total sales, or number of likes.

    click_count, praise_count

  3. In the Configure Feature node, configure the parameters as described in the following table, click Generate Feature, set a feature version, and then click Next.

    After you click Generate Feature, the system derives various statistical features for users and items. This solution uses the default derived features, but you can edit them based on your business requirements. For more information, see Feature Configuration.

    Resource configuration

    Parameter

    Description

    Example

    Statistical Period

    Use this parameter for batch feature generation. To avoid generating too many features, set the statistics period to 3, 7, and 15 days. This setting calculates statistics for users and items over the specified periods.

    If you have very few user behavior events, try 21 days.

    3,7,15

    Behavior

    Select the configured behavior events. The recommended order is expr (impression), click, and praise.

    expr, click, praise

  4. In the Retrieval Configuration node, click Add next to the target category, configure the parameters, click Confirm, and then click Next.

    This section describes several recall configuration methods. To get started quickly, you only need to configure global hot recall and etrec u2i recall. The other methods are provided for reference.

    Resource configuration

    Global hot recall

    Global hot recall generates a ranking of popular items based on click event statistics, where top_n represents the number of items in the ranking. To modify the popularity scoring formula or access event, generate the relevant code and deploy it to DataWorks.

    The scoring formula is click_uv*click_uv/(expr+adj_factor)*exp(-item_publish_days/fresh_decay_denom), where:

    • click_uv: For the same click-through rate (CTR), a higher number of clicks indicates greater popularity.

    • click_uv/(expr+adj_factor): The smoothed CTR, where click_uv is the number of unique users who clicked and expr is the number of impressions. The adjustment factor adj_factor is added to prevent the denominator from being zero and to adjust the CTR when the number of impressions is low. When impressions are few, the CTR approaches 1. Adding adj_factor moves the CTR away from 1, making it closer to the true CTR.

    • exp(-item_publish_days/fresh_decay_denom): This term penalizes older items. item_publish_days is the number of days since the item was published.

    In the configuration editor, set parameters such as Recall Model Name (for example, global_hot), recall time window (for example, 15 days), recall count (for example, 500), Exposure Behavior Event, Click Behavior Event, the Hot Score Formula switch, recall engine (for example, FeatureStore), and Version.

    Etrec u2i recall

    etrec is an item-based collaborative filtering algorithm. For details, see Collaborative filtering etrec.

    In the Edit Configuration dialog box, set the Recall Model Name (for example, etrec) and recall engine (for example, Hologres). In the U2I Behavior Weight section, you can add events and their corresponding weights, such as a weight of 0 for expr, 1 for click, and 3 for praise.

    Parameter

    Description

    Training days

    The number of days of behavior logs used for training. The default is 30. You can adjust this value based on the log volume.

    Recall count

    The number of user-to-item recommendations to generate offline.

    U2I trigger

    Items the user has interacted with (for example, by clicking, favoriting, or purchasing). This typically excludes items the user only viewed (impressions).

    Behavior time window

    The number of days of behavior data to collect. The default is 15, which includes data from the most recent 15 days.

    Behavior time attenuation coefficient

    A value between 0 and 1. A larger value indicates that past behaviors decay more rapidly and therefore have less weight in constructing the trigger_item.

    Trigger selection count

    The number of item IDs per user to use for the Cartesian product with the I2I data generated by etrec. A value between 10 and 50 is recommended. A large value can generate an excessive number of recall candidates.

    U2i behavior weight

    For exposure events, either do not set a weight or set the weight to 0. We recommend not setting exposure events, thereby skipping user exposure data.

    I2I model settings

    etrec parameter settings. For details, see Collaborative filtering etrec. We recommend not setting a high value for the number of related items. After enabling the I2I model, configure the following parameters: set similarity calculation strategy (sim_type) to wbcosine (options: asymcosine, jaccard); set related item selection count (top_n) to 500; set maximum behaviors per user (max_bhv) to 500; set minimum behaviors per user (min_bhv) to 2; set calculation strategy (operator) to add (options: mul, min, max); set similarity calculation weight coefficient (weight) to 1; and set decay coefficient (alpha) to 0.5. After completing the configuration, click Confirm.

    Grouped hot recall

    You can set up rankings based on attributes such as city and gender to provide initial personalized recall. In the following example, a combination of gender and the bucket ID of a numerical feature is used for grouping.

    On the Edit Configuration page, set recall type to grouped hot recall, Recall Model Name to group_hot, recall time window to 15, and recall count to 500. Set exposure behavior event to expr and click behavior event to click. Enable the Hot Score Formula Settings switch, then set hot score exponent adjustment factor to 100, hot score freshness decay to true, and hot score freshness decay factor to 180. In the User Group Trigger section, click Add to add features: gender (with no bucket boundaries) and follow_cnt (with bucket boundaries of 1,5,10,20). In the Behavior Group Trigger section, add the feature net_type (with no bucket boundaries). Set recall engine to Hologres and Version to 1, then click Confirm.

    Swing u2i recall

    Swing is a method for calculating item relevance that measures item similarity based on the User-Item-User pattern. In the Edit Configuration dialog box, configure the following parameters: set recall type to swing u2i recall, Recall Model Name to swing, training days to 30, and recall count to 500. Set recall engine to Hologres, disable U2I trigger, set behavior time window to 15, decay coefficient to 0.2, and trigger selection count to 10. In the U2I Behavior Weight section, click Add to add behavior weights: set the weight for expr to 0, click to 1, and praise to 3.

    The parameters for the Swing I2I model include: related item selection count (top_n, example: 500), maximum clicks per user (max_click_per_user, example: 600), maximum users per item (max_user_per_item, example: 700), maximum time span (max_time_span, example: 1), adjustment coefficients (alpha1=5, alpha2=1, beta=0.3), item weight calculation method (norm_method, option: COUNT), and Version (version). When you are finished, click Confirm.

    Vector recall

    Two vector recall methods are provided, DSSM and MIND. For more information, see the following:

    • Recall Target Name: Typically indicates whether an item was clicked. Set this to is_click.

    • Recall Target Selection: Set to max(if(event='click', 1, 0)).

      You can refer to the following code:

      select max(if(event='click',1,0)) is_click ,...
      from ${behavior_table}
      where between dt=${bizdate_start} and dt=${bizdate_end}
      group by req_id,user_id,item

      Where:

      • ${behavior_table}: The behavior table.

      • ${bizdate_start}: The start date of the behavior time window.

      • event: The event field in the ${behavior_table} table. Select a value based on your specific schema.

      • is_click: The target name.

      The formulas for dimension calculation are as follows:

      EMB_SQRT4_STEP8: (8 + Pow(count, 0.25)) / 8) * 8
      EMB_SQRT4_STEP4: (4 + Pow(count, 0.25)) / 4) * 4
      EMB_LN_STEP8:    (8 + Log(count + 1)) / 8) * 8
      EMB_LN_STEP4:    (4 + Log(count + 1)) / 4) * 4

      Here, count is the number of feature enumeration values. Use the Log function when a feature has a large number of unique values.

    In the Edit Configuration panel, set Recall Model Name to dssm and Model Type to dssm. Disable the negative sampling strategy. Enable recall target settings and set target type to CLASSIFICATION. Set training days to 30 and embed_dim strategy to EMB_SQRT4_STEP4. Set recall engine to Hologres. For incremental training, select true and set the incremental training days to 1. For both Asynchronous Training and online inference mode, select false. Set interest count to 1 and enable Sample Weight.

    Set sample weight name (name) to sample_weight and sample weight expression (selection) to ln(sum(playtime) + 1). Scene data filtering (scene_values) is used to enter scene data filtering conditions. Separate multiple values with commas. Set Version (version) to 1.

    Cold-start recall

    This is a dual-tower recall model similar to DSSM, divided into a user tower and an item tower. DropoutNet is a recall model suitable for head, long-tail, and even brand-new users and items.

    On the Create Configuration page, set the following parameters: set recall type to cold-start recall and Model Type (model_type) to dropoutnet. Enable Recall Target Settings (label) and configure the recall target name as is_click, the recall target selection as max(if(event='click', 1, 0)), and the target type as CLASSIFICATION. Set training days (train_days) to 30 and embed_dim strategy to EMB_SQRT4_STEP4. Set recall engine to Hologres. Set incremental training to true and set the incremental training days to 1. Set both Asynchronous Training and online inference mode to false. Set Version to 1 and click Confirm.

    Global hot fallback

    Global hot fallback recall is similar to global hot recall. Its main purpose is to provide a fallback with a sufficient candidate set if the primary global hot recall engine fails. Therefore, its output is stored in Redis. This output contains only a single row of data. On the Edit Configuration page, set recall type to global hot fallback recall. Set fallback model name (model_name) to global_hot_supplement, recall time window (day_interval) to 15, and fallback recall count (supplement_top_n) to 500. Set exposure behavior event (exposure_event) to expr and click behavior event (click_event) to click. Disable the Hot Score Formula Settings (hot_score) toggle. Set redis data source name (redis_datasource) to redis and Version (version) to 1.

    Collaborative metric i2i recall

    The collaborative metric learning i2i recall model uses session click data to calculate item-to-item similarity. Configure the model parameters: set the recall model name to CoMetricLearningI2I. After enabling Recall Target Settings, set the recall target name to is_click, the recall target selection to max(if(event='click', 1, 0)), and the target type to CLASSIFICATION. For the remaining settings, set training days to 30, set embed_dim strategy to EMB_SQRT4_STEP4, set recall engine to Hologres, set incremental training to false, set incremental training days to 1, set Asynchronous Training to false, and set version to 1. After completing the configuration, click Confirm.

  5. In the Configure Sorting Method node, click Add next to Fine-grained Ranking, configure the parameters as described below, click Confirm, and then click Next.

    Resource configuration

    The platform provides multiple ranking models. For more information, see ranking model. This section explains how to configure the ranking parameters for the DBMTL multi-objective ranking model.

    On the ranking parameter configuration page, set ranking type to Fine-grained Ranking, enter dbmtl for fine-grained ranking model name (model_name), enter playtime for filter field (exclude_field), set training days (train_days) to 30, select dbmtl for model type selection (model_type), select false for incremental training (is_incremental), set incremental training days (incremental_train_days) to 1, select false for asynchronous training (is_async), and set version (version) to 1.

    Click Create next to Refined Ranking Target Settings (labels) to add the following two labels:

    • Target 1: Set target name to is_click, set target expression to max(if(event='click',1,0)), and select classification for target type.

    • Target 2 (Note: The 'l' in 'ln' is a lowercase L): Set target name to ln_playtime, set target expression to ln(sum(playtime)+1), set target dependency to is_click, select regression for target type, and then click OK.

  6. In the Generate Script node, click Generate Deployment Script.

    Important

    After the deployment script is generated, the system provides an OSS address that contains all the deployment files. You can save this address to run the deployment script manually later.

  7. Once the script is generated, click Determine in the dialog box to go to the Recommendation Solution Customization > Deployment History page.

    If script generation fails, check the run logs to identify and fix the error, then regenerate the script.

6. Deploy the recommendation

Once the script is generated, you can deploy it to DataWorks in one of two ways.

Method 1: Integrated deployment

  1. Click Go to Deploy to the right of the target solution. On the Deployment Records page, you can filter records by scenario and Deployment Status. Find the target recommendation solution, such as pai_rec_testdemo_v1, and confirm that its Deployment Status is Ready.

  2. On the Deployment Preview page, in the Diff File section, select the files to deploy. For a first-time deployment, click Select All and then click Deploy to DataWorks.

    The page returns to the Deployment History page. On this page, the Deployment Status for the pai_rec_testdemo_v1 recommendation solution in the HomePage scenario is displayed as Running.

  3. Wait for a moment, then click image to refresh the list and check the deployment status.

    • If the deployment fails, click View Log in the Actions column. Analyze and resolve the specific error, and then regenerate and deploy the script.

    • When the Deployment Status changes to Successful, you can go to DataStudio in the DataWorks workspace configured for this solution to view the deployed code. For more information, see Data Development: developer. In the left-side navigation pane of DataStudio, expand Business Flow > Workflow > MaxCompute > Data Development to view the deployed node folders, including feature, feature_v726, rank_v1, recall_v1 (with etrec_u2i_recall and global_hot_recall sub-nodes), and test.

  4. View the task data backfill process.

    1. On the Recommendation Solution Customization > Deployment History page, click Details in the Actions column for the successfully deployed solution.

    2. On the Deployment Preview page, click View Task Data Backfill Process to review the process and its instructions.

    3. Ensure that the partitions for the user table, item table, and user behavior table contain data for the most recent n days, where n is the sum of the training time window and the maximum feature time window. If you are using the demo data from this topic, synchronize the latest data partitions. If you are generating data with a Python script, perform a data backfill in the DataWorks Operation Center to generate the latest partitions.

    4. Click Create Data Refill Task in the upper-left corner, and then click Start Task by Order under the Data Refill Task List. Ensure that all tasks run successfully. If a task fails, click Details to view the log information, analyze and resolve the error, and then rerun the task. After a successful rerun, click Continue Run in the upper-left corner of the page until all tasks are successful. The data backfill process DAG shows the complete feature engineering pipeline: at the top are the three data source tables (pairec_demo.rec_sln_demo_item_table_v1, pairec_demo.rec_sln_demo_behavior_table_v1, and pairec_demo.rec_sln_demo_user_table_v1), which sequentially go through preprocessing, wide table construction (such as behavior_table_v1_wide), aggregate statistics (such as item_id_30d_agg and user_id_30d_agg), and static feature extraction to generate full feature tables (such as item_table_v1_all_feat and user_table_v1_all_feat). Finally, a PyODPS3-type create_sync_onlinestore node synchronizes the data to the online feature store. Each node card includes information such as the start and end dates and the order of the data backfill. The Create Backfill Task button in the upper-left corner initiates this data backfill process.

Method 2: Migration Assistant

After the script is generated, you can also manually deploy it using Migration Assistant in the DataWorks console. The key parameters are described below. For other operations, see Create and view a DataWorks import task.

  • Import Name: Set this according to the console prompt.

  • Upload Method: Select OSS File, enter the OSS Link, and click Verify.

    The deployment file is stored at the OSS address generated in Step 5, for example, oss://examplebucket/algoconfig/plan/1723717372/package.zip. You can log on to the OSS console and follow the steps below to get the file's URL. In the left-side navigation pane of the OSS console, click Bucket List. Go to the target bucket and click File List. Find and click the target file, such as package.zip. In the details panel that appears on the right, enable the HTTPS toggle, and then click Copy File URL to get a signed URL. Use this URL as the OSS Link.

7. Freeze nodes

After the backfill completes, freeze the tasks in the Operation Center (the three nodes in step 2.2) to prevent them from running daily.

In the DataWorks Operation Center, navigate to Auto Triggered Task O&M > Recurring Job. Search for the node (for example, rec_sln_demo_user_table_v1), select the node (Workspace.node name), and choose Suspend (Freeze).