Feature production best practices

更新时间:
复制 MD 格式

FeatureStore, a feature platform in Platform for AI (PAI), standardizes common feature engineering operations so you can generate hundreds of features from raw source tables with just a few lines of code. This guide walks you through a recommendation scenario: starting from three source tables, you'll generate user and item feature tables, then merge everything into a complete training dataset ready for model training.

What you'll build

By the end of this guide, you'll have:

  • Three preprocessed source tables (user, item, and behavior) synced into your MaxCompute project

  • A wide behavior table and a label table derived from those sources

  • Two user feature tables generated using AggregationTransform and WindowTransform

  • Two item feature tables generated the same way

  • A final training dataset (fs_demo_fs_engineering_v1_training_set) that joins all seven tables

Prerequisites

Before you begin, make sure the following are in place:

PAI

MaxCompute

DataWorks

How it works

This guide follows four stages:

  1. Prepare — Sync the three demo source tables from pai_online_project into your MaxCompute project, and install the FeatureStore Python SDK.

  2. Transform — Merge and preprocess the source tables to produce a wide behavior table and a label table.

  3. Generate features — Run FeatureStore pipelines to produce aggregation and window feature tables for both users and items.

  4. Build the training dataset — Join all feature tables into a single sample table.

Step 1: Prepare source tables and SDK

Sync source tables

The three source tables for this tutorial are stored in the shared pai_online_project workspace. They contain simulation data only. Use DataWorks to copy them into your own MaxCompute project.

TableContents
rec_sln_demo_user_table_preprocess_v1Basic user features: gender, age, city, number of followers
rec_sln_demo_item_table_preprocess_v1Basic item features: category, author, cumulative clicks, cumulative praises
rec_sln_demo_behavior_table_preprocess_v1Behavior events: which user clicked which item and when

To sync the tables:

  1. Log on to the DataWorks console.

  2. In the left navigation pane, click Data Development and O&M > Data Development.

  3. Select your DataWorks workspace and click Go to Data Studio.

  4. Hover over Create, then choose Create Node > MaxCompute > ODPS SQL. Configure the node parameters:

    ParameterValue
    Node TypeODPS SQL
    PathBusiness Flow/Workflow/MaxCompute
    NameEnter a custom name
  5. Click Confirm.

  6. In the SQL editor, run the following statements. Each block creates the table structure and loads data from the date range 20240530 to 20240605. User table:

    CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_preprocess_v1
    LIKE pai_online_project.rec_sln_demo_user_table_preprocess_v1
    STORED AS ALIORC
    LIFECYCLE 90;
    
    INSERT OVERWRITE TABLE rec_sln_demo_user_table_preprocess_v1 PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_user_table_preprocess_v1
    WHERE ds >= '20240530' AND ds <= '20240605';

    Item table:

    CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_preprocess_v1
    LIKE pai_online_project.rec_sln_demo_item_table_preprocess_v1
    STORED AS ALIORC
    LIFECYCLE 90;
    
    INSERT OVERWRITE TABLE rec_sln_demo_item_table_preprocess_v1 PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_item_table_preprocess_v1
    WHERE ds >= '20240530' AND ds <= '20240605';

    Behavior table:

    CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_v1
    LIKE pai_online_project.rec_sln_demo_behavior_table_preprocess_v1
    STORED AS ALIORC
    LIFECYCLE 90;
    
    INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_v1 PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_behavior_table_preprocess_v1
    WHERE ds >= '20240530' AND ds <= '20240605';

Install the FeatureStore SDK

Run the following in a Jupyter Notebook (Python 3):

%pip install https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whl

Then import the modules you'll need throughout this guide:

import os
from feature_store_py import FeatureStoreClient
from feature_store_py.fs_datasource import MaxComputeDataSource
from feature_store_py.feature_engineering import (
    TableTransform, Condition, DayOf, ComboTransform, Feature,
    AggregationTransform, auto_count_feature_transform,
    WindowTransform, auto_window_feature_transform
)

Step 2: Transform source tables

Before feature generation can begin, the three source tables need to be reshaped into two working tables: a wide behavior table that merges all three sources, and a label table that extracts training labels from the behavior data.

Use the ODPS SQL node you created in step 1 to run both SQL blocks below.

Create the wide behavior table

This statement joins user attributes, item attributes, and behavior events into a single denormalized table that the FeatureStore SDK uses for feature computation.

CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_wide_v1
(
    request_id bigint
    ,user_id string
    ,page string
    ,net_type string
    ,day_h bigint COMMENT 'The hour at which the behavior occurred.'
    ,week_day bigint COMMENT 'The day of the week on which the behavior occurred.'
    ,event_unix_time bigint
    ,item_id string
    ,event string
    ,playtime double
    ,gender string
    ,age bigint
    ,city string
    ,item_cnt bigint
    ,follow_cnt bigint
    ,follower_cnt bigint
    ,is_new_user bigint
    ,tags string
    ,duration double
    ,category string
    ,author bigint
    ,click_count bigint
    ,praise_count bigint
    ,is_new_item bigint
)
PARTITIONED BY
(
    ds string
)
LIFECYCLE 90;

INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_wide_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT  sq0.request_id
        ,sq0.user_id
        ,sq0.page
        ,sq0.net_type
        ,sq0.day_h
        ,sq0.week_day
        ,sq0.event_unix_time
        ,sq0.item_id
        ,sq0.event
        ,sq0.playtime
        ,sq1.gender
        ,sq1.age
        ,sq1.city
        ,sq1.item_cnt
        ,sq1.follow_cnt
        ,sq1.follower_cnt
        ,sq1.is_new_user
        ,sq1.tags
        ,sq2.duration
        ,sq2.category
        ,sq2.author
        ,sq2.click_count
        ,sq2.praise_count
        ,sq2.is_new_item
FROM    (
            SELECT  *
            FROM    rec_sln_demo_behavior_table_preprocess_v1
            WHERE   ds = '${bdp.system.bizdate}'
        ) sq0
LEFT JOIN (
              SELECT  *
              FROM    rec_sln_demo_user_table_preprocess_v1
              WHERE   ds = '${bdp.system.bizdate}'
          ) sq1
ON      sq0.user_id = sq1.user_id
LEFT JOIN (
              SELECT  *
              FROM    rec_sln_demo_item_table_preprocess_v1
              WHERE   ds = '${bdp.system.bizdate}'
          ) sq2
ON      sq0.item_id = sq2.item_id;

Create the label table

This statement derives training labels from the behavior table. The labels are is_click (whether the user clicked), ln_playtime (log-transformed total playtime), and is_praise (whether the user praised the item).

CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_label_table_v1
(
    request_id bigint
    ,user_id string
    ,page string
    ,net_type string
    ,day_h bigint COMMENT 'The hour at which the behavior occurred.'
    ,week_day bigint COMMENT 'The day of the week on which the behavior occurred.'
    ,day_min string
    ,event_unix_time bigint
    ,item_id string
    ,playtime double
    ,is_click BIGINT
    ,ln_playtime DOUBLE
    ,is_praise BIGINT
)
PARTITIONED BY
(
    ds string
)
LIFECYCLE 90;

INSERT OVERWRITE TABLE rec_sln_demo_fs_label_table_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT  request_id
        ,user_id
        ,MAX(page) page
        ,MAX(net_type) net_type
        ,MAX(day_h) day_h
        ,MAX(week_day) week_day
        ,TO_CHAR(FROM_UNIXTIME(MIN(event_unix_time)),'yyyymmddhhmi') day_min
        ,MAX(event_unix_time) event_unix_time
        ,item_id
        ,MAX(playtime) playtime
        ,MAX(IF(event='click', 1, 0)) is_click
        ,LN(SUM(playtime) + 1) ln_playtime
        ,MAX(IF(event='praise', 1, 0)) is_praise
FROM    rec_sln_demo_behavior_table_preprocess_v1
WHERE   ds = '${bdp.system.bizdate}'
GROUP BY request_id, user_id, item_id;

After both statements complete, you'll have:

  • rec_sln_demo_behavior_table_preprocess_wide_v1 — the input for feature generation

  • rec_sln_demo_fs_label_table_v1 — the input for the final sample table

Step 3: Generate features

FeatureStore's automatic extension functions let you generate hundreds of statistical and window features from a single function call, rather than writing aggregation SQL manually. The steps below generate features separately for users and items.

All examples use the FeatureStore SDK pipeline API: define the data sources and transforms, then call execute() to run the computation in MaxCompute.

Initialize the client

# Load credentials from environment variables — never hardcode keys in source code
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")

project = 'project_name'   # Your MaxCompute project name
region = 'cn-hangzhou'     # The region where your project resides

fs_client = FeatureStoreClient(
    access_key_id=access_key_id,
    access_key_secret=access_key_secret,
    region=region
)

Bind source tables to data sources

input_bhv_table_name = "rec_sln_demo_behavior_table_preprocess_wide_v1"
ds_bhv = MaxComputeDataSource(table=input_bhv_table_name, project=project)

input_user_table_name = "rec_sln_demo_user_table_preprocess_v1"
ds_user = MaxComputeDataSource(table=input_user_table_name, project=project)

input_item_table_name = "rec_sln_demo_item_table_preprocess_v1"
ds_item = MaxComputeDataSource(table=input_item_table_name, project=project)

Generate user features

AggregationTransform

auto_count_feature_transform generates aggregation features (sum, max, min, average) for each metric across multiple time windows — in this case, 3-, 7-, and 15-day windows grouped by user_id.

Define features:

agg_user_table_v1 = 'rec_sln_demo_user_table_preprocess_agg_v1'

name_prefix = "user"
input_list = ["playtime", "duration", "click_count", "praise_count"]
event_name = 'event'
event_type = 'expr'
group_by_key = "user_id"
window_size = [3, 7, 15]

user_count_feature_list = auto_count_feature_transform(
    name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(user_count_feature_list))
print("count_feature_list =", user_count_feature_list)

Create the pipeline:

agg_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_count_feature_list)
agg_user_pipeline = fs_client.create_pipeline(ds_user, agg_user_table_v1).merge(
    agg_user_bhv_pipeline, keep_input_columns=False
)

Run and verify:

execute_date = '20240605'
output_agg_user_table = agg_user_pipeline.execute(execute_date, drop_table=True)

# Preview the first 20 rows
agg_user_ret = output_agg_user_table.to_pandas(execute_date, limit=20)
agg_user_ret

WindowTransform

auto_window_feature_transform generates cross-dimensional window features — for example, how a user's click count varies by hour of day or content category across different time windows.

Define features:

win_user_table_v1 = 'rec_sln_demo_user_table_preprocess_win_v1'

name_prefix = 'user'
input_list = ['day_h', 'category']
agg_field = ['duration', 'click_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'user_id'
window_size = [7, 15, 30, 45]

user_win_feature_list = auto_window_feature_transform(
    name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_user_win_feature_list =", len(user_win_feature_list))
print("user_win_feature_list =", user_win_feature_list)

Create the pipeline:

win_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_win_feature_list)
win_user_pipeline = fs_client.create_pipeline(ds_user, win_user_table_v1).merge(
    win_user_bhv_pipeline, keep_input_columns=False
)

Run and verify:

execute_date = '20240605'
output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True)

# Note: WindowTransform uses intermediate tables. On the first run, backfill all
# historical partitions by adding backfill_partitions=True — this may take a while:
# output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)

win_user_ret = output_win_user_table.to_pandas(execute_date, limit=20)
win_user_ret

Generate item features

The item feature generation follows the same pattern as user features, with item_id as the grouping key.

AggregationTransform

Define features:

agg_item_table_v1 = 'rec_sln_demo_item_table_preprocess_agg_v1'

name_prefix = "item"
input_list = ["item_cnt", "follow_cnt", "follower_cnt"]
event_name = 'event'
event_type = 'expr'
group_by_key = "item_id"
window_size = [3, 7, 15]

item_count_feature_list = auto_count_feature_transform(
    name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(item_count_feature_list))
print("count_feature_list =", item_count_feature_list)

Create the pipeline:

agg_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_count_feature_list)
agg_item_pipeline = fs_client.create_pipeline(ds_item, agg_item_table_v1).merge(
    agg_item_bhv_pipeline, keep_input_columns=False
)

Run and verify:

execute_date = '20240605'
output_agg_item_table = agg_item_pipeline.execute(execute_date, drop_table=True)

agg_item_ret = output_agg_item_table.to_pandas(execute_date, limit=20)
agg_item_ret

WindowTransform

Define features:

win_item_table_v1 = 'rec_sln_demo_item_table_preprocess_win_v1'

name_prefix = 'item'
input_list = ['day_h', 'category']
agg_field = ['click_count', 'praise_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'item_id'
window_size = [7, 15, 30, 45]

item_win_feature_list = auto_window_feature_transform(
    name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_item_win_feature_list =", len(item_win_feature_list))
print("item_win_feature_list =", item_win_feature_list)

Create the pipeline:

win_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_win_feature_list)
win_item_pipeline = fs_client.create_pipeline(ds_item, win_item_table_v1).merge(
    win_item_bhv_pipeline, keep_input_columns=False
)

Run and verify:

execute_date = '20240605'
output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True)

# On the first run, add backfill_partitions=True if intermediate partitions are missing:
# output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)

win_item_ret = output_win_item_table.to_pandas(execute_date, limit=20)
win_item_ret

Step 4: Build the training dataset

At this point you have seven tables: one label table, three user-side tables (source, aggregation, window), and three item-side tables (source, aggregation, window). The SQL below joins all seven into a single training dataset using LEFT JOINs keyed on user_id and item_id.

Replace <project_name> with your actual MaxCompute project name before running.

5. Train the model

After you obtain the sample table, you can directly use it for model training. For more information, see Model training.

CREATE TABLE IF NOT EXISTS fs_demo_fs_engineering_v1_training_set(
     request_id BIGINT,
     user_id STRING,
     page STRING,
     net_type STRING,
     day_h BIGINT,
     week_day BIGINT,
     day_min STRING,
     event_unix_time BIGINT,
     item_id STRING,
     playtime DOUBLE,
     is_click BIGINT,
     ln_playtime DOUBLE,
     is_praise BIGINT,
     duration DOUBLE,
     category STRING,
     author BIGINT,
     click_count BIGINT,
     praise_count BIGINT,
     is_new_item BIGINT,
     item__sum_item_cnt_3d BIGINT,
     item__sum_follow_cnt_3d BIGINT,
     item__sum_follower_cnt_3d BIGINT,
     item__max_item_cnt_3d BIGINT,
     item__max_follow_cnt_3d BIGINT,
     item__max_follower_cnt_3d BIGINT,
     item__min_item_cnt_3d BIGINT,
     item__min_follow_cnt_3d BIGINT,
     item__min_follower_cnt_3d BIGINT,
     item__avg_item_cnt_3d DOUBLE,
     item__avg_follow_cnt_3d DOUBLE,
     item__avg_follower_cnt_3d DOUBLE,
     item__sum_item_cnt_7d BIGINT,
     item__sum_follow_cnt_7d BIGINT,
     item__sum_follower_cnt_7d BIGINT,
     item__max_item_cnt_7d BIGINT,
     item__max_follow_cnt_7d BIGINT,
     item__max_follower_cnt_7d BIGINT,
     item__min_item_cnt_7d BIGINT,
     item__min_follow_cnt_7d BIGINT,
     item__min_follower_cnt_7d BIGINT,
     item__avg_item_cnt_7d DOUBLE,
     item__avg_follow_cnt_7d DOUBLE,
     item__avg_follower_cnt_7d DOUBLE,
     item__sum_item_cnt_15d BIGINT,
     item__sum_follow_cnt_15d BIGINT,
     item__sum_follower_cnt_15d BIGINT,
     item__max_item_cnt_15d BIGINT,
     item__max_follow_cnt_15d BIGINT,
     item__max_follower_cnt_15d BIGINT,
     item__min_item_cnt_15d BIGINT,
     item__min_follow_cnt_15d BIGINT,
     item__min_follower_cnt_15d BIGINT,
     item__avg_item_cnt_15d DOUBLE,
     item__avg_follow_cnt_15d DOUBLE,
     item__avg_follower_cnt_15d DOUBLE,
     item__kv_day_h_click_count_sum_7d STRING,
     item__kv_category_click_count_sum_7d STRING,
     item__kv_day_h_praise_count_sum_7d STRING,
     item__kv_category_praise_count_sum_7d STRING,
     item__kv_day_h_click_count_max_7d STRING,
     item__kv_category_click_count_max_7d STRING,
     item__kv_day_h_praise_count_max_7d STRING,
     item__kv_category_praise_count_max_7d STRING,
     item__kv_day_h_click_count_min_7d STRING,
     item__kv_category_click_count_min_7d STRING,
     item__kv_day_h_praise_count_min_7d STRING,
     item__kv_category_praise_count_min_7d STRING,
     item__kv_day_h_click_count_avg_7d STRING,
     item__kv_category_click_count_avg_7d STRING,
     item__kv_day_h_praise_count_avg_7d STRING,
     item__kv_category_praise_count_avg_7d STRING,
     item__kv_day_h_click_count_sum_15d STRING,
     item__kv_category_click_count_sum_15d STRING,
     item__kv_day_h_praise_count_sum_15d STRING,
     item__kv_category_praise_count_sum_15d STRING,
     item__kv_day_h_click_count_max_15d STRING,
     item__kv_category_click_count_max_15d STRING,
     item__kv_day_h_praise_count_max_15d STRING,
     item__kv_category_praise_count_max_15d STRING,
     item__kv_day_h_click_count_min_15d STRING,
     item__kv_category_click_count_min_15d STRING,
     item__kv_day_h_praise_count_min_15d STRING,
     item__kv_category_praise_count_min_15d STRING,
     item__kv_day_h_click_count_avg_15d STRING,
     item__kv_category_click_count_avg_15d STRING,
     item__kv_day_h_praise_count_avg_15d STRING,
     item__kv_category_praise_count_avg_15d STRING,
     item__kv_day_h_click_count_sum_30d STRING,
     item__kv_category_click_count_sum_30d STRING,
     item__kv_day_h_praise_count_sum_30d STRING,
     item__kv_category_praise_count_sum_30d STRING,
     item__kv_day_h_click_count_max_30d STRING,
     item__kv_category_click_count_max_30d STRING,
     item__kv_day_h_praise_count_max_30d STRING,
     item__kv_category_praise_count_max_30d STRING,
     item__kv_day_h_click_count_min_30d STRING,
     item__kv_category_click_count_min_30d STRING,
     item__kv_day_h_praise_count_min_30d STRING,
     item__kv_category_praise_count_min_30d STRING,
     item__kv_day_h_click_count_avg_30d STRING,
     item__kv_category_click_count_avg_30d STRING,
     item__kv_day_h_praise_count_avg_30d STRING,
     item__kv_category_praise_count_avg_30d STRING,
     item__kv_day_h_click_count_sum_45d STRING,
     item__kv_category_click_count_sum_45d STRING,
     item__kv_day_h_praise_count_sum_45d STRING,
     item__kv_category_praise_count_sum_45d STRING,
     item__kv_day_h_click_count_max_45d STRING,
     item__kv_category_click_count_max_45d STRING,
     item__kv_day_h_praise_count_max_45d STRING,
     item__kv_category_praise_count_max_45d STRING,
     item__kv_day_h_click_count_min_45d STRING,
     item__kv_category_click_count_min_45d STRING,
     item__kv_day_h_praise_count_min_45d STRING,
     item__kv_category_praise_count_min_45d STRING,
     item__kv_day_h_click_count_avg_45d STRING,
     item__kv_category_click_count_avg_45d STRING,
     item__kv_day_h_praise_count_avg_45d STRING,
     item__kv_category_praise_count_avg_45d STRING,
     gender STRING,
     age BIGINT,
     city STRING,
     item_cnt BIGINT,
     follow_cnt BIGINT,
     follower_cnt BIGINT,
     is_new_user BIGINT,
     tags STRING,
     user__sum_playtime_3d DOUBLE,
     user__sum_duration_3d DOUBLE,
     user__sum_click_count_3d BIGINT,
     user__sum_praise_count_3d BIGINT,
     user__max_playtime_3d DOUBLE,
     user__max_duration_3d DOUBLE,
     user__max_click_count_3d BIGINT,
     user__max_praise_count_3d BIGINT,
     user__min_playtime_3d DOUBLE,
     user__min_duration_3d DOUBLE,
     user__min_click_count_3d BIGINT,
     user__min_praise_count_3d BIGINT,
     user__avg_playtime_3d DOUBLE,
     user__avg_duration_3d DOUBLE,
     user__avg_click_count_3d DOUBLE,
     user__avg_praise_count_3d DOUBLE,
     user__sum_playtime_7d DOUBLE,
     user__sum_duration_7d DOUBLE,
     user__sum_click_count_7d BIGINT,
     user__sum_praise_count_7d BIGINT,
     user__max_playtime_7d DOUBLE,
     user__max_duration_7d DOUBLE,
     user__max_click_count_7d BIGINT,
     user__max_praise_count_7d BIGINT,
     user__min_playtime_7d DOUBLE,
     user__min_duration_7d DOUBLE,
     user__min_click_count_7d BIGINT,
     user__min_praise_count_7d BIGINT,
     user__avg_playtime_7d DOUBLE,
     user__avg_duration_7d DOUBLE,
     user__avg_click_count_7d DOUBLE,
     user__avg_praise_count_7d DOUBLE,
     user__sum_playtime_15d DOUBLE,
     user__sum_duration_15d DOUBLE,
     user__sum_click_count_15d BIGINT,
     user__sum_praise_count_15d BIGINT,
     user__max_playtime_15d DOUBLE,
     user__max_duration_15d DOUBLE,
     user__max_click_count_15d BIGINT,
     user__max_praise_count_15d BIGINT,
     user__min_playtime_15d DOUBLE,
     user__min_duration_15d DOUBLE,
     user__min_click_count_15d BIGINT,
     user__min_praise_count_15d BIGINT,
     user__avg_playtime_15d DOUBLE,
     user__avg_duration_15d DOUBLE,
     user__avg_click_count_15d DOUBLE,
     user__avg_praise_count_15d DOUBLE,
     user__kv_day_h_duration_sum_7d STRING,
     user__kv_category_duration_sum_7d STRING,
     user__kv_day_h_click_count_sum_7d STRING,
     user__kv_category_click_count_sum_7d STRING,
     user__kv_day_h_duration_max_7d STRING,
     user__kv_category_duration_max_7d STRING,
     user__kv_day_h_click_count_max_7d STRING,
     user__kv_category_click_count_max_7d STRING,
     user__kv_day_h_duration_min_7d STRING,
     user__kv_category_duration_min_7d STRING,
     user__kv_day_h_click_count_min_7d STRING,
     user__kv_category_click_count_min_7d STRING,
     user__kv_day_h_duration_avg_7d STRING,
     user__kv_category_duration_avg_7d STRING,
     user__kv_day_h_click_count_avg_7d STRING,
     user__kv_category_click_count_avg_7d STRING,
     user__kv_day_h_duration_sum_15d STRING,
     user__kv_category_duration_sum_15d STRING,
     user__kv_day_h_click_count_sum_15d STRING,
     user__kv_category_click_count_sum_15d STRING,
     user__kv_day_h_duration_max_15d STRING,
     user__kv_category_duration_max_15d STRING,
     user__kv_day_h_click_count_max_15d STRING,
     user__kv_category_click_count_max_15d STRING,
     user__kv_day_h_duration_min_15d STRING,
     user__kv_category_duration_min_15d STRING,
     user__kv_day_h_click_count_min_15d STRING,
     user__kv_category_click_count_min_15d STRING,
     user__kv_day_h_duration_avg_15d STRING,
     user__kv_category_duration_avg_15d STRING,
     user__kv_day_h_click_count_avg_15d STRING,
     user__kv_category_click_count_avg_15d STRING,
     user__kv_day_h_duration_sum_30d STRING,
     user__kv_category_duration_sum_30d STRING,
     user__kv_day_h_click_count_sum_30d STRING,
     user__kv_category_click_count_sum_30d STRING,
     user__kv_day_h_duration_max_30d STRING,
     user__kv_category_duration_max_30d STRING,
     user__kv_day_h_click_count_max_30d STRING,
     user__kv_category_click_count_max_30d STRING,
     user__kv_day_h_duration_min_30d STRING,
     user__kv_category_duration_min_30d STRING,
     user__kv_day_h_click_count_min_30d STRING,
     user__kv_category_click_count_min_30d STRING,
     user__kv_day_h_duration_avg_30d STRING,
     user__kv_category_duration_avg_30d STRING,
     user__kv_day_h_click_count_avg_30d STRING,
     user__kv_category_click_count_avg_30d STRING,
     user__kv_day_h_duration_sum_45d STRING,
     user__kv_category_duration_sum_45d STRING,
     user__kv_day_h_click_count_sum_45d STRING,
     user__kv_category_click_count_sum_45d STRING,
     user__kv_day_h_duration_max_45d STRING,
     user__kv_category_duration_max_45d STRING,
     user__kv_day_h_click_count_max_45d STRING,
     user__kv_category_click_count_max_45d STRING,
     user__kv_day_h_duration_min_45d STRING,
     user__kv_category_duration_min_45d STRING,
     user__kv_day_h_click_count_min_45d STRING,
     user__kv_category_click_count_min_45d STRING,
     user__kv_day_h_duration_avg_45d STRING,
     user__kv_category_duration_avg_45d STRING,
     user__kv_day_h_click_count_avg_45d STRING,
     user__kv_category_click_count_avg_45d STRING
)
PARTITIONED BY (ds STRING)
LIFECYCLE 90;

INSERT OVERWRITE TABLE fs_demo_fs_engineering_v1_training_set PARTITION (ds = '${bdp.system.bizdate}')
SELECT
sq0.request_id,
sq0.user_id,
sq0.page,
sq0.net_type,
sq0.day_h,
sq0.week_day,
sq0.day_min,
sq0.event_unix_time,
sq0.item_id,
sq0.playtime,
sq0.is_click,
sq0.ln_playtime,
sq0.is_praise,
sq2.duration,
sq2.category,
sq2.author,
sq2.click_count,
sq2.praise_count,
sq2.is_new_item,
sq5.item__sum_item_cnt_3d,
sq5.item__sum_follow_cnt_3d,
sq5.item__sum_follower_cnt_3d,
sq5.item__max_item_cnt_3d,
sq5.item__max_follow_cnt_3d,
sq5.item__max_follower_cnt_3d,
sq5.item__min_item_cnt_3d,
sq5.item__min_follow_cnt_3d,
sq5.item__min_follower_cnt_3d,
sq5.item__avg_item_cnt_3d,
sq5.item__avg_follow_cnt_3d,
sq5.item__avg_follower_cnt_3d,
sq5.item__sum_item_cnt_7d,
sq5.item__sum_follow_cnt_7d,
sq5.item__sum_follower_cnt_7d,
sq5.item__max_item_cnt_7d,
sq5.item__max_follow_cnt_7d,
sq5.item__max_follower_cnt_7d,
sq5.item__min_item_cnt_7d,
sq5.item__min_follow_cnt_7d,
sq5.item__min_follower_cnt_7d,
sq5.item__avg_item_cnt_7d,
sq5.item__avg_follow_cnt_7d,
sq5.item__avg_follower_cnt_7d,
sq5.item__sum_item_cnt_15d,
sq5.item__sum_follow_cnt_15d,
sq5.item__sum_follower_cnt_15d,
sq5.item__max_item_cnt_15d,
sq5.item__max_follow_cnt_15d,
sq5.item__max_follower_cnt_15d,
sq5.item__min_item_cnt_15d,
sq5.item__min_follow_cnt_15d,
sq5.item__min_follower_cnt_15d,
sq5.item__avg_item_cnt_15d,
sq5.item__avg_follow_cnt_15d,
sq5.item__avg_follower_cnt_15d,
sq6.item__kv_day_h_click_count_sum_7d,
sq6.item__kv_category_click_count_sum_7d,
sq6.item__kv_day_h_praise_count_sum_7d,
sq6.item__kv_category_praise_count_sum_7d,
sq6.item__kv_day_h_click_count_max_7d,
sq6.item__kv_category_click_count_max_7d,
sq6.item__kv_day_h_praise_count_max_7d,
sq6.item__kv_category_praise_count_max_7d,
sq6.item__kv_day_h_click_count_min_7d,
sq6.item__kv_category_click_count_min_7d,
sq6.item__kv_day_h_praise_count_min_7d,
sq6.item__kv_category_praise_count_min_7d,
sq6.item__kv_day_h_click_count_avg_7d,
sq6.item__kv_category_click_count_avg_7d,
sq6.item__kv_day_h_praise_count_avg_7d,
sq6.item__kv_category_praise_count_avg_7d,
sq6.item__kv_day_h_click_count_sum_15d,
sq6.item__kv_category_click_count_sum_15d,
sq6.item__kv_day_h_praise_count_sum_15d,
sq6.item__kv_category_praise_count_sum_15d,
sq6.item__kv_day_h_click_count_max_15d,
sq6.item__kv_category_click_count_max_15d,
sq6.item__kv_day_h_praise_count_max_15d,
sq6.item__kv_category_praise_count_max_15d,
sq6.item__kv_day_h_click_count_min_15d,
sq6.item__kv_category_click_count_min_15d,
sq6.item__kv_day_h_praise_count_min_15d,
sq6.item__kv_category_praise_count_min_15d,
sq6.item__kv_day_h_click_count_avg_15d,
sq6.item__kv_category_click_count_avg_15d,
sq6.item__kv_day_h_praise_count_avg_15d,
sq6.item__kv_category_praise_count_avg_15d,
sq6.item__kv_day_h_click_count_sum_30d,
sq6.item__kv_category_click_count_sum_30d,
sq6.item__kv_day_h_praise_count_sum_30d,
sq6.item__kv_category_praise_count_sum_30d,
sq6.item__kv_day_h_click_count_max_30d,
sq6.item__kv_category_click_count_max_30d,
sq6.item__kv_day_h_praise_count_max_30d,
sq6.item__kv_category_praise_count_max_30d,
sq6.item__kv_day_h_click_count_min_30d,
sq6.item__kv_category_click_count_min_30d,
sq6.item__kv_day_h_praise_count_min_30d,
sq6.item__kv_category_praise_count_min_30d,
sq6.item__kv_day_h_click_count_avg_30d,
sq6.item__kv_category_click_count_avg_30d,
sq6.item__kv_day_h_praise_count_avg_30d,
sq6.item__kv_category_praise_count_avg_30d,
sq6.item__kv_day_h_click_count_sum_45d,
sq6.item__kv_category_click_count_sum_45d,
sq6.item__kv_day_h_praise_count_sum_45d,
sq6.item__kv_category_praise_count_sum_45d,
sq6.item__kv_day_h_click_count_max_45d,
sq6.item__kv_category_click_count_max_45d,
sq6.item__kv_day_h_praise_count_max_45d,
sq6.item__kv_category_praise_count_max_45d,
sq6.item__kv_day_h_click_count_min_45d,
sq6.item__kv_category_click_count_min_45d,
sq6.item__kv_day_h_praise_count_min_45d,
sq6.item__kv_category_praise_count_min_45d,
sq6.item__kv_day_h_click_count_avg_45d,
sq6.item__kv_category_click_count_avg_45d,
sq6.item__kv_day_h_praise_count_avg_45d,
sq6.item__kv_category_praise_count_avg_45d,
sq1.gender,
sq1.age,
sq1.city,
sq1.item_cnt,
sq1.follow_cnt,
sq1.follower_cnt,
sq1.is_new_user,
sq1.tags,
sq3.user__sum_playtime_3d,
sq3.user__sum_duration_3d,
sq3.user__sum_click_count_3d,
sq3.user__sum_praise_count_3d,
sq3.user__max_playtime_3d,
sq3.user__max_duration_3d,
sq3.user__max_click_count_3d,
sq3.user__max_praise_count_3d,
sq3.user__min_playtime_3d,
sq3.user__min_duration_3d,
sq3.user__min_click_count_3d,
sq3.user__min_praise_count_3d,
sq3.user__avg_playtime_3d,
sq3.user__avg_duration_3d,
sq3.user__avg_click_count_3d,
sq3.user__avg_praise_count_3d,
sq3.user__sum_playtime_7d,
sq3.user__sum_duration_7d,
sq3.user__sum_click_count_7d,
sq3.user__sum_praise_count_7d,
sq3.user__max_playtime_7d,
sq3.user__max_duration_7d,
sq3.user__max_click_count_7d,
sq3.user__max_praise_count_7d,
sq3.user__min_playtime_7d,
sq3.user__min_duration_7d,
sq3.user__min_click_count_7d,
sq3.user__min_praise_count_7d,
sq3.user__avg_playtime_7d,
sq3.user__avg_duration_7d,
sq3.user__avg_click_count_7d,
sq3.user__avg_praise_count_7d,
sq3.user__sum_playtime_15d,
sq3.user__sum_duration_15d,
sq3.user__sum_click_count_15d,
sq3.user__sum_praise_count_15d,
sq3.user__max_playtime_15d,
sq3.user__max_duration_15d,
sq3.user__max_click_count_15d,
sq3.user__max_praise_count_15d,
sq3.user__min_playtime_15d,
sq3.user__min_duration_15d,
sq3.user__min_click_count_15d,
sq3.user__min_praise_count_15d,
sq3.user__avg_playtime_15d,
sq3.user__avg_duration_15d,
sq3.user__avg_click_count_15d,
sq3.user__avg_praise_count_15d,
sq4.user__kv_day_h_duration_sum_7d,
sq4.user__kv_category_duration_sum_7d,
sq4.user__kv_day_h_click_count_sum_7d,
sq4.user__kv_category_click_count_sum_7d,
sq4.user__kv_day_h_duration_max_7d,
sq4.user__kv_category_duration_max_7d,
sq4.user__kv_day_h_click_count_max_7d,
sq4.user__kv_category_click_count_max_7d,
sq4.user__kv_day_h_duration_min_7d,
sq4.user__kv_category_duration_min_7d,
sq4.user__kv_day_h_click_count_min_7d,
sq4.user__kv_category_click_count_min_7d,
sq4.user__kv_day_h_duration_avg_7d,
sq4.user__kv_category_duration_avg_7d,
sq4.user__kv_day_h_click_count_avg_7d,
sq4.user__kv_category_click_count_avg_7d,
sq4.user__kv_day_h_duration_sum_15d,
sq4.user__kv_category_duration_sum_15d,
sq4.user__kv_day_h_click_count_sum_15d,
sq4.user__kv_category_click_count_sum_15d,
sq4.user__kv_day_h_duration_max_15d,
sq4.user__kv_category_duration_max_15d,
sq4.user__kv_day_h_click_count_max_15d,
sq4.user__kv_category_click_count_max_15d,
sq4.user__kv_day_h_duration_min_15d,
sq4.user__kv_category_duration_min_15d,
sq4.user__kv_day_h_click_count_min_15d,
sq4.user__kv_category_click_count_min_15d,
sq4.user__kv_day_h_duration_avg_15d,
sq4.user__kv_category_duration_avg_15d,
sq4.user__kv_day_h_click_count_avg_15d,
sq4.user__kv_category_click_count_avg_15d,
sq4.user__kv_day_h_duration_sum_30d,
sq4.user__kv_category_duration_sum_30d,
sq4.user__kv_day_h_click_count_sum_30d,
sq4.user__kv_category_click_count_sum_30d,
sq4.user__kv_day_h_duration_max_30d,
sq4.user__kv_category_duration_max_30d,
sq4.user__kv_day_h_click_count_max_30d,
sq4.user__kv_category_click_count_max_30d,
sq4.user__kv_day_h_duration_min_30d,
sq4.user__kv_category_duration_min_30d,
sq4.user__kv_day_h_click_count_min_30d,
sq4.user__kv_category_click_count_min_30d,
sq4.user__kv_day_h_duration_avg_30d,
sq4.user__kv_category_duration_avg_30d,
sq4.user__kv_day_h_click_count_avg_30d,
sq4.user__kv_category_click_count_avg_30d,
sq4.user__kv_day_h_duration_sum_45d,
sq4.user__kv_category_duration_sum_45d,
sq4.user__kv_day_h_click_count_sum_45d,
sq4.user__kv_category_click_count_sum_45d,
sq4.user__kv_day_h_duration_max_45d,
sq4.user__kv_category_duration_max_45d,
sq4.user__kv_day_h_click_count_max_45d,
sq4.user__kv_category_click_count_max_45d,
sq4.user__kv_day_h_duration_min_45d,
sq4.user__kv_category_duration_min_45d,
sq4.user__kv_day_h_click_count_min_45d,
sq4.user__kv_category_click_count_min_45d,
sq4.user__kv_day_h_duration_avg_45d,
sq4.user__kv_category_duration_avg_45d,
sq4.user__kv_day_h_click_count_avg_45d,
sq4.user__kv_category_click_count_avg_45d
FROM
(
    SELECT *
    FROM rec_sln_demo_fs_label_table_v1
    WHERE ds = '${bdp.system.bizdate}'
) sq0
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_user_table_preprocess_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq1 ON sq0.user_id = sq1.user_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_item_table_preprocess_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq2 ON sq0.item_id = sq2.item_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_user_table_preprocess_agg_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq3 ON sq0.user_id = sq3.user_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_user_table_preprocess_win_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq4 ON sq0.user_id = sq4.user_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_item_table_preprocess_agg_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq5 ON sq0.item_id = sq5.item_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_item_table_preprocess_win_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq6 ON sq0.item_id = sq6.item_id;

The result is fs_demo_fs_engineering_v1_training_set, a partitioned table with a 90-day lifecycle that combines all user features, item features, and training labels in a single row per request.

What's next