FeatureStore, a feature platform in Platform for AI (PAI), standardizes common feature engineering operations so you can generate hundreds of features from raw source tables with just a few lines of code. This guide walks you through a recommendation scenario: starting from three source tables, you'll generate user and item feature tables, then merge everything into a complete training dataset ready for model training.
What you'll build
By the end of this guide, you'll have:
Three preprocessed source tables (user, item, and behavior) synced into your MaxCompute project
A wide behavior table and a label table derived from those sources
Two user feature tables generated using
AggregationTransformandWindowTransformTwo item feature tables generated the same way
A final training dataset (
fs_demo_fs_engineering_v1_training_set) that joins all seven tables
Prerequisites
Before you begin, make sure the following are in place:
PAI
Prepare your AccessKey ID and AccessKey secret. Store them as environment variables rather than hardcoding them in your code. For setup instructions, see Create an AccessKey pair and Configure the authentication AccessKey in the environment variable
MaxCompute
DataWorks
Activate DataWorks
Purchase an exclusive resource group for scheduling and attach it to your DataWorks workspace. See Use an exclusive resource group for scheduling
Add a MaxCompute data source to your DataWorks workspace. See Bind a MaxCompute computing resource
How it works
This guide follows four stages:
Prepare — Sync the three demo source tables from
pai_online_projectinto your MaxCompute project, and install the FeatureStore Python SDK.Transform — Merge and preprocess the source tables to produce a wide behavior table and a label table.
Generate features — Run FeatureStore pipelines to produce aggregation and window feature tables for both users and items.
Build the training dataset — Join all feature tables into a single sample table.
Step 1: Prepare source tables and SDK
Sync source tables
The three source tables for this tutorial are stored in the shared pai_online_project workspace. They contain simulation data only. Use DataWorks to copy them into your own MaxCompute project.
| Table | Contents |
|---|---|
rec_sln_demo_user_table_preprocess_v1 | Basic user features: gender, age, city, number of followers |
rec_sln_demo_item_table_preprocess_v1 | Basic item features: category, author, cumulative clicks, cumulative praises |
rec_sln_demo_behavior_table_preprocess_v1 | Behavior events: which user clicked which item and when |
To sync the tables:
Log on to the DataWorks console.
In the left navigation pane, click Data Development and O&M > Data Development.
Select your DataWorks workspace and click Go to Data Studio.
Hover over Create, then choose Create Node > MaxCompute > ODPS SQL. Configure the node parameters:
Parameter Value Node Type ODPS SQL Path Business Flow/Workflow/MaxCompute Name Enter a custom name Click Confirm.
In the SQL editor, run the following statements. Each block creates the table structure and loads data from the date range
20240530to20240605. User table:CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_preprocess_v1 LIKE pai_online_project.rec_sln_demo_user_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_user_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_user_table_preprocess_v1 WHERE ds >= '20240530' AND ds <= '20240605';Item table:
CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_preprocess_v1 LIKE pai_online_project.rec_sln_demo_item_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_item_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_item_table_preprocess_v1 WHERE ds >= '20240530' AND ds <= '20240605';Behavior table:
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_v1 LIKE pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 WHERE ds >= '20240530' AND ds <= '20240605';
Install the FeatureStore SDK
Run the following in a Jupyter Notebook (Python 3):
%pip install https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whlThen import the modules you'll need throughout this guide:
import os
from feature_store_py import FeatureStoreClient
from feature_store_py.fs_datasource import MaxComputeDataSource
from feature_store_py.feature_engineering import (
TableTransform, Condition, DayOf, ComboTransform, Feature,
AggregationTransform, auto_count_feature_transform,
WindowTransform, auto_window_feature_transform
)Step 2: Transform source tables
Before feature generation can begin, the three source tables need to be reshaped into two working tables: a wide behavior table that merges all three sources, and a label table that extracts training labels from the behavior data.
Use the ODPS SQL node you created in step 1 to run both SQL blocks below.
Create the wide behavior table
This statement joins user attributes, item attributes, and behavior events into a single denormalized table that the FeatureStore SDK uses for feature computation.
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_wide_v1
(
request_id bigint
,user_id string
,page string
,net_type string
,day_h bigint COMMENT 'The hour at which the behavior occurred.'
,week_day bigint COMMENT 'The day of the week on which the behavior occurred.'
,event_unix_time bigint
,item_id string
,event string
,playtime double
,gender string
,age bigint
,city string
,item_cnt bigint
,follow_cnt bigint
,follower_cnt bigint
,is_new_user bigint
,tags string
,duration double
,category string
,author bigint
,click_count bigint
,praise_count bigint
,is_new_item bigint
)
PARTITIONED BY
(
ds string
)
LIFECYCLE 90;
INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_wide_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT sq0.request_id
,sq0.user_id
,sq0.page
,sq0.net_type
,sq0.day_h
,sq0.week_day
,sq0.event_unix_time
,sq0.item_id
,sq0.event
,sq0.playtime
,sq1.gender
,sq1.age
,sq1.city
,sq1.item_cnt
,sq1.follow_cnt
,sq1.follower_cnt
,sq1.is_new_user
,sq1.tags
,sq2.duration
,sq2.category
,sq2.author
,sq2.click_count
,sq2.praise_count
,sq2.is_new_item
FROM (
SELECT *
FROM rec_sln_demo_behavior_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
) sq0
LEFT JOIN (
SELECT *
FROM rec_sln_demo_user_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
) sq1
ON sq0.user_id = sq1.user_id
LEFT JOIN (
SELECT *
FROM rec_sln_demo_item_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
) sq2
ON sq0.item_id = sq2.item_id;Create the label table
This statement derives training labels from the behavior table. The labels are is_click (whether the user clicked), ln_playtime (log-transformed total playtime), and is_praise (whether the user praised the item).
CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_label_table_v1
(
request_id bigint
,user_id string
,page string
,net_type string
,day_h bigint COMMENT 'The hour at which the behavior occurred.'
,week_day bigint COMMENT 'The day of the week on which the behavior occurred.'
,day_min string
,event_unix_time bigint
,item_id string
,playtime double
,is_click BIGINT
,ln_playtime DOUBLE
,is_praise BIGINT
)
PARTITIONED BY
(
ds string
)
LIFECYCLE 90;
INSERT OVERWRITE TABLE rec_sln_demo_fs_label_table_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT request_id
,user_id
,MAX(page) page
,MAX(net_type) net_type
,MAX(day_h) day_h
,MAX(week_day) week_day
,TO_CHAR(FROM_UNIXTIME(MIN(event_unix_time)),'yyyymmddhhmi') day_min
,MAX(event_unix_time) event_unix_time
,item_id
,MAX(playtime) playtime
,MAX(IF(event='click', 1, 0)) is_click
,LN(SUM(playtime) + 1) ln_playtime
,MAX(IF(event='praise', 1, 0)) is_praise
FROM rec_sln_demo_behavior_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
GROUP BY request_id, user_id, item_id;After both statements complete, you'll have:
rec_sln_demo_behavior_table_preprocess_wide_v1— the input for feature generationrec_sln_demo_fs_label_table_v1— the input for the final sample table
Step 3: Generate features
FeatureStore's automatic extension functions let you generate hundreds of statistical and window features from a single function call, rather than writing aggregation SQL manually. The steps below generate features separately for users and items.
All examples use the FeatureStore SDK pipeline API: define the data sources and transforms, then call execute() to run the computation in MaxCompute.
Initialize the client
# Load credentials from environment variables — never hardcode keys in source code
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
project = 'project_name' # Your MaxCompute project name
region = 'cn-hangzhou' # The region where your project resides
fs_client = FeatureStoreClient(
access_key_id=access_key_id,
access_key_secret=access_key_secret,
region=region
)Bind source tables to data sources
input_bhv_table_name = "rec_sln_demo_behavior_table_preprocess_wide_v1"
ds_bhv = MaxComputeDataSource(table=input_bhv_table_name, project=project)
input_user_table_name = "rec_sln_demo_user_table_preprocess_v1"
ds_user = MaxComputeDataSource(table=input_user_table_name, project=project)
input_item_table_name = "rec_sln_demo_item_table_preprocess_v1"
ds_item = MaxComputeDataSource(table=input_item_table_name, project=project)Generate user features
AggregationTransform
auto_count_feature_transform generates aggregation features (sum, max, min, average) for each metric across multiple time windows — in this case, 3-, 7-, and 15-day windows grouped by user_id.
Define features:
agg_user_table_v1 = 'rec_sln_demo_user_table_preprocess_agg_v1'
name_prefix = "user"
input_list = ["playtime", "duration", "click_count", "praise_count"]
event_name = 'event'
event_type = 'expr'
group_by_key = "user_id"
window_size = [3, 7, 15]
user_count_feature_list = auto_count_feature_transform(
name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(user_count_feature_list))
print("count_feature_list =", user_count_feature_list)Create the pipeline:
agg_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_count_feature_list)
agg_user_pipeline = fs_client.create_pipeline(ds_user, agg_user_table_v1).merge(
agg_user_bhv_pipeline, keep_input_columns=False
)Run and verify:
execute_date = '20240605'
output_agg_user_table = agg_user_pipeline.execute(execute_date, drop_table=True)
# Preview the first 20 rows
agg_user_ret = output_agg_user_table.to_pandas(execute_date, limit=20)
agg_user_retWindowTransform
auto_window_feature_transform generates cross-dimensional window features — for example, how a user's click count varies by hour of day or content category across different time windows.
Define features:
win_user_table_v1 = 'rec_sln_demo_user_table_preprocess_win_v1'
name_prefix = 'user'
input_list = ['day_h', 'category']
agg_field = ['duration', 'click_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'user_id'
window_size = [7, 15, 30, 45]
user_win_feature_list = auto_window_feature_transform(
name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_user_win_feature_list =", len(user_win_feature_list))
print("user_win_feature_list =", user_win_feature_list)Create the pipeline:
win_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_win_feature_list)
win_user_pipeline = fs_client.create_pipeline(ds_user, win_user_table_v1).merge(
win_user_bhv_pipeline, keep_input_columns=False
)Run and verify:
execute_date = '20240605'
output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True)
# Note: WindowTransform uses intermediate tables. On the first run, backfill all
# historical partitions by adding backfill_partitions=True — this may take a while:
# output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)
win_user_ret = output_win_user_table.to_pandas(execute_date, limit=20)
win_user_retGenerate item features
The item feature generation follows the same pattern as user features, with item_id as the grouping key.
AggregationTransform
Define features:
agg_item_table_v1 = 'rec_sln_demo_item_table_preprocess_agg_v1'
name_prefix = "item"
input_list = ["item_cnt", "follow_cnt", "follower_cnt"]
event_name = 'event'
event_type = 'expr'
group_by_key = "item_id"
window_size = [3, 7, 15]
item_count_feature_list = auto_count_feature_transform(
name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(item_count_feature_list))
print("count_feature_list =", item_count_feature_list)Create the pipeline:
agg_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_count_feature_list)
agg_item_pipeline = fs_client.create_pipeline(ds_item, agg_item_table_v1).merge(
agg_item_bhv_pipeline, keep_input_columns=False
)Run and verify:
execute_date = '20240605'
output_agg_item_table = agg_item_pipeline.execute(execute_date, drop_table=True)
agg_item_ret = output_agg_item_table.to_pandas(execute_date, limit=20)
agg_item_retWindowTransform
Define features:
win_item_table_v1 = 'rec_sln_demo_item_table_preprocess_win_v1'
name_prefix = 'item'
input_list = ['day_h', 'category']
agg_field = ['click_count', 'praise_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'item_id'
window_size = [7, 15, 30, 45]
item_win_feature_list = auto_window_feature_transform(
name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_item_win_feature_list =", len(item_win_feature_list))
print("item_win_feature_list =", item_win_feature_list)Create the pipeline:
win_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_win_feature_list)
win_item_pipeline = fs_client.create_pipeline(ds_item, win_item_table_v1).merge(
win_item_bhv_pipeline, keep_input_columns=False
)Run and verify:
execute_date = '20240605'
output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True)
# On the first run, add backfill_partitions=True if intermediate partitions are missing:
# output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)
win_item_ret = output_win_item_table.to_pandas(execute_date, limit=20)
win_item_retStep 4: Build the training dataset
At this point you have seven tables: one label table, three user-side tables (source, aggregation, window), and three item-side tables (source, aggregation, window). The SQL below joins all seven into a single training dataset using LEFT JOINs keyed on user_id and item_id.
Replace <project_name> with your actual MaxCompute project name before running.
5. Train the model
After you obtain the sample table, you can directly use it for model training. For more information, see Model training.
CREATE TABLE IF NOT EXISTS fs_demo_fs_engineering_v1_training_set(
request_id BIGINT,
user_id STRING,
page STRING,
net_type STRING,
day_h BIGINT,
week_day BIGINT,
day_min STRING,
event_unix_time BIGINT,
item_id STRING,
playtime DOUBLE,
is_click BIGINT,
ln_playtime DOUBLE,
is_praise BIGINT,
duration DOUBLE,
category STRING,
author BIGINT,
click_count BIGINT,
praise_count BIGINT,
is_new_item BIGINT,
item__sum_item_cnt_3d BIGINT,
item__sum_follow_cnt_3d BIGINT,
item__sum_follower_cnt_3d BIGINT,
item__max_item_cnt_3d BIGINT,
item__max_follow_cnt_3d BIGINT,
item__max_follower_cnt_3d BIGINT,
item__min_item_cnt_3d BIGINT,
item__min_follow_cnt_3d BIGINT,
item__min_follower_cnt_3d BIGINT,
item__avg_item_cnt_3d DOUBLE,
item__avg_follow_cnt_3d DOUBLE,
item__avg_follower_cnt_3d DOUBLE,
item__sum_item_cnt_7d BIGINT,
item__sum_follow_cnt_7d BIGINT,
item__sum_follower_cnt_7d BIGINT,
item__max_item_cnt_7d BIGINT,
item__max_follow_cnt_7d BIGINT,
item__max_follower_cnt_7d BIGINT,
item__min_item_cnt_7d BIGINT,
item__min_follow_cnt_7d BIGINT,
item__min_follower_cnt_7d BIGINT,
item__avg_item_cnt_7d DOUBLE,
item__avg_follow_cnt_7d DOUBLE,
item__avg_follower_cnt_7d DOUBLE,
item__sum_item_cnt_15d BIGINT,
item__sum_follow_cnt_15d BIGINT,
item__sum_follower_cnt_15d BIGINT,
item__max_item_cnt_15d BIGINT,
item__max_follow_cnt_15d BIGINT,
item__max_follower_cnt_15d BIGINT,
item__min_item_cnt_15d BIGINT,
item__min_follow_cnt_15d BIGINT,
item__min_follower_cnt_15d BIGINT,
item__avg_item_cnt_15d DOUBLE,
item__avg_follow_cnt_15d DOUBLE,
item__avg_follower_cnt_15d DOUBLE,
item__kv_day_h_click_count_sum_7d STRING,
item__kv_category_click_count_sum_7d STRING,
item__kv_day_h_praise_count_sum_7d STRING,
item__kv_category_praise_count_sum_7d STRING,
item__kv_day_h_click_count_max_7d STRING,
item__kv_category_click_count_max_7d STRING,
item__kv_day_h_praise_count_max_7d STRING,
item__kv_category_praise_count_max_7d STRING,
item__kv_day_h_click_count_min_7d STRING,
item__kv_category_click_count_min_7d STRING,
item__kv_day_h_praise_count_min_7d STRING,
item__kv_category_praise_count_min_7d STRING,
item__kv_day_h_click_count_avg_7d STRING,
item__kv_category_click_count_avg_7d STRING,
item__kv_day_h_praise_count_avg_7d STRING,
item__kv_category_praise_count_avg_7d STRING,
item__kv_day_h_click_count_sum_15d STRING,
item__kv_category_click_count_sum_15d STRING,
item__kv_day_h_praise_count_sum_15d STRING,
item__kv_category_praise_count_sum_15d STRING,
item__kv_day_h_click_count_max_15d STRING,
item__kv_category_click_count_max_15d STRING,
item__kv_day_h_praise_count_max_15d STRING,
item__kv_category_praise_count_max_15d STRING,
item__kv_day_h_click_count_min_15d STRING,
item__kv_category_click_count_min_15d STRING,
item__kv_day_h_praise_count_min_15d STRING,
item__kv_category_praise_count_min_15d STRING,
item__kv_day_h_click_count_avg_15d STRING,
item__kv_category_click_count_avg_15d STRING,
item__kv_day_h_praise_count_avg_15d STRING,
item__kv_category_praise_count_avg_15d STRING,
item__kv_day_h_click_count_sum_30d STRING,
item__kv_category_click_count_sum_30d STRING,
item__kv_day_h_praise_count_sum_30d STRING,
item__kv_category_praise_count_sum_30d STRING,
item__kv_day_h_click_count_max_30d STRING,
item__kv_category_click_count_max_30d STRING,
item__kv_day_h_praise_count_max_30d STRING,
item__kv_category_praise_count_max_30d STRING,
item__kv_day_h_click_count_min_30d STRING,
item__kv_category_click_count_min_30d STRING,
item__kv_day_h_praise_count_min_30d STRING,
item__kv_category_praise_count_min_30d STRING,
item__kv_day_h_click_count_avg_30d STRING,
item__kv_category_click_count_avg_30d STRING,
item__kv_day_h_praise_count_avg_30d STRING,
item__kv_category_praise_count_avg_30d STRING,
item__kv_day_h_click_count_sum_45d STRING,
item__kv_category_click_count_sum_45d STRING,
item__kv_day_h_praise_count_sum_45d STRING,
item__kv_category_praise_count_sum_45d STRING,
item__kv_day_h_click_count_max_45d STRING,
item__kv_category_click_count_max_45d STRING,
item__kv_day_h_praise_count_max_45d STRING,
item__kv_category_praise_count_max_45d STRING,
item__kv_day_h_click_count_min_45d STRING,
item__kv_category_click_count_min_45d STRING,
item__kv_day_h_praise_count_min_45d STRING,
item__kv_category_praise_count_min_45d STRING,
item__kv_day_h_click_count_avg_45d STRING,
item__kv_category_click_count_avg_45d STRING,
item__kv_day_h_praise_count_avg_45d STRING,
item__kv_category_praise_count_avg_45d STRING,
gender STRING,
age BIGINT,
city STRING,
item_cnt BIGINT,
follow_cnt BIGINT,
follower_cnt BIGINT,
is_new_user BIGINT,
tags STRING,
user__sum_playtime_3d DOUBLE,
user__sum_duration_3d DOUBLE,
user__sum_click_count_3d BIGINT,
user__sum_praise_count_3d BIGINT,
user__max_playtime_3d DOUBLE,
user__max_duration_3d DOUBLE,
user__max_click_count_3d BIGINT,
user__max_praise_count_3d BIGINT,
user__min_playtime_3d DOUBLE,
user__min_duration_3d DOUBLE,
user__min_click_count_3d BIGINT,
user__min_praise_count_3d BIGINT,
user__avg_playtime_3d DOUBLE,
user__avg_duration_3d DOUBLE,
user__avg_click_count_3d DOUBLE,
user__avg_praise_count_3d DOUBLE,
user__sum_playtime_7d DOUBLE,
user__sum_duration_7d DOUBLE,
user__sum_click_count_7d BIGINT,
user__sum_praise_count_7d BIGINT,
user__max_playtime_7d DOUBLE,
user__max_duration_7d DOUBLE,
user__max_click_count_7d BIGINT,
user__max_praise_count_7d BIGINT,
user__min_playtime_7d DOUBLE,
user__min_duration_7d DOUBLE,
user__min_click_count_7d BIGINT,
user__min_praise_count_7d BIGINT,
user__avg_playtime_7d DOUBLE,
user__avg_duration_7d DOUBLE,
user__avg_click_count_7d DOUBLE,
user__avg_praise_count_7d DOUBLE,
user__sum_playtime_15d DOUBLE,
user__sum_duration_15d DOUBLE,
user__sum_click_count_15d BIGINT,
user__sum_praise_count_15d BIGINT,
user__max_playtime_15d DOUBLE,
user__max_duration_15d DOUBLE,
user__max_click_count_15d BIGINT,
user__max_praise_count_15d BIGINT,
user__min_playtime_15d DOUBLE,
user__min_duration_15d DOUBLE,
user__min_click_count_15d BIGINT,
user__min_praise_count_15d BIGINT,
user__avg_playtime_15d DOUBLE,
user__avg_duration_15d DOUBLE,
user__avg_click_count_15d DOUBLE,
user__avg_praise_count_15d DOUBLE,
user__kv_day_h_duration_sum_7d STRING,
user__kv_category_duration_sum_7d STRING,
user__kv_day_h_click_count_sum_7d STRING,
user__kv_category_click_count_sum_7d STRING,
user__kv_day_h_duration_max_7d STRING,
user__kv_category_duration_max_7d STRING,
user__kv_day_h_click_count_max_7d STRING,
user__kv_category_click_count_max_7d STRING,
user__kv_day_h_duration_min_7d STRING,
user__kv_category_duration_min_7d STRING,
user__kv_day_h_click_count_min_7d STRING,
user__kv_category_click_count_min_7d STRING,
user__kv_day_h_duration_avg_7d STRING,
user__kv_category_duration_avg_7d STRING,
user__kv_day_h_click_count_avg_7d STRING,
user__kv_category_click_count_avg_7d STRING,
user__kv_day_h_duration_sum_15d STRING,
user__kv_category_duration_sum_15d STRING,
user__kv_day_h_click_count_sum_15d STRING,
user__kv_category_click_count_sum_15d STRING,
user__kv_day_h_duration_max_15d STRING,
user__kv_category_duration_max_15d STRING,
user__kv_day_h_click_count_max_15d STRING,
user__kv_category_click_count_max_15d STRING,
user__kv_day_h_duration_min_15d STRING,
user__kv_category_duration_min_15d STRING,
user__kv_day_h_click_count_min_15d STRING,
user__kv_category_click_count_min_15d STRING,
user__kv_day_h_duration_avg_15d STRING,
user__kv_category_duration_avg_15d STRING,
user__kv_day_h_click_count_avg_15d STRING,
user__kv_category_click_count_avg_15d STRING,
user__kv_day_h_duration_sum_30d STRING,
user__kv_category_duration_sum_30d STRING,
user__kv_day_h_click_count_sum_30d STRING,
user__kv_category_click_count_sum_30d STRING,
user__kv_day_h_duration_max_30d STRING,
user__kv_category_duration_max_30d STRING,
user__kv_day_h_click_count_max_30d STRING,
user__kv_category_click_count_max_30d STRING,
user__kv_day_h_duration_min_30d STRING,
user__kv_category_duration_min_30d STRING,
user__kv_day_h_click_count_min_30d STRING,
user__kv_category_click_count_min_30d STRING,
user__kv_day_h_duration_avg_30d STRING,
user__kv_category_duration_avg_30d STRING,
user__kv_day_h_click_count_avg_30d STRING,
user__kv_category_click_count_avg_30d STRING,
user__kv_day_h_duration_sum_45d STRING,
user__kv_category_duration_sum_45d STRING,
user__kv_day_h_click_count_sum_45d STRING,
user__kv_category_click_count_sum_45d STRING,
user__kv_day_h_duration_max_45d STRING,
user__kv_category_duration_max_45d STRING,
user__kv_day_h_click_count_max_45d STRING,
user__kv_category_click_count_max_45d STRING,
user__kv_day_h_duration_min_45d STRING,
user__kv_category_duration_min_45d STRING,
user__kv_day_h_click_count_min_45d STRING,
user__kv_category_click_count_min_45d STRING,
user__kv_day_h_duration_avg_45d STRING,
user__kv_category_duration_avg_45d STRING,
user__kv_day_h_click_count_avg_45d STRING,
user__kv_category_click_count_avg_45d STRING
)
PARTITIONED BY (ds STRING)
LIFECYCLE 90;
INSERT OVERWRITE TABLE fs_demo_fs_engineering_v1_training_set PARTITION (ds = '${bdp.system.bizdate}')
SELECT
sq0.request_id,
sq0.user_id,
sq0.page,
sq0.net_type,
sq0.day_h,
sq0.week_day,
sq0.day_min,
sq0.event_unix_time,
sq0.item_id,
sq0.playtime,
sq0.is_click,
sq0.ln_playtime,
sq0.is_praise,
sq2.duration,
sq2.category,
sq2.author,
sq2.click_count,
sq2.praise_count,
sq2.is_new_item,
sq5.item__sum_item_cnt_3d,
sq5.item__sum_follow_cnt_3d,
sq5.item__sum_follower_cnt_3d,
sq5.item__max_item_cnt_3d,
sq5.item__max_follow_cnt_3d,
sq5.item__max_follower_cnt_3d,
sq5.item__min_item_cnt_3d,
sq5.item__min_follow_cnt_3d,
sq5.item__min_follower_cnt_3d,
sq5.item__avg_item_cnt_3d,
sq5.item__avg_follow_cnt_3d,
sq5.item__avg_follower_cnt_3d,
sq5.item__sum_item_cnt_7d,
sq5.item__sum_follow_cnt_7d,
sq5.item__sum_follower_cnt_7d,
sq5.item__max_item_cnt_7d,
sq5.item__max_follow_cnt_7d,
sq5.item__max_follower_cnt_7d,
sq5.item__min_item_cnt_7d,
sq5.item__min_follow_cnt_7d,
sq5.item__min_follower_cnt_7d,
sq5.item__avg_item_cnt_7d,
sq5.item__avg_follow_cnt_7d,
sq5.item__avg_follower_cnt_7d,
sq5.item__sum_item_cnt_15d,
sq5.item__sum_follow_cnt_15d,
sq5.item__sum_follower_cnt_15d,
sq5.item__max_item_cnt_15d,
sq5.item__max_follow_cnt_15d,
sq5.item__max_follower_cnt_15d,
sq5.item__min_item_cnt_15d,
sq5.item__min_follow_cnt_15d,
sq5.item__min_follower_cnt_15d,
sq5.item__avg_item_cnt_15d,
sq5.item__avg_follow_cnt_15d,
sq5.item__avg_follower_cnt_15d,
sq6.item__kv_day_h_click_count_sum_7d,
sq6.item__kv_category_click_count_sum_7d,
sq6.item__kv_day_h_praise_count_sum_7d,
sq6.item__kv_category_praise_count_sum_7d,
sq6.item__kv_day_h_click_count_max_7d,
sq6.item__kv_category_click_count_max_7d,
sq6.item__kv_day_h_praise_count_max_7d,
sq6.item__kv_category_praise_count_max_7d,
sq6.item__kv_day_h_click_count_min_7d,
sq6.item__kv_category_click_count_min_7d,
sq6.item__kv_day_h_praise_count_min_7d,
sq6.item__kv_category_praise_count_min_7d,
sq6.item__kv_day_h_click_count_avg_7d,
sq6.item__kv_category_click_count_avg_7d,
sq6.item__kv_day_h_praise_count_avg_7d,
sq6.item__kv_category_praise_count_avg_7d,
sq6.item__kv_day_h_click_count_sum_15d,
sq6.item__kv_category_click_count_sum_15d,
sq6.item__kv_day_h_praise_count_sum_15d,
sq6.item__kv_category_praise_count_sum_15d,
sq6.item__kv_day_h_click_count_max_15d,
sq6.item__kv_category_click_count_max_15d,
sq6.item__kv_day_h_praise_count_max_15d,
sq6.item__kv_category_praise_count_max_15d,
sq6.item__kv_day_h_click_count_min_15d,
sq6.item__kv_category_click_count_min_15d,
sq6.item__kv_day_h_praise_count_min_15d,
sq6.item__kv_category_praise_count_min_15d,
sq6.item__kv_day_h_click_count_avg_15d,
sq6.item__kv_category_click_count_avg_15d,
sq6.item__kv_day_h_praise_count_avg_15d,
sq6.item__kv_category_praise_count_avg_15d,
sq6.item__kv_day_h_click_count_sum_30d,
sq6.item__kv_category_click_count_sum_30d,
sq6.item__kv_day_h_praise_count_sum_30d,
sq6.item__kv_category_praise_count_sum_30d,
sq6.item__kv_day_h_click_count_max_30d,
sq6.item__kv_category_click_count_max_30d,
sq6.item__kv_day_h_praise_count_max_30d,
sq6.item__kv_category_praise_count_max_30d,
sq6.item__kv_day_h_click_count_min_30d,
sq6.item__kv_category_click_count_min_30d,
sq6.item__kv_day_h_praise_count_min_30d,
sq6.item__kv_category_praise_count_min_30d,
sq6.item__kv_day_h_click_count_avg_30d,
sq6.item__kv_category_click_count_avg_30d,
sq6.item__kv_day_h_praise_count_avg_30d,
sq6.item__kv_category_praise_count_avg_30d,
sq6.item__kv_day_h_click_count_sum_45d,
sq6.item__kv_category_click_count_sum_45d,
sq6.item__kv_day_h_praise_count_sum_45d,
sq6.item__kv_category_praise_count_sum_45d,
sq6.item__kv_day_h_click_count_max_45d,
sq6.item__kv_category_click_count_max_45d,
sq6.item__kv_day_h_praise_count_max_45d,
sq6.item__kv_category_praise_count_max_45d,
sq6.item__kv_day_h_click_count_min_45d,
sq6.item__kv_category_click_count_min_45d,
sq6.item__kv_day_h_praise_count_min_45d,
sq6.item__kv_category_praise_count_min_45d,
sq6.item__kv_day_h_click_count_avg_45d,
sq6.item__kv_category_click_count_avg_45d,
sq6.item__kv_day_h_praise_count_avg_45d,
sq6.item__kv_category_praise_count_avg_45d,
sq1.gender,
sq1.age,
sq1.city,
sq1.item_cnt,
sq1.follow_cnt,
sq1.follower_cnt,
sq1.is_new_user,
sq1.tags,
sq3.user__sum_playtime_3d,
sq3.user__sum_duration_3d,
sq3.user__sum_click_count_3d,
sq3.user__sum_praise_count_3d,
sq3.user__max_playtime_3d,
sq3.user__max_duration_3d,
sq3.user__max_click_count_3d,
sq3.user__max_praise_count_3d,
sq3.user__min_playtime_3d,
sq3.user__min_duration_3d,
sq3.user__min_click_count_3d,
sq3.user__min_praise_count_3d,
sq3.user__avg_playtime_3d,
sq3.user__avg_duration_3d,
sq3.user__avg_click_count_3d,
sq3.user__avg_praise_count_3d,
sq3.user__sum_playtime_7d,
sq3.user__sum_duration_7d,
sq3.user__sum_click_count_7d,
sq3.user__sum_praise_count_7d,
sq3.user__max_playtime_7d,
sq3.user__max_duration_7d,
sq3.user__max_click_count_7d,
sq3.user__max_praise_count_7d,
sq3.user__min_playtime_7d,
sq3.user__min_duration_7d,
sq3.user__min_click_count_7d,
sq3.user__min_praise_count_7d,
sq3.user__avg_playtime_7d,
sq3.user__avg_duration_7d,
sq3.user__avg_click_count_7d,
sq3.user__avg_praise_count_7d,
sq3.user__sum_playtime_15d,
sq3.user__sum_duration_15d,
sq3.user__sum_click_count_15d,
sq3.user__sum_praise_count_15d,
sq3.user__max_playtime_15d,
sq3.user__max_duration_15d,
sq3.user__max_click_count_15d,
sq3.user__max_praise_count_15d,
sq3.user__min_playtime_15d,
sq3.user__min_duration_15d,
sq3.user__min_click_count_15d,
sq3.user__min_praise_count_15d,
sq3.user__avg_playtime_15d,
sq3.user__avg_duration_15d,
sq3.user__avg_click_count_15d,
sq3.user__avg_praise_count_15d,
sq4.user__kv_day_h_duration_sum_7d,
sq4.user__kv_category_duration_sum_7d,
sq4.user__kv_day_h_click_count_sum_7d,
sq4.user__kv_category_click_count_sum_7d,
sq4.user__kv_day_h_duration_max_7d,
sq4.user__kv_category_duration_max_7d,
sq4.user__kv_day_h_click_count_max_7d,
sq4.user__kv_category_click_count_max_7d,
sq4.user__kv_day_h_duration_min_7d,
sq4.user__kv_category_duration_min_7d,
sq4.user__kv_day_h_click_count_min_7d,
sq4.user__kv_category_click_count_min_7d,
sq4.user__kv_day_h_duration_avg_7d,
sq4.user__kv_category_duration_avg_7d,
sq4.user__kv_day_h_click_count_avg_7d,
sq4.user__kv_category_click_count_avg_7d,
sq4.user__kv_day_h_duration_sum_15d,
sq4.user__kv_category_duration_sum_15d,
sq4.user__kv_day_h_click_count_sum_15d,
sq4.user__kv_category_click_count_sum_15d,
sq4.user__kv_day_h_duration_max_15d,
sq4.user__kv_category_duration_max_15d,
sq4.user__kv_day_h_click_count_max_15d,
sq4.user__kv_category_click_count_max_15d,
sq4.user__kv_day_h_duration_min_15d,
sq4.user__kv_category_duration_min_15d,
sq4.user__kv_day_h_click_count_min_15d,
sq4.user__kv_category_click_count_min_15d,
sq4.user__kv_day_h_duration_avg_15d,
sq4.user__kv_category_duration_avg_15d,
sq4.user__kv_day_h_click_count_avg_15d,
sq4.user__kv_category_click_count_avg_15d,
sq4.user__kv_day_h_duration_sum_30d,
sq4.user__kv_category_duration_sum_30d,
sq4.user__kv_day_h_click_count_sum_30d,
sq4.user__kv_category_click_count_sum_30d,
sq4.user__kv_day_h_duration_max_30d,
sq4.user__kv_category_duration_max_30d,
sq4.user__kv_day_h_click_count_max_30d,
sq4.user__kv_category_click_count_max_30d,
sq4.user__kv_day_h_duration_min_30d,
sq4.user__kv_category_duration_min_30d,
sq4.user__kv_day_h_click_count_min_30d,
sq4.user__kv_category_click_count_min_30d,
sq4.user__kv_day_h_duration_avg_30d,
sq4.user__kv_category_duration_avg_30d,
sq4.user__kv_day_h_click_count_avg_30d,
sq4.user__kv_category_click_count_avg_30d,
sq4.user__kv_day_h_duration_sum_45d,
sq4.user__kv_category_duration_sum_45d,
sq4.user__kv_day_h_click_count_sum_45d,
sq4.user__kv_category_click_count_sum_45d,
sq4.user__kv_day_h_duration_max_45d,
sq4.user__kv_category_duration_max_45d,
sq4.user__kv_day_h_click_count_max_45d,
sq4.user__kv_category_click_count_max_45d,
sq4.user__kv_day_h_duration_min_45d,
sq4.user__kv_category_duration_min_45d,
sq4.user__kv_day_h_click_count_min_45d,
sq4.user__kv_category_click_count_min_45d,
sq4.user__kv_day_h_duration_avg_45d,
sq4.user__kv_category_duration_avg_45d,
sq4.user__kv_day_h_click_count_avg_45d,
sq4.user__kv_category_click_count_avg_45d
FROM
(
SELECT *
FROM rec_sln_demo_fs_label_table_v1
WHERE ds = '${bdp.system.bizdate}'
) sq0
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_user_table_preprocess_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq1 ON sq0.user_id = sq1.user_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_item_table_preprocess_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq2 ON sq0.item_id = sq2.item_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_user_table_preprocess_agg_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq3 ON sq0.user_id = sq3.user_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_user_table_preprocess_win_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq4 ON sq0.user_id = sq4.user_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_item_table_preprocess_agg_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq5 ON sq0.item_id = sq5.item_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_item_table_preprocess_win_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq6 ON sq0.item_id = sq6.item_id;The result is fs_demo_fs_engineering_v1_training_set, a partitioned table with a 90-day lifecycle that combines all user features, item features, and training labels in a single row per request.
What's next
To understand the full FeatureStore feature platform, including online serving and feature registration, see FeatureStore overview.
For the complete API reference for feature engineering transforms, see Feature platform and feature engineering.
For technical support, join the DingTalk group 34415007523.