To ensure smooth and fast data analysis, store data in specific formats across different data tables. This topic provides data table storage examples for V3 user tag datasets, Recency, Frequency, Monetary (RFM) datasets, Awareness, Interest, Purchase, and Loyalty (AIPL) datasets, behavioral datasets, and product tag datasets.
User tag dataset example
ADS large wide table example:
user_id | user_id_2 | user_id_3 | Gender | Chocolate preference in the last 90 days | Registration time | Age | Consumption amount in the last 30 days | ... |
|---|---|---|---|---|---|---|---|---|
a0001 | b0001 | c0001 | Male | 0.8 | 2018/10/21 | 23 | 30 | ... |
a0002 | b0002 | c0002 | Female | 0.5 | 2018/10/21 | 52 | 50 | ... |
All fields in the table are examples. You can add custom fields and define custom field names as needed.
`user_id` is the primary key of the table and is a required field. It represents the user identity in the user tag dataset and serves as the unique user identifier for operations such as User Insights and audience selection.
Fields such as `user_id` and `user_id_2` are user ID fields that you can use for marketing and pushes. Other fields are user property fields that you can use for Perspective Analytics.
The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.
If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.
Tag processing requirements:
Field type | Processing method | Data format | Scheduling requirement |
|---|---|---|---|
Text (single enumeration value) | Keep the initial value. | string | Update the large wide table daily. |
Text (multiple enumeration values) | Keep the initial value. | string | |
Date | Keep the initial value. | datetime, date | |
Numeric | Keep the initial value. | double, bigint |
Product tag dataset example
ADS large wide table example:
item_id | Product name | Product specification | Purchase threshold | Listing time | Season | Price | Profit margin | ... |
|---|---|---|---|---|---|---|---|---|
A0001 | XXX Jacket | One size | Non-member | 2021/01/20 | Spring/Summer | 1000 | 0.50 | ... |
A0002 | XXX Coat | S/M/L | Member | 2021/01/20 | Winter | 2000 | 0.30 | ... |
All fields in the table are examples. You can add custom fields and define custom field names as needed.
`item_id` is the primary key of the table and is a required field. It represents the product identity in the product tag dataset and serves as the unique product identifier for operations such as product recommendation. We recommend that you use a product ID at the Standard Product Unit (SPU) granularity.
The product name is a required field.
Additional requirements for product recommendation:
Use lowercase for all English letters in table names and field names.
Store the data table in an AnalyticDB (ADB) for MySQL 3.0 data source.
The number of products must be 10 or more.
Audience size × Product pool size ≤ 50 billion. The audience is the group of users who will receive recommendations. The product pool is the range of candidate products filtered from the product tag dataset.
Include only static property tags for products, such as specifications and series. Statistical tags, such as sales in the last 90 days, can affect model training for product recommendation. If you include statistical tags, you must specify them when you create an algorithm model so they can be excluded during model training.
To train an algorithm model that finds associations between product categories, you can use categories as the data content for the product tag dataset. In this case, the category ID is the primary key, and the dataset includes fields such as the category name. You can then use the categories as the behavior objects for the behavioral dataset to create an algorithm model.
Tag processing requirements:
Field type | Processing method | Data format | Scheduling requirement |
|---|---|---|---|
Text (single enumeration value) | Keep the initial value. | string | Update the large wide table daily. |
Date | Keep the initial value. | datetime, date | |
Numeric | Keep the initial value. | double, bigint |
RFM Dataset Example
The RFM dataset supports two types of data tables: customer data and transaction data.
Customer data
You can aggregate the raw data from the last N days into tag data at the customer granularity. The aggregated data volume must be within 100 million rows. Each customer can have only one record in the table.
Customer data table example:
user_id | Last purchase date | Total purchase amount | Total number of purchases | user_id_2 | user_id_3 | user_id_... |
|---|---|---|---|---|---|---|
a0001 | 2019/8/23 12:36:54 | 345.55 | 2 | b0001 | c0001 | ... |
a0002 | 2018/7/12 15:36:54 | 356.1 | 1 | b0002 | c0001 | ... |
Field format requirements:
Field name | Data format | Description |
|---|---|---|
user_id | string | The primary key of the table. This is a required field. It is the user identity for the RFM dataset and serves as the unique user identifier for operations such as User Insights and audience selection. |
Last purchase date | datetime, date | Required field. |
Total purchase amount | double, bigint | Required field. |
Total number of purchases | double, bigint | Required field. |
user_id_2, etc. | string | Optional field. Other user IDs besides the main user identity. |
You can define custom names for all fields.
Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.
The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.
If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.
Transaction data
You can use this format to analyze small-scale transaction data, with a recommended limit of 100 million rows. Each row represents a user transaction record and must contain the customer identity, transaction date, and transaction amount.
Transaction data table example:
order_id | user_id | Transaction date | Transaction amount | user_id_2 | user_id_3 | user_id_... |
|---|---|---|---|---|---|---|
x1 | a0001 | 2019/6/24 12:56:54 | 124 | b0001 | ... | |
x2 | a0002 | 2019/8/23 12:36:54 | 222.55 | b0002 | c0002 | ... |
x3 | a0003 | 2018/7/12 16:36:54 | 356.1 | b0003 | c0003 | ... |
Field format requirements:
Field name | Data format | Description |
|---|---|---|
order_id | string | The primary key of the table. This is a required field. It is the order ID. |
user_id | string | Required field. It is the user identity for the RFM dataset and serves as the unique user identifier for operations such as User Insights and audience selection. |
Transaction date | datetime, date | Required field. |
Transaction amount | double, bigint | Required field. |
user_id_2, etc. | string | Optional field. Other user IDs besides the main user identity. |
You can define custom names for all fields.
Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.
The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.
If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.
AIPL Dataset example
To create an AIPL dataset from customer data, the customer data table must contain fields related to the AIPL rules.
For example, assume the AIPL rules are as follows:
A: The user logged in to the official website in the last 90 days, or the registration date is within the last six months.
I: The user added items to favorites in the last 90 days.
P: The user made a purchase in the last 90 days.
L: The user made multiple purchases in the last 90 days, and the total purchase amount is greater than 1,000.
The corresponding ADS large wide table example is:
user_id | user_id_2 | Channel | Registration time | Consumption amount in the last 90 days | Logons in the last 90 days | Times added to favorites in the last 90 days | Purchases in the last 90 days | ... |
|---|---|---|---|---|---|---|---|---|
a0001 | b0001 | Miniapp | 2020/10/10 | 500 | 1 | 2 | 1 | ... |
a0002 | b0002 | app | 2020/10/20 | 3000 | 3 | 5 | 2 | ... |
a0003 | b0003 | Official website | 2021/10/30 | 0 | 1 | 0 | 0 | ... |
All fields in the table are examples. You can add custom fields and define custom field names based on your AIPL rules and business needs.
`user_id` is a required field. It represents the user identity in the AIPL dataset and serves as the unique user identifier for operations such as User Insights and audience selection.
Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.
The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.
If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.
A single user can have multiple rows of data. For example, one row for each channel for the same user.
Tag processing requirements:
Field type | Processing method | Data format | Scheduling requirement |
|---|---|---|---|
Text (single enumeration value) | Keep the initial value. | string | Update the large wide table daily. |
Text (multiple enumeration values) | Keep the initial value. | string | |
Date | Keep the initial value. | datetime, date | |
Numeric | Keep the initial value. | double, bigint |
To allow the AIPL dataset to distinguish between channels, you can add a dimension table for the channel field. For the table schema, see the following table:
id | Channel |
|---|---|
1 | Miniapp |
2 | app |
3 | Official website |
4 | |
... | ... |
Behavioral dataset example
The data requirements for a behavioral dataset vary based on its use case:
For Behavior Filtering and AIPL model building in the User Insights module.
The following sections provide behavioral dataset examples and field requirements for both use cases. Special requirements for specific use cases are also noted. Prepare your data according to your use case. If your requirements change, you can adjust the original data accordingly.
For User Insights, the ADS behavioral data table example is:
user_id
user_id_2
user_id_...
Behavior type
Behavior time
Behavior count
Behavior amount
Behavior channel
Behavior object property
Behavior object property value
a0001
b0006
c0006
Add to cart
20211023
5
500
Official website
Level-1 category
Women's clothing
a0002
b0002
c0002
Browse
20201225
10
1000
Official website
Level-2 category
Loungewear
a0003
b0003
c0003
Add to favorites
20201023
2
2000
app
Brand
Brand A
a0004
b0004
c0004
Purchase
20200818
1
3000
Miniapp
Product
Product A
For repurchase prediction and product recommendation, the ADS behavioral data table example is:
user_id
user_id_2
user_id_...
Behavior type
Behavior time
Behavior count
Behavior amount
Behavior channel
Behavior object property
Behavior object property value
Behavior object property value ID
a0001
b0001
c0001
Purchase
2021-01-01 10:00:01
1
1000
Outlet 1
Product
Product A
A0001
a0002
b0002
c0002
Purchase
2021-01-01 10:01:01
1
2000
Miniapp
Product
Product B
A0002
a0003
b0003
c0003
Purchase
2021-01-01 10:02:01
1
2000
Official website
Product
Product B
A0002
a0004
b0004
c0004
Purchase
2021-01-01 10:03:01
1
3000
app
Product
Product C
A0003
Field format requirements:
Field name | Data format | Description |
|---|---|---|
user_id | string | Required field. The user identity for the behavioral dataset. It serves as the unique user identifier for operations such as User Insights and audience selection.
|
user_id_2, etc. | string | Optional field. Other user IDs besides the main user identity. |
Behavior type | string | Required field.
|
Behavior time |
| Required field. Note
|
Behavior count | double, bigint | |
Behavior amount | double, bigint | |
Behavior channel | string | |
Behavior object property | string | Required fields. The behavior object property and behavior object property value form a key-value pair. They can record information such as products, brands, and categories.
|
Behavior object property value | string | |
Behavior object property value ID | string |
|
You can define custom names for all fields.
Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.
The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.
If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.
A single user can have multiple rows of data. A new row is required if the behavior time, behavior type, behavior channel, or behavior object is different.
Additional requirements for repurchase prediction and product recommendation:
Use lowercase for all English letters in table names and field names.
Store the data table in an ADB 3.0 data source.
The number of purchase behavior records must be 1 million or more. The time span must be 730 days or more. The corresponding number of deduplicated users must be between 500,000 and 3 million.
Zero-repurchase scenarios are not supported. This means that among users who made a purchase in the past 365 days from any start date, at least one user must make a repurchase within the next prediction epoch.
For product recommendation, to train an algorithm model that finds associations between product categories, you can use categories as the behavior objects for the behavioral dataset. In this case, the `behavior_object_property`, `behavior_object_property_value`, and `behavior_object_property_value_id` fields record 'category', the category name, and the category ID, respectively. You can then use the categories as the data content for the product tag dataset to create an algorithm model.
To allow the behavioral dataset to distinguish between behavior types, behavior channels, and behavior objects during Behavior Filtering, you can add corresponding dimension tables.
For the table schema of dimension tables for behavior types and behavior channels, see the following table:
id
Behavior type
1
Purchase
2
Add to cart
3
Browse
4
Add to favorites
...
...
There are two types of table schemas for behavior object property dimension tables:
One dimension table for each property: You can save the property values of the same property in one column. For the table schema, see the preceding table. If the property is a product, you can use the product tag table as the dimension table.
One dimension table for all properties: You can save the property values of all properties in the same table. Therefore, the behavior object property and behavior object property value are each saved in a separate column. For the table schema, see the following table:
id
Behavior object property
Behavior object property value
1
Level-1 category
Women's clothing
2
Level-1 category
Men's clothing
3
Level-2 category
Coat
4
Level-2 category
Pants
5
Level-2 category
Dress
...
...
...