Quick Audience dataset examples

更新时间:
复制 MD 格式

To ensure smooth and fast data analysis, store data in specific formats across different data tables. This topic provides data table storage examples for V3 user tag datasets, Recency, Frequency, Monetary (RFM) datasets, Awareness, Interest, Purchase, and Loyalty (AIPL) datasets, behavioral datasets, and product tag datasets.

User tag dataset example

ADS large wide table example:

user_id

user_id_2

user_id_3

Gender

Chocolate preference in the last 90 days

Registration time

Age

Consumption amount in the last 30 days

...

a0001

b0001

c0001

Male

0.8

2018/10/21

23

30

...

a0002

b0002

c0002

Female

0.5

2018/10/21

52

50

...

Note
  • All fields in the table are examples. You can add custom fields and define custom field names as needed.

  • `user_id` is the primary key of the table and is a required field. It represents the user identity in the user tag dataset and serves as the unique user identifier for operations such as User Insights and audience selection.

  • Fields such as `user_id` and `user_id_2` are user ID fields that you can use for marketing and pushes. Other fields are user property fields that you can use for Perspective Analytics.

  • The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.

  • If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.

Tag processing requirements:

Field type

Processing method

Data format

Scheduling requirement

Text (single enumeration value)

Keep the initial value.

string

Update the large wide table daily.

Text (multiple enumeration values)

Keep the initial value.

string

Date

Keep the initial value.

datetime, date

Numeric

Keep the initial value.

double, bigint

Product tag dataset example

ADS large wide table example:

item_id

Product name

Product specification

Purchase threshold

Listing time

Season

Price

Profit margin

...

A0001

XXX Jacket

One size

Non-member

2021/01/20

Spring/Summer

1000

0.50

...

A0002

XXX Coat

S/M/L

Member

2021/01/20

Winter

2000

0.30

...

Note

  • All fields in the table are examples. You can add custom fields and define custom field names as needed.

  • `item_id` is the primary key of the table and is a required field. It represents the product identity in the product tag dataset and serves as the unique product identifier for operations such as product recommendation. We recommend that you use a product ID at the Standard Product Unit (SPU) granularity.

  • The product name is a required field.

  • Additional requirements for product recommendation:

    • Use lowercase for all English letters in table names and field names.

    • Store the data table in an AnalyticDB (ADB) for MySQL 3.0 data source.

    • The number of products must be 10 or more.

    • Audience size × Product pool size ≤ 50 billion. The audience is the group of users who will receive recommendations. The product pool is the range of candidate products filtered from the product tag dataset.

    • Include only static property tags for products, such as specifications and series. Statistical tags, such as sales in the last 90 days, can affect model training for product recommendation. If you include statistical tags, you must specify them when you create an algorithm model so they can be excluded during model training.

    • To train an algorithm model that finds associations between product categories, you can use categories as the data content for the product tag dataset. In this case, the category ID is the primary key, and the dataset includes fields such as the category name. You can then use the categories as the behavior objects for the behavioral dataset to create an algorithm model.

Tag processing requirements:

Field type

Processing method

Data format

Scheduling requirement

Text (single enumeration value)

Keep the initial value.

string

Update the large wide table daily.

Date

Keep the initial value.

datetime, date

Numeric

Keep the initial value.

double, bigint

RFM Dataset Example

The RFM dataset supports two types of data tables: customer data and transaction data.

Customer data

You can aggregate the raw data from the last N days into tag data at the customer granularity. The aggregated data volume must be within 100 million rows. Each customer can have only one record in the table.

Customer data table example:

user_id

Last purchase date

Total purchase amount

Total number of purchases

user_id_2

user_id_3

user_id_...

a0001

2019/8/23 12:36:54

345.55

2

b0001

c0001

...

a0002

2018/7/12 15:36:54

356.1

1

b0002

c0001

...

Field format requirements:

Field name

Data format

Description

user_id

string

The primary key of the table. This is a required field. It is the user identity for the RFM dataset and serves as the unique user identifier for operations such as User Insights and audience selection.

Last purchase date

datetime, date

Required field.

Total purchase amount

double, bigint

Required field.

Total number of purchases

double, bigint

Required field.

user_id_2, etc.

string

Optional field. Other user IDs besides the main user identity.

Note
  • You can define custom names for all fields.

  • Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.

  • The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.

  • If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.

Transaction data

You can use this format to analyze small-scale transaction data, with a recommended limit of 100 million rows. Each row represents a user transaction record and must contain the customer identity, transaction date, and transaction amount.

Transaction data table example:

order_id

user_id

Transaction date

Transaction amount

user_id_2

user_id_3

user_id_...

x1

a0001

2019/6/24 12:56:54

124

b0001

c0001

...

x2

a0002

2019/8/23 12:36:54

222.55

b0002

c0002

...

x3

a0003

2018/7/12 16:36:54

356.1

b0003

c0003

...

Field format requirements:

Field name

Data format

Description

order_id

string

The primary key of the table. This is a required field. It is the order ID.

user_id

string

Required field. It is the user identity for the RFM dataset and serves as the unique user identifier for operations such as User Insights and audience selection.

Transaction date

datetime, date

Required field.

Transaction amount

double, bigint

Required field.

user_id_2, etc.

string

Optional field. Other user IDs besides the main user identity.

Note
  • You can define custom names for all fields.

  • Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.

  • The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.

  • If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.

AIPL Dataset example

To create an AIPL dataset from customer data, the customer data table must contain fields related to the AIPL rules.

For example, assume the AIPL rules are as follows:

  • A: The user logged in to the official website in the last 90 days, or the registration date is within the last six months.

  • I: The user added items to favorites in the last 90 days.

  • P: The user made a purchase in the last 90 days.

  • L: The user made multiple purchases in the last 90 days, and the total purchase amount is greater than 1,000.

The corresponding ADS large wide table example is:

user_id

user_id_2

Channel

Registration time

Consumption amount in the last 90 days

Logons in the last 90 days

Times added to favorites in the last 90 days

Purchases in the last 90 days

...

a0001

b0001

Miniapp

2020/10/10

500

1

2

1

...

a0002

b0002

app

2020/10/20

3000

3

5

2

...

a0003

b0003

Official website

2021/10/30

0

1

0

0

...

Note
  • All fields in the table are examples. You can add custom fields and define custom field names based on your AIPL rules and business needs.

  • `user_id` is a required field. It represents the user identity in the AIPL dataset and serves as the unique user identifier for operations such as User Insights and audience selection.

  • Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.

  • The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.

  • If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.

  • A single user can have multiple rows of data. For example, one row for each channel for the same user.

Tag processing requirements:

Field type

Processing method

Data format

Scheduling requirement

Text (single enumeration value)

Keep the initial value.

string

Update the large wide table daily.

Text (multiple enumeration values)

Keep the initial value.

string

Date

Keep the initial value.

datetime, date

Numeric

Keep the initial value.

double, bigint

To allow the AIPL dataset to distinguish between channels, you can add a dimension table for the channel field. For the table schema, see the following table:

id

Channel

1

Miniapp

2

app

3

Official website

4

WeChat

...

...

Behavioral dataset example

The data requirements for a behavioral dataset vary based on its use case:

The following sections provide behavioral dataset examples and field requirements for both use cases. Special requirements for specific use cases are also noted. Prepare your data according to your use case. If your requirements change, you can adjust the original data accordingly.

  • For User Insights, the ADS behavioral data table example is:

    user_id

    user_id_2

    user_id_...

    Behavior type

    Behavior time

    Behavior count

    Behavior amount

    Behavior channel

    Behavior object property

    Behavior object property value

    a0001

    b0006

    c0006

    Add to cart

    20211023

    5

    500

    Official website

    Level-1 category

    Women's clothing

    a0002

    b0002

    c0002

    Browse

    20201225

    10

    1000

    Official website

    Level-2 category

    Loungewear

    a0003

    b0003

    c0003

    Add to favorites

    20201023

    2

    2000

    app

    Brand

    Brand A

    a0004

    b0004

    c0004

    Purchase

    20200818

    1

    3000

    Miniapp

    Product

    Product A

  • For repurchase prediction and product recommendation, the ADS behavioral data table example is:

    user_id

    user_id_2

    user_id_...

    Behavior type

    Behavior time

    Behavior count

    Behavior amount

    Behavior channel

    Behavior object property

    Behavior object property value

    Behavior object property value ID

    a0001

    b0001

    c0001

    Purchase

    2021-01-01 10:00:01

    1

    1000

    Outlet 1

    Product

    Product A

    A0001

    a0002

    b0002

    c0002

    Purchase

    2021-01-01 10:01:01

    1

    2000

    Miniapp

    Product

    Product B

    A0002

    a0003

    b0003

    c0003

    Purchase

    2021-01-01 10:02:01

    1

    2000

    Official website

    Product

    Product B

    A0002

    a0004

    b0004

    c0004

    Purchase

    2021-01-01 10:03:01

    1

    3000

    app

    Product

    Product C

    A0003

Field format requirements:

Field name

Data format

Description

user_id

string

Required field. The user identity for the behavioral dataset. It serves as the unique user identifier for operations such as User Insights and audience selection.

  • For User Insights: No restrictions.

  • For repurchase prediction and product recommendation: To send marketing messages to the audience generated from the prediction results, use an ID type such as a mobile phone number.

user_id_2, etc.

string

Optional field. Other user IDs besides the main user identity.

Behavior type

string

Required field.

  • For User Insights: No restrictions.

  • For repurchase prediction and product recommendation: The algorithm only supports predictions based on purchase behaviors. Data from other behaviors does not affect the prediction results.

    Note

    If other behaviors are included, the algorithm configuration interface provides a field value filtering feature to exclude behaviors other than purchases.

Behavior time

  • For User Insights: datetime, date

  • For repurchase prediction and product recommendation: timestamp, datetime, date

Required field.

Note
  • The data format for the behavior time field is different for repurchase prediction and product recommendation than it is for User Insights. Ensure that you convert the data to the correct format.

  • The behavior time and behavior amount fields cannot be empty when used for repurchase prediction and product recommendation.

Behavior count

double, bigint

Behavior amount

double, bigint

Behavior channel

string

Behavior object property

string

Required fields. The behavior object property and behavior object property value form a key-value pair. They can record information such as products, brands, and categories.

  • For User Insights: No restrictions.

  • For repurchase prediction and product recommendation: The algorithm only supports predictions based on product properties. Data from other properties does not affect the prediction results.

    Note

    If other properties are included, the algorithm configuration interface provides a field value filtering feature to exclude properties other than products.

Behavior object property value

string

Behavior object property value ID

string

  • For User Insights and repurchase prediction: This field is not required.

  • For product recommendation: This is a required field. The field value must match the value of the item_id primary key in the product tag table. This associates the behavior with product information. We recommend using the Standard Product Unit (SPU) granularity.

Note
  • You can define custom names for all fields.

  • Fields such as `user_id` and `user_id_2` are user ID fields. You can use them for marketing and pushes.

  • The supported ID types are OneID, UnionID, mobile phone number, email, Taobao ID, Taobao nickname, Taobao OUID, Alipay ID, Weibo ID, IMEI, IDFA, IMSI, OAID, MAC address, and OpenID.

  • If you use a mobile phone number as an ID field, ensure that the number is an 11-digit number without the +86 prefix. Otherwise, you cannot use it for SMS Marketing.

  • A single user can have multiple rows of data. A new row is required if the behavior time, behavior type, behavior channel, or behavior object is different.

  • Additional requirements for repurchase prediction and product recommendation:

    • Use lowercase for all English letters in table names and field names.

    • Store the data table in an ADB 3.0 data source.

    • The number of purchase behavior records must be 1 million or more. The time span must be 730 days or more. The corresponding number of deduplicated users must be between 500,000 and 3 million.

    • Zero-repurchase scenarios are not supported. This means that among users who made a purchase in the past 365 days from any start date, at least one user must make a repurchase within the next prediction epoch.

    • For product recommendation, to train an algorithm model that finds associations between product categories, you can use categories as the behavior objects for the behavioral dataset. In this case, the `behavior_object_property`, `behavior_object_property_value`, and `behavior_object_property_value_id` fields record 'category', the category name, and the category ID, respectively. You can then use the categories as the data content for the product tag dataset to create an algorithm model.

To allow the behavioral dataset to distinguish between behavior types, behavior channels, and behavior objects during Behavior Filtering, you can add corresponding dimension tables.

  • For the table schema of dimension tables for behavior types and behavior channels, see the following table:

    id

    Behavior type

    1

    Purchase

    2

    Add to cart

    3

    Browse

    4

    Add to favorites

    ...

    ...

  • There are two types of table schemas for behavior object property dimension tables:

    • One dimension table for each property: You can save the property values of the same property in one column. For the table schema, see the preceding table. If the property is a product, you can use the product tag table as the dimension table.

    • One dimension table for all properties: You can save the property values of all properties in the same table. Therefore, the behavior object property and behavior object property value are each saved in a separate column. For the table schema, see the following table:

      id

      Behavior object property

      Behavior object property value

      1

      Level-1 category

      Women's clothing

      2

      Level-1 category

      Men's clothing

      3

      Level-2 category

      Coat

      4

      Level-2 category

      Pants

      5

      Level-2 category

      Dress

      ...

      ...

      ...