The Operational Data Store (ODS) layer, based on the best practices of Alibaba's OneData methodology, stores raw data from your business systems. This data serves as the source for all higher-level data layers. Data from business systems is typically detailed, application-oriented, accumulates over a long period, and is frequently accessed.
Table design for the ODS layer
In this tutorial, the ODS layer primarily contains raw, unprocessed data, such as transaction order details, user information, and product details. This data is logically stored in two-dimensional tables. Although the ODS layer is not strictly part of data warehouse modeling, proper planning and data synchronization for this layer are crucial. This tutorial uses the following six ODS tables:
Records product information for auctions: s_auction.
Records product information for regular sales: s_sale.
Records detailed user information: s_users_extra.
Records new product transaction order information: s_biz_order_delta.
Records new logistics order information: s_logistics_order_delta.
Records new payment order information: s_pay_order_delta.
Keep table and field names consistent with the business system. However, use additional identifiers to distinguish between incremental and full tables. In Dataphin, fact-based models with the `di` suffix represent incremental (transaction) tables, and models with the `df` suffix represent full (periodic snapshot) tables.
Be aware of potential naming conflicts. For example, tables from different business systems might have the same name. To distinguish between them, you can add the source database name as a prefix or suffix. If a field name is the same as a keyword, you can resolve the conflict by adding a suffix, such as
_col1, as defined by your naming convention.
Table creation examples
You can use the ad hoc query feature to write SQL statements and create the required ODS tables. The following code provides examples of table creation statements.
CREATE TABLE IF NOT EXISTS s_auction
(
id STRING COMMENT 'Product ID',
title STRING COMMENT 'Product name',
gmt_modified STRING COMMENT 'Last modified date of the product',
price DOUBLE COMMENT 'Product transaction price, in CNY',
starts STRING COMMENT 'Product listing time',
minimum_bid DOUBLE COMMENT 'Starting bid for the auction item, in CNY',
duration STRING COMMENT 'Validity period, sales cycle, in days',
incrementnum DOUBLE COMMENT 'Bid increment for the auction',
city STRING COMMENT 'City where the product is located',
prov STRING COMMENT 'Province where the product is located',
ends STRING COMMENT 'Sales end time',
quantity BIGINT COMMENT 'Quantity',
stuff_status BIGINT COMMENT 'Product condition: 0 for New, 1 for Unused, 2 for Used',
auction_status BIGINT COMMENT 'Product status: 0 for Normal, 1 for Deleted by user, 2 for Delisted, 3 for Never listed',
cate_id BIGINT COMMENT 'Product category ID',
cate_name STRING COMMENT 'Product category name',
commodity_id BIGINT COMMENT 'Category ID',
commodity_name STRING COMMENT 'Category name',
umid STRING COMMENT 'Buyer umID'
)
COMMENT 'Product auction ODS'
PARTITIONED BY (ds STRING COMMENT 'Format: YYYYMMDD')
LIFECYCLE 400;
CREATE TABLE IF NOT EXISTS s_sale
(
id STRING COMMENT 'Product ID',
title STRING COMMENT 'Product name',
gmt_modified STRING COMMENT 'Last modified date of the product',
starts STRING COMMENT 'Product listing time',
price DOUBLE COMMENT 'Product price, in CNY',
city STRING COMMENT 'City where the product is located',
prov STRING COMMENT 'Province where the product is located',
quantity BIGINT COMMENT 'Quantity',
stuff_status BIGINT COMMENT 'Product condition: 0 for New, 1 for Unused, 2 for Used',
auction_status BIGINT COMMENT 'Product status: 0 for Normal, 1 for Deleted by user, 2 for Delisted, 3 for Never listed',
cate_id BIGINT COMMENT 'Product category ID',
cate_name STRING COMMENT 'Product category name',
commodity_id BIGINT COMMENT 'Category ID',
commodity_name STRING COMMENT 'Category name',
umid STRING COMMENT 'Buyer umID'
)
COMMENT 'Product regular purchase ODS'
PARTITIONED BY (ds STRING COMMENT 'Format: YYYYMMDD')
LIFECYCLE 400;
CREATE TABLE IF NOT EXISTS s_users_extra
(
id STRING COMMENT 'User ID',
logincount BIGINT COMMENT 'Number of logons',
buyer_goodnum BIGINT COMMENT 'Number of positive ratings as a buyer',
seller_goodnum BIGINT COMMENT 'Number of positive ratings as a seller',
level_type BIGINT COMMENT '1 for Level 1 store, 2 for Level 2 store, 3 for Level 3 store',
promoted_num BIGINT COMMENT '1 for Grade A service, 2 for Grade B service, 3 for Grade C service',
gmt_create STRING COMMENT 'Creation time',
order_id BIGINT COMMENT 'Order ID',
buyer_id BIGINT COMMENT 'Buyer ID',
buyer_nick STRING COMMENT 'Buyer nickname',
buyer_star_id BIGINT COMMENT 'Buyer star level ID',
seller_id BIGINT COMMENT 'Seller ID',
seller_nick STRING COMMENT 'Seller nickname',
seller_star_id BIGINT COMMENT 'Seller star level ID',
shop_id BIGINT COMMENT 'Shop ID',
shop_name STRING COMMENT 'Shop name'
)
COMMENT 'User extension table'
PARTITIONED BY (ds STRING COMMENT 'yyyymmdd')
LIFECYCLE 400;
CREATE TABLE IF NOT EXISTS s_biz_order_delta
(
biz_order_id STRING COMMENT 'Order ID',
pay_order_id STRING COMMENT 'Payment order ID',
logistics_order_id STRING COMMENT 'Logistics order ID',
buyer_nick STRING COMMENT 'Buyer nickname',
buyer_id STRING COMMENT 'Buyer ID',
seller_nick STRING COMMENT 'Seller nickname',
seller_id STRING COMMENT 'Seller ID',
auction_id STRING COMMENT 'Product ID',
auction_title STRING COMMENT 'Product title',
auction_price DOUBLE COMMENT 'Product price',
buy_amount BIGINT COMMENT 'Quantity',
buy_fee BIGINT COMMENT 'Purchase amount',
pay_status BIGINT COMMENT 'Payment status: 1 for Unpaid, 2 for Paid, 3 for Refunded',
logistics_id BIGINT COMMENT 'Logistics order ID',
mord_cod_status BIGINT COMMENT 'Logistics status: 0 for Initial, 1 for Order accepted, 2 for Order acceptance timeout, 3 for Pickup successful, 4 for Pickup failed, 5 for Delivery successful, 6 for Delivery failed, 7 for Logistics order canceled by user',
status BIGINT COMMENT 'Status: 0 for Order normal, 1 for Order invisible',
sub_biz_type BIGINT COMMENT 'Business type: 1 for Auction, 2 for Purchase',
end_time STRING COMMENT 'Transaction end time',
shop_id BIGINT COMMENT 'Shop ID'
)
COMMENT 'Daily incremental table for successful transaction orders'
PARTITIONED BY (ds STRING COMMENT 'yyyymmdd')
LIFECYCLE 7200;
CREATE TABLE IF NOT EXISTS s_logistics_order_delta
(
logistics_order_id STRING COMMENT 'Logistics order ID',
post_fee DOUBLE COMMENT 'Logistics fee',
address STRING COMMENT 'Shipping address',
full_name STRING COMMENT 'Recipient full name',
mobile_phone STRING COMMENT 'Mobile phone',
prov STRING COMMENT 'Province',
prov_code STRING COMMENT 'Province ID',
city STRING COMMENT 'City',
city_code STRING COMMENT 'City ID',
logistics_status BIGINT COMMENT 'Logistics status: 1 for Not shipped, 2 for Shipped, 3 for Received, 4 for Returned, 5 for Picking',
consign_time STRING COMMENT 'Shipping time',
gmt_create STRING COMMENT 'Order creation time',
shipping BIGINT COMMENT 'Shipping method: 1 for Surface mail, 2 for Express, 3 for EMS',
seller_id STRING COMMENT 'Seller ID',
buyer_id STRING COMMENT 'Buyer ID'
)
COMMENT 'Daily incremental table for transaction logistics orders'
PARTITIONED BY (ds STRING COMMENT 'Date')
LIFECYCLE 7200;
CREATE TABLE IF NOT EXISTS s_pay_order_delta
(
pay_order_id STRING COMMENT 'Payment order ID',
total_fee DOUBLE COMMENT 'Total amount payable (quantity * unit price)',
seller_id STRING COMMENT 'Seller ID',
buyer_id STRING COMMENT 'Buyer ID',
pay_status BIGINT COMMENT 'Payment status: 1 for Awaiting buyer payment, 2 for Awaiting seller shipment, 3 for Transaction successful',
pay_time STRING COMMENT 'Payment time',
gmt_create STRING COMMENT 'Order creation time',
refund_fee DOUBLE COMMENT 'Refund amount (including shipping fee)',
confirm_paid_fee DOUBLE COMMENT 'Amount for which receipt has been confirmed'
)
COMMENT 'Incremental table for transaction payment orders'
PARTITIONED BY (ds STRING COMMENT 'Date')
LIFECYCLE 7200;Data synchronization, loading, and processing
Data in the ODS layer is synchronized from various data source systems and stored in MaxCompute for data development. You can use the data integration feature of Dataphin to perform data synchronization. Follow these guidelines when you use the data import feature:
To maintain a consistent table schema, synchronize each source table from a system to MaxCompute using a single data synchronization task.
Data import supports two synchronization methods: full data synchronization and real-time incremental data synchronization. You can achieve real-time synchronization using minute-level or hour-level scheduling.
Store tables in the ODS layer as partitioned tables based on date and time. This method simplifies the management of storage costs and control policies. In Dataphin, the default name for a time-based partition is `ds`.
The data import feature lets you manually adjust the fields to be synchronized between the source and destination tables.
If a source field does not exist in the destination table, you can either add the field to the destination table or remove the source field from the synchronization task.
If a source field does not match a destination field, you can modify the mapping by removing the mismatched destination field and adding a new, matching field.