Operational Data Store (ODS)

更新时间:
复制 MD 格式

The Operational Data Store (ODS) layer, based on the best practices of Alibaba's OneData methodology, stores raw data from your business systems. This data serves as the source for all higher-level data layers. Data from business systems is typically detailed, application-oriented, accumulates over a long period, and is frequently accessed.

Table design for the ODS layer

In this tutorial, the ODS layer primarily contains raw, unprocessed data, such as transaction order details, user information, and product details. This data is logically stored in two-dimensional tables. Although the ODS layer is not strictly part of data warehouse modeling, proper planning and data synchronization for this layer are crucial. This tutorial uses the following six ODS tables:

  • Records product information for auctions: s_auction.

  • Records product information for regular sales: s_sale.

  • Records detailed user information: s_users_extra.

  • Records new product transaction order information: s_biz_order_delta.

  • Records new logistics order information: s_logistics_order_delta.

  • Records new payment order information: s_pay_order_delta.

Note
  • Keep table and field names consistent with the business system. However, use additional identifiers to distinguish between incremental and full tables. In Dataphin, fact-based models with the `di` suffix represent incremental (transaction) tables, and models with the `df` suffix represent full (periodic snapshot) tables.

  • Be aware of potential naming conflicts. For example, tables from different business systems might have the same name. To distinguish between them, you can add the source database name as a prefix or suffix. If a field name is the same as a keyword, you can resolve the conflict by adding a suffix, such as _col1, as defined by your naming convention.

Table creation examples

You can use the ad hoc query feature to write SQL statements and create the required ODS tables. The following code provides examples of table creation statements.

CREATE TABLE IF NOT EXISTS s_auction
(
    id                             STRING COMMENT 'Product ID',
    title                          STRING COMMENT 'Product name',
    gmt_modified                   STRING COMMENT 'Last modified date of the product',
    price                          DOUBLE COMMENT 'Product transaction price, in CNY',
    starts                         STRING COMMENT 'Product listing time',
    minimum_bid                    DOUBLE COMMENT 'Starting bid for the auction item, in CNY',
    duration                       STRING COMMENT 'Validity period, sales cycle, in days',
    incrementnum                   DOUBLE COMMENT 'Bid increment for the auction',
    city                           STRING COMMENT 'City where the product is located',
    prov                           STRING COMMENT 'Province where the product is located',
    ends                           STRING COMMENT 'Sales end time',
    quantity                       BIGINT COMMENT 'Quantity',
    stuff_status                   BIGINT COMMENT 'Product condition: 0 for New, 1 for Unused, 2 for Used',
    auction_status                 BIGINT COMMENT 'Product status: 0 for Normal, 1 for Deleted by user, 2 for Delisted, 3 for Never listed',
    cate_id                        BIGINT COMMENT 'Product category ID',
    cate_name                      STRING COMMENT 'Product category name',
    commodity_id                   BIGINT COMMENT 'Category ID',
    commodity_name                 STRING COMMENT 'Category name',
    umid                           STRING COMMENT 'Buyer umID'
)
COMMENT 'Product auction ODS'
PARTITIONED BY (ds         STRING COMMENT 'Format: YYYYMMDD')
LIFECYCLE 400;


CREATE TABLE IF NOT EXISTS s_sale
(
    id                           STRING COMMENT 'Product ID',
    title                        STRING COMMENT 'Product name',
    gmt_modified                 STRING COMMENT 'Last modified date of the product',
    starts                       STRING COMMENT 'Product listing time',
    price                        DOUBLE COMMENT 'Product price, in CNY',
    city                         STRING COMMENT 'City where the product is located',
    prov                         STRING COMMENT 'Province where the product is located',
    quantity                     BIGINT COMMENT 'Quantity',
    stuff_status                 BIGINT COMMENT 'Product condition: 0 for New, 1 for Unused, 2 for Used',
    auction_status               BIGINT COMMENT 'Product status: 0 for Normal, 1 for Deleted by user, 2 for Delisted, 3 for Never listed',
    cate_id                      BIGINT COMMENT 'Product category ID',
    cate_name                    STRING COMMENT 'Product category name',
    commodity_id                 BIGINT COMMENT 'Category ID',
    commodity_name               STRING COMMENT 'Category name',
    umid                         STRING COMMENT 'Buyer umID'
)
COMMENT 'Product regular purchase ODS'
PARTITIONED BY (ds      STRING COMMENT 'Format: YYYYMMDD')
LIFECYCLE 400;


CREATE TABLE IF NOT EXISTS s_users_extra
(
    id                  STRING COMMENT 'User ID',
    logincount          BIGINT COMMENT 'Number of logons',
    buyer_goodnum       BIGINT COMMENT 'Number of positive ratings as a buyer',
    seller_goodnum      BIGINT COMMENT 'Number of positive ratings as a seller',
    level_type          BIGINT COMMENT '1 for Level 1 store, 2 for Level 2 store, 3 for Level 3 store',
    promoted_num        BIGINT COMMENT '1 for Grade A service, 2 for Grade B service, 3 for Grade C service',
    gmt_create          STRING COMMENT 'Creation time',
    order_id            BIGINT COMMENT 'Order ID',
    buyer_id            BIGINT COMMENT 'Buyer ID',
    buyer_nick          STRING COMMENT 'Buyer nickname',
    buyer_star_id       BIGINT COMMENT 'Buyer star level ID',
    seller_id           BIGINT COMMENT 'Seller ID',
    seller_nick         STRING COMMENT 'Seller nickname',
    seller_star_id      BIGINT COMMENT 'Seller star level ID',
    shop_id             BIGINT COMMENT 'Shop ID',
    shop_name           STRING COMMENT 'Shop name'
)
COMMENT 'User extension table'
PARTITIONED BY (ds       STRING COMMENT 'yyyymmdd')
LIFECYCLE 400;


CREATE TABLE IF NOT EXISTS s_biz_order_delta
(
    biz_order_id           STRING COMMENT 'Order ID',
    pay_order_id           STRING COMMENT 'Payment order ID',
    logistics_order_id     STRING COMMENT 'Logistics order ID',
    buyer_nick             STRING COMMENT 'Buyer nickname',
    buyer_id               STRING COMMENT 'Buyer ID',
    seller_nick            STRING COMMENT 'Seller nickname',
    seller_id              STRING COMMENT 'Seller ID',
    auction_id             STRING COMMENT 'Product ID',
    auction_title          STRING COMMENT 'Product title',
    auction_price          DOUBLE COMMENT 'Product price',
    buy_amount             BIGINT COMMENT 'Quantity',
    buy_fee                BIGINT COMMENT 'Purchase amount',
    pay_status             BIGINT COMMENT 'Payment status: 1 for Unpaid, 2 for Paid, 3 for Refunded',
    logistics_id           BIGINT COMMENT 'Logistics order ID',
    mord_cod_status        BIGINT COMMENT 'Logistics status: 0 for Initial, 1 for Order accepted, 2 for Order acceptance timeout, 3 for Pickup successful, 4 for Pickup failed, 5 for Delivery successful, 6 for Delivery failed, 7 for Logistics order canceled by user',
    status                 BIGINT COMMENT 'Status: 0 for Order normal, 1 for Order invisible',
    sub_biz_type           BIGINT COMMENT 'Business type: 1 for Auction, 2 for Purchase',
    end_time               STRING COMMENT 'Transaction end time',
    shop_id                BIGINT COMMENT 'Shop ID'
)
COMMENT 'Daily incremental table for successful transaction orders'
PARTITIONED BY (ds       STRING COMMENT 'yyyymmdd')
LIFECYCLE 7200;


CREATE TABLE IF NOT EXISTS s_logistics_order_delta
(
    logistics_order_id    STRING COMMENT 'Logistics order ID',
    post_fee              DOUBLE COMMENT 'Logistics fee',
    address               STRING COMMENT 'Shipping address',
    full_name             STRING COMMENT 'Recipient full name',
    mobile_phone          STRING COMMENT 'Mobile phone',
    prov                  STRING COMMENT 'Province',
    prov_code             STRING COMMENT 'Province ID',
    city                  STRING COMMENT 'City',
    city_code             STRING COMMENT 'City ID',
    logistics_status      BIGINT COMMENT 'Logistics status: 1 for Not shipped, 2 for Shipped, 3 for Received, 4 for Returned, 5 for Picking',
    consign_time          STRING COMMENT 'Shipping time',
    gmt_create            STRING COMMENT 'Order creation time',
    shipping              BIGINT COMMENT 'Shipping method: 1 for Surface mail, 2 for Express, 3 for EMS',
    seller_id             STRING COMMENT 'Seller ID',
    buyer_id              STRING COMMENT 'Buyer ID'
)
COMMENT 'Daily incremental table for transaction logistics orders'
PARTITIONED BY (ds                 STRING COMMENT 'Date')
LIFECYCLE 7200;


CREATE TABLE IF NOT EXISTS s_pay_order_delta
(
    pay_order_id          STRING COMMENT 'Payment order ID',
    total_fee             DOUBLE COMMENT 'Total amount payable (quantity * unit price)',
    seller_id             STRING COMMENT 'Seller ID',
    buyer_id              STRING COMMENT 'Buyer ID',
    pay_status            BIGINT COMMENT 'Payment status: 1 for Awaiting buyer payment, 2 for Awaiting seller shipment, 3 for Transaction successful',
    pay_time              STRING COMMENT 'Payment time',
    gmt_create            STRING COMMENT 'Order creation time',
    refund_fee            DOUBLE COMMENT 'Refund amount (including shipping fee)',
    confirm_paid_fee      DOUBLE COMMENT 'Amount for which receipt has been confirmed'
)
COMMENT 'Incremental table for transaction payment orders'
PARTITIONED BY (ds        STRING COMMENT 'Date')
LIFECYCLE 7200;

Data synchronization, loading, and processing

Data in the ODS layer is synchronized from various data source systems and stored in MaxCompute for data development. You can use the data integration feature of Dataphin to perform data synchronization. Follow these guidelines when you use the data import feature:

  • To maintain a consistent table schema, synchronize each source table from a system to MaxCompute using a single data synchronization task.

  • Data import supports two synchronization methods: full data synchronization and real-time incremental data synchronization. You can achieve real-time synchronization using minute-level or hour-level scheduling.

  • Store tables in the ODS layer as partitioned tables based on date and time. This method simplifies the management of storage costs and control policies. In Dataphin, the default name for a time-based partition is `ds`.

  • The data import feature lets you manually adjust the fields to be synchronized between the source and destination tables.

    • If a source field does not exist in the destination table, you can either add the field to the destination table or remove the source field from the synchronization task.

    • If a source field does not match a destination field, you can modify the mapping by removing the mismatched destination field and adding a new, matching field.