Reference TPC-DS test instructions

更新时间:
复制 MD 格式

This topic describes how to use TPC-DS, a decision support benchmark, to run performance tests for OLAP query scenarios in Hologres.

Introduction to TPC-DS

TPC-DS is a decision support benchmark developed by the Transaction Processing Performance Council (TPC). It simulates complex decision support system scenarios that involve multi-dimensional analytical queries.

The TPC-DS benchmark has the following characteristics:

  • Includes 24 tables (7 fact tables and 17 dimension tables) that cover multiple analytical dimensions of the retail business.

  • Includes 99 complex OLAP queries that exercise advanced SQL features such as multi-table joins, subqueries, and window functions.

  • Uses the response time of each query as the primary evaluation metric, which reflects the system's processing capability for complex analytical workloads.

For more information, see TPC Benchmark DS (TPC-DS).

Note

The TPC-DS implementation in this topic is based on TPC-DS benchmarking. The results cannot be compared with published TPC-DS benchmark results. The tests in this topic do not comply with all the requirements of TPC-DS benchmarking.

Dataset introduction

The TPC-DS dataset contains 24 tables, which are divided into fact tables and dimension tables.

Fact tables (7)

Table Description
STORE_SALES Store sales fact table.
STORE_RETURNS Store returns fact table.
CATALOG_SALES Catalog sales fact table.
CATALOG_RETURNS Catalog returns fact table.
WEB_SALES Web sales fact table.
WEB_RETURNS Web returns fact table.
INVENTORY Inventory fact table.

Dimension tables (17)

Table Description
CUSTOMER_ADDRESS Customer address dimension table.
CUSTOMER_DEMOGRAPHICS Customer demographics dimension table.
DATE_DIM Date dimension table.
WAREHOUSE Warehouse dimension table.
SHIP_MODE Shipping mode dimension table.
TIME_DIM Time dimension table.
REASON Return reason dimension table.
INCOME_BAND Income band dimension table.
ITEM Item dimension table.
STORE Store dimension table.
CALL_CENTER Call center dimension table.
CUSTOMER Customer dimension table.
WEB_SITE Website dimension table.
HOUSEHOLD_DEMOGRAPHICS Household demographics dimension table.
WEB_PAGE Web page dimension table.
PROMOTION Promotion dimension table.
CATALOG_PAGE Catalog page dimension table.

Table definitions (DDL)

The following sections provide the complete CREATE TABLE statements for all 24 TPC-DS tables, including the schema definitions and Hologres-specific table properties.

CUSTOMER_ADDRESS

CREATE TABLE CUSTOMER_ADDRESS
(
    CA_ADDRESS_SK             INT                NOT NULL,
    ca_address_id             TEXT              NOT NULL,
    ca_street_number          TEXT                      ,
    CA_STREET_NAME            TEXT                   ,
    CA_STREET_TYPE            TEXT                      ,
    CA_SUITE_NUMBER           TEXT                      ,
    CA_CITY                   TEXT                   ,
    CA_COUNTY                 TEXT                   ,
    CA_STATE                  TEXT                       ,
    CA_ZIP                    TEXT                      ,
    CA_COUNTRY                TEXT                   ,
    CA_GMT_OFFSET             DECIMAL(5,2)                  ,
    CA_LOCATION_TYPE          TEXT                      ,
    PRIMARY KEY (CA_ADDRESS_SK)
)
WITH (
    shard_count = '64',
    distribution_key = 'CA_ADDRESS_SK',
    clustering_key = 'CA_ADDRESS_SK'
);

CUSTOMER_DEMOGRAPHICS

CREATE TABLE CUSTOMER_DEMOGRAPHICS
(
    CD_DEMO_SK                INT                NOT NULL,
    CD_GENDER                 TEXT                       ,
    CD_MARITAL_STATUS         TEXT                       ,
    CD_EDUCATION_STATUS       TEXT                      ,
    CD_PURCHASE_ESTIMATE      INTEGER                       ,
    CD_CREDIT_RATING          TEXT                      ,
    CD_DEP_COUNT              INTEGER                       ,
    CD_DEP_EMPLOYED_COUNT     INTEGER                       ,
    CD_DEP_COLLEGE_COUNT      INTEGER                       ,
    PRIMARY KEY (CD_DEMO_SK)
)
WITH (
    distribution_key = 'CD_DEMO_SK',
    clustering_key = 'CD_DEMO_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

DATE_DIM

CREATE TABLE DATE_DIM
(
    D_DATE_SK                 INT                NOT NULL,
    D_DATE_ID                 TEXT              NOT NULL,
    D_DATE                    DATE              NOT NULL,
    D_MONTH_SEQ               INTEGER                       ,
    D_WEEK_SEQ                INTEGER                       ,
    D_QUARTER_SEQ             INTEGER                       ,
    D_YEAR                    INTEGER                       ,
    D_DOW                     INTEGER                       ,
    D_MOY                     INTEGER                       ,
    D_DOM                     INTEGER                       ,
    D_QOY                     INTEGER                       ,
    D_FY_YEAR                 INTEGER                       ,
    D_FY_QUARTER_SEQ          INTEGER                       ,
    D_FY_WEEK_SEQ             INTEGER                       ,
    D_DAY_NAME                TEXT                       ,
    D_QUARTER_NAME            TEXT                       ,
    D_HOLIDAY                 TEXT                       ,
    D_WEEKEND                 TEXT                       ,
    D_FOLLOWING_HOLIDAY       TEXT                       ,
    D_FIRST_DOM                INTEGER                       ,
    D_LAST_DOM                INTEGER                       ,
    D_SAME_DAY_LY             INTEGER                       ,
    D_SAME_DAY_LQ             INTEGER                       ,
    D_CURRENT_DAY             TEXT                       ,
    D_CURRENT_WEEK            TEXT                       ,
    D_CURRENT_MONTH           TEXT                       ,
    D_CURRENT_QUARTER         TEXT                       ,
    D_CURRENT_YEAR            TEXT                       ,
    PRIMARY KEY (D_DATE_SK)
)
WITH (
    distribution_key = 'D_DATE_SK',
    clustering_key = 'D_DATE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

WAREHOUSE

CREATE TABLE WAREHOUSE
(
    W_WAREHOUSE_SK            INT                NOT NULL,
    W_WAREHOUSE_ID            TEXT              NOT NULL,
    W_WAREHOUSE_NAME          TEXT                   ,
    W_WAREHOUSE_SQ_FT         INTEGER                       ,
    W_STREET_NUMBER           TEXT                      ,
    W_STREET_NAME             TEXT                   ,
    W_STREET_TYPE             TEXT                      ,
    W_SUITE_NUMBER            TEXT                      ,
    W_CITY                    TEXT                   ,
    W_COUNTY                  TEXT                   ,
    W_STATE                   TEXT                       ,
    W_ZIP                     TEXT                      ,
    W_COUNTRY                 TEXT                   ,
    W_GMT_OFFSET              DECIMAL(5,2)                  ,
    PRIMARY KEY (W_WAREHOUSE_SK)
)
WITH (
    distribution_key = 'W_WAREHOUSE_SK',
    clustering_key = 'W_WAREHOUSE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

SHIP_MODE

CREATE TABLE SHIP_MODE
(
    SM_SHIP_MODE_SK           INT                NOT NULL,
    SM_SHIP_MODE_ID           TEXT              NOT NULL,
    SM_TYPE                   TEXT                      ,
    SM_CODE                   TEXT                      ,
    SM_CARRIER                TEXT                      ,
    SM_CONTRACT               TEXT                      ,
    PRIMARY KEY (SM_SHIP_MODE_SK)
)
WITH (
    distribution_key = 'SM_SHIP_MODE_SK',
    clustering_key = 'SM_SHIP_MODE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

TIME_DIM

CREATE TABLE TIME_DIM
(
    T_TIME_SK                 INT                NOT NULL,
    T_TIME_ID                 TEXT              NOT NULL,
    T_TIME                    INTEGER           NOT NULL,
    T_HOUR                    INTEGER                       ,
    T_MINUTE                  INTEGER                       ,
    T_SECOND                  INTEGER                       ,
    T_AM_PM                   TEXT                       ,
    T_SHIFT                   TEXT                      ,
    T_SUB_SHIFT               TEXT                      ,
    T_MEAL_TIME               TEXT                      ,
    PRIMARY KEY (T_TIME_SK)
)
WITH (
    distribution_key = 'T_TIME_SK',
    clustering_key = 'T_TIME_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

REASON

CREATE TABLE REASON
(
    R_REASON_SK               INT                NOT NULL,
    R_REASON_ID               TEXT              NOT NULL,
    R_REASON_DESC             TEXT                     ,
    PRIMARY KEY (R_REASON_SK)
)
WITH (
    distribution_key = 'R_REASON_SK',
    clustering_key = 'R_REASON_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

INCOME_BAND

CREATE TABLE INCOME_BAND
(
    IB_INCOME_BAND_SK         INT                NOT NULL,
    IB_LOWER_BOUND            INTEGER                       ,
    IB_UPPER_BOUND            INTEGER                       ,
    PRIMARY KEY (IB_INCOME_BAND_SK)
)
WITH (
    distribution_key = 'IB_INCOME_BAND_SK',
    clustering_key = 'IB_INCOME_BAND_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

ITEM

CREATE TABLE ITEM
(
    I_ITEM_SK                 INT                NOT NULL,
    I_ITEM_ID                 TEXT              NOT NULL,
    I_REC_START_DATE          DATE                          ,
    I_REC_END_DATE            DATE                          ,
    I_ITEM_DESC               TEXT                  ,
    I_CURRENT_PRICE           DECIMAL(7,2)                  ,
    I_WHOLESALE_COST          DECIMAL(7,2)                  ,
    I_BRAND_ID                INTEGER                       ,
    I_BRAND                   TEXT                      ,
    I_CLASS_ID                INTEGER                       ,
    I_CLASS                   TEXT                      ,
    I_CATEGORY_ID             INTEGER                       ,
    I_CATEGORY                TEXT                      ,
    I_MANUFACT_ID             INTEGER                       ,
    I_MANUFACT                TEXT                      ,
    I_SIZE                    TEXT                      ,
    I_FORMULATION             TEXT                      ,
    I_COLOR                   TEXT                      ,
    I_UNITS                   TEXT                      ,
    I_CONTAINER               TEXT                      ,
    I_MANAGER_ID              INTEGER                       ,
    I_PRODUCT_NAME            TEXT                      ,
    PRIMARY KEY (I_ITEM_SK)
)
WITH (
    distribution_key = 'I_ITEM_SK',
    clustering_key = 'I_ITEM_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

STORE

CREATE TABLE STORE
(
    S_STORE_SK                INT                NOT NULL,
    S_STORE_ID                TEXT              NOT NULL,
    S_REC_START_DATE          DATE                          ,
    S_REC_END_DATE            DATE                          ,
    S_CLOSED_DATE_SK          INT                       ,
    S_STORE_NAME              TEXT                   ,
    S_NUMBER_EMPLOYEES        INTEGER                       ,
    S_FLOOR_SPACE             INTEGER                       ,
    S_HOURS                   TEXT                      ,
    S_MANAGER                 TEXT                   ,
    S_MARKET_ID               INTEGER                       ,
    S_GEOGRAPHY_CLASS         TEXT                  ,
    S_MARKET_DESC             TEXT                  ,
    S_MARKET_MANAGER          TEXT                   ,
    S_DIVISION_ID             INTEGER                       ,
    S_DIVISION_NAME           TEXT                   ,
    S_COMPANY_ID              INTEGER                       ,
    S_COMPANY_NAME            TEXT                   ,
    S_STREET_NUMBER           TEXT                   ,
    S_STREET_NAME             TEXT                   ,
    S_STREET_TYPE             TEXT                      ,
    S_SUITE_NUMBER            TEXT                      ,
    S_CITY                    TEXT                   ,
    S_COUNTY                  TEXT                   ,
    S_STATE                   TEXT                       ,
    S_ZIP                     TEXT                      ,
    S_COUNTRY                 TEXT                   ,
    S_GMT_OFFSET              DECIMAL(5,2)                  ,
    S_TAX_PERCENTAGE          DECIMAL(5,2)                  ,
    PRIMARY KEY (S_STORE_SK)
)
WITH (
    distribution_key = 'S_STORE_SK',
    clustering_key = 'S_STORE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

CALL_CENTER

CREATE TABLE CALL_CENTER
(
    CC_CALL_CENTER_SK         INT                NOT NULL,
    CC_CALL_CENTER_ID         TEXT              NOT NULL,
    CC_REC_START_DATE         DATE                          ,
    CC_REC_END_DATE           DATE                          ,
    CC_CLOSED_DATE_SK         INT                        ,
    CC_OPEN_DATE_SK           INT                        ,
    CC_NAME                   TEXT                   ,
    CC_CLASS                  TEXT                   ,
    CC_EMPLOYEES              INTEGER                       ,
    CC_SQ_FT                  INTEGER                       ,
    CC_HOURS                  TEXT                      ,
    CC_MANAGER                TEXT                   ,
    CC_MKT_ID                 INTEGER                       ,
    CC_MKT_CLASS              TEXT                      ,
    CC_MKT_DESC               TEXT                  ,
    CC_MARKET_MANAGER         TEXT                   ,
    CC_DIVISION               INTEGER                       ,
    CC_DIVISION_NAME          TEXT                   ,
    CC_COMPANY                INTEGER                       ,
    CC_COMPANY_NAME           TEXT                      ,
    CC_STREET_NUMBER          TEXT                      ,
    CC_STREET_NAME            TEXT                   ,
    CC_STREET_TYPE            TEXT                      ,
    CC_SUITE_NUMBER           TEXT                      ,
    CC_CITY                   TEXT                   ,
    CC_COUNTY                 TEXT                   ,
    CC_STATE                  TEXT                       ,
    CC_ZIP                    TEXT                      ,
    CC_COUNTRY                TEXT                   ,
    CC_GMT_OFFSET             DECIMAL(5,2)                  ,
    CC_TAX_PERCENTAGE         DECIMAL(5,2)                  ,
    PRIMARY KEY (CC_CALL_CENTER_SK)
)
WITH (
    distribution_key = 'CC_CALL_CENTER_SK',
    clustering_key = 'CC_CALL_CENTER_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

CUSTOMER

CREATE TABLE CUSTOMER
(
    C_CUSTOMER_SK             INT                NOT NULL,
    C_CUSTOMER_ID             TEXT              NOT NULL,
    C_CURRENT_CDEMO_SK        INT                        ,
    C_CURRENT_HDEMO_SK        INT                        ,
    C_CURRENT_ADDR_SK         INT                        ,
    C_FIRST_SHIPTO_DATE_SK     INT                        ,
    C_FIRST_SALES_DATE_SK      INT                        ,
    C_SALUTATION              TEXT                      ,
    C_FIRST_NAME               TEXT                      ,
    C_LAST_NAME               TEXT                      ,
    C_PREFERRED_CUST_FLAG      TEXT                       ,
    C_BIRTH_DAY               INTEGER                       ,
    C_BIRTH_MONTH             INTEGER                       ,
    C_BIRTH_YEAR              INTEGER                       ,
    C_BIRTH_COUNTRY           TEXT                   ,
    C_LOGIN                   TEXT                      ,
    C_EMAIL_ADDRESS           TEXT                      ,
    C_LAST_REVIEW_DATE_SK     INT                        ,
    PRIMARY KEY (C_CUSTOMER_SK)
)
WITH (
    distribution_key = 'C_CUSTOMER_SK',
    clustering_key = 'C_CUSTOMER_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

WEB_SITE

CREATE TABLE WEB_SITE
(
    WEB_SITE_SK               INT                NOT NULL,
    WEB_SITE_ID               TEXT              NOT NULL,
    WEB_REC_START_DATE        DATE                          ,
    WEB_REC_END_DATE          DATE                          ,
    WEB_NAME                  TEXT                   ,
    WEB_OPEN_DATE_SK          INT                        ,
    WEB_CLOSE_DATE_SK         INT                        ,
    WEB_CLASS                 TEXT                   ,
    WEB_MANAGER               TEXT                   ,
    WEB_MKT_ID                INTEGER                       ,
    WEB_MKT_CLASS             TEXT                   ,
    WEB_MKT_DESC              TEXT                  ,
    WEB_MARKET_MANAGER        TEXT                   ,
    WEB_COMPANY_ID            INTEGER                       ,
    WEB_COMPANY_NAME          TEXT                      ,
    WEB_STREET_NUMBER         TEXT                      ,
    WEB_STREET_NAME           TEXT                   ,
    WEB_STREET_TYPE           TEXT                      ,
    WEB_SUITE_NUMBER          TEXT                      ,
    WEB_CITY                  TEXT                   ,
    WEB_COUNTY                TEXT                   ,
    WEB_STATE                 TEXT                       ,
    WEB_ZIP                   TEXT                      ,
    WEB_COUNTRY               TEXT                   ,
    WEB_GMT_OFFSET            DECIMAL(5,2)                  ,
    WEB_TAX_PERCENTAGE        DECIMAL(5,2)                  ,
    PRIMARY KEY (WEB_SITE_SK)
)
WITH (
    distribution_key = 'WEB_SITE_SK',
    clustering_key = 'WEB_SITE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

STORE_RETURNS

CREATE TABLE STORE_RETURNS
(
    SR_RETURNED_DATE_SK       INT                       ,
    SR_RETURN_TIME_SK         INT                       ,
    SR_ITEM_SK                INT               NOT NULL,
    SR_CUSTOMER_SK            INT                       ,
    SR_CDEMO_SK               INT                       ,
    SR_HDEMO_SK               INT                       ,
    SR_ADDR_SK                INT                       ,
    SR_STORE_SK               INT                       ,
    SR_REASON_SK              INT                       ,
    SR_TICKET_NUMBER          INT               NOT NULL,
    SR_RETURN_QUANTITY        INTEGER                      ,
    SR_RETURN_AMT             DECIMAL(7,2)                 ,
    SR_RETURN_TAX             DECIMAL(7,2)                 ,
    SR_RETURN_AMT_INC_TAX     DECIMAL(7,2)                 ,
    SR_FEE                    DECIMAL(7,2)                 ,
    SR_RETURN_SHIP_COST       DECIMAL(7,2)                 ,
    SR_REFUNDED_CASH          DECIMAL(7,2)                 ,
    SR_REVERSED_CHARGE        DECIMAL(7,2)                 ,
    SR_STORE_CREDIT           DECIMAL(7,2)                 ,
    SR_NET_LOSS               DECIMAL(7,2)
)
WITH (
    distribution_key = 'SR_ITEM_SK',
    clustering_key = 'sr_returned_date_sk,SR_ITEM_SK,SR_TICKET_NUMBER',
    colocate_with = 'CUSTOMER_ADDRESS'
);

HOUSEHOLD_DEMOGRAPHICS

CREATE TABLE HOUSEHOLD_DEMOGRAPHICS
(
    HD_DEMO_SK                INT               NOT NULL,
    HD_INCOME_BAND_SK         INT                       ,
    HD_BUY_POTENTIAL          TEXT                     ,
    HD_DEP_COUNT              INTEGER                      ,
    HD_VEHICLE_COUNT          INTEGER                      ,
    PRIMARY KEY (HD_DEMO_SK)
)
WITH (
    distribution_key = 'HD_DEMO_SK',
    clustering_key = 'HD_DEMO_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

WEB_PAGE

CREATE TABLE WEB_PAGE
(
    WP_WEB_PAGE_SK            INT               NOT NULL,
    WP_WEB_PAGE_ID            TEXT              NOT NULL,
    WP_REC_START_DATE         DATE                          ,
    WP_REC_END_DATE           DATE                          ,
    WP_CREATION_DATE_SK       INT                       ,
    WP_ACCESS_DATE_SK         INT                       ,
    WP_AUTOGEN_FLAG           TEXT                       ,
    WP_CUSTOMER_SK            INT                       ,
    WP_URL                    TEXT                  ,
    WP_TYPE                   TEXT                      ,
    WP_CHAR_COUNT             INTEGER                       ,
    WP_LINK_COUNT             INTEGER                       ,
    WP_IMAGE_COUNT            INTEGER                       ,
    WP_MAX_AD_COUNT           INTEGER                       ,
    PRIMARY KEY (WP_WEB_PAGE_SK)
)
WITH (
    distribution_key = 'WP_WEB_PAGE_SK',
    clustering_key = 'WP_WEB_PAGE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

PROMOTION

CREATE TABLE PROMOTION
(
    P_PROMO_SK                INT               NOT NULL,
    P_PROMO_ID                TEXT              NOT NULL,
    P_START_DATE_SK           INT                       ,
    P_END_DATE_SK             INT                       ,
    P_ITEM_SK                 INT                       ,
    P_COST                    DECIMAL(15,2)                 ,
    P_RESPONSE_TARGET         INTEGER                       ,
    P_PROMO_NAME              TEXT                      ,
    P_CHANNEL_DMAIL           TEXT                       ,
    P_CHANNEL_EMAIL           TEXT                       ,
    P_CHANNEL_CATALOG         TEXT                       ,
    P_CHANNEL_TV              TEXT                       ,
    P_CHANNEL_RADIO           TEXT                       ,
    P_CHANNEL_PRESS           TEXT                       ,
    P_CHANNEL_EVENT           TEXT                       ,
    P_CHANNEL_DEMO            TEXT                       ,
    P_CHANNEL_DETAILS         TEXT                  ,
    P_PURPOSE                 TEXT                      ,
    P_DISCOUNT_ACTIVE         TEXT                       ,
    PRIMARY KEY (P_PROMO_SK)
)
WITH (
    distribution_key = 'P_PROMO_SK',
    clustering_key = 'P_PROMO_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

CATALOG_PAGE

CREATE TABLE CATALOG_PAGE
(
   CP_CATALOG_PAGE_SK        INT               NOT NULL,
    CP_CATALOG_PAGE_ID        TEXT              NOT NULL,
    CP_START_DATE_SK          INT                       ,
    CP_END_DATE_SK            INT                       ,
    CP_DEPARTMENT             TEXT                   ,
    CP_CATALOG_NUMBER         INTEGER                       ,
    CP_CATALOG_PAGE_NUMBER    INTEGER                       ,
    CP_DESCRIPTION            TEXT                  ,
    CP_TYPE                   TEXT                  ,
    PRIMARY KEY (CP_CATALOG_PAGE_SK)
)
WITH (
    distribution_key = 'CP_CATALOG_PAGE_SK',
    clustering_key = 'CP_CATALOG_PAGE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

INVENTORY

CREATE TABLE INVENTORY
(
    INV_DATE_SK               INT               NOT NULL,
    INV_ITEM_SK               INT               NOT NULL,
    INV_WAREHOUSE_SK          INT               NOT NULL,
    INV_QUANTITY_ON_HAND      INTEGER
)
WITH (
    distribution_key = 'inv_item_sk',
    clustering_key = 'inv_date_sk, INV_ITEM_SK, INV_WAREHOUSE_SK',
    colocate_with = 'CUSTOMER_ADDRESS'
);

CATALOG_RETURNS

CREATE TABLE CATALOG_RETURNS
(
    CR_RETURNED_DATE_SK       INT                       ,
    CR_RETURNED_TIME_SK       INT                       ,
    CR_ITEM_SK                INT               NOT NULL,
    CR_REFUNDED_CUSTOMER_SK   INT                       ,
    CR_REFUNDED_CDEMO_SK      INT                       ,
    CR_REFUNDED_HDEMO_SK      INT                       ,
    CR_REFUNDED_ADDR_SK       INT                       ,
    CR_RETURNING_CUSTOMER_SK  INT                       ,
    CR_RETURNING_CDEMO_SK     INT                       ,
    CR_RETURNING_HDEMO_SK     INT                       ,
    CR_RETURNING_ADDR_SK      INT                       ,
    CR_CALL_CENTER_SK         INT                       ,
    CR_CATALOG_PAGE_SK        INT                       ,
    CR_SHIP_MODE_SK           INT                       ,
    CR_WAREHOUSE_SK           INT                       ,
    CR_REASON_SK              INT                       ,
    CR_ORDER_NUMBER           INT               NOT NULL,
    CR_RETURN_QUANTITY        INTEGER                       ,
    CR_RETURN_AMOUNT          DECIMAL(7,2)                  ,
    CR_RETURN_TAX             DECIMAL(7,2)                  ,
    CR_RETURN_AMT_INC_TAX     DECIMAL(7,2)                  ,
    CR_FEE                    DECIMAL(7,2)                  ,
    CR_RETURN_SHIP_COST       DECIMAL(7,2)                  ,
    CR_REFUNDED_CASH          DECIMAL(7,2)                  ,
    CR_REVERSED_CHARGE        DECIMAL(7,2)                  ,
    CR_STORE_CREDIT           DECIMAL(7,2)                  ,
    CR_NET_LOSS               DECIMAL(7,2)
)
WITH (
    distribution_key = 'CR_ITEM_SK',
    clustering_key = 'cr_returned_date_sk,CR_ITEM_SK,CR_ORDER_NUMBER',
    colocate_with = 'CUSTOMER_ADDRESS'
);

WEB_RETURNS

CREATE TABLE WEB_RETURNS
(
    WR_RETURNED_DATE_SK       INT                       ,
    WR_RETURNED_TIME_SK       INT                       ,
    WR_ITEM_SK                INT               NOT NULL,
    WR_REFUNDED_CUSTOMER_SK   INT                       ,
    WR_REFUNDED_CDEMO_SK      INT                       ,
    WR_REFUNDED_HDEMO_SK      INT                       ,
    WR_REFUNDED_ADDR_SK       INT                       ,
    WR_RETURNING_CUSTOMER_SK  INT                       ,
    WR_RETURNING_CDEMO_SK     INT                       ,
    WR_RETURNING_HDEMO_SK     INT                       ,
    WR_RETURNING_ADDR_SK      INT                       ,
    WR_WEB_PAGE_SK            INT                       ,
    WR_REASON_SK              INT                       ,
    WR_ORDER_NUMBER           INT               NOT NULL,
    WR_RETURN_QUANTITY        INTEGER                       ,
    WR_RETURN_AMT             DECIMAL(7,2)                  ,
    WR_RETURN_TAX             DECIMAL(7,2)                  ,
    WR_RETURN_AMT_INC_TAX     DECIMAL(7,2)                  ,
    WR_FEE                    DECIMAL(7,2)                  ,
    WR_RETURN_SHIP_COST       DECIMAL(7,2)                  ,
    WR_REFUNDED_CASH          DECIMAL(7,2)                  ,
    WR_REVERSED_CHARGE        DECIMAL(7,2)                  ,
    WR_ACCOUNT_CREDIT         DECIMAL(7,2)                  ,
    WR_NET_LOSS               DECIMAL(7,2)
)
WITH (
    distribution_key = 'WR_ITEM_SK',
    clustering_key = 'wr_returned_date_sk, WR_ITEM_SK,WR_ORDER_NUMBER',
    colocate_with = 'CUSTOMER_ADDRESS'
);

WEB_SALES

CREATE TABLE WEB_SALES
(
    WS_SOLD_DATE_SK           INT                       ,
    WS_SOLD_TIME_SK           INT                       ,
    WS_SHIP_DATE_SK           INT                       ,
    WS_ITEM_SK                INT               NOT NULL,
    WS_BILL_CUSTOMER_SK       INT                       ,
    WS_BILL_CDEMO_SK          INT                       ,
    WS_BILL_HDEMO_SK          INT                       ,
    WS_BILL_ADDR_SK           INT                       ,
    WS_SHIP_CUSTOMER_SK       INT                       ,
    WS_SHIP_CDEMO_SK          INT                       ,
    WS_SHIP_HDEMO_SK          INT                       ,
    WS_SHIP_ADDR_SK           INT                       ,
    WS_WEB_PAGE_SK            INT                       ,
    WS_WEB_SITE_SK            INT                       ,
    WS_SHIP_MODE_SK           INT                       ,
    WS_WAREHOUSE_SK           INT                       ,
    WS_PROMO_SK               INT                       ,
    WS_ORDER_NUMBER           INT               NOT NULL,
    WS_QUANTITY               INTEGER                       ,
    WS_WHOLESALE_COST         DECIMAL(7,2)                  ,
    WS_LIST_PRICE             DECIMAL(7,2)                  ,
    WS_SALES_PRICE            DECIMAL(7,2)                  ,
    WS_EXT_DISCOUNT_AMT       DECIMAL(7,2)                  ,
    WS_EXT_SALES_PRICE        DECIMAL(7,2)                  ,
    WS_EXT_WHOLESALE_COST     DECIMAL(7,2)                  ,
    WS_EXT_LIST_PRICE         DECIMAL(7,2)                  ,
    WS_EXT_TAX                DECIMAL(7,2)                  ,
    WS_COUPON_AMT             DECIMAL(7,2)                  ,
    WS_EXT_SHIP_COST          DECIMAL(7,2)                  ,
    WS_NET_PAID               DECIMAL(7,2)                  ,
    WS_NET_PAID_INC_TAX       DECIMAL(7,2)                  ,
    WS_NET_PAID_INC_SHIP      DECIMAL(7,2)                  ,
    WS_NET_PAID_INC_SHIP_TAX  DECIMAL(7,2)                  ,
    WS_NET_PROFIT             DECIMAL(7,2)
)
WITH (
    distribution_key = 'WS_ITEM_SK',
    clustering_key = 'ws_sold_date_sk,WS_ITEM_SK,WS_ORDER_NUMBER',
    colocate_with = 'CUSTOMER_ADDRESS'
);

CATALOG_SALES

CREATE TABLE CATALOG_SALES
(
    CS_SOLD_DATE_SK           INT                       ,
    CS_SOLD_TIME_SK           INT                       ,
    CS_SHIP_DATE_SK           INT                       ,
    CS_BILL_CUSTOMER_SK       INT                       ,
    CS_BILL_CDEMO_SK          INT                       ,
    CS_BILL_HDEMO_SK          INT                       ,
    CS_BILL_ADDR_SK           INT                       ,
    CS_SHIP_CUSTOMER_SK       INT                       ,
    CS_SHIP_CDEMO_SK          INT                       ,
    CS_SHIP_HDEMO_SK          INT                       ,
    CS_SHIP_ADDR_SK           INT                       ,
    CS_CALL_CENTER_SK         INT                       ,
    CS_CATALOG_PAGE_SK        INT                       ,
    CS_SHIP_MODE_SK           INT                       ,
    CS_WAREHOUSE_SK           INT                       ,
    CS_ITEM_SK                INT               NOT NULL,
    CS_PROMO_SK               INT                       ,
    CS_ORDER_NUMBER           INT               NOT NULL,
    CS_QUANTITY               INTEGER                       ,
    CS_WHOLESALE_COST         DECIMAL(7,2)                  ,
    CS_LIST_PRICE             DECIMAL(7,2)                  ,
    CS_SALES_PRICE            DECIMAL(7,2)                  ,
    CS_EXT_DISCOUNT_AMT       DECIMAL(7,2)                  ,
    CS_EXT_SALES_PRICE        DECIMAL(7,2)                  ,
    CS_EXT_WHOLESALE_COST     DECIMAL(7,2)                  ,
    CS_EXT_LIST_PRICE         DECIMAL(7,2)                  ,
    CS_EXT_TAX                DECIMAL(7,2)                  ,
    CS_COUPON_AMT             DECIMAL(7,2)                  ,
    CS_EXT_SHIP_COST          DECIMAL(7,2)                  ,
    CS_NET_PAID               DECIMAL(7,2)                  ,
    CS_NET_PAID_INC_TAX       DECIMAL(7,2)                  ,
    CS_NET_PAID_INC_SHIP      DECIMAL(7,2)                  ,
    CS_NET_PAID_INC_SHIP_TAX  DECIMAL(7,2)                  ,
    CS_NET_PROFIT             DECIMAL(7,2)
)
WITH (
    distribution_key = 'CS_ITEM_SK',
    clustering_key = 'cs_sold_date_sk,CS_ITEM_SK,CS_ORDER_NUMBER',
    colocate_with = 'CUSTOMER_ADDRESS'
);

STORE_SALES

CREATE TABLE STORE_SALES
(
    SS_SOLD_DATE_SK           INT                       ,
    SS_SOLD_TIME_SK           INT                       ,
    SS_ITEM_SK                INT               NOT NULL,
    SS_CUSTOMER_SK            INT                       ,
    SS_CDEMO_SK               INT                       ,
    SS_HDEMO_SK               INT                       ,
    SS_ADDR_SK                INT                       ,
    SS_STORE_SK               INT                       ,
    SS_PROMO_SK               INT                       ,
    SS_TICKET_NUMBER          INT               NOT NULL,
    SS_QUANTITY               INTEGER                       ,
    SS_WHOLESALE_COST         DECIMAL(7,2)                  ,
    SS_LIST_PRICE             DECIMAL(7,2)                  ,
    SS_SALES_PRICE            DECIMAL(7,2)                  ,
    SS_EXT_DISCOUNT_AMT       DECIMAL(7,2)                  ,
    SS_EXT_SALES_PRICE        DECIMAL(7,2)                  ,
    SS_EXT_WHOLESALE_COST     DECIMAL(7,2)                  ,
    SS_EXT_LIST_PRICE         DECIMAL(7,2)                  ,
    SS_EXT_TAX                DECIMAL(7,2)                  ,
    SS_COUPON_AMT             DECIMAL(7,2)                  ,
    SS_NET_PAID               DECIMAL(7,2)                  ,
    SS_NET_PAID_INC_TAX       DECIMAL(7,2)                  ,
    SS_NET_PROFIT             DECIMAL(7,2)
)
WITH (
    distribution_key = 'SS_ITEM_SK',
    clustering_key = 'ss_sold_date_sk, SS_ITEM_SK,SS_TICKET_NUMBER',
    colocate_with = 'CUSTOMER_ADDRESS'
);

Data scale

The data volume is controlled by the scale_factor parameter. 1 SF corresponds to 1 GB of raw data. The default value is 1000, which generates 1 TB of raw data.

Note

The data volume mentioned applies only to the raw data. It does not include space for indexes. Reserve additional space when you prepare the environment.

Prepare the test environment

Hologres instance requirements

Parameter Requirement Description
Version Hologres V4.2 or later Earlier versions do not support some of the syntax and features used by this test toolkit.
Compute resources 64 cores or more 96 CU is recommended.
Billing method Dedicated pay-as-you-go instance Ensures exclusive resources and stable test results.

ECS instance requirements

Parameter Specification Description
Instance type ecs.g6.4xlarge or higher Ensures adequate performance for data generation and loading.
Image Alibaba Cloud Linux 3.2104 LTS 64-bit Provides the best compatibility.
Data disk ESSD cloud disk The capacity must be larger than the data volume that corresponds to scale_factor. For example, reserve at least 1.5 TB of disk space for a 1000 GB dataset.

Network requirements

  • The ECS instance and the Hologres instance must reside in the same VPC.

  • The ECS instance must be able to access the Hologres instance through its VPC endpoint.

Download and configure the test tool

Step 1: Log on to the ECS instance

Connect to the ECS instance over SSH. For more information, see Connect to a Linux instance by using a password.

Step 2: Install the PSQL client

Run the following command to install the PostgreSQL client:

sudo yum install postgresql

Step 3: Download and extract the test toolkit

Download the benchmark_online_tpcds.tar.gz test toolkit. Upload it to the ECS instance, then run the following commands to extract the toolkit and switch to the working directory:

tar xvf benchmark_online_tpcds.tar.gz
cd holo_tpcds

Step 4: Configure the Hologres connection

Edit the holo.conf file and fill in the connection information for your Hologres instance:

vim holo.conf

The following table describes the configuration parameters:

Parameter Description Example
host The VPC endpoint of the Hologres instance (without the port). hgxxx-xx-xxx-vpc-st.hologres.aliyuncs.com
port The VPC port of the Hologres instance. 80
user The AccessKey ID of your account. LTAI5txxx
password The AccessKey secret of your account. -
db The name of the database. If left empty, the database is automatically named tpcds_{scale_factor}g. -
scale_factor The scale factor of the dataset, which controls the size of the generated data. The default value is 1000, in GB. 1000

Run the test

Method 2: Run only the query test (when data is already loaded)

When the data has already been loaded into Hologres and you only need to rerun the queries, use the following command:

nohup ./tpcds_benchmark/run.sh > test.log 2>&1 &

End-to-end workflow steps

Step Description Script
1. Generate data Generates the TPC-DS standard dataset by using the dsdgen tool. bin/gen_data.sh
2. Create tables Creates the 24 tables in Hologres with the corresponding table properties. bin/create_tables.sh
3. Load data Loads the generated data into Hologres in parallel by using COPY. bin/load_data.sh
4. Run queries Executes the 99 TPC-DS queries and records the elapsed time of each query. bin/run_queries.sh

Query execution mode

  • Queries run in Hot Run mode (over a persistent connection) to eliminate connection-setup overhead.

  • Each query runs four times. The first run is treated as a warm-up and is not timed. The next three runs are timed, and the average is reported.

  • The cache is cleared (freecache) before each timed run to ensure fair measurements.

View test results

Test result overview

After run.sh completes, the elapsed time for each step is printed to the output:

[INFO] Step Summary:
  gen_data: 1537619 ms
  create_tables: 6457 ms
  load_data: 4013581 ms
  run_queries: 1102638 ms
  all_steps: 6660303 ms
  benchmark_total: 243168 ms

The following table describes each metric:

Metric Description
gen_data The elapsed time of data generation, in milliseconds.
create_tables The elapsed time of table creation, in milliseconds.
load_data The elapsed time of data loading, in milliseconds.
run_queries The total query execution time, including the warm-up runs, in milliseconds.
all_steps The total elapsed time of all steps, in milliseconds.
benchmark_total The effective execution time of the 99 queries (excluding warm-up), in milliseconds. This value is the performance benchmark.

Per-query results

The average execution time of each query is written to the log in the following format:

q1    45.32
q2    67.89
...
q99   123.45
total 5432.10

Notes

  • To reduce variables that might affect the test results, create a new instance for each test. Do not use instances that have been upgraded or downgraded.

  • Make sure that the ECS instance has sufficient disk space. Generating 1 TB of data requires at least 1.5 TB of free disk space.

  • Do not run other business workloads on the same Hologres instance during the test, because they affect the accuracy of the results.

  • Use nohup to run the test script in the background so that an SSH disconnection does not interrupt the test.

  • When comparing multiple test runs, use the same instance specification and data scale for each run.

Known limitations

  • The Hologres instance must be on V4.2 or later. Earlier versions do not support some of the syntax and features used by this test toolkit.

  • The instance must have at least 64 cores of compute resources. Smaller instances may experience query timeouts or out-of-memory errors.

  • The default value of scale_factor is 1000 (which corresponds to 1 TB of data). To adjust the data scale, modify the scale_factor parameter in holo.conf.

  • The data generation step depends on local disk space. The generated dataset is approximately the number of GB specified by scale_factor. Make sure that sufficient disk space is available.

  • When you specify the -f parameter, the existing database is dropped and recreated. All data in any existing database with the same name is removed. Use this option with care.

  • The query results produced by this toolkit are for performance reference only. They do not constitute official TPC-DS benchmark results.