This topic describes how to use TPC-DS, a decision support benchmark, to run performance tests for OLAP query scenarios in Hologres.
Introduction to TPC-DS
TPC-DS is a decision support benchmark developed by the Transaction Processing Performance Council (TPC). It simulates complex decision support system scenarios that involve multi-dimensional analytical queries.
The TPC-DS benchmark has the following characteristics:
-
Includes 24 tables (7 fact tables and 17 dimension tables) that cover multiple analytical dimensions of the retail business.
-
Includes 99 complex OLAP queries that exercise advanced SQL features such as multi-table joins, subqueries, and window functions.
-
Uses the response time of each query as the primary evaluation metric, which reflects the system's processing capability for complex analytical workloads.
For more information, see TPC Benchmark DS (TPC-DS).
The TPC-DS implementation in this topic is based on TPC-DS benchmarking. The results cannot be compared with published TPC-DS benchmark results. The tests in this topic do not comply with all the requirements of TPC-DS benchmarking.
Dataset introduction
The TPC-DS dataset contains 24 tables, which are divided into fact tables and dimension tables.
Fact tables (7)
| Table | Description |
|---|---|
| STORE_SALES | Store sales fact table. |
| STORE_RETURNS | Store returns fact table. |
| CATALOG_SALES | Catalog sales fact table. |
| CATALOG_RETURNS | Catalog returns fact table. |
| WEB_SALES | Web sales fact table. |
| WEB_RETURNS | Web returns fact table. |
| INVENTORY | Inventory fact table. |
Dimension tables (17)
| Table | Description |
|---|---|
| CUSTOMER_ADDRESS | Customer address dimension table. |
| CUSTOMER_DEMOGRAPHICS | Customer demographics dimension table. |
| DATE_DIM | Date dimension table. |
| WAREHOUSE | Warehouse dimension table. |
| SHIP_MODE | Shipping mode dimension table. |
| TIME_DIM | Time dimension table. |
| REASON | Return reason dimension table. |
| INCOME_BAND | Income band dimension table. |
| ITEM | Item dimension table. |
| STORE | Store dimension table. |
| CALL_CENTER | Call center dimension table. |
| CUSTOMER | Customer dimension table. |
| WEB_SITE | Website dimension table. |
| HOUSEHOLD_DEMOGRAPHICS | Household demographics dimension table. |
| WEB_PAGE | Web page dimension table. |
| PROMOTION | Promotion dimension table. |
| CATALOG_PAGE | Catalog page dimension table. |
Table definitions (DDL)
The following sections provide the complete CREATE TABLE statements for all 24 TPC-DS tables, including the schema definitions and Hologres-specific table properties.
CUSTOMER_ADDRESS
CREATE TABLE CUSTOMER_ADDRESS
(
CA_ADDRESS_SK INT NOT NULL,
ca_address_id TEXT NOT NULL,
ca_street_number TEXT ,
CA_STREET_NAME TEXT ,
CA_STREET_TYPE TEXT ,
CA_SUITE_NUMBER TEXT ,
CA_CITY TEXT ,
CA_COUNTY TEXT ,
CA_STATE TEXT ,
CA_ZIP TEXT ,
CA_COUNTRY TEXT ,
CA_GMT_OFFSET DECIMAL(5,2) ,
CA_LOCATION_TYPE TEXT ,
PRIMARY KEY (CA_ADDRESS_SK)
)
WITH (
shard_count = '64',
distribution_key = 'CA_ADDRESS_SK',
clustering_key = 'CA_ADDRESS_SK'
);
CUSTOMER_DEMOGRAPHICS
CREATE TABLE CUSTOMER_DEMOGRAPHICS
(
CD_DEMO_SK INT NOT NULL,
CD_GENDER TEXT ,
CD_MARITAL_STATUS TEXT ,
CD_EDUCATION_STATUS TEXT ,
CD_PURCHASE_ESTIMATE INTEGER ,
CD_CREDIT_RATING TEXT ,
CD_DEP_COUNT INTEGER ,
CD_DEP_EMPLOYED_COUNT INTEGER ,
CD_DEP_COLLEGE_COUNT INTEGER ,
PRIMARY KEY (CD_DEMO_SK)
)
WITH (
distribution_key = 'CD_DEMO_SK',
clustering_key = 'CD_DEMO_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
DATE_DIM
CREATE TABLE DATE_DIM
(
D_DATE_SK INT NOT NULL,
D_DATE_ID TEXT NOT NULL,
D_DATE DATE NOT NULL,
D_MONTH_SEQ INTEGER ,
D_WEEK_SEQ INTEGER ,
D_QUARTER_SEQ INTEGER ,
D_YEAR INTEGER ,
D_DOW INTEGER ,
D_MOY INTEGER ,
D_DOM INTEGER ,
D_QOY INTEGER ,
D_FY_YEAR INTEGER ,
D_FY_QUARTER_SEQ INTEGER ,
D_FY_WEEK_SEQ INTEGER ,
D_DAY_NAME TEXT ,
D_QUARTER_NAME TEXT ,
D_HOLIDAY TEXT ,
D_WEEKEND TEXT ,
D_FOLLOWING_HOLIDAY TEXT ,
D_FIRST_DOM INTEGER ,
D_LAST_DOM INTEGER ,
D_SAME_DAY_LY INTEGER ,
D_SAME_DAY_LQ INTEGER ,
D_CURRENT_DAY TEXT ,
D_CURRENT_WEEK TEXT ,
D_CURRENT_MONTH TEXT ,
D_CURRENT_QUARTER TEXT ,
D_CURRENT_YEAR TEXT ,
PRIMARY KEY (D_DATE_SK)
)
WITH (
distribution_key = 'D_DATE_SK',
clustering_key = 'D_DATE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
WAREHOUSE
CREATE TABLE WAREHOUSE
(
W_WAREHOUSE_SK INT NOT NULL,
W_WAREHOUSE_ID TEXT NOT NULL,
W_WAREHOUSE_NAME TEXT ,
W_WAREHOUSE_SQ_FT INTEGER ,
W_STREET_NUMBER TEXT ,
W_STREET_NAME TEXT ,
W_STREET_TYPE TEXT ,
W_SUITE_NUMBER TEXT ,
W_CITY TEXT ,
W_COUNTY TEXT ,
W_STATE TEXT ,
W_ZIP TEXT ,
W_COUNTRY TEXT ,
W_GMT_OFFSET DECIMAL(5,2) ,
PRIMARY KEY (W_WAREHOUSE_SK)
)
WITH (
distribution_key = 'W_WAREHOUSE_SK',
clustering_key = 'W_WAREHOUSE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
SHIP_MODE
CREATE TABLE SHIP_MODE
(
SM_SHIP_MODE_SK INT NOT NULL,
SM_SHIP_MODE_ID TEXT NOT NULL,
SM_TYPE TEXT ,
SM_CODE TEXT ,
SM_CARRIER TEXT ,
SM_CONTRACT TEXT ,
PRIMARY KEY (SM_SHIP_MODE_SK)
)
WITH (
distribution_key = 'SM_SHIP_MODE_SK',
clustering_key = 'SM_SHIP_MODE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
TIME_DIM
CREATE TABLE TIME_DIM
(
T_TIME_SK INT NOT NULL,
T_TIME_ID TEXT NOT NULL,
T_TIME INTEGER NOT NULL,
T_HOUR INTEGER ,
T_MINUTE INTEGER ,
T_SECOND INTEGER ,
T_AM_PM TEXT ,
T_SHIFT TEXT ,
T_SUB_SHIFT TEXT ,
T_MEAL_TIME TEXT ,
PRIMARY KEY (T_TIME_SK)
)
WITH (
distribution_key = 'T_TIME_SK',
clustering_key = 'T_TIME_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
REASON
CREATE TABLE REASON
(
R_REASON_SK INT NOT NULL,
R_REASON_ID TEXT NOT NULL,
R_REASON_DESC TEXT ,
PRIMARY KEY (R_REASON_SK)
)
WITH (
distribution_key = 'R_REASON_SK',
clustering_key = 'R_REASON_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
INCOME_BAND
CREATE TABLE INCOME_BAND
(
IB_INCOME_BAND_SK INT NOT NULL,
IB_LOWER_BOUND INTEGER ,
IB_UPPER_BOUND INTEGER ,
PRIMARY KEY (IB_INCOME_BAND_SK)
)
WITH (
distribution_key = 'IB_INCOME_BAND_SK',
clustering_key = 'IB_INCOME_BAND_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
ITEM
CREATE TABLE ITEM
(
I_ITEM_SK INT NOT NULL,
I_ITEM_ID TEXT NOT NULL,
I_REC_START_DATE DATE ,
I_REC_END_DATE DATE ,
I_ITEM_DESC TEXT ,
I_CURRENT_PRICE DECIMAL(7,2) ,
I_WHOLESALE_COST DECIMAL(7,2) ,
I_BRAND_ID INTEGER ,
I_BRAND TEXT ,
I_CLASS_ID INTEGER ,
I_CLASS TEXT ,
I_CATEGORY_ID INTEGER ,
I_CATEGORY TEXT ,
I_MANUFACT_ID INTEGER ,
I_MANUFACT TEXT ,
I_SIZE TEXT ,
I_FORMULATION TEXT ,
I_COLOR TEXT ,
I_UNITS TEXT ,
I_CONTAINER TEXT ,
I_MANAGER_ID INTEGER ,
I_PRODUCT_NAME TEXT ,
PRIMARY KEY (I_ITEM_SK)
)
WITH (
distribution_key = 'I_ITEM_SK',
clustering_key = 'I_ITEM_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
STORE
CREATE TABLE STORE
(
S_STORE_SK INT NOT NULL,
S_STORE_ID TEXT NOT NULL,
S_REC_START_DATE DATE ,
S_REC_END_DATE DATE ,
S_CLOSED_DATE_SK INT ,
S_STORE_NAME TEXT ,
S_NUMBER_EMPLOYEES INTEGER ,
S_FLOOR_SPACE INTEGER ,
S_HOURS TEXT ,
S_MANAGER TEXT ,
S_MARKET_ID INTEGER ,
S_GEOGRAPHY_CLASS TEXT ,
S_MARKET_DESC TEXT ,
S_MARKET_MANAGER TEXT ,
S_DIVISION_ID INTEGER ,
S_DIVISION_NAME TEXT ,
S_COMPANY_ID INTEGER ,
S_COMPANY_NAME TEXT ,
S_STREET_NUMBER TEXT ,
S_STREET_NAME TEXT ,
S_STREET_TYPE TEXT ,
S_SUITE_NUMBER TEXT ,
S_CITY TEXT ,
S_COUNTY TEXT ,
S_STATE TEXT ,
S_ZIP TEXT ,
S_COUNTRY TEXT ,
S_GMT_OFFSET DECIMAL(5,2) ,
S_TAX_PERCENTAGE DECIMAL(5,2) ,
PRIMARY KEY (S_STORE_SK)
)
WITH (
distribution_key = 'S_STORE_SK',
clustering_key = 'S_STORE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
CALL_CENTER
CREATE TABLE CALL_CENTER
(
CC_CALL_CENTER_SK INT NOT NULL,
CC_CALL_CENTER_ID TEXT NOT NULL,
CC_REC_START_DATE DATE ,
CC_REC_END_DATE DATE ,
CC_CLOSED_DATE_SK INT ,
CC_OPEN_DATE_SK INT ,
CC_NAME TEXT ,
CC_CLASS TEXT ,
CC_EMPLOYEES INTEGER ,
CC_SQ_FT INTEGER ,
CC_HOURS TEXT ,
CC_MANAGER TEXT ,
CC_MKT_ID INTEGER ,
CC_MKT_CLASS TEXT ,
CC_MKT_DESC TEXT ,
CC_MARKET_MANAGER TEXT ,
CC_DIVISION INTEGER ,
CC_DIVISION_NAME TEXT ,
CC_COMPANY INTEGER ,
CC_COMPANY_NAME TEXT ,
CC_STREET_NUMBER TEXT ,
CC_STREET_NAME TEXT ,
CC_STREET_TYPE TEXT ,
CC_SUITE_NUMBER TEXT ,
CC_CITY TEXT ,
CC_COUNTY TEXT ,
CC_STATE TEXT ,
CC_ZIP TEXT ,
CC_COUNTRY TEXT ,
CC_GMT_OFFSET DECIMAL(5,2) ,
CC_TAX_PERCENTAGE DECIMAL(5,2) ,
PRIMARY KEY (CC_CALL_CENTER_SK)
)
WITH (
distribution_key = 'CC_CALL_CENTER_SK',
clustering_key = 'CC_CALL_CENTER_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
CUSTOMER
CREATE TABLE CUSTOMER
(
C_CUSTOMER_SK INT NOT NULL,
C_CUSTOMER_ID TEXT NOT NULL,
C_CURRENT_CDEMO_SK INT ,
C_CURRENT_HDEMO_SK INT ,
C_CURRENT_ADDR_SK INT ,
C_FIRST_SHIPTO_DATE_SK INT ,
C_FIRST_SALES_DATE_SK INT ,
C_SALUTATION TEXT ,
C_FIRST_NAME TEXT ,
C_LAST_NAME TEXT ,
C_PREFERRED_CUST_FLAG TEXT ,
C_BIRTH_DAY INTEGER ,
C_BIRTH_MONTH INTEGER ,
C_BIRTH_YEAR INTEGER ,
C_BIRTH_COUNTRY TEXT ,
C_LOGIN TEXT ,
C_EMAIL_ADDRESS TEXT ,
C_LAST_REVIEW_DATE_SK INT ,
PRIMARY KEY (C_CUSTOMER_SK)
)
WITH (
distribution_key = 'C_CUSTOMER_SK',
clustering_key = 'C_CUSTOMER_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
WEB_SITE
CREATE TABLE WEB_SITE
(
WEB_SITE_SK INT NOT NULL,
WEB_SITE_ID TEXT NOT NULL,
WEB_REC_START_DATE DATE ,
WEB_REC_END_DATE DATE ,
WEB_NAME TEXT ,
WEB_OPEN_DATE_SK INT ,
WEB_CLOSE_DATE_SK INT ,
WEB_CLASS TEXT ,
WEB_MANAGER TEXT ,
WEB_MKT_ID INTEGER ,
WEB_MKT_CLASS TEXT ,
WEB_MKT_DESC TEXT ,
WEB_MARKET_MANAGER TEXT ,
WEB_COMPANY_ID INTEGER ,
WEB_COMPANY_NAME TEXT ,
WEB_STREET_NUMBER TEXT ,
WEB_STREET_NAME TEXT ,
WEB_STREET_TYPE TEXT ,
WEB_SUITE_NUMBER TEXT ,
WEB_CITY TEXT ,
WEB_COUNTY TEXT ,
WEB_STATE TEXT ,
WEB_ZIP TEXT ,
WEB_COUNTRY TEXT ,
WEB_GMT_OFFSET DECIMAL(5,2) ,
WEB_TAX_PERCENTAGE DECIMAL(5,2) ,
PRIMARY KEY (WEB_SITE_SK)
)
WITH (
distribution_key = 'WEB_SITE_SK',
clustering_key = 'WEB_SITE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
STORE_RETURNS
CREATE TABLE STORE_RETURNS
(
SR_RETURNED_DATE_SK INT ,
SR_RETURN_TIME_SK INT ,
SR_ITEM_SK INT NOT NULL,
SR_CUSTOMER_SK INT ,
SR_CDEMO_SK INT ,
SR_HDEMO_SK INT ,
SR_ADDR_SK INT ,
SR_STORE_SK INT ,
SR_REASON_SK INT ,
SR_TICKET_NUMBER INT NOT NULL,
SR_RETURN_QUANTITY INTEGER ,
SR_RETURN_AMT DECIMAL(7,2) ,
SR_RETURN_TAX DECIMAL(7,2) ,
SR_RETURN_AMT_INC_TAX DECIMAL(7,2) ,
SR_FEE DECIMAL(7,2) ,
SR_RETURN_SHIP_COST DECIMAL(7,2) ,
SR_REFUNDED_CASH DECIMAL(7,2) ,
SR_REVERSED_CHARGE DECIMAL(7,2) ,
SR_STORE_CREDIT DECIMAL(7,2) ,
SR_NET_LOSS DECIMAL(7,2)
)
WITH (
distribution_key = 'SR_ITEM_SK',
clustering_key = 'sr_returned_date_sk,SR_ITEM_SK,SR_TICKET_NUMBER',
colocate_with = 'CUSTOMER_ADDRESS'
);
HOUSEHOLD_DEMOGRAPHICS
CREATE TABLE HOUSEHOLD_DEMOGRAPHICS
(
HD_DEMO_SK INT NOT NULL,
HD_INCOME_BAND_SK INT ,
HD_BUY_POTENTIAL TEXT ,
HD_DEP_COUNT INTEGER ,
HD_VEHICLE_COUNT INTEGER ,
PRIMARY KEY (HD_DEMO_SK)
)
WITH (
distribution_key = 'HD_DEMO_SK',
clustering_key = 'HD_DEMO_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
WEB_PAGE
CREATE TABLE WEB_PAGE
(
WP_WEB_PAGE_SK INT NOT NULL,
WP_WEB_PAGE_ID TEXT NOT NULL,
WP_REC_START_DATE DATE ,
WP_REC_END_DATE DATE ,
WP_CREATION_DATE_SK INT ,
WP_ACCESS_DATE_SK INT ,
WP_AUTOGEN_FLAG TEXT ,
WP_CUSTOMER_SK INT ,
WP_URL TEXT ,
WP_TYPE TEXT ,
WP_CHAR_COUNT INTEGER ,
WP_LINK_COUNT INTEGER ,
WP_IMAGE_COUNT INTEGER ,
WP_MAX_AD_COUNT INTEGER ,
PRIMARY KEY (WP_WEB_PAGE_SK)
)
WITH (
distribution_key = 'WP_WEB_PAGE_SK',
clustering_key = 'WP_WEB_PAGE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
PROMOTION
CREATE TABLE PROMOTION
(
P_PROMO_SK INT NOT NULL,
P_PROMO_ID TEXT NOT NULL,
P_START_DATE_SK INT ,
P_END_DATE_SK INT ,
P_ITEM_SK INT ,
P_COST DECIMAL(15,2) ,
P_RESPONSE_TARGET INTEGER ,
P_PROMO_NAME TEXT ,
P_CHANNEL_DMAIL TEXT ,
P_CHANNEL_EMAIL TEXT ,
P_CHANNEL_CATALOG TEXT ,
P_CHANNEL_TV TEXT ,
P_CHANNEL_RADIO TEXT ,
P_CHANNEL_PRESS TEXT ,
P_CHANNEL_EVENT TEXT ,
P_CHANNEL_DEMO TEXT ,
P_CHANNEL_DETAILS TEXT ,
P_PURPOSE TEXT ,
P_DISCOUNT_ACTIVE TEXT ,
PRIMARY KEY (P_PROMO_SK)
)
WITH (
distribution_key = 'P_PROMO_SK',
clustering_key = 'P_PROMO_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
CATALOG_PAGE
CREATE TABLE CATALOG_PAGE
(
CP_CATALOG_PAGE_SK INT NOT NULL,
CP_CATALOG_PAGE_ID TEXT NOT NULL,
CP_START_DATE_SK INT ,
CP_END_DATE_SK INT ,
CP_DEPARTMENT TEXT ,
CP_CATALOG_NUMBER INTEGER ,
CP_CATALOG_PAGE_NUMBER INTEGER ,
CP_DESCRIPTION TEXT ,
CP_TYPE TEXT ,
PRIMARY KEY (CP_CATALOG_PAGE_SK)
)
WITH (
distribution_key = 'CP_CATALOG_PAGE_SK',
clustering_key = 'CP_CATALOG_PAGE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
INVENTORY
CREATE TABLE INVENTORY
(
INV_DATE_SK INT NOT NULL,
INV_ITEM_SK INT NOT NULL,
INV_WAREHOUSE_SK INT NOT NULL,
INV_QUANTITY_ON_HAND INTEGER
)
WITH (
distribution_key = 'inv_item_sk',
clustering_key = 'inv_date_sk, INV_ITEM_SK, INV_WAREHOUSE_SK',
colocate_with = 'CUSTOMER_ADDRESS'
);
CATALOG_RETURNS
CREATE TABLE CATALOG_RETURNS
(
CR_RETURNED_DATE_SK INT ,
CR_RETURNED_TIME_SK INT ,
CR_ITEM_SK INT NOT NULL,
CR_REFUNDED_CUSTOMER_SK INT ,
CR_REFUNDED_CDEMO_SK INT ,
CR_REFUNDED_HDEMO_SK INT ,
CR_REFUNDED_ADDR_SK INT ,
CR_RETURNING_CUSTOMER_SK INT ,
CR_RETURNING_CDEMO_SK INT ,
CR_RETURNING_HDEMO_SK INT ,
CR_RETURNING_ADDR_SK INT ,
CR_CALL_CENTER_SK INT ,
CR_CATALOG_PAGE_SK INT ,
CR_SHIP_MODE_SK INT ,
CR_WAREHOUSE_SK INT ,
CR_REASON_SK INT ,
CR_ORDER_NUMBER INT NOT NULL,
CR_RETURN_QUANTITY INTEGER ,
CR_RETURN_AMOUNT DECIMAL(7,2) ,
CR_RETURN_TAX DECIMAL(7,2) ,
CR_RETURN_AMT_INC_TAX DECIMAL(7,2) ,
CR_FEE DECIMAL(7,2) ,
CR_RETURN_SHIP_COST DECIMAL(7,2) ,
CR_REFUNDED_CASH DECIMAL(7,2) ,
CR_REVERSED_CHARGE DECIMAL(7,2) ,
CR_STORE_CREDIT DECIMAL(7,2) ,
CR_NET_LOSS DECIMAL(7,2)
)
WITH (
distribution_key = 'CR_ITEM_SK',
clustering_key = 'cr_returned_date_sk,CR_ITEM_SK,CR_ORDER_NUMBER',
colocate_with = 'CUSTOMER_ADDRESS'
);
WEB_RETURNS
CREATE TABLE WEB_RETURNS
(
WR_RETURNED_DATE_SK INT ,
WR_RETURNED_TIME_SK INT ,
WR_ITEM_SK INT NOT NULL,
WR_REFUNDED_CUSTOMER_SK INT ,
WR_REFUNDED_CDEMO_SK INT ,
WR_REFUNDED_HDEMO_SK INT ,
WR_REFUNDED_ADDR_SK INT ,
WR_RETURNING_CUSTOMER_SK INT ,
WR_RETURNING_CDEMO_SK INT ,
WR_RETURNING_HDEMO_SK INT ,
WR_RETURNING_ADDR_SK INT ,
WR_WEB_PAGE_SK INT ,
WR_REASON_SK INT ,
WR_ORDER_NUMBER INT NOT NULL,
WR_RETURN_QUANTITY INTEGER ,
WR_RETURN_AMT DECIMAL(7,2) ,
WR_RETURN_TAX DECIMAL(7,2) ,
WR_RETURN_AMT_INC_TAX DECIMAL(7,2) ,
WR_FEE DECIMAL(7,2) ,
WR_RETURN_SHIP_COST DECIMAL(7,2) ,
WR_REFUNDED_CASH DECIMAL(7,2) ,
WR_REVERSED_CHARGE DECIMAL(7,2) ,
WR_ACCOUNT_CREDIT DECIMAL(7,2) ,
WR_NET_LOSS DECIMAL(7,2)
)
WITH (
distribution_key = 'WR_ITEM_SK',
clustering_key = 'wr_returned_date_sk, WR_ITEM_SK,WR_ORDER_NUMBER',
colocate_with = 'CUSTOMER_ADDRESS'
);
WEB_SALES
CREATE TABLE WEB_SALES
(
WS_SOLD_DATE_SK INT ,
WS_SOLD_TIME_SK INT ,
WS_SHIP_DATE_SK INT ,
WS_ITEM_SK INT NOT NULL,
WS_BILL_CUSTOMER_SK INT ,
WS_BILL_CDEMO_SK INT ,
WS_BILL_HDEMO_SK INT ,
WS_BILL_ADDR_SK INT ,
WS_SHIP_CUSTOMER_SK INT ,
WS_SHIP_CDEMO_SK INT ,
WS_SHIP_HDEMO_SK INT ,
WS_SHIP_ADDR_SK INT ,
WS_WEB_PAGE_SK INT ,
WS_WEB_SITE_SK INT ,
WS_SHIP_MODE_SK INT ,
WS_WAREHOUSE_SK INT ,
WS_PROMO_SK INT ,
WS_ORDER_NUMBER INT NOT NULL,
WS_QUANTITY INTEGER ,
WS_WHOLESALE_COST DECIMAL(7,2) ,
WS_LIST_PRICE DECIMAL(7,2) ,
WS_SALES_PRICE DECIMAL(7,2) ,
WS_EXT_DISCOUNT_AMT DECIMAL(7,2) ,
WS_EXT_SALES_PRICE DECIMAL(7,2) ,
WS_EXT_WHOLESALE_COST DECIMAL(7,2) ,
WS_EXT_LIST_PRICE DECIMAL(7,2) ,
WS_EXT_TAX DECIMAL(7,2) ,
WS_COUPON_AMT DECIMAL(7,2) ,
WS_EXT_SHIP_COST DECIMAL(7,2) ,
WS_NET_PAID DECIMAL(7,2) ,
WS_NET_PAID_INC_TAX DECIMAL(7,2) ,
WS_NET_PAID_INC_SHIP DECIMAL(7,2) ,
WS_NET_PAID_INC_SHIP_TAX DECIMAL(7,2) ,
WS_NET_PROFIT DECIMAL(7,2)
)
WITH (
distribution_key = 'WS_ITEM_SK',
clustering_key = 'ws_sold_date_sk,WS_ITEM_SK,WS_ORDER_NUMBER',
colocate_with = 'CUSTOMER_ADDRESS'
);
CATALOG_SALES
CREATE TABLE CATALOG_SALES
(
CS_SOLD_DATE_SK INT ,
CS_SOLD_TIME_SK INT ,
CS_SHIP_DATE_SK INT ,
CS_BILL_CUSTOMER_SK INT ,
CS_BILL_CDEMO_SK INT ,
CS_BILL_HDEMO_SK INT ,
CS_BILL_ADDR_SK INT ,
CS_SHIP_CUSTOMER_SK INT ,
CS_SHIP_CDEMO_SK INT ,
CS_SHIP_HDEMO_SK INT ,
CS_SHIP_ADDR_SK INT ,
CS_CALL_CENTER_SK INT ,
CS_CATALOG_PAGE_SK INT ,
CS_SHIP_MODE_SK INT ,
CS_WAREHOUSE_SK INT ,
CS_ITEM_SK INT NOT NULL,
CS_PROMO_SK INT ,
CS_ORDER_NUMBER INT NOT NULL,
CS_QUANTITY INTEGER ,
CS_WHOLESALE_COST DECIMAL(7,2) ,
CS_LIST_PRICE DECIMAL(7,2) ,
CS_SALES_PRICE DECIMAL(7,2) ,
CS_EXT_DISCOUNT_AMT DECIMAL(7,2) ,
CS_EXT_SALES_PRICE DECIMAL(7,2) ,
CS_EXT_WHOLESALE_COST DECIMAL(7,2) ,
CS_EXT_LIST_PRICE DECIMAL(7,2) ,
CS_EXT_TAX DECIMAL(7,2) ,
CS_COUPON_AMT DECIMAL(7,2) ,
CS_EXT_SHIP_COST DECIMAL(7,2) ,
CS_NET_PAID DECIMAL(7,2) ,
CS_NET_PAID_INC_TAX DECIMAL(7,2) ,
CS_NET_PAID_INC_SHIP DECIMAL(7,2) ,
CS_NET_PAID_INC_SHIP_TAX DECIMAL(7,2) ,
CS_NET_PROFIT DECIMAL(7,2)
)
WITH (
distribution_key = 'CS_ITEM_SK',
clustering_key = 'cs_sold_date_sk,CS_ITEM_SK,CS_ORDER_NUMBER',
colocate_with = 'CUSTOMER_ADDRESS'
);
STORE_SALES
CREATE TABLE STORE_SALES
(
SS_SOLD_DATE_SK INT ,
SS_SOLD_TIME_SK INT ,
SS_ITEM_SK INT NOT NULL,
SS_CUSTOMER_SK INT ,
SS_CDEMO_SK INT ,
SS_HDEMO_SK INT ,
SS_ADDR_SK INT ,
SS_STORE_SK INT ,
SS_PROMO_SK INT ,
SS_TICKET_NUMBER INT NOT NULL,
SS_QUANTITY INTEGER ,
SS_WHOLESALE_COST DECIMAL(7,2) ,
SS_LIST_PRICE DECIMAL(7,2) ,
SS_SALES_PRICE DECIMAL(7,2) ,
SS_EXT_DISCOUNT_AMT DECIMAL(7,2) ,
SS_EXT_SALES_PRICE DECIMAL(7,2) ,
SS_EXT_WHOLESALE_COST DECIMAL(7,2) ,
SS_EXT_LIST_PRICE DECIMAL(7,2) ,
SS_EXT_TAX DECIMAL(7,2) ,
SS_COUPON_AMT DECIMAL(7,2) ,
SS_NET_PAID DECIMAL(7,2) ,
SS_NET_PAID_INC_TAX DECIMAL(7,2) ,
SS_NET_PROFIT DECIMAL(7,2)
)
WITH (
distribution_key = 'SS_ITEM_SK',
clustering_key = 'ss_sold_date_sk, SS_ITEM_SK,SS_TICKET_NUMBER',
colocate_with = 'CUSTOMER_ADDRESS'
);
Data scale
The data volume is controlled by the scale_factor parameter. 1 SF corresponds to 1 GB of raw data. The default value is 1000, which generates 1 TB of raw data.
The data volume mentioned applies only to the raw data. It does not include space for indexes. Reserve additional space when you prepare the environment.
Prepare the test environment
Hologres instance requirements
| Parameter | Requirement | Description |
|---|---|---|
| Version | Hologres V4.2 or later | Earlier versions do not support some of the syntax and features used by this test toolkit. |
| Compute resources | 64 cores or more | 96 CU is recommended. |
| Billing method | Dedicated pay-as-you-go instance | Ensures exclusive resources and stable test results. |
ECS instance requirements
| Parameter | Specification | Description |
|---|---|---|
| Instance type | ecs.g6.4xlarge or higher | Ensures adequate performance for data generation and loading. |
| Image | Alibaba Cloud Linux 3.2104 LTS 64-bit | Provides the best compatibility. |
| Data disk | ESSD cloud disk | The capacity must be larger than the data volume that corresponds to scale_factor. For example, reserve at least 1.5 TB of disk space for a 1000 GB dataset. |
Network requirements
-
The ECS instance and the Hologres instance must reside in the same VPC.
-
The ECS instance must be able to access the Hologres instance through its VPC endpoint.
Download and configure the test tool
Step 1: Log on to the ECS instance
Connect to the ECS instance over SSH. For more information, see Connect to a Linux instance by using a password.
Step 2: Install the PSQL client
Run the following command to install the PostgreSQL client:
sudo yum install postgresql
Step 3: Download and extract the test toolkit
Download the benchmark_online_tpcds.tar.gz test toolkit. Upload it to the ECS instance, then run the following commands to extract the toolkit and switch to the working directory:
tar xvf benchmark_online_tpcds.tar.gz
cd holo_tpcds
Step 4: Configure the Hologres connection
Edit the holo.conf file and fill in the connection information for your Hologres instance:
vim holo.conf
The following table describes the configuration parameters:
| Parameter | Description | Example |
|---|---|---|
| host | The VPC endpoint of the Hologres instance (without the port). | hgxxx-xx-xxx-vpc-st.hologres.aliyuncs.com |
| port | The VPC port of the Hologres instance. | 80 |
| user | The AccessKey ID of your account. | LTAI5txxx |
| password | The AccessKey secret of your account. | - |
| db | The name of the database. If left empty, the database is automatically named tpcds_{scale_factor}g. | - |
| scale_factor | The scale factor of the dataset, which controls the size of the generated data. The default value is 1000, in GB. | 1000 |
Run the test
Method 1: Run the end-to-end workflow (recommended for first-time testing)
The end-to-end workflow generates data, creates tables, loads data, and executes queries in a single run:
nohup ./tpcds_benchmark/run.sh -f > test.log 2>&1 &
Method 2: Run only the query test (when data is already loaded)
When the data has already been loaded into Hologres and you only need to rerun the queries, use the following command:
nohup ./tpcds_benchmark/run.sh > test.log 2>&1 &
End-to-end workflow steps
| Step | Description | Script |
|---|---|---|
| 1. Generate data | Generates the TPC-DS standard dataset by using the dsdgen tool. | bin/gen_data.sh |
| 2. Create tables | Creates the 24 tables in Hologres with the corresponding table properties. | bin/create_tables.sh |
| 3. Load data | Loads the generated data into Hologres in parallel by using COPY. | bin/load_data.sh |
| 4. Run queries | Executes the 99 TPC-DS queries and records the elapsed time of each query. | bin/run_queries.sh |
Query execution mode
-
Queries run in Hot Run mode (over a persistent connection) to eliminate connection-setup overhead.
-
Each query runs four times. The first run is treated as a warm-up and is not timed. The next three runs are timed, and the average is reported.
-
The cache is cleared (freecache) before each timed run to ensure fair measurements.
View test results
Test result overview
After run.sh completes, the elapsed time for each step is printed to the output:
[INFO] Step Summary:
gen_data: 1537619 ms
create_tables: 6457 ms
load_data: 4013581 ms
run_queries: 1102638 ms
all_steps: 6660303 ms
benchmark_total: 243168 ms
The following table describes each metric:
| Metric | Description |
|---|---|
| gen_data | The elapsed time of data generation, in milliseconds. |
| create_tables | The elapsed time of table creation, in milliseconds. |
| load_data | The elapsed time of data loading, in milliseconds. |
| run_queries | The total query execution time, including the warm-up runs, in milliseconds. |
| all_steps | The total elapsed time of all steps, in milliseconds. |
| benchmark_total | The effective execution time of the 99 queries (excluding warm-up), in milliseconds. This value is the performance benchmark. |
Per-query results
The average execution time of each query is written to the log in the following format:
q1 45.32
q2 67.89
...
q99 123.45
total 5432.10
Notes
-
To reduce variables that might affect the test results, create a new instance for each test. Do not use instances that have been upgraded or downgraded.
-
Make sure that the ECS instance has sufficient disk space. Generating 1 TB of data requires at least 1.5 TB of free disk space.
-
Do not run other business workloads on the same Hologres instance during the test, because they affect the accuracy of the results.
-
Use nohup to run the test script in the background so that an SSH disconnection does not interrupt the test.
-
When comparing multiple test runs, use the same instance specification and data scale for each run.
Known limitations
-
The Hologres instance must be on V4.2 or later. Earlier versions do not support some of the syntax and features used by this test toolkit.
-
The instance must have at least 64 cores of compute resources. Smaller instances may experience query timeouts or out-of-memory errors.
-
The default value of
scale_factoris 1000 (which corresponds to 1 TB of data). To adjust the data scale, modify thescale_factorparameter inholo.conf. -
The data generation step depends on local disk space. The generated dataset is approximately the number of GB specified by
scale_factor. Make sure that sufficient disk space is available. -
When you specify the
-fparameter, the existing database is dropped and recreated. All data in any existing database with the same name is removed. Use this option with care. -
The query results produced by this toolkit are for performance reference only. They do not constitute official TPC-DS benchmark results.