TPC-DS data

更新时间:
复制 MD 格式

MaxCompute provides public TPC-DS datasets in four sizes (10 GB, 100 GB, 1 TB, and 10 TB) for product testing.

Dataset and tables

TPC-DS is a standard benchmark from the Transaction Processing Performance Council (TPC) for evaluating data management systems. MaxCompute uses the official TPC-DS tool to generate datasets stored in different schemas in the BIGDATA_PUBLIC_DATASET project.

Dataset size

Project name

Schema name

10 GB

BIGDATA_PUBLIC_DATASET

TPCDS_10G

100 GB

BIGDATA_PUBLIC_DATASET

TPCDS_100G

1 TB

BIGDATA_PUBLIC_DATASET

TPCDS_1T

10 TB

BIGDATA_PUBLIC_DATASET

TPCDS_10T

Each schema contains the following tables:

call_center, catalog_page, catalog_returns, catalog_sales, customer, customer_address, customer_demographics, date_dim, household_demographics, income_band, inventory, item, promotion, reason, ship_mode, store, store_returns, store_sales, tab_reducenum, tab_reducenum_100, time_dim, warehouse, web_page, web_returns, web_sales, web_site

Note

Table schemas and content are documented in TPC Benchmark DS (TPC-DS) v2.5.0.

Available regions

Region

Region ID

China (Hangzhou)

cn-hangzhou

China (Shanghai)

cn-shanghai

China (Beijing)

cn-beijing

China (Zhangjiakou)

cn-zhangjiakou

China (Ulanqab)

cn-wulanchabu

China (Shenzhen)

cn-shenzhen

China (Chengdu)

cn-chengdu

China (Hong Kong)

cn-hongkong

Singapore

ap-southeast-1

Japan (Tokyo)

ap-northeast-1

Malaysia (Kuala Lumpur)

ap-southeast-3

Indonesia (Jakarta)

ap-southeast-5

US (Silicon Valley)

us-west-1

US (Virginia)

us-east-1

UK (London)

eu-west-1

Germany (Frankfurt)

eu-central-1

UAE (Dubai)

me-east-1

China (Shanghai) Finance Cloud

cn-shanghai-finance-1

China (Beijing) Finance Cloud (Invitational Preview)

cn-beijing-finance-1

South China 1 Finance Cloud

cn-shenzhen-finance-1

China (Beijing) Alibaba Gov Cloud 1

cn-north-2-gov-1

Prerequisites

Before you begin, make sure that you have:

Query the data

TPC-DS tables require cross-project access because you are not a member of the BIGDATA_PUBLIC_DATASET project. Use the full project.schema.table path in queries.

Supported tools

Note

The Data Map feature in DataWorks cannot discover tables in this public dataset because the data requires cross-project access.

Required session flags

The TPC-DS dataset uses schemas and data types such as DECIMAL and INT. Set these flags before running queries:

-- Enable session-level schema syntax.
SET odps.namespace.schema=true;

-- Enable data type compatibility.
SET odps.sql.hive.compatible=true;
SET odps.sql.type.system.odps2=true;
SET odps.sql.decimal.odps2=true;

-- Allow ORDER BY without LIMIT clause.
-- New projects use this setting by default. Existing projects may need it set explicitly
-- to avoid errors or suboptimal join order for the Q72 query.
SET odps.sql.validate.orderby.limit=false;

-- Allow Cartesian products (required for Q77).
SET odps.sql.allow.cartesian=true;
Note

If tenant-level schema syntax is not enabled, the public dataset does not appear in DataWorks Data Analysis. SQL queries still work.

Query example

The following query retrieves 100 rows from the store_sales table in the 10 GB dataset. To query other datasets, replace the schema name (for example, tpcds_100g).

-- Enable session-level schema syntax.
SET odps.namespace.schema=true;

-- Query the tpcds_10g dataset. Replace the schema name to query other datasets.
SELECT * FROM bigdata_public_dataset.tpcds_10g.store_sales limit 100;

Sample output:

+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
| ss_sold_date_sk | ss_sold_time_sk | ss_item_sk | ss_customer_sk | ss_cdemo_sk | ss_hdemo_sk | ss_addr_sk | ss_store_sk | ss_promo_sk | ss_ticket_number | ss_quantity | ss_wholesale_cost | ss_list_price | ss_sales_price | ss_ext_discount_amt | ss_ext_sales_price | ss_ext_wholesale_cost | ss_ext_list_price | ss_ext_tax | ss_coupon_amt | ss_net_paid | ss_net_paid_inc_tax | ss_net_profit |
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
| NULL            | NULL            | 39073      | NULL           | 1420876     | 1738        | 56600      | NULL        | NULL        | 41171            | 90          | 53.3              | NULL          | 72.87          | 0                   | NULL               | 4797                  | 7626.6            | 459.08     | 0             | NULL        | NULL                | NULL          |
| NULL            | NULL            | 22434      | 98163          | NULL        | NULL        | NULL       | 1           | NULL        | 8909             | NULL        | 15.22             | NULL          | 9.2            | NULL                | 690                | NULL                  | 1380.75           | NULL       | NULL          | NULL        | NULL                | -451.5        |
| NULL            | NULL            | 82219      | NULL           | NULL        | 1572        | 209531     | 38          | 285         | 14907            | 48          | 84.64             | 132.03        | NULL           | 0                   | NULL               | NULL                  | NULL              | 51.96      | 0             | NULL        | 2650.2              | -1464.48      |
| NULL            | NULL            | 97573      | 214533         | 1298744     | NULL        | NULL       | NULL        | 77          | 26167            | NULL        | 92.55             | 143.45        | 91.8           | 0                   | 8353.8             | NULL                  | NULL              | NULL       | 0             | NULL        | NULL                | -68.25        |
| NULL            | NULL            | 60120      | 376494         | NULL        | 1678        | 13917      | NULL        | NULL        | 35953            | 9           | 46.97             | NULL          | NULL           | NULL                | NULL               | NULL                  | 714.33            | NULL       | NULL          | NULL        | NULL                | 34.38         |
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+

... (100 rows returned)

Sample query files

MaxCompute provides sample query files for each dataset size, each containing 99 queries of varying complexity and data scan volume.

Note

Select queries carefully to avoid high computing costs, especially for larger datasets.

Dataset size

Query file

10 GB

MaxCompute-TPCDS_10G-99-query

100 GB

MaxCompute-TPCDS_100G-99-query

1 TB

MaxCompute-TPCDS_1T-99-query

10 TB

MaxCompute-TPCDS_10T-99-query

Use the official TPC-DS documentation to generate query variants with the benchmark suite.

Billing

Storage of this public dataset is free. Running queries incurs computing charges. Subscription computing pricing or Pay-as-you-go computing pricing.

Disclaimer

  • The TPC-DS data generation and analysis are based on the TPC-DS benchmark. Results cannot be compared with officially published TPC-DS benchmark results because the test environment does not meet all benchmark requirements.

  • This TPC-DS dataset is for product testing and evaluation only. The data is not updated regularly and must not be used in a production environment.

  • The TPC-DS data originates from TPC. You can also generate TPC-DS data independently using the official TPC-DS documentation.