MaxCompute provides public TPC-DS datasets in four sizes (10 GB, 100 GB, 1 TB, and 10 TB) for product testing.
Dataset and tables
TPC-DS is a standard benchmark from the Transaction Processing Performance Council (TPC) for evaluating data management systems. MaxCompute uses the official TPC-DS tool to generate datasets stored in different schemas in the BIGDATA_PUBLIC_DATASET project.
|
Dataset size |
Project name |
Schema name |
|
10 GB |
BIGDATA_PUBLIC_DATASET |
TPCDS_10G |
|
100 GB |
BIGDATA_PUBLIC_DATASET |
TPCDS_100G |
|
1 TB |
BIGDATA_PUBLIC_DATASET |
TPCDS_1T |
|
10 TB |
BIGDATA_PUBLIC_DATASET |
TPCDS_10T |
Each schema contains the following tables:
call_center, catalog_page, catalog_returns, catalog_sales, customer, customer_address, customer_demographics, date_dim, household_demographics, income_band, inventory, item, promotion, reason, ship_mode, store, store_returns, store_sales, tab_reducenum, tab_reducenum_100, time_dim, warehouse, web_page, web_returns, web_sales, web_site
Table schemas and content are documented in TPC Benchmark DS (TPC-DS) v2.5.0.
Available regions
|
Region |
Region ID |
|
China (Hangzhou) |
cn-hangzhou |
|
China (Shanghai) |
cn-shanghai |
|
China (Beijing) |
cn-beijing |
|
China (Zhangjiakou) |
cn-zhangjiakou |
|
China (Ulanqab) |
cn-wulanchabu |
|
China (Shenzhen) |
cn-shenzhen |
|
China (Chengdu) |
cn-chengdu |
|
China (Hong Kong) |
cn-hongkong |
|
Singapore |
ap-southeast-1 |
|
Japan (Tokyo) |
ap-northeast-1 |
|
Malaysia (Kuala Lumpur) |
ap-southeast-3 |
|
Indonesia (Jakarta) |
ap-southeast-5 |
|
US (Silicon Valley) |
us-west-1 |
|
US (Virginia) |
us-east-1 |
|
UK (London) |
eu-west-1 |
|
Germany (Frankfurt) |
eu-central-1 |
|
UAE (Dubai) |
me-east-1 |
|
China (Shanghai) Finance Cloud |
cn-shanghai-finance-1 |
|
China (Beijing) Finance Cloud (Invitational Preview) |
cn-beijing-finance-1 |
|
South China 1 Finance Cloud |
cn-shenzhen-finance-1 |
|
China (Beijing) Alibaba Gov Cloud 1 |
cn-north-2-gov-1 |
Prerequisites
Before you begin, make sure that you have:
-
A MaxCompute project. Create a MaxCompute project.
Query the data
TPC-DS tables require cross-project access because you are not a member of the BIGDATA_PUBLIC_DATASET project. Use the full project.schema.table path in queries.
Supported tools
The Data Map feature in DataWorks cannot discover tables in this public dataset because the data requires cross-project access.
Required session flags
The TPC-DS dataset uses schemas and data types such as DECIMAL and INT. Set these flags before running queries:
-- Enable session-level schema syntax.
SET odps.namespace.schema=true;
-- Enable data type compatibility.
SET odps.sql.hive.compatible=true;
SET odps.sql.type.system.odps2=true;
SET odps.sql.decimal.odps2=true;
-- Allow ORDER BY without LIMIT clause.
-- New projects use this setting by default. Existing projects may need it set explicitly
-- to avoid errors or suboptimal join order for the Q72 query.
SET odps.sql.validate.orderby.limit=false;
-- Allow Cartesian products (required for Q77).
SET odps.sql.allow.cartesian=true;
If tenant-level schema syntax is not enabled, the public dataset does not appear in DataWorks Data Analysis. SQL queries still work.
Query example
The following query retrieves 100 rows from the store_sales table in the 10 GB dataset. To query other datasets, replace the schema name (for example, tpcds_100g).
-- Enable session-level schema syntax.
SET odps.namespace.schema=true;
-- Query the tpcds_10g dataset. Replace the schema name to query other datasets.
SELECT * FROM bigdata_public_dataset.tpcds_10g.store_sales limit 100;
Sample output:
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
| ss_sold_date_sk | ss_sold_time_sk | ss_item_sk | ss_customer_sk | ss_cdemo_sk | ss_hdemo_sk | ss_addr_sk | ss_store_sk | ss_promo_sk | ss_ticket_number | ss_quantity | ss_wholesale_cost | ss_list_price | ss_sales_price | ss_ext_discount_amt | ss_ext_sales_price | ss_ext_wholesale_cost | ss_ext_list_price | ss_ext_tax | ss_coupon_amt | ss_net_paid | ss_net_paid_inc_tax | ss_net_profit |
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
| NULL | NULL | 39073 | NULL | 1420876 | 1738 | 56600 | NULL | NULL | 41171 | 90 | 53.3 | NULL | 72.87 | 0 | NULL | 4797 | 7626.6 | 459.08 | 0 | NULL | NULL | NULL |
| NULL | NULL | 22434 | 98163 | NULL | NULL | NULL | 1 | NULL | 8909 | NULL | 15.22 | NULL | 9.2 | NULL | 690 | NULL | 1380.75 | NULL | NULL | NULL | NULL | -451.5 |
| NULL | NULL | 82219 | NULL | NULL | 1572 | 209531 | 38 | 285 | 14907 | 48 | 84.64 | 132.03 | NULL | 0 | NULL | NULL | NULL | 51.96 | 0 | NULL | 2650.2 | -1464.48 |
| NULL | NULL | 97573 | 214533 | 1298744 | NULL | NULL | NULL | 77 | 26167 | NULL | 92.55 | 143.45 | 91.8 | 0 | 8353.8 | NULL | NULL | NULL | 0 | NULL | NULL | -68.25 |
| NULL | NULL | 60120 | 376494 | NULL | 1678 | 13917 | NULL | NULL | 35953 | 9 | 46.97 | NULL | NULL | NULL | NULL | NULL | 714.33 | NULL | NULL | NULL | NULL | 34.38 |
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
... (100 rows returned)
Sample query files
MaxCompute provides sample query files for each dataset size, each containing 99 queries of varying complexity and data scan volume.
Select queries carefully to avoid high computing costs, especially for larger datasets.
|
Dataset size |
Query file |
|
10 GB |
|
|
100 GB |
|
|
1 TB |
|
|
10 TB |
Use the official TPC-DS documentation to generate query variants with the benchmark suite.
Billing
Storage of this public dataset is free. Running queries incurs computing charges. Subscription computing pricing or Pay-as-you-go computing pricing.
Disclaimer
-
The TPC-DS data generation and analysis are based on the TPC-DS benchmark. Results cannot be compared with officially published TPC-DS benchmark results because the test environment does not meet all benchmark requirements.
-
This TPC-DS dataset is for product testing and evaluation only. The data is not updated regularly and must not be used in a production environment.
-
The TPC-DS data originates from TPC. You can also generate TPC-DS data independently using the official TPC-DS documentation.