Shared sample datasets enable you to quickly validate data processing performance, optimize query efficiency, or verify features. They are ideal resources for development, testing, and learning.
Create a dataset
Log on to the Data Lake Formation console.
-
In the left navigation pane, click Data catalog.
-
Click , locate the data share named dlf_samples, and click Create Catalog.
The catalog created from a received share is read-only.
-
Click the Catalogs tab to view the newly created catalog.
In the Catalogs list, the status of the new catalog appears as Running.
Query the dataset
The shared catalog includes multiple sizes of the TPC-DS standard sample database, suitable for data testing, analysis, and baseline performance evaluation at different scales. The available datasets include the following:
|
Sample database name |
Sample data description |
|
tpcds_paimon_sf1 |
TPC-DS 1 GB Paimon table |
|
tpcds_paimon_sf2 |
TPC-DS 2 GB Paimon table |
|
tpcds_paimon_sf10 |
TPC-DS 10 GB Paimon table |
|
tpcds_paimon_sf100 |
TPC-DS 100 GB Paimon table |
|
tpcds_iceberg_sf1 |
TPC-DS 1 GB Iceberg table |
You can associate this catalog with other platforms such as EMR and Flink to query data. For more information, see Engine integration.