You can optimize data warehouse performance by optimizing tables and data distribution.
For more information about best practices for table design, see Table design best practices.
Hash clustering
Hash clustering tables provide bucket pruning, aggregation, and storage optimizations. When you create a table, use the clustered by clause to specify a Hash Key. MaxCompute then hashes the specified columns and distributes the data into buckets based on the hash values. For the hash key, choose columns that have few repeat key values. For more information about how to use hash clustering tables, see Table operations.
To convert a table into a hash clustering table:
ALTER TABLE table_name [CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] [, col_name [ASC | DESC] ...])] INTO number_of_buckets BUCKETS]The ALTER TABLE statement applies to existing tables. After you add a new clustering property, new partitions are stored using hash clustering.
After you create a hash clustering table, use the INSERT OVERWRITE statement to convert the source table into a hash clustering table.
Hash clustering tables have the following limitations:
The
INSERT INTOstatement is not supported. To add data, use theINSERT OVERWRITEstatement.You cannot use the tunnel upload command to import data directly into a range cluster table because data uploaded using Tunnel is not sorted.
Other table optimization techniques
In addition to the table design specifications, you can use the following techniques to optimize your tables:
Use intermediate tables for tables that have a large data volume and many downstream tasks.
Split tables. If individual fields are generated very slowly, split those fields into separate tables.
Merge tables. As your data warehouse evolves, merge the tasks and data for tables with overlapping or duplicate business logic.
Zipper table: You can leverage zipper tables to reduce your storage consumption. For more information, see zipper storage.
Take advantage of the special features of MaxCompute tables. For more information, see Advanced features of MaxCompute tables.