Table design best practices

更新时间:
复制 MD 格式

Tablestore automatically splits data into multiple partitions based on partition key ranges. Poor partition key choices lead to hotspots, data skew, or scaling bottlenecks. Design high-performance, scalable data tables with best practices for primary key design, attribute column design, and table splitting.

Primary key design

Tablestore automatically splits data into multiple partitions based on the range of the partition key (the first primary key column) and distributes them across service nodes. Design primary keys to distribute data and access load evenly and avoid hotspots.

Partition key design principles

The partition key is the first primary key column in a data table. Tablestore automatically splits data into multiple partitions based on the partition key range and schedules each partition to a different service node. Follow these principles when you design a partition key:

  • Keep the data volume under a single partition key value within 10 GB.

  • Keep data under different partition key values logically independent.

  • Avoid concentrating access load on a small range of consecutive partition key values.

Note

If workloads are low (TPS/QPS below 1,000, data volume within 10 GB, and not expected to grow significantly), hotspot impact is minimal and a single partition can handle the load. However, do not design your architecture to rely on single-partition capacity.

Data distribution

In distributed data systems, uneven data distribution causes the following problems:

  • Read and write throughput is limited by a single partition, creating a clear bottleneck.

  • Uneven hotspot distribution causes a long-tail effect that slows down overall processing.

  • Hotspot partitions become the bottleneck for the entire business pipeline and affect upstream and downstream systems.

When data is evenly distributed across partitions, read and write load spreads across all partitions. Each request covers only local data, and you can scale horizontally by adding more resources.

Common hotspot problems and solutions

Consider a monitoring scenario where a data table stores metric values for each machine at each point in time, with the following primary key design:

Timestamp (partition key)

MachineIp

Problem

1718000001

10.10.0.1

Every write appends to the end of the table (the last partition), creating a tail hotspot.

1718000002

10.10.0.2

1718000003

10.10.0.1

Because data is partitioned by partition key range, all writes concentrate on the last partition, creating a tail hotspot that cannot be resolved by partition splitting.

Approach 1: Reorder the partition key

Move MachineIp to the first primary key column and Timestamp to the second. The adjusted primary key becomes MachineIp = "10.10.0.1", Timestamp = 1718000001. Write load then spreads across partitions by machine. Other naturally distributed fields such as UserId, DeviceId, and OrderId work equally well as partition keys.

Note

If writes from a specific IP range concentrate on a single partition, Tablestore automatically splits that partition to distribute the load across multiple partitions.

Approach 2: Prepend an MD5 prefix

Compute the MD5 hash of MachineIp and prepend the first 4 hexadecimal characters to the IP as the partition key. For example, the first 4 MD5 characters of 10.10.0.1 are a1b2, so the partition key becomes a1b2,10.10.0.1. This approach breaks the sequential ordering of IP ranges. The hexadecimal prefix also helps the system perform pre-splitting.

Approach 3: Alternatives for global ordering requirements

If your business requires globally ordered queries by time but you want to avoid tail hotspots, consider the following alternatives:

  • Local ordering: Place a distributed field in the first column. Data with the same distributed field value remains ordered by time. For example, data under a specific MachineIp stays ordered by Timestamp.

  • Bucket-based writes: Use the time value modulo N as the first primary key column (0 to N-1) and time as the second column. For example, with 16 buckets, the partition key for a write is timestamp % 16 = 3. To read, query all 16 buckets in parallel and merge the results. Increasing the number of buckets distributes load more evenly but increases the number of parallel queries during reads.

  • Search indexes: Write data to the data table with even distribution, and use Search index to perform ordered queries by time or other fields. Search indexes automatically distribute data across multiple shards and merge results during queries.

Primary key column length and data volume constraints

  • Keep the data volume under a single partition key value within 10 GB (no hard limit). Rows with the same partition key value cannot be split further.

  • The primary key column length limit is 1 KB. Shorter primary keys improve query performance.

Concatenate columns into a partition key

If the total data volume for all rows under a single partition key value may exceed 10 GB, concatenate multiple primary key columns into a new partition key to distribute the data. Follow these rules when concatenating:

  • The concatenated partition key must effectively split records that originally shared the same partition key value into different partition key values.

  • Pad integer primary key columns with leading zeros to maintain sort consistency. For example, OrderNumber = 123 becomes 00000123 after padding, which keeps lexicographic order consistent with numeric order.

  • Choose a delimiter character with an ASCII value lower than all characters in the data to preserve the lexicographic order of the new partition key. The specific choice depends on the character set of your business data. For example, if SellerID values contain digits and letters (such as a100 and a1001), different delimiters affect sorting as follows:

    Delimiter

    Sort order after concatenation

    Reason

    Result

    :

    000054:a1001 sorts before 000054:a100:

    The ASCII value of : is greater than that of digits. At the 5th character, a1001 compares 1 < :

    Incorrect sort

    ,

    000054,a100, sorts before 000054,a1001

    The ASCII value of , is lower than all digits and letters, so lexicographic order matches the original data order

    Correct sort

Add a hash prefix to the partition key

If you must use a sequentially increasing column as the partition key, prepend a hash prefix to randomize the distribution of adjacent data in the table and distribute access load evenly.

Note

After you add a hash prefix, previously contiguous data is scattered. You can no longer use range read operations to read logically contiguous data. Use search indexes as an alternative for range queries.

Attribute column design

  • Control row width: Tablestore supports wide rows (up to hundreds of thousands of attribute columns), but reading an extremely wide row in one request may time out. Keep attribute columns under ten thousand per row. Use column name specification or pagination to read wide rows.

  • Attribute column size limit: A single attribute column value cannot exceed 2 MB. Split data that exceeds this limit across multiple columns, or store it in Object Storage Service (OSS).

  • Split tables by access frequency: When a row has many attribute columns with significantly different access frequencies, store high-frequency and low-frequency columns in separate tables. For example, in a product management system, product quantity and price are accessed frequently while product descriptions (large text) are accessed infrequently. Split them into two tables.

  • Compress large text: Compress large attribute column text and store it as binary to save storage space and reduce read/write overhead.

  • Use the Long type for monetary values: Tablestore does not support the BigDecimal type. For fields that require precise calculations such as monetary amounts, use the Long type to store the smallest unit value (for example, store 5.32 USD as 532 cents) to avoid precision loss with the Double type.

Table splitting and capacity planning

Table splitting

Keep the data volume of a single search index within 20 billion rows. If the data volume exceeds 20 billion rows, contact Tablestore technical support for table splitting evaluation and design.

For example, if the largest log table currently has 6.1 billion rows and grows by 2.1 billion rows per year, it will not exceed 20 billion rows within 3 to 5 years and does not require splitting. Plan a table splitting strategy in advance when existing data is large and growing rapidly.

Hot/cold data separation

Data access patterns are typically time-sensitive: recent data is accessed frequently, while older data gradually becomes cold. Storing hot and cold data in the same table causes uneven access load and underutilized reserved read/write throughput.

Store hot and cold data in separate tables with different reserved read/write throughput. Build search indexes on the hot data table for multi-condition queries, and use secondary indexes with fixed query patterns on the cold data table to reduce costs. Search indexes support data TTL (time-to-live) to automatically separate hot and cold data.

Bulk data import

When you import data into Tablestore in bulk, writing in primary key order concentrates write load on a single partition and slows down the import. Take the following measures:

  • Split the dataset into smaller subsets and use multiple worker threads to randomly select and import subsets in parallel.

  • Contact Tablestore technical support for pre-splitting before you write data, so that data spreads across multiple partitions from the start.

  • Use TableStoreWriter for high-concurrency asynchronous writes, which automatically handles bucket distribution and batch submission.

Design case study: Student card transaction records

The following example uses a university student card transaction system to illustrate the table design decision process. The primary key columns of the data table include CardID (student card ID), SellerID (merchant ID), DeviceID (terminal ID), and OrderNumber. The business rules are as follows:

  • Each student card maps to one CardID, and each merchant maps to one SellerID.

  • Each terminal maps to a globally unique DeviceID.

  • OrderNumber is unique within a single terminal and increases over time, but is not globally unique.

  • Transaction records are written in real time.

Primary key and partition key selection

First, determine the partition key. Compare the effects of different partitioning approaches based on the characteristics of each primary key column:

Partitioning approach

Analysis

CardID as partition key

Each student makes a limited number of transactions per day. Write load spreads across tens of thousands of cards, providing good data distribution. Recommended.

SellerID as partition key

The number of merchants is small, and most transactions concentrate on a few merchants, which creates hotspots easily. Not recommended.

DeviceID as partition key

Terminal devices are globally unique, providing good distribution. However, a few high-traffic terminals (such as card readers in popular cafeterias) may still create localized hotspots.

OrderNumber as partition key

OrderNumber increases over time within a terminal. Using it directly as the partition key causes a tail hotspot. A hash prefix is required.

Design conclusion

Based on the analysis, CardID provides the best data distribution (tens of thousands of cards, a few transactions per student per day) and is the recommended partition key. The primary key order is CardID, DeviceID, SellerID, OrderNumber. For most business scenarios, selecting the primary key and partition key is sufficient.

Advanced optimization

Basic partition key selection may not meet performance requirements. The following are two common advanced optimization approaches.

Scenario 1: Concatenate multiple columns into a partition key when a single partition key has too much data

If the business requires queries by DeviceID and you select it as the partition key, the data volume under a single DeviceID may exceed 10 GB. In this case, concatenate DeviceID, SellerID, and CardID into a new partition key to split records from the same terminal by merchant and student card.

Before concatenation

DeviceID (partition key)

SellerID

CardID

OrderNumber

Problem

54

10

1001

000001

All records for DeviceID=54 are in the same partition. Over time, the accumulated data volume may exceed 10 GB.

54

20

1002

000002

78

20

1001

000003

After concatenation

DeviceID,SellerID,CardID (new partition key)

OrderNumber

Effect

54,10,1001

000001

Transaction records from the same terminal are split into different partition key values by merchant and student card.

54,20,1002

000002

78,20,1001

000003

Scenario 2: Add a hash prefix when partition key values increase sequentially

If the business requires queries by OrderNumber and you select it as the partition key, the sequential increase of OrderNumber within a terminal causes a tail hotspot. Compute an MD5 hash prefix for OrderNumber to distribute data evenly:

Original OrderNumber

First 4 MD5 characters

HashOrderNumber (partition key)

000001

e2c8

e2c8,000001

000002

3f79

3f79,000002

000003

a5b1

a5b1,000003

After adding the hash prefix, you can compute the hash prefix for any OrderNumber with the same algorithm to get its HashOrderNumber. However, the hash prefix scatters the original order, and you can no longer use range read operations to read logically contiguous records.

Hot/cold data separation

Student card transaction records have a clear time-sensitive pattern: recent records are queried frequently, while records of graduated students are rarely accessed. Plan hot/cold data separation as follows:

  • Store recent active data (such as the past year) and historical data in separate tables with different reserved read/write throughput.

  • Configure higher reserved throughput for the active table and lower throughput for the historical table to reduce storage costs.

  • Build search indexes on the active table for multi-dimensional queries (such as by merchant or time range). The historical table only needs primary key queries.

  • Periodically migrate data that exceeds the retention period from the active table to the historical table.