The primary key hash discretization feature in Lindorm uses a hash function to distribute data across different shards. This enables distributed storage and queries and prevents issues such as data skew and uneven loads. This topic describes how to use the primary key hash discretization feature.
Background information
LindormTable is a distributed storage engine that uses range partitioning. Typically, data written to LindormTable is evenly distributed across different shards.
In practice, uneven traffic distribution can cause problems. For example, some data might be accessed much more frequently than other data. This can lead to uneven data distribution across shards and create hot spots. These hot spots prevent you from taking full advantage of the distributed system. You can resolve this data distribution issue by adjusting the data access method. However, if the issue is caused by primary keys that cannot be easily discretized, you can use the primary key hash discretization feature. Examples of such keys include auto-incrementing primary keys or time-based columns in the HH:mm:ss format.
The primary key hash discretization feature evenly distributes this data across different shards. This improves system scalability. It also supports distributed queries on top of range partitions, which improves query efficiency.
If your business scenario matches these cases, you can apply a hash algorithm to a specified primary key column when you create a table or a secondary index to discretize the data.
Prerequisites
LindormTable must be version 2.5.3 or later. For more information about how to view or upgrade the current version, see LindormTable version guide and Perform a minor version update.
We recommend that you use the latest version of LindormTable.
Usage notes
When you apply a hash algorithm to a specified column in a primary key or an index, the hash function expression must be placed first. For example,
PRIMARY KEY(p1, hash32(p1), p2)is incorrect. The correct usage isPRIMARY KEY(hash32(p1), p1, p2).When you apply a hash algorithm to a column in a primary key or an index, you must also specify that column as a primary key column or an index column.
Primary key columns with a specified hash algorithm cannot be modified.
After you enable the primary key hash discretization feature, you cannot use the bulk load method to import data.
DDL
Create a table
When you create a table, you can set the primary key columns in the `PRIMARY KEY` clause and apply a hash algorithm to the specified columns. The 8-bit, 32-bit, and 64-bit hash algorithms are supported. Based on the characteristics of your primary keys, you can apply different hash algorithms to one or more primary key columns.
The more bits the hash algorithm uses, the more memory the underlying storage layer consumes, and the higher the storage cost. For example, if you use a 32-bit hash algorithm, the underlying storage layer consumes an additional 4 bytes for each key-value pair.
Examples:
-- Apply a hash algorithm to one primary key column
CREATE TABLE t1 (
p1 bigint,
p2 integer,
c1 integer,
c2 varchar,
PRIMARY KEY(hash32(p1), p1, p2)
);
-- Apply a hash algorithm to multiple primary key columns
CREATE TABLE t2 (
p1 bigint,
p2 integer,
c1 integer,
c2 varchar,
PRIMARY KEY(hash8(p1, p2), p1, p2)
);hash32(p1) indicates that a 32-bit hash algorithm is applied to the p1 column. hash8(p1, p2) indicates that an 8-bit hash algorithm is applied to both the p1 and p2 columns.
Create a secondary index
The primary key hash discretization feature is supported for index tables.
If you apply a hash algorithm to a column when you create a primary table but do not apply a hash algorithm to that column when you create a secondary index, the data in the index table is not discretized by default.
CREATE INDEX idx ON t1 (hash64(c1, c2), c1, c2);View hash column details
You can view the table schema to see details about the hash columns.
DESCRIBE table t1;Result:
+--------------+------------+-------------+---------+----------------+------------+
| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | TYPE | IS_PRIMARY_KEY | SORT_ORDER |
+--------------+------------+-------------+---------+----------------+------------+
| test1 | t1 | hash32[p1] | INT | true | ASC |
| test1 | t1 | p1 | BIGINT | true | ASC |
| test1 | t1 | p2 | INT | true | ASC |
| test1 | t1 | c1 | INT | false | none |
| test1 | t1 | c2 | VARCHAR | false | none |
+--------------+------------+-------------+---------+----------------+------------+DML
Write data
When you write data, you do not need to add hash-related parameters to the SQL statement. The system automatically generates and populates the hash value based on the data that is written.
UPSERT INTO t1(p1, p2, c1, c2) VALUES(1, 1, 1, 'a');Query data
When you query data, you do not need to add hash-related parameters to the SQL statement. The system automatically calculates the hash value based on the query conditions. However, note the following points:
You must specify values for all primary key columns that use a hash algorithm. If a column value is missing, the system cannot calculate the hash value. This prevents the system from locating the correct data shard and results in a full table scan.
The following examples use table t1:
-- Recommended SELECT * FROM t1 WHERE p1=1 AND p2=1; -- The system queries the shard where hash(p1)=hash32(1). -- Not recommended SELECT * FROM t1 WHERE p2=1; -- The value of the primary key column p1 is not specified. The system cannot locate the shard that corresponds to the hash value and performs a full table scan.For primary key columns that use a hash algorithm, the query condition must be an equality query, such as
p1=2. Range queries on hash columns, such asp1>1, are not supported.The following examples use table t1:
-- Correct SELECT * FROM t1 WHERE p1=2 AND p2>1; -- Incorrect SELECT * FROM t1 WHERE p2=1 AND p1>2 AND p1<8; -- This query is not supported because a range query is set for the primary key column p1.
For more information about how to view the execution details of a SELECT statement, see EXPLAIN.
FAQ
Why does the query result display a hash column with a null value?
This issue is fixed in LindormTable 2.6.0 and the SQL engine 2.6.2.1. You can perform a minor version update to version 2.6.0 or later.
A minor version update may restart the instance. We recommend that you perform this operation during off-peak hours.