OLAP resource groups support various query patterns for wide table data, including primary key lookups, non-primary key filtering, analytics, full-text search, and vector retrieval. They can also leverage various indexes in the wide table engine to automatically accelerate queries.
Prerequisites
-
You have activated Lindorm (Activate Service) and enabled an OLAP resource group.
-
You have created a Lindorm instance (Create Instance) and it contains wide table data.
Manage wide tables
OLAP resource groups support full lifecycle management for databases and tables in the wide table engine.
Create a database
CREATE DATABASE <db_name>;
Example:
CREATE DATABASE mydb;
Drop a database
DROP DATABASE <db_name>;
Example:
DROP DATABASE mydb;
Create a wide table
For detailed SQL syntax, see CREATE TABLE. OLAP resource groups support a subset of common data types.
CREATE TABLE <table_name> (
<col_name> <data_type> [NOT NULL],
...
PRIMARY KEY(<pk_col1> [, <pk_col2>, ...])
);
-
Primary key columns must be declared as
NOT NULL. A primary key consists of one or more columns that uniquely identify a row. -
OLAP resource groups support the following data types:
|
Type |
Description |
|
|
Boolean value |
|
|
8-bit integer |
|
|
16-bit integer |
|
|
32-bit integer |
|
|
64-bit integer |
|
|
Single-precision floating-point number |
|
|
Double-precision floating-point number |
|
|
Variable-length string |
|
|
Binary data |
Example:
USE mydb;
-- User event log table (composite primary key)
CREATE TABLE user_event (
user_id VARCHAR NOT NULL,
event_id BIGINT NOT NULL,
event_type TINYINT,
amount DOUBLE,
region VARCHAR,
PRIMARY KEY(user_id, event_id)
);
-- Orders table (single-column primary key)
CREATE TABLE orders (
order_id VARCHAR NOT NULL,
user_id VARCHAR,
status TINYINT,
total DOUBLE,
PRIMARY KEY(order_id)
);
Drop a wide table
DROP TABLE <table_name>;
Example:
DROP TABLE orders;
Write data
OLAP resource groups support writing data to the wide table engine using the INSERT INTO statement. The wide table engine does not support UPDATE or DELETE operations. To update data, you must use the native APIs of the wide table engine, such as the HBase or Phoenix APIs.
Write a single row
INSERT INTO <table_name> VALUES (<val1>, <val2>, ...);
Write multiple rows in a batch
INSERT INTO <table_name> VALUES
(<val1>, <val2>, ...),
(<val1>, <val2>, ...),
...;
Example:
USE mydb;
-- Write a single row
INSERT INTO orders VALUES ('u001', 'o10001', 100.0, 'SHIPPED', 'Beijing', '2024-01-01');
-- Write multiple rows in a batch
INSERT INTO orders VALUES
('u001', 'o10002', 200.0, 'SHIPPED', 'Shanghai', '2024-01-02'),
('u002', 'o20001', 50.0, 'PENDING', 'Beijing', '2024-01-03'),
('u002', 'o20002', 300.0, 'DELIVERED', 'Shenzhen', '2024-01-04');
Manage indexes
The wide table engine supports multiple index types. You can create and manage these indexes with SQL through an OLAP resource group to accelerate queries.
Columnar index
A columnar index synchronizes wide table data to columnar storage for efficient scanning by the OLAP resource group. It is suited for large-scale aggregation and analytics scenarios.
CREATE INDEX <index_name>
USING COLUMNAR
ON <table_name>(*)
PARTITION BY ENUMERABLE (
[<pk_col> [, ...],]
bucket(<bucket_num>, <pk_col>)
)
WITH (
`lindorm_columnar.user.index.database` = '<db_name>',
`lindorm_columnar.user.index.table` = '<table_name>'
);
Syntax description:
|
Parameter |
Description |
|
|
The name of the index. |
|
|
The name of the target wide table. |
|
|
Specifies the partitioning strategy for the index data. It consists of standard partition expressions and bucket partition expressions. All columns in these expressions must be primary key columns of the wide table. |
|
Standard partition expression |
Optional. Specifies zero or more low-cardinality primary key columns, such as dates or cities, separated by commas. Index data is built for each partition value, which allows queries to use partition predicates to efficiently locate data. Avoid using high-cardinality columns, such as user IDs, as standard partition keys to prevent creating a large number of small partitions. |
|
|
Required. You must specify at least one bucket expression. |
|
|
The destination database for the columnar index data. |
|
|
The destination table for the columnar index data. |
Note The standard partition and bucket partition expressions together determine the number of index partitions. We recommend that you keep the data size of each partition between 50 MB and 512 MB. For example, if a wide table ingests 50 GB of data per day and its primary key is(id, dt), you can configure the partition asPARTITION BY ENUMERABLE (dt, bucket(200, id)).
Example:
-- Create a columnar index for the 'test' table.
-- The wide table primary key is (p2, createtime).
-- The index uses 'createtime' for standard partitioning and hash-buckets 'p2' into 128 buckets.
CREATE INDEX columnar_idx
USING COLUMNAR
ON test(*)
PARTITION BY ENUMERABLE (
createtime,
bucket(128, p2)
)
WITH (
`lindorm_columnar.user.index.database` = 'mydb',
`lindorm_columnar.user.index.table` = 'test'
);
Secondary index
A secondary index builds an independent index structure on specified columns to accelerate point lookups and range queries on non-primary key columns. It supports a covering column to avoid additional table access.
CREATE INDEX <index_name>
ON <table_name>(<col_name> [ASC|DESC] [, ...])
[INCLUDE(<col_name> [, ...])];
Syntax description:
|
Parameter |
Description |
|
|
The name of the index. |
|
|
The name of the target wide table. |
|
|
The indexed columns and their sort order. You can specify multiple columns. Query predicates must match the prefix of the indexed columns. |
|
|
Specifies the covering columns. These additional columns are stored in the index, which allows queries to return their values without table access. |
Example:
-- Create a secondary index on the c1 column of the 'test' table, covering columns c3 and c4 to avoid table access.
CREATE INDEX secondary_idx ON test(c1 DESC) INCLUDE(c3, c4);
Search index
A search index is based on an inverted index and supports full-text search (MATCH) and exact queries on structured fields. The text field type is used for full-text search with tokenization, while the string field type is used for exact matches and supports comparison operators such as = and LIKE.
CREATE INDEX <index_name>
USING SEARCH
ON <table_name>(<col_spec> [, ...]);
Syntax description:
|
Parameter |
Description |
|
|
The name of the index. |
|
|
The name of the target wide table. |
|
|
The index configuration for a column, in the format |
|
|
Use |
|
|
Analyzer, which is valid only when |
|
|
Specifies whether to create a full-text index on the column. If set to |
Example:
-- Create a search index for the 'test' table. All columns support structured queries,
-- and column c3 is also enabled for Chinese tokenization and full-text search.
CREATE INDEX search_idx USING SEARCH ON test('*', c3(type=text, analyzer=ik, indexed=true));
View indexes
SHOW INDEX FROM <table_name>;
Example:
SHOW INDEX FROM test;
Drop an index
DROP INDEX <index_name> ON <table_name>;
Example:
DROP INDEX search_idx ON test;
Query examples
OLAP resource groups integrate the wide table engine with a massively parallel processing (MPP) query engine, supporting primary key lookups, non-primary key filtering, aggregate analytics, full-text search, and vector retrieval. You can build a columnar index, secondary index, or search index on a wide table. During query planning, the OLAP resource group automatically selects the appropriate index to accelerate queries by several to dozens of times, without changing your SQL.
Primary key queries
Primary key queries filter data by primary key columns, supporting exact-match lookups on the full primary key and range scans on a primary key prefix. The wide table engine is natively optimized for these queries, enabling millisecond-level lookups.
-- Create a table.
CREATE TABLE orders (
user_id VARCHAR NOT NULL,
order_id VARCHAR NOT NULL,
amount DOUBLE,
status VARCHAR,
region VARCHAR,
create_time VARCHAR,
PRIMARY KEY(user_id, order_id)
);
-- Exact-match lookup on the full primary key.
SELECT * FROM orders WHERE user_id = 'u001' AND order_id = 'o10001';
-- Range scan on a primary key prefix.
SELECT * FROM orders WHERE user_id = 'u001' AND order_id >= 'o10001';
-- Partial primary key scan.
SELECT * FROM orders WHERE user_id >= 'u001';
Non-primary key queries
Non-primary key queries filter data on non-primary key columns and perform a full table scan by default. You can create a secondary index or a search index based on the predicate type. The OLAP resource group automatically uses the index to accelerate the query without any changes to your SQL.
-- Create a table.
CREATE TABLE orders (
user_id VARCHAR NOT NULL,
order_id VARCHAR NOT NULL,
amount DOUBLE,
status VARCHAR,
region VARCHAR,
create_time VARCHAR,
PRIMARY KEY(user_id, order_id)
);
Non-primary key query with a secondary index
A secondary index is suited for structured filtering scenarios such as equality, range, and pattern matching queries.
-- Create a secondary index on the 'status' column.
-- Cover the 'amount' and 'region' columns to avoid table access.
CREATE INDEX idx_status ON orders(status) INCLUDE(amount, region);
-- Equality filter (hits the secondary index).
SELECT user_id, order_id, amount, region FROM orders WHERE status = 'SHIPPED';
-- Range filter (hits the secondary index).
SELECT user_id, order_id, amount FROM orders WHERE status >= 'S' AND status < 'T';
-- Pattern matching (hits the secondary index).
SELECT user_id, order_id, amount, region FROM orders WHERE status LIKE 'SH%';
Non-primary key query with a search index
A search index is suited for exact matches on string type fields, as well as for full-text search (MATCH) on text type fields that use tokenization.
-- Create a search index on the 'region' column (string type, supports equality and LIKE).
CREATE INDEX idx_search USING SEARCH ON orders(region(type=string, indexed=true));
-- Equality filter (hits the search index).
SELECT * FROM orders WHERE region = 'Beijing';
-- LIKE filter (hits the search index, supported for string type).
SELECT * FROM orders WHERE region LIKE 'Bei%';
-- MATCH full-text search (hits the search index).
SELECT * FROM orders WHERE region MATCH 'Beijing Shanghai';
Full-text search
After you create a search index on text fields in a wide table, you can use the MATCH syntax for full-text search on both regular and JSON fields. The OLAP resource group automatically pushes search predicates down to the inverted index for efficient retrieval.
The MATCH expression supports the following matching rules:
|
Syntax |
Description |
|
|
Contains the word. |
|
|
Must contain the word. |
|
|
Must not contain the word. |
|
|
Contains the exact phrase (no tokenization). |
|
|
Must contain at least one of the words in parentheses. |
-- Create a table with text and JSON fields.
CREATE TABLE orders (
user_id VARCHAR NOT NULL,
order_id VARCHAR NOT NULL,
amount DOUBLE,
status VARCHAR,
region VARCHAR,
create_time VARCHAR,
user_json VARCHAR,
PRIMARY KEY(user_id, order_id)
);
-- Create a search index on the 'region' column (string type for exact matching and LIKE).
-- Enable Chinese tokenization and full-text search on the 'user_json' column.
CREATE INDEX search_idx USING SEARCH ON orders(
region(type=string, indexed=true),
user_json(type=text, analyzer=ik, indexed=true)
);
-- Contains 'Beijing' or 'Shanghai'.
SELECT * FROM orders WHERE region MATCH 'Beijing Shanghai';
-- Must contain 'Beijing' and must not contain 'Shanghai'.
SELECT * FROM orders WHERE region MATCH '+Beijing -Shanghai';
-- Contains the exact phrase 'Beijing Chaoyang'.
SELECT * FROM orders WHERE region MATCH '"Beijing Chaoyang"';
-- Must contain 'Beijing', and must also contain either 'Chaoyang' or 'Haidian'.
SELECT * FROM orders WHERE region MATCH '+Beijing +(Chaoyang Haidian)';
-- The JSON field contains 'Beijing' (use MATCH directly on the column; the search index covers the entire column content).
SELECT * FROM orders WHERE user_json MATCH 'Beijing';
Complex analytics
OLAP resource groups support complex analytical operations on wide table data, such as large-scale aggregations, multi-table joins, and window functions. After you create a columnar index for a wide table, analytical queries read from columnar storage, which significantly reduces I/O. The OLAP resource group automatically routes queries to the index without any changes to your SQL.
-- Create a table with 'region' as a primary key prefix for easy partitioning.
CREATE TABLE orders (
region VARCHAR NOT NULL,
user_id VARCHAR NOT NULL,
order_id VARCHAR NOT NULL,
amount DOUBLE,
status VARCHAR,
create_time VARCHAR,
PRIMARY KEY(region, user_id, order_id)
);
-- Create a columnar index, using 'region' for standard partitioning and hash-bucketing 'user_id' (128 buckets).
-- Note: Columns in PARTITION BY ENUMERABLE must be primary key columns.
CREATE INDEX columnar_idx
USING COLUMNAR
ON orders(*)
PARTITION BY ENUMERABLE (
region,
bucket(128, user_id)
)
WITH (
`lindorm_columnar.user.index.database` = 'mydb',
`lindorm_columnar.user.index.table` = 'orders'
);
-- Large-scale aggregate analysis (hits the columnar index).
SELECT
region,
COUNT(*) AS order_cnt,
SUM(amount) AS gmv,
AVG(amount) AS aov
FROM orders
WHERE region IN ('Shanghai', 'Beijing')
GROUP BY region
ORDER BY gmv DESC;
-- UNION query
(SELECT user_id, order_id, amount, status, region FROM orders WHERE region = 'Beijing')
UNION ALL
(SELECT user_id, order_id, amount, status, region FROM orders WHERE status LIKE 'SH%');
Vector retrieval
If a vector index has been created for a wide table, you can use an OLAP resource group to retrieve data based on vector similarity.
Query format
SELECT /*+ _l_force_vector_index_(${PROPERTIES}) */ ${SELECT_LIST}[,_vector_score_(${EMBEDDING_COLUMN}, ${TARGET_VECTOR})]
FROM ${TABLE_NAME}
WHERE ${CONDITION}
ORDER BY ${DISTANCE_FUNCTION}(${EMBEDDING_COLUMN}, ${TARGET_VECTOR})
LIMIT ${K}
Syntax notes
-
Variables:
Parameter
Description
${SELECT_LIST}
The columns to return from the query.
_vector_score_(${EMBEDDING_COLUMN}, ${TARGET_VECTOR})
Returns the similarity score. The
EMBEDDING_COLUMNandTARGET_VECTORparameters must match the parameters used by the distance function.${TABLE_NAME}
The target table.
${CONDITION}
The scalar filter conditions.
${DISTANCE_FUNCTION}
The similarity distance function. Supported functions are
l2_distance,cosine_similarity, andinnerproduct_distance.${EMBEDDING_COLUMN}
The column on which the vector index is built.
${TARGET_VECTOR}
The target vector, expressed as a string representation of a floating-point array.
${PROPERTIES}
Parameters for vector retrieval, such as
k=10andlvector.min_score=0. -
Distance functions:
l2_distance,cosine_similarity, andinnerproduct_distance. -
Hint: Use
/*+ _l_force_vector_index_(...) */to query the vector index and configure vector query parameters. The following parameters are supported:Parameter
Required
Description
k
Yes
The number of most similar results to return. Ensure that the
kparameter is consistent with theLIMITclause.lvector.min_score
No
The similarity threshold. The scores of returned vectors must be greater than this value. The scores of returned vectors are in the range of [0, 1]. The value range is [0, +inf]. The default value is
0.lvector.ef_search
No
Specifies the length of the dynamic list during an index query for the HNSW algorithm. Value range: [1, 1000]. The default value is
100.lvector.nprobe
No
The number of cluster units to probe. Adjust this value to meet your recall requirements. A larger value improves recall but reduces performance. Valid values: [1, method.parameters.nlist]. No default value. Important: This parameter applies only to the IVFPQ algorithm.
lvector.reorder_factor
No
Performs reordering by using the original vectors. The distances calculated by the ivfpq algorithm are quantized, which results in some precision loss. Reordering with the original vectors is used to compensate for this. The number of vectors to reorder is
k * reorder_factor. This process is typically used to improve recall precision but increases performance overhead. Value range: [1, 200]. The default value is10. Important - This parameter applies only to the ivfpq algorithm. - If the value of k is small, set this parameter to5. If k is greater than100, set this parameter to1.lvector.client_refactor
No
Specifies whether to skip re-ranking within each shard and instead re-rank at an upper layer to improve performance. Valid values are
true(yes) andfalse(no, default).lvector.filter_type
No
The hybrid search mode. Valid values: -
pre_filter: Filters structured data and then searches vector data. -post_filter: Searches vector data and then filters structured data.
Example wide table schema
-- The following wide table already exists, and a vector index has been created on the 'item_embedding' column.
+----------------+---------------------+------+-------+---------+---------------+---------+
| Field | Type | Null | Key | Default | Extra | Comment |
+----------------+---------------------+------+-------+---------+---------------+---------+
| user_id | VARCHAR(1073741824) | No | true | NULL | partition key | |
| order_id | VARCHAR(1073741824) | No | true | NULL | partition key | |
| amount | DOUBLE | Yes | false | NULL | | |
| status | VARCHAR(1073741824) | Yes | false | NULL | | |
| region | VARCHAR(1073741824) | Yes | false | NULL | | |
| create_time | VARCHAR(1073741824) | Yes | false | NULL | | |
| item_embedding | VARCHAR(1073741824) | Yes | false | NULL | | |
+----------------+---------------------+------+-------+---------+---------------+---------+
Query data
-- Retrieve the top 10 most similar product orders based on the input feature vector.
SELECT /*+ _l_force_vector_index_(k=10, lvector.min_score=0) */
user_id,
order_id
FROM orders
WHERE region='Beijing'
ORDER BY l2_distance(item_embedding, '[0.077, 0.941, 0.917, 0.347, 0.331, 0.309, 0.127, 0.990]')
LIMIT 10;
-- Retrieve the top 10 most similar product orders based on the input feature vector and return their similarity scores.
SELECT /*+ _l_force_vector_index_(k=10, lvector.min_score=0) */
user_id,
order_id,
_vector_score_(item_embedding, '[0.077, 0.941, 0.917, 0.347, 0.331, 0.309, 0.127, 0.990]')
FROM orders
WHERE region='Beijing'
ORDER BY l2_distance(item_embedding, '[0.077, 0.941, 0.917, 0.347, 0.331, 0.309, 0.127, 0.990]')
LIMIT 10;
Parameters
The OLAP resource group automatically selects the appropriate index to accelerate queries. To force or disable the use of a specific index, add a hint to your query.
|
Index type |
Hint |
Description |
|
|
Forces the query to use the secondary index. |
|
|
|
Disables the secondary index. |
|
|
|
Forces the query to use the search index. |
|
|
|
Disables the search index. |
|
|
|
Forces the query to use the columnar index. |
|
|
|
Disables the columnar index. |
|
|
|
Forces the query to use the vector index for similarity retrieval. |
|
|
|
Disables the vector index. |