columnar table design, ORDER KEY, storage policy, PolarDB MySQL-PolarDB(PolarDB)-阿里云帮助中心

This topic describes how to create and manage columnar table, including setting order keys, converting table formats, and analyzing query execution plans to optimize performance.

Create columnar table

This section describes how to create a columnar table, set an order key to optimize query performance, and specify a storage policy.

Create a basic table

To create a basic columnar table, specify ENGINE=XEngine and TABLE_FORMAT=COLUMN in your CREATE TABLE statement:

-- Create a columnar table to store sales records
CREATE TABLE sales_records (
  sale_id INT PRIMARY KEY AUTO_INCREMENT,
  product_id INT NOT NULL,
  region VARCHAR(50),
  sale_date DATE,
  amount DECIMAL(10, 2)
) ENGINE=XEngine, TABLE_FORMAT=COLUMN;

Use an ORDER KEY to optimize query performance

An order key (ORDER KEY) determines the physical sort order of data in a columnar table. A well-designed order key can significantly improve the performance of filter, aggregation, and sort operations by pre-sorting data.

Design guidelines

Frequently filtered columns: Use columns that appear most often in WHERE clauses for range or equality lookups, especially date columns such as sale_date.
Frequently grouped columns: Include dimension columns that are often used in GROUP BY clauses.
High-cardinality columns: Place columns with many distinct values at the beginning of the order key to improve data filtering efficiency.

Procedure: Specify the order key by adding an ORDER KEY clause in your CREATE TABLE statement.

Important

The total length of all columns in the order_key must not exceed 3,072 bytes.

-- Create a columnar table with an order key
-- Queries frequently filter by date range and region, so use them as the order key
CREATE TABLE sales_records_sorted (
  sale_id INT PRIMARY KEY AUTO_INCREMENT,
  product_id INT NOT NULL,
  region VARCHAR(50),
  sale_date DATE,
  amount DECIMAL(10, 2),
  -- Define an order key named idx_date_region
  ORDER KEY idx_date_region(sale_date, region)
) ENGINE=XEngine, TABLE_FORMAT=COLUMN;

Specify a storage policy for hot-cold data tiering

The storage_policy parameter specifies the storage medium for table data. Use it to separate hot and cold data by storing infrequently accessed data on low-cost Object Storage Service (OSS).

HOT: Data is stored on high-performance PolarStore. This is the default setting.
COLD: Data is stored on low-cost OSS.

Note

The storage_policy=cold syntax is supported. However, in the current version, all data is stored on PolarStore regardless of whether you specify HOT or COLD.

-- Syntax example: Create a columnar table with COLD storage policy
CREATE TABLE archive_logs (
  log_id BIGINT PRIMARY KEY,
  log_time TIMESTAMP,
  message TEXT
) ENGINE=XEngine, TABLE_FORMAT=COLUMN, STORAGE_POLICY=COLD;

Create partitioned table

You create a columnar partitioned table the same way you create a non-partitioned table. Specify ENGINE=XEngine and TABLE_FORMAT=COLUMN in your CREATE TABLE statement.

Note

Creating a columnar partitioned table converts the table to a hybrid partitioned table. Hybrid partitioned tables are currently in limited preview. To use this feature, go to the Quota Center, find the quota by Quota ID: polardb_mysql_hybrid_partition, and click Apply to activate the feature.

-- Create a columnar partitioned table
CREATE TABLE t1(a1 INT, a2 INT, a3 INT, a4 INT) ENGINE=XEngine TABLE_FORMAT=COLUMN
PARTITION BY RANGE(a1)
(
  PARTITION p1 VALUES LESS THAN (20),
  PARTITION p2 VALUES LESS THAN (40),
  PARTITION p3 VALUES LESS THAN (60),
  PARTITION p4 VALUES LESS THAN MAXVALUE
);

-- Create a columnar partitioned table with COLD storage on OSS
CREATE TABLE t2(a1 INT, a2 INT, a3 INT, a4 INT) ENGINE=XEngine TABLE_FORMAT=COLUMN STORAGE_POLICY=COLD
PARTITION BY RANGE(a1)
(
  PARTITION p1 VALUES LESS THAN (20),
  PARTITION p2 VALUES LESS THAN (40),
  PARTITION p3 VALUES LESS THAN (60),
  PARTITION p4 VALUES LESS THAN MAXVALUE
);

Manage columnar table

This section describes how to manage existing tables, including converting between row-based table and columnar formats, modifying table structures, and changing partition storage formats.

Convert table storage formats

You can convert an existing row-based table table to a columnar table for analytics acceleration, or convert a columnar table back to a row-based table table:

// Convert a row-based table table to an X-Engine columnar table
ALTER TABLE sales_records ENGINE=XEngine, TABLE_FORMAT=COLUMN;

// Convert a columnar table to an X-Engine row-based table table
ALTER TABLE sales_records ENGINE=XEngine TABLE_FORMAT=ROW;

// Convert a columnar table to an InnoDB row-based table table
ALTER TABLE sales_records ENGINE=innodb;

Modify table schema (columns and indexes)

Columnar tables support Online DDL. You can add or drop columns and indexes without blocking DML operations. The syntax is the same as for row-based table tables:

-- Add a column online
ALTER TABLE sales_records ADD COLUMN salesperson_id INT;

-- Drop a column online
ALTER TABLE sales_records DROP COLUMN salesperson_id;

-- Add an index online
ALTER TABLE sales_records ADD INDEX idx_product(product_id);

-- Drop an index online
ALTER TABLE sales_records DROP INDEX idx_product;

Note

In the current version, the ADD COLUMN operation does not support Instant DDL. However, it supports Online DDL, which means adding a column requires a full table data copy but does not block DML operations during execution.

Modify order keys

You can add, change, or drop the order key of a columnar table to match your evolving query patterns:

-- Add an order key to an existing columnar table
ALTER TABLE sales_records ADD ORDER KEY idx_region(region), ALGORITHM=INPLACE;

-- Change the order key definition (drop and add)
ALTER TABLE sales_records_sorted DROP ORDER KEY idx_date_region, ADD ORDER KEY idx_product(product_id), ALGORITHM=INPLACE;

-- Drop an order key
ALTER TABLE sales_records_sorted DROP ORDER KEY idx_date_region, ALGORITHM=INPLACE;

Change partition storage formats

In a partitioned table, you can convert specific historical partitions to the columnar format for hybrid row-column storage. Keep recent active data partitions in the row-based table format for OLTP performance, and convert historical archive data partitions to the columnar format for analytics acceleration.

-- Step 1: Create an InnoDB partitioned table
CREATE TABLE historical_orders (
  order_id INT NOT NULL,
  order_date DATE NOT NULL,
  amount DECIMAL(10, 2)
) ENGINE=InnoDB
PARTITION BY RANGE(TO_DAYS(order_date)) (
  PARTITION p2023 VALUES LESS THAN (TO_DAYS('2024-01-01')),
  PARTITION p2024 VALUES LESS THAN (TO_DAYS('2025-01-01'))
);

-- Step 2: Convert the 2023 historical data partition to columnar format
ALTER TABLE historical_orders CHANGE PARTITION p2023 ENGINE=XEngine TABLE_FORMAT=COLUMN;

Analyze and optimize query performance

Use the EXPLAIN command to verify that a query uses the columnar execution engine and to understand its execution plan. Columnar execution plans use a horizontal tree format that differs significantly from traditional row-based plans.

Example

The following example uses the TPC-H Q5 benchmark query to compare columnar and row-based execution plans.

Columnar execution plan example

+----+-------------------------------------------+----------+----------------------------------------------------------------------------------------------------+
| ID | Operator                                  | Name     | Extra Info                                                                                         |
+----+-------------------------------------------+----------+----------------------------------------------------------------------------------------------------+
|  1 | Select Statement                          |          | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited)                                      |
|  2 | └─Sort                                    |          | Sort Key: revenue DESC                                                                             |
|  3 |   └─Hash Groupby                          |          | Group Key: nation.n_name                                                                           |
|  4 |     └─Hash Join                           |          | Join Cond: (lineitem.l_suppkey, customer.c_nationkey) = (supplier.s_suppkey, supplier.s_nationkey) |
|  5 |       ├─Hash Join                         |          | Join Cond: orders.o_orderkey = lineitem.l_orderkey                                                 |
|  6 |       │ ├─Hash Join                       |          | Join Cond: (customer.c_nationkey, region.r_regionkey) = (nation.n_nationkey, nation.n_regionkey)   |
|  7 |       │ │ ├─Hash Join                     |          | Join Cond: orders.o_custkey = customer.c_custkey                                                   |
|  8 |       │ │ │ ├─Cartesian Product           |          |                                                                                                    |
|  9 |       │ │ │ │ ├─Table Scan                | region   | Cond: (r_name = "ASIA")                                                                            |
| 10 |       │ │ │ │ └─Table Scan                | orders   | Cond: ((o_orderdate >= 01/01/1994) AND (o_orderdate < 01/01/1995))                                 |
| 11 |       │ │ │ └─Table Scan                  | customer |                                                                                                    |
| 12 |       │ │ └─Table Scan                    | nation   |                                                                                                    |
| 13 |       │ └─Table Scan                      | lineitem |                                                                                                    |
| 14 |       └─Table Scan                        | supplier |                                                                                                    |
+----+-------------------------------------------+----------+----------------------------------------------------------------------------------------------------+

How to read the plan:

Confirm columnar execution: The Extra Info column displays IMCI Execution Plan, which indicates that the query is processed by the columnar execution engine.
Check parallelism: max_dop = 32 indicates that the maximum degree of parallelism for this query is 32 threads within the columnar engine.
Understand core operators:
- Table Scan: Represents a scan of a columnar table. The Cond section shows that WHERE conditions are pushed down to the storage layer for execution, which is a key performance optimization.
- Hash Join/Hash Groupby: High-performance join and aggregation algorithms used in the columnar engine, suitable for processing large-scale data.
- Sort: Sort operation. Performance is high when the order key matches the ORDER BY columns.

Row-based execution plan comparison

+----+-------------+----------+------------+--------+---------------+---------+---------+---------------------------+---------+----------+----------------------------------------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                       | rows    | filtered | Extra                                        |
+----+-------------+----------+------------+--------+---------------+---------+---------+---------------------------+---------+----------+----------------------------------------------+
|  1 | SIMPLE      | region   | NULL       | ALL    | PRIMARY       | NULL    | NULL    | NULL                      |       5 |    20.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | orders   | NULL       | ALL    | PRIMARY       | NULL    | NULL    | NULL                      | 1500000 |    11.11 | Using where; Using join buffer (hash join)   |
|  1 | SIMPLE      | customer | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | tpch.orders.o_custkey     |       1 |   100.00 | NULL                                         |
|  1 | SIMPLE      | nation   | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | tpch.customer.c_nationkey |       1 |    10.00 | Using where                                  |
|  1 | SIMPLE      | lineitem | NULL       | ref    | PRIMARY       | PRIMARY | 8       | tpch.orders.o_orderkey    |       2 |   100.00 | NULL                                         |
|  1 | SIMPLE      | supplier | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | tpch.lineitem.l_suppkey   |       1 |    10.00 | Using where                                  |
+----+-------------+----------+------------+--------+---------------+---------+---------+---------------------------+---------+----------+----------------------------------------------+

Row-based execution plans typically contain markers such as Using temporary; Using filesort, which indicate that the engine creates temporary tables and performs on-disk sorting. These are the main performance bottlenecks in analytical queries. Columnar plans avoid these inefficiencies through in-memory parallel computation.