本文介绍了如何免费体验PolarDB-X的列存索引(CCI)功能。
背景
传统的OLTP和OLAP解决方案基于简单的读写分离或ETL模型,将在线库的数据以T+1的方式抽取到数据仓库中进行计算,这种方案存在存储成本高、实时性差、链路和维护成本高等缺陷。
为应对数据爆炸式增长的挑战,PolarDB分布式版基于对象存储OSS设计了一套列存索引(Clustered Columnar Index,CCI)功能,支持将行存数据实时同步到列存存储上,并支持以下功能:
在线事务处理和实时数据分析的一体化能力,满足OLTP和OLAP混合场景的需求。
结合PolarDB分布式架构,列存索引支持智能路由和MPP查询加速技术。计算层会精确识别出TP和AP的流量,并智能地将TP和AP流量分别路由到不同的存储介质上,同时确保在AP链路上默认开启MPP并行查询技术扫描列存索引,从而提升查询分析的能力。
采用Delta+Main模型,满足秒级的实时更新,结合MVCC多版本技术,能确保在任何时刻都可以读取到一致性的快照数据。
阿里云提供了数据库解决方案功能体验馆,提供真实免费的PolarDB分布式版实例环境和开箱即用的测试方法,您可以在线快捷体验PolarDB分布式版的列存索引(CCI)功能。
影响
本功能体验不涉及生产环境的部署,因此不会影响业务。
费用
本次体验中,由于体验涉及到的资源不归属于您,因此不会产生任何费用,您可以放心体验。
体验内容
体验环境
在本免费体验中,阿里云提供了预置环境供您操作体验,预置环境的详情如下:
集群:提供了一个PolarDB-X实例。具体如下:
实例版本:polardb-2.5.0_5.4.20-20250210_xcluster5.4.20-20241213
说明实例版本的规则说明,请参见版本说明。
计算节点规格:4核16 GB
计算节点数量:2个
存储节点规格:4核16 GB
存储节点数量:2个
兼容性:MySQL 5.7
规格系列:通用规格
测试数据集:实例中预置了标准测试集TPC-H的10 GB数据集。
说明本文的TPC-H的实现基于TPC-H的基准测试,并不能与已发布的TPC-H基准测试结果相比较,本文中的测试并不符合TPC-H基准测试的所有要求。
观测指标
CPU占用率(单位:%):实例中正常查询(未使用列存索引加速)与列存索引加速查询的CPU使用率。
查询耗时(单位:秒):执行特定SQL所耗费的时间。
操作步骤
进入瑶池解决方案体验馆中的列存索引/表-加速TP复杂SQL查询体验方案页面中:
单击页面下方创建免费体验任务按钮。
稍等片刻后,单击
刷新任务列表,可以看到您创建的体验任务已开始。
单击查看详情,进入列存索引功能的体验页面,单击开启任务。
说明请根据页面按钮提示,手动点击按钮执行每一步操作。若在倒计时结束时没有手动点击执行,则会自动执行对应操作。
首先,我们来体验一下PolarDB-X列存索引功能在单表查询中的加速效果。
关闭列存加速引擎时,单表查询语句的执行计划以及执行时间。
单击选择数据库,自动执行如下命令,切换到测试数据库
tpch10g
中。USE tpch10g;
单击关闭列存加速引擎,自动执行如下命令,关闭列存索引加速。
SET ENABLE_COLUMNAR_OPTIMIZER = false;
单击查看SQL执行计划,自动执行如下命令,查看单表查询语句的执行计划。
EXPLAIN SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date '1998-12-01' - interval '60' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
返回结果如下,此时查询计划中并未使用列存索引加速。
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | LOGICAL EXECUTIONPLAN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Project(l_returnflag="l_returnflag", l_linestatus="l_linestatus", sum_qty="sum_qty", sum_base_price="sum_base_price", sum_disc_price="sum_disc_price", sum_charge="sum_charge", avg_qty="sum_qty / $f6", avg_price="sum_base_price / $f6", avg_disc="$f7 / $f6", count_order="$f6") | | Exchange(distribution=single, collation=[0 ASC-nulls-first, 1 ASC-nulls-first]) | | SortAgg(group="l_returnflag,l_linestatus", sum_qty="SUM(sum_qty)", sum_base_price="SUM(sum_base_price)", sum_disc_price="SUM(sum_disc_price)", sum_charge="SUM(sum_charge)", $f6="SUM($f6)", $f7="SUM($f7)") | | Exchange(distribution=hash[0, 1], collation=[0 ASC-nulls-first, 1 ASC-nulls-first]) | | LogicalView(tables="lineitem[p1,p2,p3,...p64]", shardCount=64, sql="SELECT `l_returnflag`, `l_linestatus`, SUM(`l_quantity`) AS `sum_qty`, SUM(`l_extendedprice`) AS `sum_base_price`, SUM((`l_extendedprice` * (? - `l_discount`))) AS `sum_disc_price`, SUM(((`l_extendedprice` * (? - `l_discount`)) * (? + `l_tax`))) AS `sum_charge`, COUNT(*) AS `$f6`, SUM(`l_discount`) AS `$f7` FROM `lineitem` AS `lineitem` WHERE (`l_shipdate` <= DATE_SUB(DATE('1998-12-01'), INTERVAL '60' DAY)) GROUP BY `l_returnflag`, `l_linestatus` ORDER BY `l_returnflag`, `l_linestatus`") | | HitCache:true | | Source:PLAN_CACHE | | TemplateId: 7d898900 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 8 rows in set (0.01 sec)
单击执行SQL,查看单表查询语句的执行时间。返回结果如下,在未使用列存索引加速查询时,该语句执行时间较长。
+--------------+--------------+--------------+------------------+--------------------+----------------------+---------+------------+----------+-------------+ | l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order | +--------------+--------------+--------------+------------------+--------------------+----------------------+---------+------------+----------+-------------+ | A | F | 377518399.00 | 566065727797.25 | 537759104278.0656 | 559276670892.116819 | 25.5010 | 38237.1510 | 0.0500 | 14804077 | | N | F | 9851614.00 | 14767438399.17 | 14028805792.2114 | 14590490998.366737 | 25.5224 | 38257.8107 | 0.0500 | 385998 | | N | O | 755019122.00 | 1132129731453.74 | 1075518177478.8293 | 1118551353535.822147 | 25.4984 | 38234.1666 | 0.0500 | 29610420 | | R | F | 377732830.00 | 566431054976.00 | 538110922664.7677 | 559634780885.086257 | 25.5084 | 38251.2193 | 0.0500 | 14808183 | +--------------+--------------+--------------+------------------+--------------------+----------------------+---------+------------+----------+-------------+ 4 rows in set (11.40 sec)
开启列存加速引擎时,单表查询语句的执行计划以及执行时间。
单击开启列存加速引擎,自动执行如下命令,开启列存索引加速。
SET ENABLE_COLUMNAR_OPTIMIZER = true;
单击查看SQL执行计划,自动执行如下命令,查看单表查询语句的执行计划。
EXPLAIN SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date '1998-12-01' - interval '60' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
返回结果如下,此时查询计划中使用了列存索引加速,即查询计划中包含了
OSSTableScan
关键字。+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | LOGICAL EXECUTIONPLAN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Project(l_returnflag="l_returnflag", l_linestatus="l_linestatus", sum_qty="sum_qty", sum_base_price="sum_base_price", sum_disc_price="sum_disc_price", sum_charge="sum_charge", avg_qty="sum_qty / $f6", avg_price="sum_base_price / $f6", avg_disc="$f7 / $f6", count_order="$f6") | | Exchange(distribution=single, collation=[0 ASC-nulls-first, 1 ASC-nulls-first]) | | MemSort(sort="l_returnflag ASC,l_linestatus ASC") | | HashAgg(group="l_returnflag,l_linestatus", sum_qty="SUM(sum_qty)", sum_base_price="SUM(sum_base_price)", sum_disc_price="SUM(sum_disc_price)", sum_charge="SUM(sum_charge)", $f6="SUM($f6)", $f7="SUM($f7)") | | Exchange(distribution=hash[0, 1], collation=[]) | | PartialHashAgg(group="l_returnflag,l_linestatus", sum_qty="SUM(l_quantity)", sum_base_price="SUM(l_extendedprice)", sum_disc_price="SUM($f4)", sum_charge="SUM($f5)", $f6="COUNT()", $f7="SUM(l_discount)") | | Project(l_returnflag="l_returnflag", l_linestatus="l_linestatus", l_quantity="l_quantity", l_extendedprice="l_extendedprice", $f4="l_extendedprice * ?0 - l_discount", $f5="l_extendedprice * ?1 - l_discount * ?2 + l_tax", l_discount="l_discount") | | OSSTableScan(tables="lineitem_col_index_$8473[p1,p2,p3,...p24]", shardCount=24, sql="SELECT `l_quantity`, `l_extendedprice`, `l_discount`, `l_tax`, `l_returnflag`, `l_linestatus` FROM `lineitem_col_index_$8473` AS `lineitem_col_index_$8473` WHERE (`l_shipdate` <= '1998-10-02')") | | HitCache:false | | Source:null | | TemplateId: NULL | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in set (0.02 sec)
单击执行SQL,查看单表查询语句的执行时间。返回结果如下,开启列存索引加速后,该语句执行时间大幅降低。
+--------------+--------------+--------------+------------------+--------------------+----------------------+---------+------------+----------+-------------+ | l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order | +--------------+--------------+--------------+------------------+--------------------+----------------------+---------+------------+----------+-------------+ | A | F | 377518399.00 | 566065727797.25 | 537759104278.0656 | 559276670892.116819 | 25.5010 | 38237.1510 | 0.0500 | 14804077 | | N | F | 9851614.00 | 14767438399.17 | 14028805792.2114 | 14590490998.366737 | 25.5224 | 38257.8107 | 0.0500 | 385998 | | N | O | 755019122.00 | 1132129731453.74 | 1075518177478.8293 | 1118551353535.822147 | 25.4984 | 38234.1666 | 0.0500 | 29610420 | | R | F | 377732830.00 | 566431054976.00 | 538110922664.7677 | 559634780885.086257 | 25.5084 | 38251.2193 | 0.0500 | 14808183 | +--------------+--------------+--------------+------------------+--------------------+----------------------+---------+------------+----------+-------------+ 4 rows in set (0.69 sec)
下一步,我们来体验一下PolarDB-X列存索引功能在多表Join查询中的加速效果。
关闭列存加速引擎时,多表Join查询语句的执行计划以及执行时间。
单击关闭列存加速引擎,自动执行如下命令,关闭列存索引加速。
SET ENABLE_COLUMNAR_OPTIMIZER = false;
单击查看SQL执行计划,自动执行如下命令,查看多表Join查询语句的执行计划。
EXPLAIN SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < date '1995-03-15' AND l_shipdate > date '1989-03-15' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue desc, o_orderdate LIMIT 20;
返回结果如下,此时查询计划中并未使用列存索引加速。
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | LOGICAL EXECUTIONPLAN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Project(l_orderkey="l_orderkey", revenue="revenue", o_orderdate="o_orderdate", o_shippriority="o_shippriority") | | TopN(sort="revenue DESC,o_orderdate ASC", offset=0, fetch=?2) | | Exchange(distribution=single, collation=[3 DESC-nulls-last, 1 ASC-nulls-first]) | | TopN(sort="revenue DESC,o_orderdate ASC", fetch=+(?2, 0)) | | Project(l_orderkey="l_orderkey", o_orderdate="o_orderdate", o_shippriority="o_shippriority", revenue="revenue") | | HashAgg(group="o_orderdate,o_shippriority,l_orderkey", revenue="SUM(__*__)") | | Exchange(distribution=hash[2, 3, 4], collation=[]) | | Project(c_custkey="c_custkey", o_custkey="o_custkey", o_orderdate="o_orderdate", o_shippriority="o_shippriority", l_orderkey="l_orderkey", __*__="__*__") | | HashJoin(condition="o_custkey = c_custkey", type="inner") | | LogicalView(tables="orders[p1,p2,p3,...p64],lineitem[p1,p2,p3,...p64]", shardCount=64, sql="SELECT `orders`.`o_custkey`, `orders`.`o_orderdate`, `orders`.`o_shippriority`, `lineitem`.`l_orderkey`, (`lineitem`.`l_extendedprice` * (? - `lineitem`.`l_discount`)) AS `__*__` FROM `orders` AS `orders` INNER JOIN `lineitem` AS `lineitem` ON ((`orders`.`o_orderkey` = `lineitem`.`l_orderkey`) AND (`orders`.`o_orderdate` < DATE('1995-03-15')) AND (`lineitem`.`l_shipdate` > DATE('1989-03-15'))) WHERE (BLOOMFILTER(?, ?, ?, `orders`.`o_custkey`, 1))") | | Exchange(distribution=broadcast, collation=[]) | | RuntimeFilterBuilder(condition="RUNTIME_FILTER_BUILDER_[0](c_custkey)") | | LogicalView(tables="customer[p1,p2,p3,...p64]", shardCount=64, sql="SELECT `c_custkey` FROM `customer` AS `customer` WHERE (`c_mktsegment` = ?)") | | HitCache:false | | Source:null | | TemplateId: NULL | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 16 rows in set (0.03 sec)
单击执行SQL,查看单表查询语句的执行时间。返回结果如下,在未使用列存索引加速查询时,该语句执行时间较长。
+------------+-------------+-------------+----------------+ | l_orderkey | revenue | o_orderdate | o_shippriority | +------------+-------------+-------------+----------------+ | 39394405 | 536454.4296 | 1992-01-04 | 0 | | ... | ... | ... | 0 | | 52974151 | 480148.7595 | 1995-02-05 | 0 | +------------+-------------+-------------+----------------+ 20 rows in set (2.05 sec)
开启列存加速引擎时,多表Join查询语句的执行计划以及执行时间。
单击开启列存加速引擎,自动执行如下命令,开启列存索引加速。
SET ENABLE_COLUMNAR_OPTIMIZER = true;
单击查看SQL执行计划,自动执行如下命令,查看单表查询语句的执行计划。
EXPLAIN SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < date '1995-03-15' AND l_shipdate > date '1989-03-15' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue desc, o_orderdate LIMIT 20;
返回结果如下,此时查询计划中使用了列存索引加速,即查询计划中包含了
OSSTableScan
关键字。+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | LOGICAL EXECUTIONPLAN | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Project(l_orderkey="l_orderkey", revenue="revenue", o_orderdate="o_orderdate", o_shippriority="o_shippriority") | | TopN(sort="revenue DESC,o_orderdate ASC", offset=0, fetch=?2) | | Exchange(distribution=single, collation=[3 DESC-nulls-last, 0 ASC-nulls-first]) | | TopN(sort="revenue DESC,o_orderdate ASC", fetch=+(?2, 0)) | | HashAgg(group="o_orderdate,o_shippriority,l_orderkey", revenue="SUM(__*__)", partition=[local]) | | Project(c_custkey="c_custkey", o_orderkey="o_orderkey", o_custkey="o_custkey", o_orderdate="o_orderdate", o_shippriority="o_shippriority", l_orderkey="l_orderkey", __*__="l_extendedprice * ?0 - l_discount") | | HashJoin(condition="o_orderkey = l_orderkey", type="inner") | | OSSTableScan(tables="lineitem_col_index_$8473[p1,p2,p3,...p24]", shardCount=24, partition=[local, remote], sql="SELECT `l_orderkey`, `l_extendedprice`, `l_discount` FROM `lineitem_col_index_$8473` AS `lineitem_col_index_$8473` WHERE (`l_shipdate` > '1989-03-15')") | | Exchange(distribution=hash[1]24, collation=[]) | | Project(c_custkey="c_custkey", o_orderkey="o_orderkey", o_custkey="o_custkey", o_orderdate="o_orderdate", o_shippriority="o_shippriority") | | HashJoin(condition="o_custkey = c_custkey", type="inner") | | Exchange(distribution=hash[1]24, collation=[]) | | OSSTableScan(tables="orders_col_index_$b0a3[p1,p2,p3,...p24]", shardCount=24, sql="SELECT `o_orderkey`, `o_custkey`, `o_orderdate`, `o_shippriority` FROM `orders_col_index_$b0a3` AS `orders_col_index_$b0a3` WHERE (`o_orderdate` < '1995-03-15')") | | OSSTableScan(tables="customer_col_index_$570d[p1,p2,p3,...p24]", shardCount=24, partition=[remote], sql="SELECT `c_custkey` FROM `customer_col_index_$570d` AS `customer_col_index_$570d` WHERE (`c_mktsegment` = ?)") | | HitCache:false | | Source:null | | TemplateId: NULL | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 17 rows in set (0.03 sec)
单击执行SQL,查看单表查询语句的执行时间。返回结果如下,开启列存索引加速后,该语句执行时间大幅降低。
+------------+-------------+-------------+----------------+ | l_orderkey | revenue | o_orderdate | o_shippriority | +------------+-------------+-------------+----------------+ | 39394405 | 536454.4296 | 1992-01-04 | 0 | | ... | ... | ... | 0 | | 52974151 | 480148.7595 | 1995-02-05 | 0 | +------------+-------------+-------------+----------------+ 20 rows in set (0.25 sec)
结果分析
使用方式简单,无需更改查询SQL即可实现加速效果。
针对单表查询和多表Join查询的场景,列存索引均有近10倍以上的性能提升。
场景
执行时间(单位:秒)
关闭列存加速引擎
开启列存加速引擎
单表查询
11.40
0.69
多表Join查询
2.05
0.25