PolarDB MySQL版新增列存索引查询性能分析功能。该功能通过采集执行SQL语句时的耗时等信息,并结合EXPLAIN获得的查询计划一并返回给用户。能够帮助用户了解执行SQL语句时的耗时细节,并且能够辅助分析慢SQL。
版本要求
集群版本需为PolarDB MySQL版8.0.1.1.42及以上,您可以通过查询版本号来确认集群版本。
使用说明
您需要将参数imci_analyze_query
的值设置为ON,来开启列存索引查询性能分析功能。
参数名称 | 级别 | 说明 |
imci_analyze_query | Session | 列存索引查询性能分析功能控制开关。取值范围如下:
|
示例
简单查询示例
以在TPC-H Schema上执行的一个简单查询语句为例,SQL语句如下:
SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;
首先,您可以通过EXPLAIN得到该查询优化器估计的各个算子的执行代价以及返回行数信息。
EXPLAIN SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;
查询结果如下:
+----+------------------------+----------+--------+----------+---------------------------------------------------------------+ | ID | Operator | Name | E-Rows | E-Cost | Extra Info | +----+------------------------+----------+--------+----------+---------------------------------------------------------------+ | 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 1, max_query_mem = unlimited) | | 2 | └─Hash Groupby | | 6 | 51218.50 | Group Key: lineitem.l_shipmode | | 3 | └─Compute Scalar | | 1869 | 50191.00 | | | 4 | └─Hash Join | | 1869 | 31000.00 | Join Cond: lineitem.l_orderkey = orders.o_orderkey | | 5 | ├─Table Scan | lineitem | 2000 | 0.00 | | | 6 | └─Table Scan | orders | 2000 | 0.00 | | +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
开启性能分析功能。
SET imci_analyze_query = ON;
执行SQL语句。
SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;
执行结果如下:
+------------+----------+ | l_shipmode | COUNT(*) | +------------+----------+ | REG AIR | 283 | | SHIP | 269 | | FOB | 299 | | RAIL | 289 | | TRUCK | 314 | | MAIL | 274 | | AIR | 272 | +------------+----------+ 7 rows in set (0.05 sec)
性能分析结果会存放在
information_schema.imci_sql_profiling
表中。在将参数imci_analyze_query
的值设置为ON时,该表将存放最近一条使用列存索引执行的查询语句的性能分析信息,通过查询这张表,就可以获得执行的查询语句的性能信息。查询SQL语句的性能信息。
/*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;
说明/*ROUTE_TO_LAST_USED*/
用于指示智能代理将查询路由到上一条语句执行的节点上。否则可能会因为路由不到查询执行的节点,导致查询不到结果或结果不正确。查询结果如下:
+----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+ | ID | Operator | Name | A-Rows | A-Cost | Execution Time(s) | Extra Info | +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+ | 1 | Select Statement | | 0 | 52609.51 | 0 | IMCI Execution Plan (max_dop = 1, real_dop = 1, max_query_mem = unlimited, real_query_mem = unlimited) | | 2 | └─Hash Groupby | | 7 | 52609.5 | 0.002 | Group Key: lineitem.l_shipmode | | 3 | └─Compute Scalar | | 2000 | 51501 | 0 | | | 4 | └─Hash Join | | 2000 | 31000 | 0.007 | Join Cond: lineitem.l_orderkey = orders.o_orderkey | | 5 | ├─Table Scan | lineitem | 2000 | 0 | 0.001 | | | 6 | └─Table Scan | orders | 2000 | 0 | 0 | | +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
可以看到,
Hash Join
与Hash Groupby
对应的Extra Info
信息中标明了对应的连接条件以及分组列,而第1行的Extra Info
则展示了执行查询操作时使用的CPU和内存信息。(可选)您可以在
information_schema.imci_sql_profiling
表上执行聚合等操作。如查看该查询语句的执行时间等。/*ROUTE_TO_LAST_USED*/SELECT SUM(`Execution Time(s)`) AS TOTAL_TIME FROM information_schema.imci_sql_profiling;
执行结果如下:
+----------------------+ | TOTAL_TIME | +----------------------+ | 0.010000000000000002 | +----------------------+ 1 row in set (0.00 sec)
可以看到,该查询语句的执行耗时为10ms。
复杂查询示例
以在TPC-H SF100数据集上执行一个复杂查询为例,通过EXPLAIN和查询分析功能分析查询耗时分布以及性能。SQL语句如下:
SELECT
c_name,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
WHERE
l_partkey > 18000000
)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name
ORDER BY
c_name
LIMIT 10;
通过EXPLAIN查看该查询语句的执行计划。
EXPLAIN SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;
执行结果如下:
+----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | ID | Operator | Name | E-Rows | E-Cost | Extra Info | +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) | | 2 | └─Limit | | 10 | 7935739.96 | Offset=0 Limit=10 | | 3 | └─Sort | | 10 | 7935739.96 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 1503700 | 7933273.99 | Group Key: customer.C_NAME | | 5 | └─Hash Right Semi Join | | 54010545 | 7865930.26 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 6 | ├─Table Scan | lineitem | 59785766 | 24001.52 | Cond: (L_PARTKEY > 18000000) | | 7 | └─Hash Join | | 538776190 | 5488090.33 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 8 | ├─Hash Join | | 181006430 | 668535.99 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY | | 9 | │ ├─Table Scan | customer | 15000000 | 600.00 | | | 10 | │ └─Table Scan | orders | 150000000 | 6000.00 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | | +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
该SQL语句的执行代价为7935739。其中,ID为7的Hash Join会消耗大量CPU(其代价为5488090,占总代价约70%)。若要验证这条查询语句的性能问题,您可以开启性能分析功能,开启后再执行该SQL语句,并通过查看SQL语句的查询性能信息来分析具体原因。
开启性能分析功能。
SET imci_analyze_query = ON;
执行SQL语句。
SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;
执行结果如下:
+--------------------+-----------------+ | c_name | sum(l_quantity) | +--------------------+-----------------+ | Customer#000000001 | 172.00 | | Customer#000000002 | 663.00 | | Customer#000000004 | 174.00 | | Customer#000000005 | 488.00 | | Customer#000000007 | 1135.00 | | Customer#000000008 | 440.00 | | Customer#000000010 | 625.00 | | Customer#000000011 | 143.00 | | Customer#000000013 | 1032.00 | | Customer#000000014 | 564.00 | +--------------------+-----------------+ 10 rows in set (21.37 sec)
查询SQL语句的性能信息。
/*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;
说明/*ROUTE_TO_LAST_USED*/
用于指示智能代理将查询路由到上一条语句执行的节点上。否则可能会因为路由到错误节点而查询不到结果。查询结果如下:
+----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | ID | Operator | Name | A-Rows | A-Cost | Execution Time(s) | Extra Info | +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | 1 | Select Statement | | 0 | 8336856.67 | 0 | | | 2 | └─Limit | | 10 | 8336856.67 | 0.002 | Offset=0 Limit=10 | | 3 | └─Sort | | 0 | 8336856.67 | 2.275 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 9813586 | 8320763.22 | 160.083 | Group Key: customer.C_NAME | | 5 | └─Hash Right Semi Join | | 239598134 | 7994854.23 | 98.174 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 6 | ├─Table Scan | lineitem | 60013756 | 24001.52 | 3.28 | Cond: (L_PARTKEY > 18000000) | | 7 | └─Hash Join | | 600037902 | 5156677.35 | 301.503 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 8 | ├─Hash Join | | 150000000 | 629777.96 | 97.201 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY | | 9 | │ ├─Table Scan | customer | 15000000 | 600 | 3.321 | | | 10 | │ └─Table Scan | orders | 150000000 | 6000 | 0.241 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | 0.661 | | +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
根据查询的性能信息可以看出:Hash Join(7)耗费了约一半的时间。这是因为
lineitem
表与Hash Join(8)表的结果都很大,从而导致join的数据量很大,因此耗时很长。在Hash Join中,构建哈希表的开销通常远远大于哈希查找,因此只需要减小Join(8)的结果大小,就能有效减少查询的耗时。查看SQL语句和执行计划,若优化器将
o_orderkey in (...)
转换为半连接子查询,且能够将该条件下推,则能提高该SQL语句的查询性能。开启基于代价的半连接下推功能。
SET imci_optimizer_switch = 'semijoin_pushdown=on';
通过EXPLAIN查看该查询语句的执行计划。
EXPLAIN SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;
执行结果如下:
+----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | ID | Operator | Name | E-Rows | E-Cost | Extra Info | +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) | | 2 | └─Limit | | 10 | 2800433.74 | Offset=0 Limit=10 | | 3 | └─Sort | | 10 | 2800433.74 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 14567321 | 2776544.58 | Group Key: customer.C_NAME | | 5 | └─Hash Join | | 57918330 | 2631846.75 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 6 | ├─Hash Join | | 14567321 | 1014041.92 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY | | 7 | │ ├─Hash Right Semi Join | | 12071937 | 906226.67 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 8 | │ │ ├─Table Scan | lineitem | 59785766 | 24001.52 | Cond: (L_PARTKEY > 18000000) | | 9 | │ │ └─Table Scan | orders | 150000000 | 6000.00 | | | 10 | │ └─Table Scan | customer | 15000000 | 600.00 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | | +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
可以看出,开启半连接下推功能后,查询总代价从7935739下降至2800433。
重新执行该SQL语句。
SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;
执行结果如下:
+--------------------+-----------------+ | c_name | sum(l_quantity) | +--------------------+-----------------+ | Customer#000000001 | 172.00 | | Customer#000000002 | 663.00 | | Customer#000000004 | 174.00 | | Customer#000000005 | 488.00 | | Customer#000000007 | 1135.00 | | Customer#000000008 | 440.00 | | Customer#000000010 | 625.00 | | Customer#000000011 | 143.00 | | Customer#000000013 | 1032.00 | | Customer#000000014 | 564.00 | +--------------------+-----------------+ 10 rows in set (13.74 sec)
可以看出,执行耗时降低了约40%。
查询SQL语句的性能信息。
/*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;
执行结果如下:
+----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | ID | Operator | Name | A-Rows | A-Cost | Execution Time(s) | Extra Info | +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | 1 | Select Statement | | 0 | 4318488.35 | 0 | | | 2 | └─Limit | | 10 | 4318488.35 | 0.002 | Offset=0 Limit=10 | | 3 | └─Sort | | 0 | 4318488.34 | 3.076 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 9813586 | 4302394.89 | 163.149 | Group Key: customer.C_NAME | | 5 | └─Hash Join | | 239598134 | 3976485.91 | 151.253 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 6 | ├─Hash Join | | 49393149 | 1321335.54 | 55.392 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY | | 7 | │ ├─Hash Right Semi Join | | 49393149 | 954805.16 | 52.552 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 8 | │ │ ├─Table Scan | lineitem | 60013756 | 24001.52 | 2.791 | Cond: (L_PARTKEY > 18000000) | | 9 | │ │ └─Table Scan | orders | 150000000 | 6000 | 0.152 | | | 10 | │ └─Table Scan | customer | 15000000 | 600 | 0.028 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | 0.642 | | +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
从性能分析结果可以看出,半连接被下推,之前的大表Join被消除,查询耗时显著减少。