分析列存索引查询性能

PolarDB MySQL版新增列存索引查询性能分析功能。该功能通过采集执行SQL语句时的耗时等信息,并结合EXPLAIN获得的查询计划一并返回给用户。能够帮助用户了解执行SQL语句时的耗时细节,并且能够辅助分析慢SQL。

版本要求

集群版本需为PolarDB MySQL版8.0.1.1.42及以上,您可以通过查询版本号来确认集群版本。

使用说明

您需要将参数imci_analyze_query的值设置为ON,来开启列存索引查询性能分析功能。

参数名称

级别

说明

imci_analyze_query

Session

列存索引查询性能分析功能控制开关。取值范围如下:

  • OFF(默认):关闭列存索引查询性能分析功能。

  • ON:开启列存索引查询性能分析功能。

示例

简单查询示例

以在TPC-H Schema上执行的一个简单查询语句为例,SQL语句如下:

SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;
  1. 首先,您可以通过EXPLAIN得到该查询优化器估计的各个算子的执行代价以及返回行数信息。

    EXPLAIN SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;

    查询结果如下:

    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
    | ID | Operator               | Name     | E-Rows | E-Cost   | Extra Info                                                    |
    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
    |  1 | Select Statement       |          |        |          | IMCI Execution Plan (max_dop = 1, max_query_mem = unlimited)  |
    |  2 | └─Hash Groupby         |          | 6      | 51218.50 | Group Key: lineitem.l_shipmode                                |
    |  3 |   └─Compute Scalar     |          | 1869   | 50191.00 |                                                               |
    |  4 |     └─Hash Join        |          | 1869   | 31000.00 | Join Cond: lineitem.l_orderkey = orders.o_orderkey            |
    |  5 |       ├─Table Scan     | lineitem | 2000   | 0.00     |                                                               |
    |  6 |       └─Table Scan     | orders   | 2000   | 0.00     |                                                               |
    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
  2. 开启性能分析功能。

    SET imci_analyze_query = ON;
  3. 执行SQL语句。

    SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;

    执行结果如下:

    +------------+----------+
    | l_shipmode | COUNT(*) |
    +------------+----------+
    | REG AIR    |      283 |
    | SHIP       |      269 |
    | FOB        |      299 |
    | RAIL       |      289 |
    | TRUCK      |      314 |
    | MAIL       |      274 |
    | AIR        |      272 |
    +------------+----------+
    7 rows in set (0.05 sec)

    性能分析结果会存放在information_schema.imci_sql_profiling表中。在将参数imci_analyze_query的值设置为ON时,该表将存放最近一条使用列存索引执行的查询语句的性能分析信息,通过查询这张表,就可以获得执行的查询语句的性能信息。

  4. 查询SQL语句的性能信息。

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    
    说明

    /*ROUTE_TO_LAST_USED*/用于指示智能代理将查询路由到上一条语句执行的节点上。否则可能会因为路由不到查询执行的节点,导致查询不到结果或结果不正确。

    查询结果如下:

    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
    | ID | Operator               | Name     | A-Rows | A-Cost   | Execution Time(s) | Extra Info                                                                                             |
    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
    |  1 | Select Statement       |          |      0 | 52609.51 |                 0 | IMCI Execution Plan (max_dop = 1, real_dop = 1, max_query_mem = unlimited, real_query_mem = unlimited) |
    |  2 | └─Hash Groupby         |          |      7 |  52609.5 |             0.002 | Group Key: lineitem.l_shipmode                                                                         |
    |  3 |   └─Compute Scalar     |          |   2000 |    51501 |                 0 |                                                                                                        |
    |  4 |     └─Hash Join        |          |   2000 |    31000 |             0.007 | Join Cond: lineitem.l_orderkey = orders.o_orderkey                                                     |
    |  5 |       ├─Table Scan     | lineitem |   2000 |        0 |             0.001 |                                                                                                        |
    |  6 |       └─Table Scan     | orders   |   2000 |        0 |                 0 |                                                                                                        |
    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+

    可以看到,Hash JoinHash Groupby对应的Extra Info信息中标明了对应的连接条件以及分组列,而第1行的Extra Info则展示了执行查询操作时使用的CPU和内存信息。

  5. (可选)您可以在information_schema.imci_sql_profiling表上执行聚合等操作。如查看该查询语句的执行时间等。

    /*ROUTE_TO_LAST_USED*/SELECT SUM(`Execution Time(s)`) AS TOTAL_TIME FROM information_schema.imci_sql_profiling;

    执行结果如下:

    +----------------------+
    | TOTAL_TIME           |
    +----------------------+
    | 0.010000000000000002 |
    +----------------------+
    1 row in set (0.00 sec)

    可以看到,该查询语句的执行耗时为10ms。

复杂查询示例

以在TPC-H SF100数据集上执行一个复杂查询为例,通过EXPLAIN和查询分析功能分析查询耗时分布以及性能。SQL语句如下:

SELECT
    c_name,
    sum(l_quantity)
FROM
    customer,
    orders,
    lineitem
WHERE
    o_orderkey IN (
        SELECT
            l_orderkey
        FROM
            lineitem
        WHERE 
            l_partkey > 18000000
    )
    AND c_custkey = o_custkey
    AND o_orderkey = l_orderkey
GROUP BY
    c_name
ORDER BY
    c_name
LIMIT 10;
  1. 通过EXPLAIN查看该查询语句的执行计划。

    EXPLAIN SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    执行结果如下:

    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    | ID | Operator                         | Name     | E-Rows    | E-Cost     | Extra Info                                                    |
    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    |  1 | Select Statement                 |          |           |            | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) |
    |  2 | └─Limit                          |          | 10        | 7935739.96 | Offset=0 Limit=10                                             |
    |  3 |   └─Sort                         |          | 10        | 7935739.96 | Sort Key: c_name ASC                                          |
    |  4 |     └─Hash Groupby               |          | 1503700   | 7933273.99 | Group Key: customer.C_NAME                                    |
    |  5 |       └─Hash Right Semi Join     |          | 54010545  | 7865930.26 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY            |
    |  6 |         ├─Table Scan             | lineitem | 59785766  | 24001.52   | Cond: (L_PARTKEY > 18000000)                                  |
    |  7 |         └─Hash Join              |          | 538776190 | 5488090.33 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY            |
    |  8 |           ├─Hash Join            |          | 181006430 | 668535.99  | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY              |
    |  9 |           │ ├─Table Scan         | customer | 15000000  | 600.00     |                                                               |
    | 10 |           │ └─Table Scan         | orders   | 150000000 | 6000.00    |                                                               |
    | 11 |           └─Table Scan           | lineitem | 600037902 | 24001.52   |                                                               |
    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+

    该SQL语句的执行代价为7935739。其中,ID为7的Hash Join会消耗大量CPU(其代价为5488090,占总代价约70%)。若要验证这条查询语句的性能问题,您可以开启性能分析功能,开启后再执行该SQL语句,并通过查看SQL语句的查询性能信息来分析具体原因。

  2. 开启性能分析功能。

    SET imci_analyze_query = ON;
  3. 执行SQL语句。

    SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    执行结果如下:

    +--------------------+-----------------+
    | c_name             | sum(l_quantity) |
    +--------------------+-----------------+
    | Customer#000000001 |          172.00 |
    | Customer#000000002 |          663.00 |
    | Customer#000000004 |          174.00 |
    | Customer#000000005 |          488.00 |
    | Customer#000000007 |         1135.00 |
    | Customer#000000008 |          440.00 |
    | Customer#000000010 |          625.00 |
    | Customer#000000011 |          143.00 |
    | Customer#000000013 |         1032.00 |
    | Customer#000000014 |          564.00 |
    +--------------------+-----------------+
    10 rows in set (21.37 sec)
  4. 查询SQL语句的性能信息。

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    
    说明

    /*ROUTE_TO_LAST_USED*/用于指示智能代理将查询路由到上一条语句执行的节点上。否则可能会因为路由到错误节点而查询不到结果。

    查询结果如下:

    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    | ID | Operator                         | Name     | A-Rows    | A-Cost     | Execution Time(s) | Extra Info                                         |
    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    |  1 | Select Statement                 |          |         0 | 8336856.67 |                 0 |                                                    |
    |  2 | └─Limit                          |          |        10 | 8336856.67 |             0.002 | Offset=0 Limit=10                                  |
    |  3 |   └─Sort                         |          |         0 | 8336856.67 |             2.275 | Sort Key: c_name ASC                               |
    |  4 |     └─Hash Groupby               |          |   9813586 | 8320763.22 |           160.083 | Group Key: customer.C_NAME                         |
    |  5 |       └─Hash Right Semi Join     |          | 239598134 | 7994854.23 |            98.174 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY |
    |  6 |         ├─Table Scan             | lineitem |  60013756 |   24001.52 |              3.28 | Cond: (L_PARTKEY > 18000000)                       |
    |  7 |         └─Hash Join              |          | 600037902 | 5156677.35 |           301.503 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY |
    |  8 |           ├─Hash Join            |          | 150000000 |  629777.96 |            97.201 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY   |
    |  9 |           │ ├─Table Scan         | customer |  15000000 |        600 |             3.321 |                                                    |
    | 10 |           │ └─Table Scan         | orders   | 150000000 |       6000 |             0.241 |                                                    |
    | 11 |           └─Table Scan           | lineitem | 600037902 |   24001.52 |             0.661 |                                                    |
    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+

    根据查询的性能信息可以看出:Hash Join(7)耗费了约一半的时间。这是因为lineitem表与Hash Join(8)表的结果都很大,从而导致join的数据量很大,因此耗时很长。在Hash Join中,构建哈希表的开销通常远远大于哈希查找,因此只需要减小Join(8)的结果大小,就能有效减少查询的耗时。

    查看SQL语句和执行计划,若优化器将o_orderkey in (...)转换为半连接子查询,且能够将该条件下推,则能提高该SQL语句的查询性能。

  5. 开启基于代价的半连接下推功能。

    SET imci_optimizer_switch = 'semijoin_pushdown=on';
  6. 通过EXPLAIN查看该查询语句的执行计划。

    EXPLAIN SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    执行结果如下:

    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    | ID | Operator                               | Name     | E-Rows    | E-Cost     | Extra Info                                                    |
    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    |  1 | Select Statement                       |          |           |            | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) |
    |  2 | └─Limit                                |          | 10        | 2800433.74 | Offset=0 Limit=10                                             |
    |  3 |   └─Sort                               |          | 10        | 2800433.74 | Sort Key: c_name ASC                                          |
    |  4 |     └─Hash Groupby                     |          | 14567321  | 2776544.58 | Group Key: customer.C_NAME                                    |
    |  5 |       └─Hash Join                      |          | 57918330  | 2631846.75 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY            |
    |  6 |         ├─Hash Join                    |          | 14567321  | 1014041.92 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY              |
    |  7 |         │ ├─Hash Right Semi Join       |          | 12071937  | 906226.67  | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY            |
    |  8 |         │ │ ├─Table Scan               | lineitem | 59785766  | 24001.52   | Cond: (L_PARTKEY > 18000000)                                  |
    |  9 |         │ │ └─Table Scan               | orders   | 150000000 | 6000.00    |                                                               |
    | 10 |         │ └─Table Scan                 | customer | 15000000  | 600.00     |                                                               |
    | 11 |         └─Table Scan                   | lineitem | 600037902 | 24001.52   |                                                               |
    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+

    可以看出,开启半连接下推功能后,查询总代价从7935739下降至2800433。

  7. 重新执行该SQL语句。

    SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    执行结果如下:

    +--------------------+-----------------+
    | c_name             | sum(l_quantity) |
    +--------------------+-----------------+
    | Customer#000000001 |          172.00 |
    | Customer#000000002 |          663.00 |
    | Customer#000000004 |          174.00 |
    | Customer#000000005 |          488.00 |
    | Customer#000000007 |         1135.00 |
    | Customer#000000008 |          440.00 |
    | Customer#000000010 |          625.00 |
    | Customer#000000011 |          143.00 |
    | Customer#000000013 |         1032.00 |
    | Customer#000000014 |          564.00 |
    +--------------------+-----------------+
    10 rows in set (13.74 sec)

    可以看出,执行耗时降低了约40%。

  8. 查询SQL语句的性能信息。

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    

    执行结果如下:

    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    | ID | Operator                               | Name     | A-Rows    | A-Cost     | Execution Time(s) | Extra Info                                         |
    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    |  1 | Select Statement                       |          |         0 | 4318488.35 |                 0 |                                                    |
    |  2 | └─Limit                                |          |        10 | 4318488.35 |             0.002 | Offset=0 Limit=10                                  |
    |  3 |   └─Sort                               |          |         0 | 4318488.34 |             3.076 | Sort Key: c_name ASC                               |
    |  4 |     └─Hash Groupby                     |          |   9813586 | 4302394.89 |           163.149 | Group Key: customer.C_NAME                         |
    |  5 |       └─Hash Join                      |          | 239598134 | 3976485.91 |           151.253 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY |
    |  6 |         ├─Hash Join                    |          |  49393149 | 1321335.54 |            55.392 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY   |
    |  7 |         │ ├─Hash Right Semi Join       |          |  49393149 |  954805.16 |            52.552 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY |
    |  8 |         │ │ ├─Table Scan               | lineitem |  60013756 |   24001.52 |             2.791 | Cond: (L_PARTKEY > 18000000)                       |
    |  9 |         │ │ └─Table Scan               | orders   | 150000000 |       6000 |             0.152 |                                                    |
    | 10 |         │ └─Table Scan                 | customer |  15000000 |        600 |             0.028 |                                                    |
    | 11 |         └─Table Scan                   | lineitem | 600037902 |   24001.52 |             0.642 |                                                    |
    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+

    从性能分析结果可以看出,半连接被下推,之前的大表Join被消除,查询耗时显著减少。