Manually adjust the join order

更新时间:
复制 MD 格式

This topic describes how to manually adjust the join order using hints.

Function introduction

AnalyticDB for MySQL supports complex join queries and automatically adjusts the join order by default. However, query statements and table filter conditions can change. If data characteristics are complex, the automatic adjustment may not select the optimal join order. A suboptimal join order can lead to large intermediate result sets and high memory usage, which degrades query performance.

To address these issues, AnalyticDB for MySQL lets you use the /*+ reorder_joins*/ hint to enable or disable the join order adjustment feature:

  • /*+ reorder_joins=true*/: Enables automatic join order adjustment. AnalyticDB for MySQL enables this feature by default, so you do not need to specify this hint when you execute SQL queries.

  • /*+ reorder_joins=false*/: Disables automatic join order adjustment. This lets you manually adjust the join order based on the data characteristics of the query. The query is then executed using the join order specified in the SQL statement.

Note

/*+ reorder_joins*/ is a session-level hint. It affects only the SQL query statement in which it is specified.

Adjustment method

  • Before adjustment

    • Query statement

      The original Query10 statement is as follows:

      Note
      • This topic uses Query10 from TPC-H as an example to demonstrate the method and effect of manually adjusting the join order. For more information about TPC-H, see TPC-H.

      • The automatic join order adjustment feature of AnalyticDB for MySQL is enabled by default. The /*+ reorder_joins=false*/ hint is used in the following query statement to simulate a scenario with a suboptimal join order.

      SELECT   c_custkey,
               c_name,
               Sum(l_extendedprice * (1 - l_discount)) AS revenue,
               c_acctbal,
               n_name,
               c_address,
               c_phone,
               c_comment
      FROM     customer c,
               orders o,
               lineitem l,
               nation n
      WHERE    c_custkey = o_custkey
      AND      l_orderkey = o_orderkey
      AND      o_orderdate >= date '1993-10-01'
      AND      o_orderdate <  date '1993-10-01' + INTERVAL '3' month
      AND      l_returnflag = 'R'
      AND      c_nationkey = n_nationkey
      GROUP BY c_custkey,
               c_name,
               c_acctbal,
               c_phone,
               n_name,
               c_address,
               c_comment
      ORDER BY revenue DESC
      LIMIT    20;
    • Join order

      Based on the preceding SQL statement, the join order is:

      customer JOIN orders JOIN lineitem JOIN nation;
    • Query results

      The intermediate results of each join in the execution plan are as follows:

      Note

      For more information about how to view an execution plan, see Analyze a query using an execution plan.

      1. The customer table is joined with the orders table. The join outputs 57,069 rows to the intermediate result set tmp1.

      2. The intermediate result set tmp1 is joined with the lineitem table using an INNER (Hash) Join on the condition o_orderkey = l_orderkey. This operation takes 4.88 ms (21.67%) and outputs 114,705 rows to the new intermediate result set tmp2.

      3. The intermediate result set tmp2 is joined with the nation table. The join outputs 114,705 rows as the final result. The join node is of the INNER (Hash) type. The join condition is left.c_nationkey = right.n_nationkey. The operation takes 5.71 ms (12.07%).

      The cumulative number of rows output by the three joins is: 57,069 + 114,705 + 114,705 = 286,479.

  • After adjustment

    • Query statement

      You can add the /*+ reorder_joins=false*/ hint to the SQL statement to disable the automatic join order adjustment feature of AnalyticDB for MySQL and then manually adjust the join order. The SQL statement with the adjusted order is as follows:

      /*reorder_joins=false*/
      SELECT   c_custkey,
               c_name,
               Sum(l_extendedprice * (1 - l_discount)) AS revenue,
               c_acctbal,
               n_name,
               c_address,
               c_phone,
               c_comment
      FROM     customer c,
               orders o,
               nation n,
               lineitem l
      WHERE    c_custkey = o_custkey
      AND      c_nationkey = n_nationkey
      AND      l_orderkey = o_orderkey
      AND      o_orderdate >= date '1993-10-01'
      AND      o_orderdate <  date '1993-10-01' + INTERVAL '3' month
      AND      l_returnflag = 'R'
      GROUP BY c_custkey,
               c_name,
               c_acctbal,
               c_phone,
               n_name,
               c_address,
               c_comment
      ORDER BY revenue DESC
      LIMIT    20;
    • Join order

      Based on the preceding SQL statement, the join order is:

      customer JOIN orders JOIN nation JOIN lineitem
    • Query results

      The intermediate results of each join in the execution plan are as follows:

      Note

      For more information about how to view an execution plan, see Analyze a query using an execution plan.

      1. The customer table is joined with the orders table. The join outputs 57,069 rows to the intermediate result set tmp1. The join node is of the INNER (Hash) type. The join condition is c_custkey = o_custkey. The operation takes 4.75 ms (36.7%).

      2. The intermediate result set tmp1 is joined with the nation table. The join outputs 57,069 rows to the intermediate result set tmp2. The join node is of the INNER (Hash) type. The join condition is left.c_nationkey = right.n_nationkey. The operation takes 5.63 ms (25.61%).

      3. The intermediate result set tmp2 is joined with the lineitem table using an INNER (Hash) join on the condition left.o_orderkey = right.l_orderkey. The join outputs 114,705 rows as the final result and takes 5.29 ms (9.81% of the total time).

      The cumulative number of rows output by the three joins is: 57,069 + 57,069 + 114,705 = 228,843.

      This is a 20% reduction in the number of output rows compared with the 286,479 rows from before the adjustment. This comparison shows that different join orders can affect the size of intermediate result sets. If you find that the join order in AnalyticDB for MySQL is not optimal, you can manually adjust it to improve query performance.