通过将Left join改写为Right join解决查询慢与内存消耗问题-云原生数据仓库AnalyticDB-阿里云

Left join是实践中常用的一种表关联方式，由于Hash Join实现会以右表做Build，且left Join不会做左右表的重新排序，在右表数据量很大时会造成执行慢、消耗过多内存资源等多个问题。本文以具体示例介绍哪些场景下可以用right join替代left join。

背景信息

AnalyticDB MySQL版默认使用Hash Join进行表关联。Hash Join在实现时会用右表构建哈希表，该过程会消耗大量资源，由于outer join（包括left join，right join）不同于inner join，从语义上不能交换左右表顺序，因此在右表数据量大的场景下，会出现执行慢、内存资源消耗大的情况，在极端场景下（右表数据量很大）还会影响集群的性能，或执行时直接报错Out of Memory Pool size pre cal。此时，可以使用本章节提供的优化方法来减少资源消耗。

使用场景

通过修改SQL语句或者加Hint的方式，可以将left join调整为right join，原left join中的左表会变为右表来构建哈希表。这时如果右表过大也会对性能有影响，因此，建议在left join左表较小，右表较大的场景下进行优化。

较小、很大的概念是相对的，和关联列、集群资源等都有关系。在实践中，我们可以通过Explain analyze查看执行计划的相关参数，通过关注PeakMemory、WallTime等参数的变化来判断是否应该使用right join。

使用方法

通常有以下两种方法可以把left join调整为right join：

直接修改SQL，例如将a left join b on a.col1 = b.col2改为b right join a on a.col1 = b.col2。
通过加hint指定优化器根据资源损耗把left join转为right join。这种用法中，优化器会根据左右表的估算大小来决定是否把left join转为right join。使用方法如下：
- 3.1.8及以上内核版本的集群默认开启该特性。如关闭了该特性，可在SQL最前面加上hint：/*+O_CBO_RULE_SWAP_OUTER_JOIN=true*/手动开启该特性。
- 3.1.8以下内核版本的集群默认关闭该特性。可在SQL最前面加上hint：/*+LEFT_TO_RIGHT_ENABLED=true*/开启该特性。

说明如何查看集群内核版本，请参见如何查看实例版本信息。如需升级内核版本，请联系技术支持。

示例

如下示例中，nation是一个25行的小表，customer是一个15000000行的大表，通过explain analyze查看一条包含left join的SQL的执行计划。

explain analyze
SELECT
  COUNT(*)
FROM
  nation t1
  left JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey

可以看到，进行join计算的stage2的计划如下。其中，Left Join这个算子中包含如下信息：

PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%)：PeakMemory的占比高达93.68%，可以判断left join为整个SQL的性能瓶颈。
Left (probe) Input avg.: 0.52 rows；Right (build) Input avg.: 312500.00 rows：即右表为大表，左表为小表。

这种场景下，我们可以将left join转为right join，来优化这条SQL语句。

Fragment 2 [HASH]
    Output: 48 rows (432B), PeakMemory: 516MB, WallTime: 6.52us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 96 rows (864B), PeakMemory: 96B (0.00%), WallTime: 88.21ms (0.88%)
    │   count_2 := count(*)
    └─ LEFT Join[(`n_nationkey` = `c_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 30000000 rows (200.27MB), PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%)
       │   Left (probe) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Right (build) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [n_nationkey:integer, $hashvalue:bigint]
       │      Estimates: 
       │      Output: 25 rows (350B), PeakMemory: 64KB (0.01%), WallTime: 63.63us (0.00%)
       │      Input avg.: 0.52 rows, Input std.dev.: 379.96%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("c_nationkey")
          │   Outputs: [c_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 15000000 (57.22MB)}
          │   Output: 30000000 rows (400.54MB), PeakMemory: 10MB (1.84%), WallTime: 1.81s (17.93%)
          └─ RemoteSource[4]
                 Outputs: [c_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates: 
                 Output: 15000000 rows (200.27MB), PeakMemory: 3MB (0.67%), WallTime: 191.32ms (1.90%)
                 Input avg.: 312500.00 rows, Input std.dev.: 380.00%

通过修改SQL的方式实现left join to right join：

SELECT
  COUNT(*)
FROM
  customer t2
  right JOIN nation t1 ON t1.n_nationkey = t2.c_nationkey

通过加Hint的方式实现left join to right join：

3.1.8及以上内核版本的集群执行以下语句开启该特性：

/*+O_CBO_RULE_SWAP_OUTER_JOIN=true*/
SELECT
  COUNT(*)
FROM
  nation t1
  left JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey

3.1.8以下内核版本的集群执行以下语句开启该特性：

/*+LEFT_TO_RIGHT_ENABLED=true*/
SELECT
  COUNT(*)
FROM
  nation t1
  left JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey

上述任意一种SQL，执行explain analyze后可以看到，在执行计划中，left Join变为了right Join，可以判断Hint是生效的。并且调整后PeakMemory的值为889 KB (3.31%)，从515 MB下降到889 KB，已经不是计算热点。

Fragment 2 [HASH]
    Output: 96 rows (864B), PeakMemory: 12MB, WallTime: 4.27us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 192 rows (1.69kB), PeakMemory: 456B (0.00%), WallTime: 5.31ms (0.08%)
    │   count_2 := count(*)
    └─ RIGHT Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 15000025 rows (350B), PeakMemory: 889KB (3.31%), WallTime: 3.15s (48.66%)
       │   Left (probe) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Right (build) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [c_nationkey:integer, $hashvalue:bigint]
       │      Estimates: 
       │      Output: 15000000 rows (200.27MB), PeakMemory: 3MB (15.07%), WallTime: 634.81ms (9.81%)
       │      Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("n_nationkey")
          │   Outputs: [n_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 25 (100B)}
          │   Output: 50 rows (700B), PeakMemory: 461KB (1.71%), WallTime: 942.37us (0.01%)
          └─ RemoteSource[4]
                 Outputs: [n_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates: 
                 Output: 25 rows (350B), PeakMemory: 64KB (0.24%), WallTime: 76.34us (0.00%)
                 Input avg.: 0.52 rows, Input std.dev.: 379.96%