在实际开发过程中,通常需要分析查询语句或表结构来分析性能瓶颈,MaxCompute SQL为您提供explain
语句实现此功能。本文为您介绍explain
的功能、命令格式及使用示例。
功能介绍
explain
语句可以显示MaxCompute SQL对应的DML语句执行计划(执行SQL语义的程序)的结构,帮助您了解SQL语句的处理过程,为优化SQL语句提供帮助。一个查询语句作业会对应多个Job,一个Job对应多个Task。
说明 如果查询语句足够复杂,
explain
的结果较多,则会触发API的限制,无法得到完整的EXPLAIN结果。此时您可以拆分查询语句,对各部分分别执行explain
语句,以了解Job的结构。命令格式
explain <dml query>;
dml query:必填。select
语句,更多信息请参见SELECT语法。
返回说明
explain
的执行结果包含如下信息: - Job间的依赖关系
例如
job0 is root job
。如果查询只需要一个Job(job0
),只会显示一行信息。 - Task间的依赖关系
In Job job0: root Tasks: M1_Stg1, M2_Stg1 J3_1_2_Stg1 depends on: M1_Stg1, M2_Stg1
job0
包含三个Task,M1_Stg1
、M2_Stg1
和J3_1_2_Stg1
。系统会先执行M1_Stg1
和M2_Stg1
两个Task,执行完成后,再执行J3_1_2_Stg1
。Task的命名规则如下:- 在MaxCompute中,共有四种Task类型:MapTask、ReduceTask、JoinTask和LocalWork。Task名称的第一个字母表示了当前Task的类型,例如
M2Stg1
就是一个MapTask。 - 紧跟着第一个字母后的数字,代表了当前Task的ID。这个ID在当前查询对应的所有Task中是唯一的。
- 用下划线(_)分隔的数字代表当前Task的直接依赖,例如
J3_1_2_Stg1
表示当前Task ID为3,依赖ID为1(M1_Stg1)和ID为2(M2_Stg1)的两个Task。
- 在MaxCompute中,共有四种Task类型:MapTask、ReduceTask、JoinTask和LocalWork。Task名称的第一个字母表示了当前Task的类型,例如
- Task中所有Operator的依赖结构。Operator串描述了一个Task的执行语义。结构示例如下:
In Task M2: Data source: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china # "Data source"描述了当前Task的输入内容。 TS: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china # TableScanOperator FIL: ISNOTNULL(customer_id) # FilterOperator RS: order: + # ReduceSinkOperator nullDirection: * optimizeOrderBy: False valueDestLimit: 0 dist: HASH keys: customer_id values: customer_id (string) total_price (double) partitions: customer_id In Task J3_1_2: JOIN: # JoinOperator StreamLineRead1 INNERJOIN StreamLineRead2 keys: 0:customer_id 1:customer_id AGGREGATE: group by:customer_id # GroupByOperator UDAF: SUM(total_price) (__agg_0_sum)[Complete],SUM(total_price) (__agg_1_sum)[Complete] RS: order: + nullDirection: * optimizeOrderBy: True valueDestLimit: 10 dist: HASH keys: customer_id values: customer_id (string) __agg_0 (double) __agg_1 (double) partitions: In Task R4_3: SEL: customer_id,__agg_0,__agg_1 # SelectOperator LIM:limit 10 # LimitOperator FS: output: Screen # FileSinkOperator schema: customer_id (string) AS ashop __agg_0 (double) AS ap __agg_1 (double) AS bp
各Operator的含义如下:- TableScanOperator(TS):描述查询语句中的
from
语句块的逻辑。explain
结果中会显示输入表的名称(Alias)。 - SelectOperator(SEL):描述查询语句中的
select
语句块的逻辑。explain
结果中会显示向下一个Operator传递的列,多个列由逗号分隔。- 如果是列的引用,则显示为
<alias>.<column_name>
。 - 如果是表达式的结果,则显示为函数形式,例如
func1(arg1_1, arg1_2, func2(arg2_1, arg2_2))
。 - 如果是常量,则直接显示常量值。
- 如果是列的引用,则显示为
- FilterOperator(FIL):描述查询语句中的
where
语句块的逻辑。explain
结果中会显示一个where
条件表达式,形式类似SelectOperator的显示规则。 - JoinOperator(JOIN):描述查询语句中的
join
语句块的逻辑。explain
结果中会显示哪些表以哪种方式Join在一起。 - GroupByOperator(例如AGGREGATE):描述聚合操作的逻辑。如果查询中使用了聚合函数,就会出现该结构,
explain
结果中会显示聚合函数的内容。 - ReduceSinkOperator(RS):描述Task间数据分发操作的逻辑。如果当前Task的结果会传递给另一个Task,则必然需要在当前Task的最后,使用ReduceSinkOperator执行数据分发操作。
explain
的结果中会显示输出结果的排序方式、分发的Key、Value以及用来求Hash值的列。 - FileSinkOperator(FS):描述最终数据的存储操作。如果查询中有
insert
语句块,explain
结果中会显示目标表名称。 - LimitOperator(LIM):描述查询语句中的
limit
语句块的逻辑。explain
结果中会显示limit
数。 - MapjoinOperator(HASHJOIN):类似JoinOperator,描述大表的
join
操作。
- TableScanOperator(TS):描述查询语句中的
示例数据
为便于理解,本文为您提供源数据,基于源数据提供相关示例。创建表sale_detail和sale_detail_jt,并添加数据,命令示例如下:
--创建分区表sale_detail和sale_detail_jt。
create table if not exists sale_detail
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
--向源表增加分区。
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');
--向源表追加数据。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);
查询表sale_detail和sale_detail_jt中的数据,命令示例如下:
select * from sale_detail;
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_jt;
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s5 | c2 | 100.2 | 2013 | china |
+------------+-------------+-------------+------------+------------+
--创建做关联的表。
create table shop as select shop_name, customer_id, total_price from sale_detail;
使用示例
基于示例数据,执行命令如下。
--查询语句。
select a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp
from (select * from sale_detail_jt where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.customer_id=b.customer_id
group by a.customer_id
order by a.customer_id
limit 10;
--获取查询语句语义。
explain
select a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp
from (select * from sale_detail_jt where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.customer_id=b.customer_id
group by a.customer_id
order by a.customer_id
limit 10;
返回结果如下。job0 is root job
In Job job0:
root Tasks: M1
In Task M1_U0:
TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
FIL: ISNOTNULL(customer_id)
HASHJOIN:
Filter1 INNERJOIN Filter2
keys:
0:customer_id
1:customer_id
non-equals:
0:
1:
bigTable: Filter1
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp
In Task M1_U1:
TS: doc_test_dev.sale_detail/sale_date=2013/region=china
FIL: ISNOTNULL(customer_id)
HASHJOIN:
Filter1 INNERJOIN Filter2
keys:
0:customer_id
1:customer_id
non-equals:
0:
1:
bigTable: Filter1
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp
基于示例数据,执行命令如下。
--查询语句。
select /*+ mapjoin(a) */
a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp
from (select * from sale_detail_jt
where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.total_price<b.total_price
group by a.customer_id
order by a.customer_id
limit 10;
--获取查询语句语义。
explain
select /*+ mapjoin(a) */
a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp
from (select * from sale_detail_jt
where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.total_price<b.total_price
group by a.customer_id
order by a.customer_id
limit 10;
返回结果如下。job0 is root job
In Job job0:
root Tasks: M1
In Task M1_U0:
TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
HASHJOIN:
TableScan1 INNERJOIN TableScan2
keys:
0:
1:
non-equals:
0:
1:
bigTable: TableScan2
FIL: LT(total_price,total_price)
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp
In Task M1_U1:
TS: doc_test_dev.sale_detail/sale_date=2013/region=china
HASHJOIN:
TableScan1 INNERJOIN TableScan2
keys:
0:
1:
non-equals:
0:
1:
bigTable: TableScan2
FIL: LT(total_price,total_price)
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp