对于按照select
语法格式书写的select
语句,它的逻辑执行顺序与标准的书写语序并不相同。本文为您介绍select
语句中操作的执行语序并提供使用示例。
执行语序
在SELECT语法中,涉及的操作主要包括:
select
from
where
group by
having
order by
distribute by
sort by
limit
基于order by
不和distribute by
、sort by
同时使用,group by
也不和distribute by
、sort by
同时使用的限制,常见select
语句的执行顺序如下:
场景1:
from
->where
->group by
->having
->select
->order by
->limit
场景2:
from
->where
->select
->distribute by
->sort by
为避免混淆,MaxCompute支持以执行顺序书写查询语句,语法结构可改为如下形式:
from <table_reference>
[where <where_condition>]
[group by <col_list>]
[having <having_condition>]
select [all | distinct] <select_expr>, <select_expr>, ...
[order by <order_condition>]
[distribute by <distribute_condition> [sort by <sort_condition>] ]
[limit <number>]
示例数据
为便于理解,本文为您提供源数据,基于源数据提供相关示例。创建表sale_detail,并添加数据,命令示例如下:
--创建一张分区表sale_detail。
create table if not exists sale_detail
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
--向源表增加分区。
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
--向源表追加数据。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
查询分区表sale_detail中的数据,命令示例如下:
set odps.sql.allow.fullscan=true;
select * from sale_detail;
--返回结果。
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
使用示例
示例1,符合场景1的命令示例如下:
说明使用以下命令查询分区表时,您需要在命令前添加
set odps.sql.allow.fullscan=true;
打开全表扫描或者在命令语句中指定分区。--按照select语法书写。 set odps.sql.allow.fullscan=true; select region,max(total_price) from sale_detail where total_price > 100 group by region having sum(total_price)>300.5 order by region limit 5; --按照执行顺序书写。与上一写法等效。 from sale_detail where total_price > 100 group by region having sum(total_price)>300.5 select region,max(total_price) order by region limit 5;
返回结果如下:
+------------+------------+ | region | _c1 | +------------+------------+ | china | 100.3 | +------------+------------+
该命令的执行逻辑如下:
从sale_detail表(
from sale_detail
)中取出满足where total_price > 100
条件的数据。对于a中得到的结果按照region进行分组(
group by region
)。对于b中得到的结果筛选分组中满足total_price之和大于305的数据(
having sum(total_price)>305
)。对于c中得到的结果
select region,max(total_price)
。对于d中得到的结果按照region进行排序(
order by region
)。对于e中得到的结果仅显示前5条数据(
limit 5
)。
示例2,符合场景2的命令示例如下:
--按照select语法书写。 set odps.sql.allow.fullscan=true; select shop_name ,total_price ,region from sale_detail where total_price > 100.2 distribute by region sort by total_price; --按照执行顺序书写。与上一写法等效。 from sale_detail where total_price > 100.2 select shop_name ,total_price ,region distribute by region sort by total_price;
返回结果如下:
+------------+-------------+------------+ | shop_name | total_price | region | +------------+-------------+------------+ | s3 | 100.3 | china | | s6 | 100.4 | shanghai | | s7 | 100.5 | shanghai | +------------+-------------+------------+
该命令的执行逻辑如下:
从sale_detail表(
from sale_detail
)中取出满足where total_price > 100.2
条件的数据。对于a中得到的结果
select shop_name, total_price, region
。对于b中得到的结果按照region进行哈希分片(
distribute by region
)。对于c中得到的结果按照total_price进行升序排列(
sort by total_price
)。