MaxCompute SQL示例解析

本文为您介绍MaxCompute SQL常见使用场景,让您快速掌握SQL的写法。

准备数据集

本文以emp表和dept表为示例数据集。您可以自行在MaxCompute项目上创建表并上传数据。具体如下:

  1. 创建表。

    • 创建emp表

      CREATE TABLE IF NOT EXISTS emp (
        EMPNO STRING,
        ENAME STRING,
        JOB STRING,
        MGR BIGINT,
        HIREDATE DATETIME,
        SAL DOUBLE,
        COMM DOUBLE,
        DEPTNO BIGINT);
    • 创建dept表

      CREATE TABLE IF NOT EXISTS dept (
        DEPTNO BIGINT,
        DNAME STRING,
        LOC STRING);
  2. 在本地新建emp.txtdept.txt文件,文件内容如下。

    emp.txt

    7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
    7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
    7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
    7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
    7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
    7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
    7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
    7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
    7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
    7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
    7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
    7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
    7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
    7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10

    dept.txt

    10,ACCOUNTING,NEW YORK
    20,RESEARCH,DALLAS
    30,SALES,CHICAGO
    40,OPERATIONS,BOSTON
  3. 进行数据导入,详情请参见数据传输服务(上传)场景与工具

SQL示例

  • 示例1:查询员工人数大于零的所有部门。

    为了避免数据量太大,此场景下建议您使用JOIN子句。

    SELECT d.* 
    FROM dept d 
    JOIN (
        SELECT DISTINCT deptno AS no
        FROM emp
    ) e 
    ON d.deptno = e.no;

    返回结果:

    +------------+------------+------------+
    | deptno     | dname      | loc        |
    +------------+------------+------------+
    | 10         | ACCOUNTING | NEW YORK   |
    | 20         | RESEARCH   | DALLAS     |
    | 30         | SALES      | CHICAGO    |
    +------------+------------+------------+
  • 示例2:查询薪金比SMITH高的所有员工。

    此场景为MAPJOIN的典型场景。

    SELECT /*+ MapJoin(a) */ e.empno
        , e.ename
        , e.sal 
    FROM emp e 
    JOIN (
        SELECT MAX(sal) AS sal
        FROM `emp`
        WHERE `ENAME` = 'SMITH'
    ) a 
    ON e.sal > a.sal;

    返回结果:

    +------------+------------+------------+
    | empno      | ename      | sal        |
    +------------+------------+------------+
    | 7499       | ALLEN      | 1600.0     |
    | 7521       | WARD       | 1250.0     |
    | 7566       | JONES      | 2975.0     |
    | 7654       | MARTIN     | 1250.0     |
    | 7698       | BLAKE      | 2850.0     |
    | 7782       | CLARK      | 2450.0     |
    | 7788       | SCOTT      | 3000.0     |
    | 7839       | KING       | 5000.0     |
    | 7844       | TURNER     | 1500.0     |
    | 7876       | ADAMS      | 1100.0     |
    | 7900       | JAMES      | 950.0      |
    | 7902       | FORD       | 3000.0     |
    | 7934       | MILLER     | 1300.0     |
    +------------+------------+------------+
  • 示例3:查询所有员工的姓名及其直接上级的姓名。

    此场景为等值连接。

    SELECT a.ename
        , b.ename 
    FROM emp a 
    LEFT OUTER JOIN emp b 
    ON b.empno = a.mgr;

    返回结果:

    +------------+------------+
    | ename      | ename2     |
    +------------+------------+
    | SMITH      | FORD       |
    | ALLEN      | BLAKE      |
    | WARD       | BLAKE      |
    | JONES      | KING       |
    | MARTIN     | BLAKE      |
    | BLAKE      | KING       |
    | CLARK      | KING       |
    | SCOTT      | JONES      |
    | KING       | NULL       |
    | TURNER     | BLAKE      |
    | ADAMS      | SCOTT      |
    | JAMES      | BLAKE      |
    | FORD       | JONES      |
    | MILLER     | CLARK      |
    +------------+------------+
  • 示例4:查询基本薪金大于1500的所有工作。

    此场景下需要使用HAVING子句。

    SELECT emp.`JOB`
        , MIN(emp.sal) AS sal 
    FROM `emp`
    GROUP BY emp.`JOB`
    HAVING MIN(emp.sal) > 1500;

    返回结果:

    +------------+------------+
    | job        | sal        |
    +------------+------------+
    | MANAGER    | 2450.0     |
    | ANALYST    | 3000.0     |
    | PRESIDENT  | 5000.0     |
    +------------+------------+
  • 示例5:查询在每个部门工作的员工数量、平均工资和平均服务期限。

    此场景为使用内建函数的典型场景。

    SELECT COUNT(empno) AS cnt_emp
        , ROUND(AVG(sal), 2) AS avg_sal
        , ROUND(AVG(datediff(getdate(), hiredate, 'dd')), 2) AS avg_hire 
    FROM `emp`
    GROUP BY `DEPTNO`;

    返回结果:

    +------------+------------+------------+
    | cnt_emp    | avg_sal    | avg_hire   |
    +------------+------------+------------+
    | 5          | 2175.0     | 14886.2    |
    | 6          | 1566.67    | 15715.33   |
    | 3          | 2916.67    | 15606.33   |
    +------------+------------+------------+
  • 示例6:查询每个部门的薪水前3名的人员的姓名以及其排序。

    此场景为典型的Top N场景。

    SELECT * 
    FROM (
      SELECT deptno
        , ename
        , sal
        , ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums 
      FROM emp 
    ) emp1 
    WHERE emp1.nums < 4;

    返回结果:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | KING       | 5000.0     | 1          |
    | 10         | CLARK      | 2450.0     | 2          |
    | 10         | MILLER     | 1300.0     | 3          |
    | 20         | SCOTT      | 3000.0     | 1          |
    | 20         | FORD       | 3000.0     | 2          |
    | 20         | JONES      | 2975.0     | 3          |
    | 30         | BLAKE      | 2850.0     | 1          |
    | 30         | ALLEN      | 1600.0     | 2          |
    | 30         | TURNER     | 1500.0     | 3          |
    +------------+------------+------------+------------+
  • 示例7:查询每个部门的人数以及该部门中办事员(CLERK)人数的占比。

    SELECT deptno
        , COUNT(empno) AS cnt
        , ROUND(SUM(CASE 
          WHEN job = 'CLERK' THEN 1
          ELSE 0
        END) / COUNT(empno), 2) AS rate 
    FROM `EMP`
    GROUP BY deptno;

    返回结果:

    +------------+------------+------------+
    | deptno     | cnt        | rate       |
    +------------+------------+------------+
    | 20         | 5          | 0.4        |
    | 30         | 6          | 0.17       |
    | 10         | 3          | 0.33       |
    +------------+------------+------------+

注意事项

  • 使用GROUP BY时,SELECT部分必须是分组项或聚合函数。

  • ORDER BY后面必须加LIMIT N。

  • SELECT表达式中不能用子查询,可以改写为JOIN。

  • JOIN不支持笛卡尔积,可以使用MAPJOIN替代。

  • UNION All需要改成子查询的格式。

  • IN/NOT IN语句对应的子查询只能有一列,而且返回的行数不能超过1000,否则也需要改成JOIN操作执行。