全部产品

APPROX_COUNT_DISTINCT

更新时间:2020-03-23 20:04:51

APPROX_COUNT_DISTINCT 函数是聚合函数,它对某一列去重后的行数进行计算,结果只能返回一个值,且该值是近似值,该函数可以进一步用于计算被引用的列的选择性。

与函数 COUNT(DISTINCT x) 相比,APPROX_COUNT_DISTINCT 返回的是近似值,所以计算速度极快。在处理大量级的数据时 COUNT(DISTINCT x) 经常要花费很长的时间,使用 APPROX_COUNT_DISTINCT 牺牲了少量的精确度,却换来了计算效率的极大提升。

语法

  1. APPROX_COUNT_DISTINCT(expr)

参数

参数 说明
expr 是数值类型(NUMBERFLOATBINARY_FLOATBINARY_DOUBLE)或者可以转换成数值类型的表达式。

返回类型

返回类型与参数 expr 的数据类型相同。

示例

以下语句创建了表 employees,并向里面插入数据:

  1. CREATE TABLE employees (manager_id INT,last_name varchar(50),hiredate varchar(50),SALARY INT);
  2. INSERT INTO employees VALUES(100, 'Raphaely', '2017-07-01', 1700);
  3. INSERT INTO employees VALUES(100, 'De Haan', '2018-05-01',11000);
  4. INSERT INTO employees VALUES(100, 'Errazuriz', '2017-07-21', 1400);
  5. INSERT INTO employees VALUES(100, 'Hartstein', '2019-05-01',14000);
  6. INSERT INTO employees VALUES(100, 'Raphaely', '2017-07-22', 1700);
  7. INSERT INTO employees VALUES(100, 'Weiss', '2019-07-11',13500);
  8. INSERT INTO employees VALUES(100, 'Russell', '2019-10-05', 13000);
  9. INSERT INTO employees VALUES(100, 'Partners', '2018-12-01',14000);
  10. INSERT INTO employees VALUES(200, 'Ross', '2019-06-11',13500);
  11. INSERT INTO employees VALUES(200, 'Bell', '2019-05-25', 13000);
  12. INSERT INTO employees VALUES(200, 'Part', '2018-08-11',14000);
  13. COMMIT;

执行以下语句:

  1. SELECT last_name, salary, APPROX_COUNT_DISTINCT(salary) OVER (ORDER BY hiredate) "Variance"
  2. FROM employees WHERE manager_id = 100 ORDER BY last_name, salary, "Variance";

查询结果如下:

  1. +-----------+--------+----------+
  2. | LAST_NAME | SALARY | Variance |
  3. +-----------+--------+----------+
  4. | De Haan | 11000 | 3 |
  5. | Errazuriz | 1400 | 2 |
  6. | Hartstein | 14000 | 4 |
  7. | Partners | 14000 | 4 |
  8. | Raphaely | 1700 | 1 |
  9. | Raphaely | 1700 | 2 |
  10. | Russell | 13000 | 6 |
  11. | Weiss | 13500 | 5 |
  12. +-----------+--------+----------+