收集和使用表中列的Histogram（直方图）统计信息_云原生大数据计算服务 MaxCompute(MaxCompute)-阿里云帮助中心

MaxCompute优化器支持表中列的Histogram（直方图）统计信息，Histogram用于描述表中的列值在不同值域区间内的分布情况，提供更细粒度的统计值估计能力，可以为优化查询性能提供帮助。

Histogram介绍

Histogram（直方图）是一种列级统计值，用于描述列中的值在不同值域区间上的分布情况。与其他列级统计值相比，Histogram可以提供更细粒度的统计值估计能力。

Histogram包含了许多不相交的桶（Bucket）。一个桶对应一个值域区间内的统计指标。桶由最小值（min）、最大值（max）、不重复值个数（NDV）、记录数（cnt）等统计指标组成。

注意事项

仅支持对整数数值类型和浮点数数值类型的列收集Histogram。
仅支持通过ANALYZE命令收集Histogram，不支持运行时自动收集。
一次最多支持收集30列的Histogram。
ANALYZE命令当前不支持收集空分区或空表的Stats（包含Histogram）。
收集Histogram后，如果有新数据写入，Histogram会失效。
重要
收集Histogram会产生一定的计算成本，因此当Histogram失效后，您可以根据需要决定是否重新收集Histogram。
当前仅支持对单分区或单表基于Histogram进行基数估计，多分区Bucket合并暂不支持。

收集Histogram

命令语法

ANALYZE TABLE <tablename> compute statistics FOR columns [(...)] [[WITH histogram [256 buckets]] [columns (...)]];

tablename：需要收集Histogram的目标表名。

使用场景

自动收集所有列的Histogram，命令如下：
```
ANALYZE TABLE <tablename> compute statistics FOR columns WITH histogram;
```
- 为避免误用计算资源（Histogram的收集代价大于其他Stats），当实际可收集的列数超过系统默认值（10）时，会出现类似如下报错：
```
Analyze histogram column number exceeds auto columns limit, please specify column names for analyze(xxx) or set unlimited auto columns(xxx)
```
  您可通过设置如下Flag参数，解除自动收集Histogram的列数量限制，解除后，系统会根据表中的实际可收集列数自动收集Histogram，但一次性最多只能收集30列。
```
set odps.sql.analyze.histogram.auto.column.num = -1;
```
- 仅收集整数数值类型和浮点数数值类型的列的Histogram，忽略不支持的列类型。
自动收集所有列的Histogram，同时可指定Bucket数量，命令如下：
```
ANALYZE TABLE <tablename> compute statistics FOR columns WITH histogram 256 buckets;
```
当前默认的Bucket数量为256，最大可调整至1024。
手动指定收集某些列的统计值，同时收集对应列的Histogram，命令示例如下：
```
ANALYZE TABLE <tablename> compute statistics FOR columns(col1, col2) WITH histogram;
```
手动指定收集某些列的统计值，同时手动指定收集Histogram的列，命令示例如下：
```
ANALYZE TABLE <tablename> compute statistics FOR columns(col1, col2) WITH histogram columns (col1);
```
重要
当手动指定收集Histogram的列时，其列名必须为前面columns列表中的一个子集。例如：上述命令，with histogram columns (col1)中的col1为前面（col1, col2）的子集。
手动指定收集某些列的统计值，同时手动指定收集Histogram的列，并指定Bucket数量，命令示例如下：
```
ANALYZE TABLE <tablename> compute statistics FOR columns(col1, col2) WITH histogram 256 buckets columns (col1);
```

使用Histogram

展示Histogram统计值。
```
show statistic <tablename> columns;
```
查询优化启用Histogram，命令如下，设置后优化器会基于Histogram进行基数估计。
```
set odps.sql.optimizer.histogram.enable=true;
```