统计聚合函数类型与使用方法-表格存储-阿里云

通过统计聚合接口可以实现求最小值、求最大值、求和、求平均值、统计行数、去重统计行数、按字段值分组、按范围分组、按地理位置分组、按过滤条件分组、直方图统计、日期直方图统计、嵌套功能；同时支持多个统计聚合功能组合使用，满足复杂的查询需求。

背景信息

统计聚合的详细功能请参见下表。

功能	说明
最小值	返回一个字段中的最小值，类似于SQL中的min。
最大值	返回一个字段中的最大值，类似于SQL中的max。
和	返回数值字段的总数，类似于SQL中的sum。
平均值	返回数值字段的平均值，类似于SQL中的avg。
统计行数	返回指定字段值的数量或者多元索引数据总行数，类似于SQL中的count。
去重统计行数	返回指定字段不同值的数量，类似于SQL中的count（distinct）。
字段值分组	根据一个字段的值对查询结果进行分组，相同的字段值放到同一分组内，返回每个分组的值和该值对应的个数。说明当分组较大时，按字段值分组可能会存在误差。
多字段分组	根据多个字段对查询结果进行分组，支持使用token进行翻页。
范围分组	根据一个字段的范围对查询结果进行分组，字段值在某范围内放到同一分组内，返回每个范围中相应的item个数。
地理位置分组	根据距离某一个中心点的范围对查询结果进行分组，距离差值在某范围内放到同一分组内，返回每个范围中相应的item个数。
过滤条件分组	按照过滤条件对查询结果进行分组，获取每个过滤条件匹配到的数量，返回结果的顺序和添加过滤条件的顺序一致。
直方图统计	按照指定数据间隔对查询结果进行分组，字段值在相同范围内放到同一分组内，返回每个分组的值和该值对应的个数。
日期直方图统计	对日期字段类型的数据按照指定间隔对查询结果进行分组，字段值在相同范围内放到同一分组内，返回每个分组的值和该值对应的个数。
嵌套	分组类型的统计聚合功能支持嵌套，其内部可以添加子统计聚合。
多个统计聚合	多个统计聚合功能可以组合使用。说明当多个统计聚合的复杂度较高时可能会影响响应速度。

前提条件

已初始化Client。具体操作，请参见初始化OTSClient。
已创建数据表并写入数据。具体操作，请参见创建数据表和写入数据。
已在数据表上创建多元索引。具体操作，请参见创建多元索引。

最小值

返回一个字段中的最小值，类似于SQL中的min。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long、Double和Date类型。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置Missing值，则在统计聚合时会忽略该行。如果设置了Missing值，则使用Missing值作为字段值的默认值参与统计聚合。

示例

/**
 *  商品库中有每一种商品的价格，求产地为浙江省的商品中，价格最低的商品价格是多少。
 *  等效的SQL语句是SELECT min(column_price) FROM product where place_of_production="浙江省";
 */
func min(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.TermQuery{"place_of_production", "浙江省"}).
            SetLimit(0).    //如果只关心统计聚合结果，不关心具体数据，您可以将limit设置为0来提高性能。
            Aggregation(search.NewMinAggregation("min_agg_1", "column_price").Missing(0.00)))

    searchResponse, err := client.Search(searchRequest)    //执行查询。
    aggResults := searchResponse.AggregationResults        //获取统计聚合结果。
    agg1, err := aggResults.Min("min_agg_1")        //获取名称为"min_agg_1"的统计聚合结果。
    if err != nil {
        panic(err)
    }
    if agg1.HasValue() {        //名称为"min_agg_1"的统计聚合结果是否有Value值。
        fmt.Println(agg1.Value)    //打印统计聚合结果。
    }
}

最大值

返回一个字段中的最大值，类似于SQL中的max。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long、Double和Date类型。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置Missing值，则在统计聚合时会忽略该行。如果设置了Missing值，则使用Missing值作为字段值的默认值参与统计聚合。

示例

/**
 * 商品库中有每一种商品的价格，求产地为浙江省的商品中，价格最高的商品价格是多少。
 * 等效的SQL语句是SELECT max(column_price) FROM product where place_of_production="浙江省"。
 */
func max(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.TermQuery{"place_of_production", "浙江省"}).
            SetLimit(0).    //如果只关心统计聚合结果，不关心具体数据，您可以将limit设置为0来提高性能。
            Aggregation(search.NewMaxAggregation("max_agg_1", "column_price").Missing(0.00)))

    searchResponse, err := client.Search(searchRequest)    //执行查询。
    aggResults := searchResponse.AggregationResults        //获取统计聚合结果。
    agg1, err := aggResults.Max("max_agg_1")        //获取名称为"max_agg_1"的统计聚合结果。
    if err != nil {
        panic(err)
    }
    if agg1.HasValue() {        //名称为"max_agg_1"的统计聚合结果是否有Value值。
        fmt.Println(agg1.Value)    //打印统计聚合结果。
    }
}

和

返回数值字段的总数，类似于SQL中的sum。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long和Double类型。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置Missing值，则在统计聚合时会忽略该行。如果设置了Missing值，则使用Missing值作为字段值的默认值参与统计聚合。

示例

/**
 * 商品库中有每一种商品的售出数量，求产地为浙江省的商品中，一共售出了多少件商品。如果某一件商品没有该值，默认售出了10件。
 * 等效的SQL语句是SELECT sum(column_price) FROM product where place_of_production="浙江省"。
 */
func sum(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.TermQuery{"place_of_production", "浙江省"}).
            SetLimit(0).    //如果只关心统计聚合结果，不关心具体数据，您可以将limit设置为0来提高性能。
            Aggregation(search.NewSumAggregation("sum_agg_1", "column_price").Missing(0.00)))

    searchResponse, err := client.Search(searchRequest)    //执行查询。
    aggResults := searchResponse.AggregationResults        //获取统计聚合结果。
    agg1, err := aggResults.Sum("sum_agg_1")        //获取名称为"sum_agg_1"的统计聚合结果。
    if err != nil {
        panic(err)
    }
    fmt.Println(agg1.Value)    //打印统计聚合结果。
}

平均值

返回数值字段的平均值，类似于SQL中的avg。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long、Double和Date类型。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置Missing值，则在统计聚合时会忽略该行。如果设置了Missing值，则使用Missing值作为字段值的默认值参与统计聚合。

示例

/**
 * 商品库中有每一种商品的售出数量，求产地为浙江省的商品中，平均价格是多少。
 * 等效的SQL语句是SELECT avg(column_price) FROM product where place_of_production="浙江省"。
 */
func avg(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.TermQuery{"place_of_production", "浙江省"}).
            SetLimit(0).    //如果只关心统计聚合结果，不关心具体数据，您可以将limit设置为0来提高性能。
            Aggregation(search.NewAvgAggregation("avg_agg_1", "column_price").Missing(0.00)))

    searchResponse, err := client.Search(searchRequest)    //执行查询。
    aggResults := searchResponse.AggregationResults        //获取统计聚合结果。
    agg1, err := aggResults.Avg("avg_agg_1")        //获取名称为"avg_agg_1"的统计聚合结果。
    if err != nil {
        panic(err)
    }
    if agg1.HasValue() {        //名称为"agg1"的统计聚合结果是否有Value值。
        fmt.Println(agg1.Value)    //打印统计聚合结果。
    }
}

统计行数

返回指定字段值的数量或者多元索引数据总行数，类似于SQL中的count。

说明

通过如下方式可以统计多元索引数据总行数或者某个query匹配的行数。

使用统计聚合的count功能，在请求中设置count(*)。
使用query功能的匹配行数，在query中设置setGetTotalCount(true)；如果需要统计多元索引数据总行数，则使用MatchAllQuery。

如果需要获取多元索引数据某列出现的次数，则使用count（列名），可应用于稀疏列的场景。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long、Double、Boolean、Keyword、Geo_point和Date类型。

示例

/**
 * 商家库中有每一种商家的惩罚记录，求浙江省的商家中，有惩罚记录的一共有多少个商家。如果商家没有惩罚记录，则商家信息中不存在该字段。
 * 等效的SQL语句是SELECT count(column_history) FROM product where place_of_production="浙江省"。
 */
func count(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.TermQuery{"place_of_production", "浙江省"}).
            SetLimit(0).    //如果只关心统计聚合结果，不关心具体数据，您可以将limit设置为0来提高性能。
            Aggregation(search.NewCountAggregation("count_agg_1", "column_price")))

    searchResponse, err := client.Search(searchRequest)    //执行查询。
    aggResults := searchResponse.AggregationResults        //获取统计聚合结果。
    agg1, err := aggResults.Count("count_agg_1")        //获取名称为"count_agg_1"的统计聚合结果。
    if err != nil {
        panic(err)
    }
    fmt.Println(agg1.Value)    //打印统计聚合结果。
}

去重统计行数

返回指定字段不同值的数量，类似于SQL中的count（distinct）。

说明

去重统计行数的计算结果是个近似值。

当去重统计行数小于1万时，计算结果接近精确值。
当去重统计行数达到1亿时，计算结果的误差为2%左右。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long、Double、Boolean、Keyword、Geo_point和Date类型。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置Missing值，则在统计聚合时会忽略该行。如果设置了Missing值，则使用Missing值作为字段值的默认值参与统计聚合。

示例

/**
 * 求所有的商品，产地一共来自多少个省份。
 * 等效的SQL语句是SELECT count(distinct column_place) FROM product。
 */
func distinctCount(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.TermQuery{"place_of_production", "浙江省"}).
            SetLimit(0).    //如果只关心统计聚合结果，不关心具体数据，您可以将limit设置为0来提高性能。
            Aggregation(search.NewDistinctCountAggregation("distinct_count_agg_1", "column_price").Missing(0.00)))

    searchResponse, err := client.Search(searchRequest)    //执行查询。
    aggResults := searchResponse.AggregationResults        //获取统计聚合结果。
    agg1, err := aggResults.DistinctCount("distinct_count_agg_1")        //获取名称为"distinct_count_agg_1"的统计聚合结果。
    if err != nil {
        panic(err)
    }
    fmt.Println(agg1.Value)    //打印统计聚合结果。
}

字段值分组

根据一个字段的值对查询结果进行分组，相同的字段值放到同一分组内，返回每个分组的值和该值对应的个数。

说明

当分组较大时，按字段值分组可能会存在误差。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long、Double、Boolean、Keyword和Date类型。
Size	返回的分组数量，默认值为10。最大值为2000。当分组数量超过2000时，只会返回前2000个分组。
GroupBySorters	分组中的item排序规则，默认按照分组中item的数量降序排序，多个排序则按照添加的顺序进行排列。支持的排序类型如下：按照值的字典序升序排列按照值的字典序降序排列按照行数升序排列按照行数降序排列按照子统计聚合结果中值升序排列按照子统计聚合结果中值降序排列
SubAggregation和SubGroupBy	子统计聚合，子统计聚合会根据分组内容再进行一次统计聚合分析。场景统计每个类别的商品数量，且统计每个类别价格的最大值和最小值。方法最外层的统计聚合是根据类别进行分组，再添加两个子统计聚合求价格的最大值和最小值。结果示例水果：5个（其中价格的最大值为15，最小值为3）洗漱用品：10个（其中价格的最大值为98，最小值为1）电子设备：3个（其中价格的最大值为8699，最小值为2300）其它：15个（其中价格的最大值为1000，最小值为80）

示例

/**
 * 所有商品中每一个类别各有多少个，且统计每一个类别的价格最大值和最小值。
 * 返回结果举例："水果：5个（其中价格的最大值为15，最小值为3），洗漱用品：10个（其中价格的最大值为98，最小值为1），电子设备：3个（其中价格的最大值为8699，最小值为2300），
 * 其它：15个（其中价格的最大值为1000，最小值为80）"。
 */
func GroupByField(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            GroupBy(search.NewGroupByField("group1", "column_type").    //按商品类别的字段值进行分组。
                SubAggregation(search.NewMinAggregation("min_price", "column_price")).    //分组中最便宜的商品。
                SubAggregation(search.NewMaxAggregation("max_price", "column_price"))))    //分组中最贵的商品。

    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    groupByResults := searchResponse.GroupByResults    //获取统计聚合结果。
    group, err := groupByResults.GroupByField("group1")
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }

    for _, item := range group.Items {    //遍历返回的所有分组。
        //打印分组的值和分组中的记录行数。
        fmt.Println("\tkey: ", item.Key, ", rowCount: ", item.RowCount) 

        //打印价格的最小值。
        minPrice, _ := item.SubAggregations.Min("min_price")
        if minPrice.HasValue() {
            fmt.Println("\t\tmin_price: ", minPrice.Value)
        }

        //打印价格的最大值。
        maxPrice, _ := item.SubAggregations.Max("max_price")
        if maxPrice.HasValue() {
            fmt.Println("\t\tmax_price: ", maxPrice.Value)
        }
    }
}

多字段分组

根据多个字段对查询结果进行分组，支持使用token进行翻页。

说明

要实现多字段分组，您可以通过字段分组嵌套或者多字段分组方式实现。关于两种实现方式的功能对比，请参见附录：多字段分组不同实现方式对比。

参数

参数	说明
GroupByName	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
SourceGroupByList	多列分组聚合字段对象，最多支持设置32列。支持的分组类型如下：重要 Sources内分组类型的GroupBySorter仅支持groupKeySort，不支持subAggSort和rowCountSort。默认降序排序。当某一列字段值不存在时，系统会返回NULL。字段值分组GroupByField：仅支持设置GroupByName、FieldName和GroupBySorter参数。直方图统计GroupByHistogram：仅支持设置GroupByName、FieldName、Interval和GroupBySorter参数日期直方图统计GroupByDateHistogram：仅支持设置GroupByName、FieldName、Interval、TimeZone和GroupBySorter参数。
NextToken	进行翻页时用于继续获取分组的Token值。多列的字段值分组请求结果GroupByCompositeResult中会返回NextToken值，通过NextToken可翻页获取全量分组结果。
Size	返回分组的数量。当满足条件的分组数量超过Size限制时，您可以通过NextToken继续翻页获取分组。重要当要限制返回的分组数量时，不支持同时配置Size和SuggestedSize参数。一般情况下只需配置Size参数即可。如果是用于对接例如Spark、Presto等计算引擎的高吞吐场景，则只需配置SuggestedSize参数。
SuggestedSize	可设置为大于服务端最大限制的值或-1，服务端按实际能力返回实际行数。适用于对接例如Spark、Presto等计算引擎的高吞吐场景。当该值超过服务端最大值限制后会被修正为最大值。实际返回的分组结果数量为`min(suggestedSize, 服务端分组数量限制, 总分组数量)`。
SubAggList和SubGroupByList	子统计聚合，子统计聚合会根据分组内容再进行一次统计聚合分析。重要 GroupByComposite不支持作为SubGroupByList。

示例

/**
 * 组合类型分组聚合：根据传入的多个SourceGroupBy（支持groupbyField、groupByHistogram和groupByDataHistogram）进行分组聚合
 * 多列的聚合返回结果以扁平化结构返回。
 */
func GroupByComposite(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}
    searchRequest.
        SetTableName(tableName).
        SetIndexName(indexName).
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).
            SetLimit(0).
            GroupBy(search.NewGroupByComposite("groupByComposite").
            SourceGroupBys(search.NewGroupByField("groupByColLong", "Col_Long"),
                search.NewGroupByField("groupByColKeyword", "Col_Keyword"),
                search.NewGroupByField("groupByColDouble", "Col_Double")).
            SetSize(100)))
    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v\n", err)
        return
    }

    for {
        if searchResponse.GroupByResults.Empty() {
            return
        }

        groupByResults := searchResponse.GroupByResults
        group, err := groupByResults.GroupByComposite("groupByComposite")
        if err != nil {
            fmt.Printf("%#v\n", err)
            return
        }

        if group == nil {
            return
        }

        for _, item := range group.Items { //遍历返回的所有分组。
            //打印分组的值和分组中的记录行数。
            if item.Keys != nil {
                for i := 0; i < len(item.Keys); i++ {
                    if item.Keys[i] == nil {
                        fmt.Printf(" null ")
                    } else {
                        fmt.Printf(" %v ", *item.Keys[i])
                    }

                }
                fmt.Printf("\trowCount:%v\n", item.RowCount)
            }
        }

        if group.NextToken != nil {
            searchRequest.
                SetTableName(tableName).
                SetIndexName(indexName).
                SetSearchQuery(search.NewSearchQuery().
                    SetQuery(&search.MatchAllQuery{}).
                    SetLimit(0).
                    GroupBy(search.NewGroupByComposite("groupByComposite").
                        SourceGroupBys(search.NewGroupByField("groupByColLong", "Col_Long"),
                            search.NewGroupByField("groupByColKeyword", "Col_Keyword"),
                            search.NewGroupByField("groupByColDouble", "Col_Double")).
                        SetSize(100).
                        SetNextToken(group.NextToken)))

            searchResponse, err = client.Search(searchRequest)
            if err != nil {
                fmt.Printf("%#v\n", err)
                return
            }
        } else {
            break
        }
    }
}

范围分组

根据一个字段的范围对查询结果进行分组，字段值在某范围内放到同一分组内，返回每个范围中相应的item个数。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Long和Double类型。
Range(fromInclusive float64, toExclusive float64)	分组的范围。起始值fromInclusive可以使用最小值NegInf，结束值toExclusive可以使用最大值Inf。
SubAggregation和SubGroupBy	子统计聚合，子统计聚合会根据分组内容再进行一次统计聚合分析。例如按销量分组后再按省份分组，即可获得某个销量范围内哪个省比重比较大，实现方法是GroupByRange下添加一个GroupByField。

示例

/**
 * 求商品销量时按[NegInf，1000）、[1000，5000）、[5000，Inf）这些分组计算每个范围的销量。
 */
func GroupByRange(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            GroupBy(search.NewGroupByRange("group1", "column_number").
                Range(search.NegInf, 1000).
                Range(1000, 5000).
                Range(5000, search.Inf)))

    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    groupByResults := searchResponse.GroupByResults    //获取统计聚合结果。
    group, err := groupByResults.GroupByRange("group1")
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    for _, item := range group.Items {    //遍历返回的所有分组。
        fmt.Println("\t[", item.From, ", ", item.To, "), rowCount: ", item.RowCount)    //打印本次分组的行数。
    }
}

地理位置分组

根据距离某一个中心点的范围对查询结果进行分组，距离差值在某范围内放到同一分组内，返回每个范围中相应的item个数。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
FieldName	用于统计聚合的字段，仅支持Geo_point类型。
CenterPoint(latitude float64, longitude float64)	起始中心点的经纬度。 latitude是起始中心点坐标纬度，longitude是起始中心点坐标经度。
Range(fromInclusive float64, toExclusive float64)	分组的范围，单位为米。起始值fromInclusive可以使用最小值NegInf，结束值toExclusive可以使用最大值Inf。
SubAggregation和SubGroupBy	子统计聚合，子统计聚合会根据分组内容再进行一次统计聚合分析。

示例


/**
 * 求距离万达广场[NegInf，1000）、[1000，5000）、[5000，Inf）这些范围内的人数，单位为米。
 */
func GroupByGeoDistance(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            GroupBy(search.NewGroupByGeoDistance("group1", "Col_GeoPoint", search.GeoPoint{Lat: 30.137817, Lon:120.08681}).
                Range(search.NegInf, 1000).
                Range(1000, 5000).
                Range(5000, search.Inf)))

    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    groupByResults := searchResponse.GroupByResults    //获取统计聚合结果。
    group, err := groupByResults.GroupByGeoDistance("group1")
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    for _, item := range group.Items {    //遍历返回的所有分组。
        fmt.Println("\t[", item.From, ", ", item.To, "), rowCount: ", item.RowCount)    //打印本次分组的行数。
    }
}

过滤条件分组

按照过滤条件对查询结果进行分组，获取每个过滤条件匹配到的数量，返回结果的顺序和添加过滤条件的顺序一致。

参数

参数	说明
Name	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
Query	过滤条件，返回结果的顺序和添加过滤条件的顺序一致。
SubAggregation和SubGroupBy	子统计聚合，子统计聚合会根据分组内容再进行一次统计聚合分析。

示例

/**
 * 按照过滤条件进行分组，例如添加三个过滤条件（销量大于100、产地是浙江省、描述中包含杭州关键词），然后获取每个过滤条件匹配到的数量。
 */
func GroupByFilter(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            GroupBy(search.NewGroupByFilter("group1").
                Query(&search.RangeQuery{
                    FieldName: "number",
                    From: 100,
                    IncludeLower: true}).
                Query(&search.TermQuery{
                    FieldName: "place",
                    Term:      "浙江省",
                }).
                Query(&search.MatchQuery{
                    FieldName: "description",
                    Text: "杭州",
                })))

    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    groupByResults := searchResponse.GroupByResults    //获取统计聚合结果。
    group, err := groupByResults.GroupByFilter("group1")
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    for _, item := range group.Items {    //遍历返回的所有分组。
        fmt.Println("\trowCount: ", item.RowCount)    //打印本次分组的行数。
    }
}

直方图统计

按照指定数据间隔对查询结果进行分组，字段值在相同范围内放到同一分组内，返回每个分组的值和该值对应的个数。

参数

参数	说明
GroupByName	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
Field	用于统计聚合的字段，仅支持Long和Double类型。
Interval	统计间隔。
FieldRange[min,max]	统计范围，与Interval参数配合使用限制分组的数量。`(FieldRange.max-FieldRange.min)/Interval`的值不能超过2000。
MinDocCount	最小行数。当分组中的行数小于最小行数时，不会返回此分组的统计结果。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置missing值，则在统计聚合时会忽略该行。如果设置了missing值，则使用missing值作为字段值的默认值参与统计聚合。

示例

func GroupByHistogram(client *tablestore.TableStoreClient, tableName string, indexName string) {
	searchRequest := &tablestore.SearchRequest{}
	searchRequest.
		SetTableName(tableName). //设置数据表名称。
		SetIndexName(indexName). //设置多元索引名称。
		SetSearchQuery(search.NewSearchQuery().
			SetQuery(&search.MatchAllQuery{}). //匹配所有行。
			SetLimit(0).
			GroupBy(search.NewGroupByHistogram("group1", "field_name").
				SetMinDocCount(1).
				SetFiledRange(1, 100).
				SetMissing(3).
				SetInterval(10)))

	searchResponse, err := client.Search(searchRequest)
	if err != nil {
		fmt.Printf("%#v", err)
		return
	}
	groupByResults := searchResponse.GroupByResults //获取统计聚合结果。
	group, err := groupByResults.GroupByHistogram("group1")
	if err != nil {
		fmt.Printf("%#v", err)
		return
	}
	for _, item := range group.Items { //遍历返回的所有分组。
		fmt.Println("key:", item.Key.Value, ", rowCount:", item.Value)
	}
}

日期直方图统计

对日期字段类型的数据按照指定间隔对查询结果进行分组，字段值在相同范围内放到同一分组内，返回每个分组的值和该值对应的个数。

参数

参数	说明
GroupByName	自定义的统计聚合名称，用于区分不同的统计聚合，可根据此名称获取本次统计聚合结果。
Field	用于统计聚合的字段，仅支持Date类型。
Interval	统计间隔。
FieldRange[min,max]	统计范围，与Interval参数配合使用限制分组的数量。`(FieldRange.max-FieldRange.min)/Interval`的值不能超过2000。
MinDocCount	最小行数。当分组中的行数小于最小行数时，不会返回此分组的统计结果。
Missing	当某行数据中的字段为空时，字段值的默认值。如果未设置missing值，则在统计聚合时会忽略该行。如果设置了missing值，则使用missing值作为字段值的默认值参与统计聚合。
TimeZone	时区。格式为`+hh:mm`或者`-hh:mm`，例如`+08:00`、`-09:00`。只有当字段数据类型为Date时才需要配置。当Date类型字段的Format未设置时区信息时，可能会导致聚合结果存在N小时的偏移，此时请设置TimeZone来解决该问题。

示例

func GroupByDateHistogram(client *tablestore.TableStoreClient, tableName string, indexName string) {
	searchRequest := &tablestore.SearchRequest{}
	searchRequest.
		SetTableName(tableName). //设置数据表名称。
		SetIndexName(indexName). //设置多元索引名称。
		SetSearchQuery(search.NewSearchQuery().
			SetQuery(&search.MatchAllQuery{}). //匹配所有行。
			SetLimit(0).
			GroupBy(search.NewGroupByDateHistogram("date_group", "date_field_name").
				SetMinDocCount(1).
				SetFiledRange("2017-05-01 10:00", "2017-05-21 13:00:00").
				SetMissing("2017-05-01 13:01:00").
				SetInterval(model.DateTimeValue{
					Value: proto.Int32(1),
					Unit:  model.DateTimeUnit_DAY.Enum(),
				})))

	searchResponse, err := client.Search(searchRequest)
	if err != nil {
		fmt.Printf("%#v", err)
		return
	}
	groupByResults := searchResponse.GroupByResults //获取统计聚合结果。
	group, err := groupByResults.GroupByDateHistogram("date_group")
	if err != nil {
		fmt.Printf("%#v", err)
		return
	}

	for _, item := range group.Items { //遍历返回的所有分组。
		fmt.Printf("millisecondTimestamp:%d , rowCount:%d \n", item.Timestamp, item.RowCount)
	}
}

嵌套

分组类型的统计聚合功能支持嵌套，其内部可以添加子统计聚合。

主要用于在分组内再次进行统计聚合，以两层的嵌套为例：

GroupBy+SubGroupBy：按省份分组后再按照城市分组，获取每个省份下每个城市的数据。
GroupBy+SubAggregation：按照省份分组后再求某个指标的最大值，获取每个省的某个指标最大值。

说明

为了性能、复杂度等综合考虑，嵌套的层级只开放了一定的层数。更多信息，请参见多元索引限制。

示例

/**
 * 嵌套的统计聚合示例。
 * 外层GroupByField中添加了2个Aggregation和1个GroupByRange。
 */
func NestedSample(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            GroupBy(search.NewGroupByField("group1", "field1").
                SubAggregation(search.NewMinAggregation("sub_agg1", "sub_field1")).
                SubAggregation(search.NewMaxAggregation("sub_agg2", "sub_field2")).
                SubGroupBy(search.NewGroupByRange("sub_group1", "sub_field3").
                    Range(search.NegInf, 3).
                    Range(3, 5).
                    Range(5, search.Inf))))

    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    groupByResults := searchResponse.GroupByResults    //获取统计聚合结果。
    group, err := groupByResults.GroupByField("group1")
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }

    for _, item := range group.Items {    //遍历返回的所有分组。
        //打印分组的值和分组中的记录行数。
        fmt.Println("\tkey: ", item.Key, ", rowCount: ", item.RowCount) 

        //获取名称为"sub_agg1"的统计聚合结果。
        subAgg1, _ := item.SubAggregations.Min("sub_agg1")
        if subAgg1.HasValue() {
            fmt.Println("\t\tsub_agg1: ", subAgg1.Value)
        }

        //获取名称为"sub_agg2"的统计聚合结果。
        subAgg2, _ := item.SubAggregations.Max("sub_agg2")
        if subAgg2.HasValue() {
            fmt.Println("\t\tsub_agg2: ", subAgg2.Value)
        }

        //获取名称为"sub_group1"的统计聚合结果。
        subGroup, _ := item.SubGroupBys.GroupByRange("sub_group1")
        for _, item := range subGroup.Items {
            fmt.Println("\t\t[", item.From, ", ", item.To, "), rowCount: ", item.RowCount)
        }
    }
}

多个统计聚合

多个统计聚合功能可以组合使用。

说明

当多个统计聚合的复杂度较高时可能会影响响应速度。

示例1

func MultipleAggregations(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            Aggregation(search.NewAvgAggregation("agg1", "Col_Long")).              //计算Col_Long字段的平均值。
            Aggregation(search.NewDistinctCountAggregation("agg2", "Col_Long")).    //计算Col_Long字段不同取值的个数。
            Aggregation(search.NewMaxAggregation("agg3", "Col_Long")).              //计算Col_Long字段的最大值。
            Aggregation(search.NewSumAggregation("agg4", "Col_Long")).              //计算Col_Long字段的和。
            Aggregation(search.NewCountAggregation("agg5", "Col_Long")))            //计算存在Col_Long字段的行数。

    //设置返回所有列。
    searchRequest.SetColumnsToGet(&tablestore.ColumnsToGet{
        ReturnAll: true,
    })
    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }
    aggResults := searchResponse.AggregationResults    //获取统计聚合结果。

    //获取求平均值的统计聚合结果。
    agg1, err := aggResults.Avg("agg1")            //获取名称为"agg1"的统计聚合结果，类型为Avg。
    if err != nil {
        panic(err)
    }
    if agg1.HasValue() {                           //名称为"agg1"的统计聚合结果是否有Value值。
        fmt.Println("(avg) agg1: ", agg1.Value)    //打印Col_Long字段平均值。
    } else {
        fmt.Println("(avg) agg1: no value")        //所有行都不存在Col_Long字段时的打印信息。
    }

    //获取去重统计行数的统计聚合结果。
    agg2, err := aggResults.DistinctCount("agg2")  //获取名称为"agg2"的统计聚合结果，类型为DistinctCount。
    if err != nil {
        panic(err)
    }
    fmt.Println("(distinct) agg2: ", agg2.Value)   //打印Col_Long字段不同取值的个数。

    //获取求最大值的统计聚合结果。
    agg3, err := aggResults.Max("agg3")            //获取名称为"agg3"的统计聚合结果，类型为Max。
    if err != nil {
        panic(err)
    }
    if agg3.HasValue() {
        fmt.Println("(max) agg3: ", agg3.Value)    //打印Col_Long字段最大值。
    } else {
        fmt.Println("(max) agg3: no value")        //所有行都不存在Col_Long字段时的打印信息。
    }

    //获取求和的统计聚合结果。
    agg4, err := aggResults.Sum("agg4")            //获取名称为"agg4"的统计聚合结果，类型为Sum。
    if err != nil {
        panic(err)
    }
    fmt.Println("(sum) agg4: ", agg4.Value)        //打印Col_Long字段的和。

    //获取统计行数的统计聚合结果。
    agg5, err := aggResults.Count("agg5")          //获取名称为"agg5"的统计聚合结果，类型为Count。
    if err != nil {
        panic(err)
    }
    fmt.Println("(count) agg6: ", agg5.Value)      //打印存在Col_Long字段的个数。
}

示例2

func MultipleAggregationsAndGroupBysSample(client *tablestore.TableStoreClient, tableName string, indexName string) {
    searchRequest := &tablestore.SearchRequest{}

    searchRequest.
        SetTableName(tableName).    //设置数据表名称。
        SetIndexName(indexName).    //设置多元索引名称。
        SetSearchQuery(search.NewSearchQuery().
            SetQuery(&search.MatchAllQuery{}).    //匹配所有行。
            SetLimit(0).
            Aggregation(search.NewAvgAggregation("agg1", "Col_Long")).              //计算Col_Long字段的平均值。
            Aggregation(search.NewDistinctCountAggregation("agg2", "Col_Long")).    //计算Col_Long字段不同取值的个数。
            Aggregation(search.NewMaxAggregation("agg3", "Col_Long")).              //计算Col_Long字段的最大值。
            GroupBy(search.NewGroupByField("group1", "Col_Keyword").    //对Col_Keyword字段做GroupByField取值统计聚合。
                GroupBySorters([]search.GroupBySorter{}).    //指定返回结果分组的顺序。
                Size(2).                                     //仅返回前2个分组。
                SubAggregation(search.NewAvgAggregation("sub_agg1", "Col_Long")).    //对每个分组进行子统计聚合。
                SubGroupBy(search.NewGroupByField("sub_group1", "Col_Keyword2"))).   //对每个分组进行子统计聚合。
            GroupBy(search.NewGroupByRange("group2", "Col_Long").        //对Col_Long字段做GroupByRange范围。
                Range(search.NegInf, 3).     //第一个分组包含Col_Long在(NegInf, 3)的索引行。
                Range(3, 5).                 //第二个分组包含Col_Long在[3, 5)的索引行。
                Range(5, search.Inf)))       //第三个分组包含Col_Long在[5, Inf)的索引行。

    // 设置返回所有列。
    searchResponse, err := client.Search(searchRequest)
    if err != nil {
        fmt.Printf("%#v", err)
        return
    }

    aggResults := searchResponse.AggregationResults    //获取统计聚合结果。
    //获取求平均值的统计聚合结果。
    agg1, err := aggResults.Avg("agg1")            //获取名称为"agg1"的统计聚合结果，类型为Avg。
    if err != nil {
        panic(err)
    }
    if agg1.HasValue() {                           //名称为"agg1"的统计聚合结果是否有Value值。
        fmt.Println("(avg) agg1: ", agg1.Value)    //打印Col_Long字段平均值。
    } else {
        fmt.Println("(avg) agg1: no value")        //所有行都不存在Col_Long字段时的打印信息。
    }

    //获取去重统计行数的统计聚合结果。
    agg2, err := aggResults.DistinctCount("agg2")  //获取名称为"agg2"的统计聚合结果，类型为DistinctCount。
    if err != nil {
        panic(err)
    }
    fmt.Println("(distinct) agg2: ", agg2.Value)   //打印Col_Long字段不同取值的个数。

    //获取求最大值的统计聚合结果。
    agg3, err := aggResults.Max("agg3")        //获取名称为"agg3"的统计聚合结果，类型为Max。
    if err != nil {
        panic(err)
    }
    if agg3.HasValue() {
        fmt.Println("(max) agg3: ", agg3.Value)    //打印Col_Long字段最大值。
    } else {
        fmt.Println("(max) agg3: no value")        //所有行都不存在Col_Long字段时的打印信息。
    }

    groupByResults := searchResponse.GroupByResults    //获取统计聚合结果。
    //获取按字段值分组的统计聚合结果。
    group1, err := groupByResults.GroupByField("group1")    //获取名称为"group1"的统计聚合结果，类型为GroupByField。
    if err != nil {
        panic(err)
    }
    fmt.Println("group1: ")
    for _, item := range group1.Items {    //遍历返回的所有分组。
        //item
        fmt.Println("\tkey: ", item.Key, ", rowCount: ", item.RowCount)    //打印本次分组的行数。

        //获取求平均值的子统计聚合结果。
        subAgg1, err := item.SubAggregations.Avg("sub_agg1")    //获取名称为"sub_agg1"的子统计聚合结果，类型为Avg。
        if err != nil {
            panic(err)
        }
        if subAgg1.HasValue() {    //如果子统计聚合"sub_agg1"计算出了Col_Long字段的平均值，则HasValue()返回true。
            fmt.Println("\t\tsub_agg1: ", subAgg1.Value)    //打印本次分组中，子统计聚合计算的Col_Long字段的平均值。
        }

        //获取按字段值分组的子统计聚合结果。
        subGroup1, err := item.SubGroupBys.GroupByField("sub_group1")    //获取名称为"sub_group1"的子统计聚合结果，类型为GroupByField。
        if err != nil {
            panic(err)
        }
        fmt.Println("\t\tsub_group1")
        for _, subItem := range subGroup1.Items {    //遍历名称为"sub_group1"的子统计聚合结果。
            fmt.Println("\t\t\tkey: ", subItem.Key, ", rowCount: ", subItem.RowCount)    //打印"sub_group1"的子统计聚合结果分组，即分组中的行数。
            tablestore.Assert(subItem.SubAggregations.Empty(), "")
            tablestore.Assert(subItem.SubGroupBys.Empty(), "")
        }
    }

    //获取按范围分组的统计聚合结果。
    group2, err := groupByResults.GroupByRange("group2")    //获取名称为"group2"的统计聚合结果，类型为GroupByRange。
    if err != nil {
        panic(err)
    }
    fmt.Println("group2: ")
    for _, item := range group2.Items {    //遍历返回的所有分组。
        fmt.Println("\t[", item.From, ", ", item.To, "), rowCount: ", item.RowCount)    //打印本次分组的行数。
    }
}

附录：多字段分组不同实现方式对比

如果要根据多个字段对查询结果进行分组，您可以通过字段分组嵌套（即嵌套多层 groupBy）或者多字段分组（即直接使用 GroupByComposite）功能实现。具体实现方式的功能对比说明请参见下表。

功能对比	字段分组嵌套	多字段分组
size	2000	2000
字段列限制	最高支持 5 层	最高支持 32 列
翻页	不支持	支持通过 GroupByComposite 中的 nextToken 翻页
分组中 Item 排序规则	按照值的字典序升序或降序排列按照行数升序或降序排列按照子统计聚合结果中值升序或降序排列	按照值的字典序升序或降序排列
是否支持子统计聚合	支持	支持
兼容性	日期类型根据 Format 返回	日期类型返回时间戳字符串