Aggregate clause for group statistics | OpenSearch-OpenSearch(Open Search)-阿里云帮助中心

Use aggregate clauses to compute group statistics, such as counts, sums, maximums, and minimums, over search results without viewing individual documents.

Syntax

Aggregate clause syntax:

group_key:field, range:number1~number2, agg_fun:func1#func2, max_group:number2, 
agg_filter:filter_clause, max_group:number

Parameters:

Parameter	Type	Required	Valid value	Default value	Description
group_key:field	field: an attribute field	Yes	INT, LITERAL, INT_ARRAY, or LITERAL_ARRAY fields. For INT_ARRAY or LITERAL_ARRAY fields, repeated items are counted individually.		The attribute field to group by for statistics collection.
agg_fun		Yes	The built-in functions count(), sum(id), max(id), min(id), and distinct_count(id).		Supported functions: count(), sum(id), max(id), min(id), and distinct_count(id). Separate multiple functions with number signs (#). The sum(), max(), and min() functions support arithmetic expressions across multiple fields.
range		No	Values between number1 and number2, and values greater than number2. STRING fields cannot be aggregated.		Collects statistics by value ranges for data distribution analysis. Only one range parameter is allowed per aggregate clause.
agg_filter		No			Filters documents by the specified conditions before aggregation.
agg_sampler_threshold	INT	No			The sampling threshold. Documents ranked above this value are counted sequentially. Documents ranked below it are sampled at intervals defined by agg_sampler_step.
agg_sampler_step	INT	No			The sampling interval for documents ranked below agg_sampler_threshold. For sum() and count(), sampled statistics are multiplied by the step size and added to the sequential statistics to produce the final result.
max_group	INT	No		1000	Maximum number of groups returned.

Usage notes

An aggregate clause is optional.
All referenced fields must be configured as attribute fields in the application schema.
Aggregate results are returned in the facet node. The agg_fun functions (such as sum() and count()) produce the statistics.
Specify multiple group_key parameters to collect statistics for different fields. Separate them with semicolons (;).

Example:

group_key:field1,agg_fun:func1;group_key:field2,agg_fun:func2

To display statistics in the response, set the config clause format to full JSON.
distinct_count is supported only in exclusive clusters. Add enable_accurate_statistics to the kvpairs clause and set it to true. Only facet-node statistics are returned when this feature is enabled.
count(), max(), min(), and sum() in exclusive clusters also require enable_accurate_statistics set to true in the kvpairs clause.
Accurate statistics are guaranteed for up to 100,000 matching documents. Beyond this limit, results may be approximate. In exclusive clusters, set enable_accurate_statistics to true in the kvpairs clause for improved accuracy.

Examples

Query documents containing "Zhejiang University" and group by group_id (sum and max of price) and company_id (count).

query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:sum(price)#max(price);group_key:company_id,agg_fun:count()

Sample return result:

{
　　status: "OK",
　　result: {
　　　　searchtime: 0.015634,
　　　　total: 5,
　　　　num: 1,
　　　　viewtotal: 5,
　　　　items: [        // The return result.
　　　　　　{ ... }
　　　　],
　　　　facet: [
　　　　　　{
　　　　　　　　key: "group_id",
　　　　　　　　items: [
　　　　　　　　　　{
　　　　　　　　　　　　value: 43,
　　　　　　　　　　　　sum: 81,
　　　　　　　　　　　　max: 20,
　　　　　　　　　　},
　　　　　　　　　　{
　　　　　　　　　　　　value: 63,
　　　　　　　　　　　　sum: 91,
　　　　　　　　　　　　max: 50,
　　　　　　　　　　},
　　　　　　　　],
　　　　　　},
　　　　　　{
　　　　　　　　key: "company_id",
　　　　　　　　items: [
　　　　　　　　　　{
　　　　　　　　　　　　value: 13,
　　　　　　　　　　　　count: 4,
　　　　　　　　　　},
　　　　　　　　　　{
　　　　　　　　　　　　value: 10,
　　　　　　　　　　　　count: 1,
　　　　　　　　　　},
　　　　　　　　],
　　　　　　},
　　　　],
　　},
　　errors: [ ],
　　tracer: "",
},

Query documents containing "Zhejiang University", group by group_id, and calculate sum(price) with sampling (threshold: 10,000, step: 5).
```
query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:sum(price), agg_sampler_threshold:10000, agg_sampler_step:5
```
Query documents containing "Zhejiang University", group by group_id, and count documents with group_id values in the range 10–50.
```
query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:count(),range:10~50
```
Query documents containing "Zhejiang University", group by group_id, and calculate max(hits + replies) for documents with create_timestamp greater than 1423456781.
```
query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:max(hits+replies),agg_filter:create_timestamp>1423456781
```