aggs clause-OpenSearch(Open Search)-阿里云帮助中心

Use the aggs clause to compute statistics over your search results without inspecting individual documents. Common use cases include:

How many products fall into each price range?
What is the total sales amount per vendor?
What is the highest-rated item in each category?

Syntax

{
  "aggs": [
    {
      "group_key": "<field>",
      "agg_fun": ["<func1>", "<func2>"],
      "agg_filter": "<filter_expression>",
      "agg_range": [<number1>, <number2>],
      "max_group": <number>,
      "order_by": "count"
    }
  ]
}

Parameters

Parameter	Required	Type	Description
`group_key`	Yes	STRING or INTEGER attribute field	The field to group results by. Must be an attribute field defined in `schema.json`.
`agg_fun`	Yes	Array of function strings	One or more aggregation functions to apply. See Aggregation functions.
`agg_filter`	No	Logical expression	Filters documents before aggregation. Uses the same syntax as the filter clause.
`agg_range`	No	`[number1, number2]`	Restricts aggregation to a numeric range. One range per `aggs` clause. Not supported for STRING fields.
`max_group`	No	Integer; default: `1000`	Maximum number of groups to return. Keep this at or below `10000` to avoid out of memory (OOM) errors on the Query Result Searcher (QRS) worker.
`order_by`	No	`"count"`	Sorts groups by document count. If omitted, groups are sorted in lexicographic order of the `group_key` values.

Aggregation functions

All functions are specified in the agg_fun array. You can combine multiple functions in a single aggs clause.

Function	Description
`count()`	Number of documents in each group
`sum(<field>)`	Sum of the field values in each group
`max(<field>)`	Maximum field value in each group
`min(<field>)`	Minimum field value in each group
`distinct_count(<field>)`	Enables the semi-exact statistics feature. Uses the HyperLogLog (HLL) algorithm to compute an approximate count of distinct field values. Accuracy exceeds 99% in most cases.

Examples

Simple aggregation

Sum the price field, grouped by group_id:

{
  "aggs": [
    {
      "group_key": "group_id",
      "agg_fun": ["sum(price)"]
    }
  ]
}

Aggregation results are returned in the facet node of the response. To access the facet node, set format to fulljson in the config clause.

{
  "result": {
    "facet": [
      {
        "key": "group_id",
        "items": [
          { "value": 43, "sum": 81 },
          { "value": 63, "sum": 91 }
        ]
      }
    ]
  }
}

Each item in items corresponds to one group: value is the group_key value, and sum is the result of the sum(price) function.

Multiple aggregation functions

Apply sum(), max(), and min() in one clause:

{
  "aggs": [
    {
      "group_key": "company_id",
      "agg_fun": ["sum(id)", "max(id)", "min(id)"]
    }
  ]
}

Aggregation across multiple fields

Use multiple aggs objects to aggregate different fields independently in one request:

{
  "aggs": [
    {
      "group_key": "group_id",
      "agg_fun": ["sum(price)"]
    },
    {
      "group_key": "company_id",
      "agg_fun": ["count()"]
    }
  ]
}

Filtered aggregation

Aggregate only documents where price > 100:

{
  "aggs": [
    {
      "group_key": "group_id",
      "agg_fun": ["sum(price)"],
      "agg_filter": "price > 100"
    }
  ]
}

Distinct count (approximate)

Count the number of distinct brand values per company_id:

{
  "aggs": [
    {
      "group_key": "company_id",
      "agg_fun": ["distinct_count(brand)"]
    }
  ]
}

Usage notes

Fields used in group_key and aggregation functions must be attribute fields declared in schema.json.
Aggregation results are returned to the facet node on the Searcher worker. Set format to fulljson in the config clause to include this node in the response.
Accurate statistics are guaranteed for up to 100,000 documents per partition. If a partition contains more than 100,000 matching documents, results may be approximate because the engine applies performance limits during distributed execution. To raise this limit, adjust the maximum document count in the cluster configuration.
Setting max_group above 10,000 increases memory consumption on the QRS worker and may cause an OOM error.