aggs clause

更新时间:
复制 MD 格式

Use the aggs clause to compute statistics over your search results without inspecting individual documents. Common use cases include:

  • How many products fall into each price range?

  • What is the total sales amount per vendor?

  • What is the highest-rated item in each category?

Syntax

{
  "aggs": [
    {
      "group_key": "<field>",
      "agg_fun": ["<func1>", "<func2>"],
      "agg_filter": "<filter_expression>",
      "agg_range": [<number1>, <number2>],
      "max_group": <number>,
      "order_by": "count"
    }
  ]
}

Parameters

ParameterRequiredTypeDescription
group_keyYesSTRING or INTEGER attribute fieldThe field to group results by. Must be an attribute field defined in schema.json.
agg_funYesArray of function stringsOne or more aggregation functions to apply. See Aggregation functions.
agg_filterNoLogical expressionFilters documents before aggregation. Uses the same syntax as the filter clause.
agg_rangeNo[number1, number2]Restricts aggregation to a numeric range. One range per aggs clause. Not supported for STRING fields.
max_groupNoInteger; default: 1000Maximum number of groups to return. Keep this at or below 10000 to avoid out of memory (OOM) errors on the Query Result Searcher (QRS) worker.
order_byNo"count"Sorts groups by document count. If omitted, groups are sorted in lexicographic order of the group_key values.

Aggregation functions

All functions are specified in the agg_fun array. You can combine multiple functions in a single aggs clause.

FunctionDescription
count()Number of documents in each group
sum(<field>)Sum of the field values in each group
max(<field>)Maximum field value in each group
min(<field>)Minimum field value in each group
distinct_count(<field>)Enables the semi-exact statistics feature. Uses the HyperLogLog (HLL) algorithm to compute an approximate count of distinct field values. Accuracy exceeds 99% in most cases.

Examples

Simple aggregation

Sum the price field, grouped by group_id:

{
  "aggs": [
    {
      "group_key": "group_id",
      "agg_fun": ["sum(price)"]
    }
  ]
}

Aggregation results are returned in the facet node of the response. To access the facet node, set format to fulljson in the config clause.

{
  "result": {
    "facet": [
      {
        "key": "group_id",
        "items": [
          { "value": 43, "sum": 81 },
          { "value": 63, "sum": 91 }
        ]
      }
    ]
  }
}

Each item in items corresponds to one group: value is the group_key value, and sum is the result of the sum(price) function.

Multiple aggregation functions

Apply sum(), max(), and min() in one clause:

{
  "aggs": [
    {
      "group_key": "company_id",
      "agg_fun": ["sum(id)", "max(id)", "min(id)"]
    }
  ]
}

Aggregation across multiple fields

Use multiple aggs objects to aggregate different fields independently in one request:

{
  "aggs": [
    {
      "group_key": "group_id",
      "agg_fun": ["sum(price)"]
    },
    {
      "group_key": "company_id",
      "agg_fun": ["count()"]
    }
  ]
}

Filtered aggregation

Aggregate only documents where price > 100:

{
  "aggs": [
    {
      "group_key": "group_id",
      "agg_fun": ["sum(price)"],
      "agg_filter": "price > 100"
    }
  ]
}

Distinct count (approximate)

Count the number of distinct brand values per company_id:

{
  "aggs": [
    {
      "group_key": "company_id",
      "agg_fun": ["distinct_count(brand)"]
    }
  ]
}

Usage notes

  • Fields used in group_key and aggregation functions must be attribute fields declared in schema.json.

  • Aggregation results are returned to the facet node on the Searcher worker. Set format to fulljson in the config clause to include this node in the response.

  • Accurate statistics are guaranteed for up to 100,000 documents per partition. If a partition contains more than 100,000 matching documents, results may be approximate because the engine applies performance limits during distributed execution. To raise this limit, adjust the maximum document count in the cluster configuration.

  • Setting max_group above 10,000 increases memory consumption on the QRS worker and may cause an OOM error.