Group statistics - aggregate clause

更新时间:
复制 MD 格式

Use aggregate clauses to compute group statistics, such as counts, sums, maximums, and minimums, over search results without viewing individual documents.

Syntax

Aggregate clause syntax:

group_key:field, range:number1~number2, agg_fun:func1#func2, max_group:number2, 
agg_filter:filter_clause, max_group:number

Parameters:

Parameter

Type

Required

Valid value

Default value

Description

group_key:field

field: an attribute field

Yes

INT, LITERAL, INT_ARRAY, or LITERAL_ARRAY fields. For INT_ARRAY or LITERAL_ARRAY fields, repeated items are counted individually.

The attribute field to group by for statistics collection.

agg_fun

Yes

The built-in functions count(), sum(id), max(id), min(id),

and distinct_count(id).

Supported functions: count(), sum(id), max(id), min(id), and distinct_count(id). Separate multiple functions with number signs (#). The sum(), max(), and min() functions support arithmetic expressions across multiple fields.

range

No

Values between number1 and number2, and values greater than number2. STRING fields cannot be aggregated.

Collects statistics by value ranges for data distribution analysis. Only one range parameter is allowed per aggregate clause.

agg_filter

No

Filters documents by the specified conditions before aggregation.

agg_sampler_threshold

INT

No

The sampling threshold. Documents ranked above this value are counted sequentially. Documents ranked below it are sampled at intervals defined by agg_sampler_step.

agg_sampler_step

INT

No

The sampling interval for documents ranked below agg_sampler_threshold. For sum() and count(), sampled statistics are multiplied by the step size and added to the sequential statistics to produce the final result.

max_group

INT

No

1000

Maximum number of groups returned.

Usage notes

  • An aggregate clause is optional.

  • All referenced fields must be configured as attribute fields in the application schema.

  • Aggregate results are returned in the facet node. The agg_fun functions (such as sum() and count()) produce the statistics.

  • Specify multiple group_key parameters to collect statistics for different fields. Separate them with semicolons (;).

Example:

group_key:field1,agg_fun:func1;group_key:field2,agg_fun:func2
  • To display statistics in the response, set the config clause format to full JSON.

  • distinct_count is supported only in exclusive clusters. Add enable_accurate_statistics to the kvpairs clause and set it to true. Only facet-node statistics are returned when this feature is enabled.

  • count(), max(), min(), and sum() in exclusive clusters also require enable_accurate_statistics set to true in the kvpairs clause.

  • Accurate statistics are guaranteed for up to 100,000 matching documents. Beyond this limit, results may be approximate. In exclusive clusters, set enable_accurate_statistics to true in the kvpairs clause for improved accuracy.

Examples

  1. Query documents containing "Zhejiang University" and group by group_id (sum and max of price) and company_id (count).

    query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:sum(price)#max(price);group_key:company_id,agg_fun:count()

    Sample return result:

    {
      status: "OK",
      result: {
        searchtime: 0.015634,
        total: 5,
        num: 1,
        viewtotal: 5,
        items: [        // The return result.
          { ... }
        ],
        facet: [
          {
            key: "group_id",
            items: [
              {
                value: 43,
                sum: 81,
                max: 20,
              },
              {
                value: 63,
                sum: 91,
                max: 50,
              },
            ],
          },
          {
            key: "company_id",
            items: [
              {
                value: 13,
                count: 4,
              },
              {
                value: 10,
                count: 1,
              },
            ],
          },
        ],
      },
      errors: [ ],
      tracer: "",
    },
  2. Query documents containing "Zhejiang University", group by group_id, and calculate sum(price) with sampling (threshold: 10,000, step: 5).

    query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:sum(price), agg_sampler_threshold:10000, agg_sampler_step:5
  3. Query documents containing "Zhejiang University", group by group_id, and count documents with group_id values in the range 10–50.

    query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:count(),range:10~50
  4. Query documents containing "Zhejiang University", group by group_id, and calculate max(hits + replies) for documents with create_timestamp greater than 1423456781.

    query=default:'Zhejiang University'&&aggregate=group_key:group_id,agg_fun:max(hits+replies),agg_filter:create_timestamp>1423456781