Distinct clause

更新时间:
复制 MD 格式

The distinct clause diversifies search results by limiting how many documents from the same source appear together. Without it, high-scoring documents from one user can dominate an entire results page.

Syntax

dist_key:field,dist_count:1,dist_times:1,reserved:false.

Parameter

Type

Required

Valid value

Default value

Description

dist_key

string

Yes

The field to scatter.

dist_times

int

No

1

The number of extractions.

dist_count

int

No

1

Documents extracted per round.

reserved

true/false

No

true/false

true

Whether to retain remaining documents after extraction. If set to false, remaining documents are discarded and the total match count becomes inaccurate.

update_total_hit

true/false

No

true/false

false

When reserved is false and update_total_hit is true, the system subtracts discarded documents from total_hit. The result may be inaccurate. When set to false, total_hit includes discarded documents.

dist_filter

string

No

A filter condition. Filtered documents skip scattering and sort with the first group of scattered results. By default, all documents are scattered.

grade

float

No

Score thresholds that classify documents into categories before scattering. Separate multiple thresholds with vertical bars (|). If omitted, all documents belong to one category. Example: grade:3.0 splits documents into two categories (scores below 3.0 and scores at or above 3.0). grade:3.0|5.0 creates three categories. Categories follow the same sort order as the first category.

dist_count and dist_times examples

The following examples use six documents where id is the primary key and name is the field to scatter.

doc1: id:11 name:a

doc2: id:22 name:a

doc3: id:33 name:a

doc4: id:44 name:b

doc5: id:55 name:c

doc6: id:66 name:c

Example 1: distinct=dist_key:name,dist_count:2,dist_times:1,reserved:false. One round extracts two documents per group. Result: doc1, doc2, doc4, doc5, and doc6.

Example 2: distinct=dist_key:name,dist_count:1,dist_times:2,reserved:false. Two rounds extract one document per group each. Result: doc1, doc4, doc5, doc2, and doc6.

Example 3: distinct=dist_key:name,dist_count:1,dist_times:1,reserved:false. One round extracts one document per group. Result: doc1, doc4, and doc5.

Usage notes

  1. The distinct clause is optional.

  2. Fields used in a distinct clause must be configured as attribute fields in the application schema.

  3. ARRAY fields are not supported. Only INT and LITERAL fields can be used.

  4. Only one field can be specified to scatter.

  5. Sorting does not remove duplicates. To deduplicate, use a distinct clause with the target field (such as title) and set dist_count:1, dist_times:1.

distinct uniq plug-in

When reserved is set to false, the total and viewtotal values become inaccurate, which can cause pagination errors. The distinct uniq plug-in fixes this when dist_times, dist_count, and reserved are set to 1, 1, and false. Add duniqfield:field to the kvpairs clause. Example: kvpairs=duniqfield:name.

Notes

  • The field value must match the dist_key value in the distinct clause.

  • The plug-in works only when dist_times, dist_count, and reserved are set to 1, 1, and false.

  • For performance reasons, the plug-in returns at most 5,000 results per query.

  • Queries that match millions of records may time out.

Examples

  1. Search for documents containing "Zhejiang University" where create_time is greater than 1402301230. Scatter results by company_id with 10 rounds of 2 documents each. Extracted documents are ranked at the back.

    query=default:'Zhejiang University'&&filter=create_time>1402301230&&distinct=dist_key:company_id,dist_count:2,dist_times:10
  2. Search for documents containing "Zhejiang University". Scatter by company_id with one round of one document. Discard remaining documents and return only the extracted ones.

    query=default:'Zhejiang University'&&distinct=dist_key:company_id,dist_count:1,dist_times:1,reserved:false&&kvpairs=duniqfield:company_id