Distinct clause-OpenSearch(Open Search)-阿里云帮助中心

The distinct clause diversifies search results by limiting how many documents from the same source appear together. Without it, high-scoring documents from one user can dominate an entire results page.

Syntax

dist_key:field,dist_count:1,dist_times:1,reserved:false.

Parameter	Type	Required	Valid value	Default value	Description
dist_key	string	Yes			The field to scatter.
dist_times	int	No		1	The number of extractions.
dist_count	int	No		1	Documents extracted per round.
reserved	true/false	No	true/false	true	Whether to retain remaining documents after extraction. If set to false, remaining documents are discarded and the total match count becomes inaccurate.
update_total_hit	true/false	No	true/false	false	When reserved is false and update_total_hit is true, the system subtracts discarded documents from total_hit. The result may be inaccurate. When set to false, total_hit includes discarded documents.
dist_filter	string	No			A filter condition. Filtered documents skip scattering and sort with the first group of scattered results. By default, all documents are scattered.
grade	float	No			Score thresholds that classify documents into categories before scattering. Separate multiple thresholds with vertical bars (\|). If omitted, all documents belong to one category. Example: grade:3.0 splits documents into two categories (scores below 3.0 and scores at or above 3.0). grade:3.0\|5.0 creates three categories. Categories follow the same sort order as the first category.

dist_count and dist_times examples

The following examples use six documents where id is the primary key and name is the field to scatter.

doc1: id:11 name:a

doc2: id:22 name:a

doc3: id:33 name:a

doc4: id:44 name:b

doc5: id:55 name:c

doc6: id:66 name:c

Example 1: distinct=dist_key:name,dist_count:2,dist_times:1,reserved:false. One round extracts two documents per group. Result: doc1, doc2, doc4, doc5, and doc6.

Example 2: distinct=dist_key:name,dist_count:1,dist_times:2,reserved:false. Two rounds extract one document per group each. Result: doc1, doc4, doc5, doc2, and doc6.

Example 3: distinct=dist_key:name,dist_count:1,dist_times:1,reserved:false. One round extracts one document per group. Result: doc1, doc4, and doc5.

Usage notes

The distinct clause is optional.
Fields used in a distinct clause must be configured as attribute fields in the application schema.
ARRAY fields are not supported. Only INT and LITERAL fields can be used.
Only one field can be specified to scatter.
Sorting does not remove duplicates. To deduplicate, use a distinct clause with the target field (such as title) and set dist_count:1, dist_times:1.

distinct uniq plug-in

When reserved is set to false, the total and viewtotal values become inaccurate, which can cause pagination errors. The distinct uniq plug-in fixes this when dist_times, dist_count, and reserved are set to 1, 1, and false. Add duniqfield:field to the kvpairs clause. Example: kvpairs=duniqfield:name.

Notes

The field value must match the dist_key value in the distinct clause.
The plug-in works only when dist_times, dist_count, and reserved are set to 1, 1, and false.
For performance reasons, the plug-in returns at most 5,000 results per query.
Queries that match millions of records may time out.

Examples

Search for documents containing "Zhejiang University" where create_time is greater than 1402301230. Scatter results by company_id with 10 rounds of 2 documents each. Extracted documents are ranked at the back.
```
query=default:'Zhejiang University'&&filter=create_time>1402301230&&distinct=dist_key:company_id,dist_count:2,dist_times:10
```
Search for documents containing "Zhejiang University". Scatter by company_id with one round of one document. Discard remaining documents and return only the extracted ones.
```
query=default:'Zhejiang University'&&distinct=dist_key:company_id,dist_count:1,dist_times:1,reserved:false&&kvpairs=duniqfield:company_id
```