The distinct clause diversifies search results by limiting how many documents from the same source appear together. Without it, high-scoring documents from one user can dominate an entire results page.
Syntax
dist_key:field,dist_count:1,dist_times:1,reserved:false.
|
Parameter |
Type |
Required |
Valid value |
Default value |
Description |
|
dist_key |
string |
Yes |
The field to scatter. |
||
|
dist_times |
int |
No |
1 |
The number of extractions. |
|
|
dist_count |
int |
No |
1 |
Documents extracted per round. |
|
|
reserved |
true/false |
No |
true/false |
true |
Whether to retain remaining documents after extraction. If set to false, remaining documents are discarded and the total match count becomes inaccurate. |
|
update_total_hit |
true/false |
No |
true/false |
false |
When reserved is false and update_total_hit is true, the system subtracts discarded documents from total_hit. The result may be inaccurate. When set to false, total_hit includes discarded documents. |
|
dist_filter |
string |
No |
A filter condition. Filtered documents skip scattering and sort with the first group of scattered results. By default, all documents are scattered. |
||
|
grade |
float |
No |
Score thresholds that classify documents into categories before scattering. Separate multiple thresholds with vertical bars (|). If omitted, all documents belong to one category. Example: grade:3.0 splits documents into two categories (scores below 3.0 and scores at or above 3.0). grade:3.0|5.0 creates three categories. Categories follow the same sort order as the first category. |
dist_count and dist_times examples
The following examples use six documents where id is the primary key and name is the field to scatter.
doc1: id:11 name:a
doc2: id:22 name:a
doc3: id:33 name:a
doc4: id:44 name:b
doc5: id:55 name:c
doc6: id:66 name:c
Example 1: distinct=dist_key:name,dist_count:2,dist_times:1,reserved:false. One round extracts two documents per group. Result: doc1, doc2, doc4, doc5, and doc6.
Example 2: distinct=dist_key:name,dist_count:1,dist_times:2,reserved:false. Two rounds extract one document per group each. Result: doc1, doc4, doc5, doc2, and doc6.
Example 3: distinct=dist_key:name,dist_count:1,dist_times:1,reserved:false. One round extracts one document per group. Result: doc1, doc4, and doc5.
Usage notes
-
The distinct clause is optional.
-
Fields used in a distinct clause must be configured as attribute fields in the application schema.
-
ARRAY fields are not supported. Only INT and LITERAL fields can be used.
-
Only one field can be specified to scatter.
-
Sorting does not remove duplicates. To deduplicate, use a distinct clause with the target field (such as title) and set dist_count:1, dist_times:1.
distinct uniq plug-in
When reserved is set to false, the total and viewtotal values become inaccurate, which can cause pagination errors. The distinct uniq plug-in fixes this when dist_times, dist_count, and reserved are set to 1, 1, and false. Add duniqfield:field to the kvpairs clause. Example: kvpairs=duniqfield:name.
Notes
-
The field value must match the dist_key value in the distinct clause.
-
The plug-in works only when dist_times, dist_count, and reserved are set to 1, 1, and false.
-
For performance reasons, the plug-in returns at most 5,000 results per query.
-
Queries that match millions of records may time out.
Examples
-
Search for documents containing "Zhejiang University" where create_time is greater than 1402301230. Scatter results by company_id with 10 rounds of 2 documents each. Extracted documents are ranked at the back.
query=default:'Zhejiang University'&&filter=create_time>1402301230&&distinct=dist_key:company_id,dist_count:2,dist_times:10 -
Search for documents containing "Zhejiang University". Scatter by company_id with one round of one document. Discard remaining documents and return only the extracted ones.
query=default:'Zhejiang University'&&distinct=dist_key:company_id,dist_count:1,dist_times:1,reserved:false&&kvpairs=duniqfield:company_id