Add a distinct clause to a query to control result diversity — limiting how many documents from the same group appear in ranked results. This prevents a single user, brand, or company from dominating an entire results page.
Example use cases:
Skew correction: Your results for a query contain too many documents from the same company. Set
dist_keytocompany_idanddist_countto2so that at most two documents per company appear in each extraction round.Deduplication: You want only one representative document per category. Set
dist_countto1anddist_timesto1.
Syntax
"distinct": {
"default": {
"dist_key": "field",
"dist_count": number,
"dist_times": number,
"dist_filter": "filter_expression",
"reserved": boolean,
"max_item_count": number,
"grade": []
},
"rank": {
"dist_key": "field",
"dist_count": number,
"dist_times": number,
"dist_filter": "filter_expression",
"reserved": boolean,
"max_item_count": number,
"grade": []
},
"rerank": {
"dist_key": "field",
"dist_count": number,
"dist_times": number,
"dist_filter": "filter_expression",
"reserved": boolean,
"max_item_count": number,
"grade": []
}
}OpenSearch Retrieval Engine Edition applies dispersing in two phases: the rough sort phase and the fine sort phase. Use the default, rank, and rerank rule keys to control which rule applies to which phase.
| Rules specified | Rough sort phase | Fine sort phase |
|---|---|---|
default only | default | default |
rank only | rank | — |
rerank only | — | rerank |
default + rank | rank | default |
default + rerank | default | rerank |
rank + rerank | rank | rerank |
default + rank + rerank | rank | rerank |
At least one of default, rank, or rerank must be specified.
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
dist_key | Yes | — | The attribute field to group documents by for dispersing. |
dist_count | No | 1 | Number of documents to extract per group in each round. |
dist_times | No | 1 | Number of extraction rounds to perform. |
dist_filter | No | All documents | A filter expression. Documents matching the filter are excluded from dispersing. In the fine sort phase, filtered documents are sorted together with the extracted documents. |
reserved | No | true | Specifies whether to retain documents not extracted by the distinct clause. Set to false to discard them. When set to false, the total and viewtotal values in the response may be inaccurate. |
max_item_count | No | — | Maximum number of documents retained in the DISTINCT calculation, computed as max(max_item_count, hit). For example, if 10 results appear per page and up to 100 pages are returned, set this to 1000. |
grade | No | One grade | Threshold values (separated by |) that classify documents into relevance grades based on rough sort scores. Documents within each grade are sorted in the same order as the rough sort phase. |
grade examples
grade:3.0— two grades: score < 3.0 (grade 1), score >= 3.0 (grade 2)grade:3.0|5.0— three grades: score < 3.0 (grade 1), 3.0 <= score < 5.0 (grade 2), score >= 5.0 (grade 3)
Example
The following example performs 10 rounds of extraction based on company_id, extracting 2 documents per round. Documents not extracted are assigned lower ranks.
"distinct": {
"default": {
"dist_key": "company_id",
"dist_count": 2,
"dist_times": 10
}
}How dist_count and dist_times interact
dist_count controls how many documents are extracted per group per round; dist_times controls how many rounds of extraction run. Together they determine the final document order.
Consider six documents where name is the distinct key:
doc1: id:1, name:a
doc2: id:2, name:a
doc3: id:3, name:a
doc4: id:4, name:b
doc5: id:5, name:c
doc6: id:6, name:cCase 1 — dist_count:2, dist_times:1
Extract 2 documents per group, 1 round:
"distinct": {
"default": {
"dist_key": "name",
"dist_count": 2,
"dist_times": 1
}
}Result order: doc1, doc2, doc4, doc5, doc6
Round 1 extracts up to 2 documents from each group: 2 from group a (doc1, doc2), 1 from group b (doc4), 2 from group c (doc5, doc6). doc3 is not extracted.
Case 2 — dist_count:1, dist_times:2
Extract 1 document per group, 2 rounds:
"distinct": {
"default": {
"dist_key": "name",
"dist_count": 1,
"dist_times": 2
}
}Result order: doc1, doc4, doc5, doc2, doc6
Round 1 extracts 1 from each group: doc1 (a), doc4 (b), doc5 (c). Round 2 extracts the next 1 from each group: doc2 (a), doc6 (c). doc3 is not extracted.
Case 3 — dist_count:1, dist_times:1
Extract 1 document per group, 1 round:
"distinct": {
"default": {
"dist_key": "name",
"dist_count": 1,
"dist_times": 1
}
}Result order: doc1, doc4, doc5
Only 1 round runs. One document is extracted per group: doc1 (a), doc4 (b), doc5 (c). The remaining documents are not extracted.
Fix inaccurate total counts with the distinct uniq plug-in
When reserved is set to false, the total and viewtotal values in the response may be inaccurate, which can cause errors in pagination or any logic that depends on those values.
The distinct uniq plug-in corrects this by computing accurate counts. To activate it, add duniqfield:<field> to a kvpairs clause.
The plug-in only works whendist_timesis1,dist_countis1, andreservedisfalse. Theduniqfieldvalue must match thedist_keyvalue. For performance reasons, the plug-in returns at most 5,000 results per query.
{
"distinct": {
"default": {
"dist_key": "company_id",
"dist_count": 1,
"dist_times": 1,
"reserved": false
}
},
"kvpairs": {
"duniqfield": "company_id"
}
}Usage notes
Fields specified in a distinct clause must be attribute fields defined in
schema.json.Only INT and LITERAL field types are supported. ARRAY is not supported.