Forward index attributes can consume significant storage space, especially when documents share many duplicate values or use floating-point fields. OpenSearch Retrieval Engine Edition provides three compression techniques to reduce forward index storage size. Enable one or more based on your field types and data distribution.
Multi-value attribute deduplication
When documents share many identical attribute values, storing each duplicate separately wastes space. Multi-value attribute deduplication removes duplicate values from the index before storage, reducing the size of generated indexes.
Applies to: multi-value attributes and single-value fields with the STRING data type.
Trade-off: deduplication requires additional memory during build and merge operations due to the dictionaries used internally. Skip this option if your duplication rate is low.
Equal-value compression
After the offset values of single-value and multi-value attributes are globally sorted by a field, duplicate values often appear consecutively. Equal-value compression stores these runs of duplicate values using fewer bits, reducing the size of the corresponding offset files.
Applies to: offset files of single-value attributes and multi-value attributes. For multi-value attributes and STRING-type single-value attributes, combine this with multi-value attribute deduplication for greater space savings.
Self-adaptive storage for offset files
Each multi-value attribute has its own offset file. Using 8 bits per offset file results in high storage overhead. OpenSearch automatically uses 4 bits per offset file when the total size of all offset files is under 4 GB. No configuration is required.
Configure compression
Set compress_type on each field in your schema configuration to enable compression. The default value is an empty string (no compression).
{
"fields": [
{
"field_name": "category",
"field_type": "INTEGER",
"multi_value": true,
"compress_type": "uniq|equal"
},
{
"field_name": "price",
"field_type": "INTEGER",
"user_defined_param": {
"key": "hello"
}
}
]
}compress_type values
| Value | Effect | Applicable field types | Combinable with |
|---|---|---|---|
uniq | Multi-value attribute deduplication | Multi-value attributes; single-value STRING fields | equal |
equal | Equal-value compression | Single-value and multi-value attributes (offset files) | uniq |
patch_compress | Patch file-based compression | — | — |
block_fp | Floating-point block compression | Multi-value FLOAT attributes | — |
fp16 | Half-precision floating-point compression | Single-value and multi-value FLOAT attributes | — |
int8#N | INT8 quantization; N defines the value range (−N to +N) | Single-value and multi-value FLOAT attributes | — |
To combine compression methods, separate values with a vertical bar (|). For example, "compress_type": "uniq|equal".
Do not combinefp16orint8#Nwithuniq|equalon a single-value FLOAT attribute.
Field parameters
| Parameter | Description | Default |
|---|---|---|
field_name | Name of the field | — |
field_type | Data type of the field | — |
multi_value | Whether the field holds multiple values per document | false |
compress_type | Compression method(s) for attribute storage | "" (no compression) |