Quantized clustering configurations-OpenSearch(Open Search)-阿里云帮助中心

Quantized clustering (QC) partitions a vector space into clusters and searches only the most relevant clusters at query time — reducing memory usage and improving query speed compared to scanning all vectors. Use this reference when configuring the index builder (QcBuilder) or the searcher (QcSearcher) for an OpenSearch Retrieval Engine Edition instance.

QcBuilder

QcBuilder parameters control how the quantized clustering index is built from your documents.

Parameter	Type	Default value	Description
`qc.builder.train_sample_count`	uint32	0	Number of documents used as training data. Set to `0` to use all documents.
`qc.builder.thread_count`	uint32	0	Number of threads used during index building. Set to `0` to match the number of CPU cores of the instance.
`qc.builder.centroid_count`	string	Optional	Number of centroids per cluster level. Supports hierarchical clusters — separate levels with an asterisk (``). For one level: `1000`. For two levels: `100100`. For two-level hierarchical clusters, set more centroids at the first level than at the second level; the first level delivers 10x the search gain of the second level. Leave this parameter unset to let the system infer the appropriate count automatically.
`qc.builder.quantizer_class`	string	—	Quantizer applied to vector data. Specifying a quantizer reduces index size and improves query performance, but may reduce retrieval accuracy in some cases. Valid values: `Int8QuantizerConverter`, `HalfFloatConverter`, `DoubleBitConverter`.
`qc.builder.quantize_by_centroid`	bool	False	Whether to perform quantization relative to each centroid's local coordinate space. Takes effect only when `qc.builder.quantizer_class` is set to `Int8QuantizerConverter`.

QcSearcher

QcSearcher parameters control how many clusters are scanned at query time. Adjust these to tune the speed-versus-accuracy trade-off without rebuilding the index.

Parameter	Type	Default value	Description
`qc.searcher.scan_ratio`	float	0.01	Maximum fraction of documents scanned per query. Used to derive `max_scan_num` with the formula: `max_scan_num = total documents × scan_ratio`. Increase this value to improve recall at the cost of higher query latency.
`qc.searcher.brute_force_threshold`	int	1000	Document count below which linear retrieval is performed instead of cluster-based search. When the total number of documents is less than this value, the system scans all vectors directly.