Quantized clustering configurations

更新时间:
复制 MD 格式

Quantized clustering (QC) partitions a vector space into clusters and searches only the most relevant clusters at query time — reducing memory usage and improving query speed compared to scanning all vectors. Use this reference when configuring the index builder (QcBuilder) or the searcher (QcSearcher) for an OpenSearch Retrieval Engine Edition instance.

QcBuilder

QcBuilder parameters control how the quantized clustering index is built from your documents.

ParameterTypeDefault valueDescription
qc.builder.train_sample_countuint320Number of documents used as training data. Set to 0 to use all documents.
qc.builder.thread_countuint320Number of threads used during index building. Set to 0 to match the number of CPU cores of the instance.
qc.builder.centroid_countstringOptionalNumber of centroids per cluster level. Supports hierarchical clusters — separate levels with an asterisk (*). For one level: 1000. For two levels: 100*100. For two-level hierarchical clusters, set more centroids at the first level than at the second level; the first level delivers 10x the search gain of the second level. Leave this parameter unset to let the system infer the appropriate count automatically.
qc.builder.quantizer_classstringQuantizer applied to vector data. Specifying a quantizer reduces index size and improves query performance, but may reduce retrieval accuracy in some cases. Valid values: Int8QuantizerConverter, HalfFloatConverter, DoubleBitConverter.
qc.builder.quantize_by_centroidboolFalseWhether to perform quantization relative to each centroid's local coordinate space. Takes effect only when qc.builder.quantizer_class is set to Int8QuantizerConverter.

QcSearcher

QcSearcher parameters control how many clusters are scanned at query time. Adjust these to tune the speed-versus-accuracy trade-off without rebuilding the index.

ParameterTypeDefault valueDescription
qc.searcher.scan_ratiofloat0.01Maximum fraction of documents scanned per query. Used to derive max_scan_num with the formula: max_scan_num = total documents × scan_ratio. Increase this value to improve recall at the cost of higher query latency.
qc.searcher.brute_force_thresholdint1000Document count below which linear retrieval is performed instead of cluster-based search. When the total number of documents is less than this value, the system scans all vectors directly.