在多租户海量索引场景下,自定义路由键是实现用户级数据隔离与精准查询的核心技术。通过将用户标识(例如ID)绑定为路由键,可保证每次查询仅针对目标用户数据,在保障了数据安全性的同时进一步提升查询性能。本文介绍如何使用自定义路由键功能。
前提条件
准备工作
在使用高级特性前,请先通过curl命令连接搜索引擎。具体操作及连接参数说明,请参见连接搜索引擎。
创建索引
仅支持纯向量数据查询
如果索引内数据量在万级以下,建议您使用flat索引。如果数据量为几万或几十万,建议使用hnsw索引。如果数据量达到了百万级别,建议使用ivfpq索引。您也可以根据业务需求,选择使用稀疏向量索引。
在自定义路由键的场景下,主键_id是全局唯一,且必须是全局唯一。
创建索引时需指定
"knn_routing": true
,表示开启自定义路由键功能。对于ivfpq索引,还需设置"meta": {"offline.construction": "true"}
。
flat routing索引
curl -u <username>:<password> -H 'Content-Type: application/json' -XPUT "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_flat_test?pretty" -d '
{
"settings" : {
"index": {
"number_of_shards": 2,
"knn": true,
"knn_routing": true
}
},
"mappings": {
"_source": {
"excludes": ["vector1"]
},
"properties": {
"vector1": {
"type": "knn_vector",
"dimension": 3,
"data_type": "float",
"method": {
"engine": "lvector",
"name": "flat",
"space_type": "l2",
"parameters": {}
}
},
"field1": {
"type": "long"
}
}
}
}
'
hnsw routing索引
curl -u <username>:<password> -H 'Content-Type: application/json' -XPUT "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_hnsw_test?pretty" -d '
{
"settings" : {
"index": {
"number_of_shards": 2,
"knn": true,
"knn_routing": true
}
},
"mappings": {
"_source": {
"excludes": ["vector1"]
},
"properties": {
"vector1": {
"type": "knn_vector",
"dimension": 3,
"method": {
"engine": "lvector",
"name": "hnsw",
"space_type": "l2",
"parameters": {
"m": 24,
"ef_construction": 500
}
}
},
"field1": {
"type": "long"
}
}
}
}'
sparse_hnsw routing稀疏向量索引
curl -u <username>:<password> -H 'Content-Type: application/json' -XPUT "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_sparse_test?pretty" -d '
{
"settings" : {
"index": {
"number_of_shards": 2,
"knn": true,
"knn_routing": true
}
},
"mappings": {
"_source": {
"excludes": ["vector1"]
},
"properties": {
"vector1": {
"type": "knn_vector",
"data_type": "sparse_vector",
"method": {
"engine": "lvector",
"name": "sparse_hnsw",
"space_type": "innerproduct",
"parameters": {
"m": 24,
"ef_construction": 200
}
}
},
"field1": {
"type": "long"
}
}
}
}'
ivfpq routing索引
在自定义路由键场景下,由于单个路由键的数据量通常较小(例如几十万条甚至更少),其ivfpq参数设置需区别于千万或亿级数据的通用策略。例如用于定义簇数量的参数nlist,在设置时可以遵循每个簇承载1,000~30,000条数据的原则,如果每个路由键的数据量为几千条,可以将nlist设置为2
。
curl -u <username>:<password> -H 'Content-Type: application/json' -XPUT "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_ivfpq_test?pretty" -d '
{
"settings": {
"index": {
"number_of_shards": 4,
"knn": true,
"knn_routing": true
}
},
"mappings": {
"_source": {
"excludes": ["vector1"]
},
"properties": {
"vector1": {
"type": "knn_vector",
"dimension": 3,
"data_type": "float",
"meta": {"offline.construction": "true"},
"method": {
"engine": "lvector",
"name": "ivfpq",
"space_type": "cosinesimil",
"parameters": {
"m": 3, // 设置为与维度dimension相同的值
"nlist": 2,
"centroids_use_hnsw": false,
"centroids_hnsw_m": 48,
"centroids_hnsw_ef_construct": 500,
"centroids_hnsw_ef_search": 200
}
}
},
"field1": {
"type": "long"
}
}
}
}
'
支持纯向量数据查询和融合查询
如果您需要执行融合查询,创建索引时需指定全文检索字段,即创建索引时添加以下参数:
"text_field": {
"type": "text",
"analyzer": "ik_max_word"
}
以hnsw routing索引为例,创建语句如下:
curl -u <username>:<password> -H 'Content-Type: application/json' -XPUT "http://ld-bp1h002998iv8****-proxy-search-pub.lindorm.aliyuncs.com:30070/vector1_routing_hnsw_hybirdSearch?pretty" -d '
{
"settings" : {
"index": {
"number_of_shards": 2,
"knn": true,
"knn_routing": true
}
},
"mappings": {
"_source": {
"excludes": ["vector1"]
},
"properties": {
"vector1": {
"type": "knn_vector",
"dimension": 3,
"data_type": "float",
"method": {
"engine": "lvector",
"name": "hnsw",
"space_type": "l2",
"parameters": {
"m": 24,
"ef_construction": 500
}
}
},
"text_field": {
"type": "text",
"analyzer": "ik_max_word"
},
"field1": {
"type": "long"
}
}
}
}'
数据写入
单条写入
以下示例向flat索引vector_routing_flat_test
中写入数据,并指定路由值为租户user123
。
curl -u <username>:<password> -H 'Content-Type: application/json' -XPUT "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_flat_test/_doc/1?routing=user123" -d '
{
"vector1": [1.2, 1.3, 1.4],
"field1": 1
}
'
批量写入
以下示例向hnsw索引vector_routing_hnsw_test
中批量写入数据,并分别指定路由值为1
和2
。
curl -u <username>:<password> -H "Content-Type: application/json" -XPOST "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/_bulk?pretty" -d '
{ "index" : { "_index" : "vector_routing_hnsw_test", "_id" : "2", "routing": "1"} }
{ "field1" : 2, "vector1": [2.2, 2.3, 2.4]}
{ "index" : { "_index" : "vector_routing_hnsw_test", "_id" : "3", "routing": "2" } }
{ "field1" : 3, "vector1": [3.2, 3.3, 3.4]}
'
批量写入稀疏向量
curl -u <username>:<password> -H "Content-Type: application/json" -XPOST "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/_bulk?pretty" -d '
{ "index" : { "_index" : "vector_routing_sparse_test", "_id" : "2", "routing": "1"} }
{ "field1" : 2, "vector1": {"indices": [10, 12, 16], "values": [0.5, 0.5, 0.2]}}
{ "index" : { "_index" : "vector_routing_sparse_test", "_id" : "3", "routing": "2" } }
{ "field1" : 3, "vector1": {"indices": [10, 12, 16], "values": [0.5, 0.5, 0.2]}}
'
索引构建
构建ivfpq索引
仅ivfpq索引需手动构建,需要在构建语句中设置 "meta": {"offline.construction": "true"}
, 表示离线索引。
发起构建前务必确保索引已写入足够的数据量,必须大于256条且超过nlist的30倍。
curl -u <username>:<password> -H 'Content-Type: application/json' -XPOST "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/_plugins/_vector/index/build" -d '
{
"indexName": "vector_routing_ivfpq_test",
"fieldName": "vector1",
"removeOldIndex": "true",
"ivf_train_only": "false"
}'
参数说明
参数 | 是否必填 | 说明 |
ivf_train_only | 是 |
无论设置为 |
清理训练数据,保留索引码本
如果将ivf_train_only设置为true
,则必须执行该步骤。该操作利用现有数据训练码本,不对现有数据生成索引。
其中,reserve_codebook=true
为必填项,表示保存索引码本。清理训练数据后需重新写入数据才可以执行纯向量数据查询(knn检索)。
如果ivf_train_only设置为false
,现存数据会根据训练的码本生成索引数据,且会保留现有的数据,您可跳过该步骤。
curl -u <username>:<password> -H "Content-Type: application/json" -XPOST "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/_truncate/vector_routing_ivfpq_test?pretty&reserve_codebook=true"
数据查询
纯向量数据查询
纯向量数据的查询可以通过knn
结构实现。
flat routing索引
curl -u <username>:<password> -H "Content-Type: application/json" -XPOST "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_flat_test/_search?pretty&routing=1" -d '{
"size": 20,
"query": {
"knn": {
"vector1": {
"vector": [2.3, 3.3, 4.4],
"k": 20
}
}
}
}'
hnsw routing索引
curl -u <username>:<password> -H 'Content-Type: application/json' -XGET "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_hnsw_test/_search?pretty&routing=1" -d '
{
"size": 10,
"query": {
"knn": {
"vector1": {
"vector": [2.2, 2.3, 2.4],
"k": 10
}
}
},
"ext": {"lvector": {"ef_search": "100"}}
}'
sparse_hsnw routing稀疏向量索引
curl -u <username>:<password> -H 'Content-Type: application/json' -XGET "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_routing_sparse_test/_search?pretty&routing=1" -d '
{
"size": 10,
"query": {
"knn": {
"vector1": {
"vector": {"indices": [10, 45, 16], "values": [0.5, 0.5, 0.2]},
"k": 10
}
}
},
"ext": {"lvector": {"ef_search": "100"}}
}'
ivfpq routing索引
curl -u <username>:<password> -H 'Content-Type: application/json' -XGET "http://ld-bp1h002998iv8****.lindorm.aliyuncs.com:30070/vector_ivfpq_test/_search?pretty&routing=1" -d '
{
"size": 10,
"query": {
"knn": {
"vector1": {
"vector": [2.2, 2.3, 2.4],
"k": 10
}
}
},
"ext": {"lvector": {"nprobe": "2", "reorder_factor": "2","client_refactor":"true"}}
}'
在自定义路由键场景中,可以将nprobe的值设置为创建索引时设置的nlist参数的值。
融合查询
使用融合查询前请确保您的索引已指定全文检索字段。
全文向量混合检索
curl -u <username>:<password> -H "Content-Type: application/json" -XPOST "http://ld-bp1h002998iv8****-proxy-search-vpc.lindorm.rds.aliyuncs.com:30070/vector_text_hybridSearch/_search?pretty&routing=1" -d '{
"size": 10,
"_source": false,
"query": {
"knn": {
"vector1": {
"vector": [2.8, 2.3, 2.4],
"filter": {
"bool": {
"must": [{
"bool": {
"must": [{
"match": {
"text_field": { // 替换为您需要检索的全文字段
"query": "test1 test2"
}
}
},
{
"term": {
"_routing": "替换为链接地址中指定的routing值,如1、user123"
}
}]
}
}]
}
},
"k": 10
}
}
},
"ext": {
"lvector": {
"hybrid_search_type": "filter_rrf",
"rrf_rank_constant": "60",
"rrf_knn_weight_factor": "0.5"
}
}
}'
向量+全文+属性过滤
curl -u <username>:<password> -H "Content-Type: application/json" -XPOST "http://ld-bp1h002998iv8****-proxy-search-vpc.lindorm.rds.aliyuncs.com:30070/vector_text_hybridSearch/_search?pretty&routing=1" -d '{
"size": 10,
"_source": false,
"query": {
"knn": {
"vector1": {
"vector": [2.8, 2.3, 2.4],
"filter": {
"bool": {
"must": [{
"bool": {
"must": [{
"match": {
"text_field": { // 替换为您需要检索的全文字段
"query": "test1 test2"
}
}
},
{
"term": {
"_routing": "替换为连接语句中指定的routing值,如1、user123"
}
}]
}
},
{
"bool": {
"filter": [{
"range": {
"field1": {
"gt": 2
}
}
}]
}
}]
}
},
"k": 10
}
}
},
"ext": {
"lvector": {
"hybrid_search_type": "filter_rrf",
"rrf_rank_constant": "60",
"rrf_knn_weight_factor": "0.5",
"filter_type": "efficient_filter"
}
}
}'