OpenAPI封装了云原生数据仓库AnalyticDB PostgreSQL版向量操作的DDL和DML,使您可以通过OpenAPI来管理向量数据。本文以SDK Java调用方式介绍如何通过API导入并查询向量数据。

前提条件

操作流程

  1. 初始化向量库

  2. 创建Namespace

  3. 创建Collection

  4. 上传向量数据

  5. 召回向量数据

初始化向量库

在使用向量检索前,需初始化knowledgebase库以及全文检索相关功能。

调用示例如下:

InitVectorDatabaseRequest request = new InitVectorDatabaseRequest();
request.setDBInstanceId("gp-bp1c62r3l489****");
request.setManagerAccount("myaccount");
request.setManagerAccountPassword("myaccount_password");
request.setRegionId("ap-southeast-1");
InitVectorDatabaseResponse response = client.getAcsResponse(request);
System.out.println(new Gson().toJson(response));

参数说明,请参见InitVectorDatabase - 初始化向量数据库

创建Namespace

Namespace用于Schema隔离,在使用向量前,需至少创建一个Namespace或者使用public的Namespace。

调用示例如下:

CreateNamespaceRequest request = new CreateNamespaceRequest();
request.setDBInstanceId("gp-bp1c62r3l489****");
request.setManagerAccount("myaccount");
request.setManagerAccountPassword("myaccount_password");
request.setNamespace("vector_test");
request.setNamespacePassword("vector_test_password");
request.setRegionId("ap-southeast-1");
CreateNamespaceResponse response = client.getAcsResponse(request);
System.out.println(new Gson().toJson(response));

参数说明,请参见CreateNamespace - 创建命名空间

创建完后,可以在实例的knowledgebase库查看对应的Schema。

SELECT schema_name FROM information_schema.schemata;

创建Collection

Collection用于存储向量数据,并使用Namespace隔离。

调用示例如下:

Map<String,String> metadata = new HashMap<>();
metadata.put("title", "text");
metadata.put("link", "text");
metadata.put("content", "text");
metadata.put("pv", "int");
List<String> fullTextRetrievalFields = Arrays.asList("title", "content");

CreateCollectionRequest request = new CreateCollectionRequest();
request.setDBInstanceId("gp-bp1c62r3l489****");
request.setManagerAccount("myaccount");
request.setManagerAccountPassword("myaccount_password");
request.setNamespace("vector_test");
request.setCollection("document");
request.setDimension(10L);
request.setFullTextRetrievalFields(StringUtils.join(fullTextRetrievalFields, ","));
request.setMetadata(new Gson().toJson(metadata));
request.setParser("zh_ch");
request.setRegionId("ap-southeast-1");
CreateCollectionResponse response = client.getAcsResponse(request);
System.out.println(new Gson().toJson(response));

参数说明,请参见CreateCollection - 创建向量数据集

创建完后,可以在实例的knowledgebase库查看对应的Table。

SELECT tablename FROM pg_tables WHERE schemaname='vector_test';

上传向量数据

将准备好的Embedding向量数据上传到对应的Collection中。

调用示例如下:

UpsertCollectionDataRequest request = new UpsertCollectionDataRequest();
request.setDBInstanceId("gp-bp1c62r3l489****");
request.setCollection("document");
request.setNamespace("vector_test");
request.setNamespacePassword("vector_test_password");
request.setRegionId("ap-southeast-1");

List<UpsertCollectionDataRequest.UpsertCollectionDataRequestRows> rows = new ArrayList<>();
UpsertCollectionDataRequest.UpsertCollectionDataRequestRows row = new UpsertCollectionDataRequest.UpsertCollectionDataRequestRows();
row.setId("0CB55798-ECF5-4064-B81E-FE35B19E01A6");
row.setVector(Arrays.asList(0.2894745251078251,0.5364747050266715,0.1276845661831275,0.22528871956822372,0.7009319238651552,0.40267406135256123,0.8873626696379067,0.1248525955774931,0.9115507046412368,0.2450859133174706));
Map<String, String> rowsMetadata = new HashMap<>();
rowsMetadata.put("title", "测试文档");
rowsMetadata.put("content","测试内容");
rowsMetadata.put("link","http://127.0.0.1/document1");
rowsMetadata.put("pv","1000");
row.setMetadata(rowsMetadata);
rows.add(row);
request.setRows(rows);
UpsertCollectionDataResponse response = client.getAcsResponse(request);
System.out.println(new Gson().toJson(response));

参数说明,请参见UpsertCollectionData - 上传向量数据

上传完成,可以在实例的knowledgebase库查看数据。

SELECT * FROM vector_test.document;

召回向量数据

准备需要召回的查询向量或全文检索字段,执行查询接口。

调用示例如下:

QueryCollectionDataRequest request = new QueryCollectionDataRequest();
request.setDBInstanceId("gp-bp1c62r3l489****");
request.setCollection("document");
request.setNamespace("vector_test");
request.setNamespacePassword("vector_test_password");
request.setContent("测试");
request.setFilter("pv > 10");
request.setTopK(10L);
request.setVector(Arrays.asList(0.7152607422256894,0.5524872066437732,0.1168505269851303,0.704130971473022,0.4118874999967596,0.2451574619214022,0.18193414783144812,0.3050522957905741,0.24846180714868163,0.0549715380856951));
request.setRegionId("ap-southeast-1");
QueryCollectionDataResponse response = client.getAcsResponse(request);
System.out.println(new Gson().toJson(response));

返回结果如下:

{
  "Matches": {
    "match": [
      {
        "Id": "0CB55798-ECF5-4064-B81E-FE35B19E01A6",
        "Metadata": {
          "title":"测试文档",
          "content":"测试内容",
          "link":"http://127.0.0.1/document1",
          "pv":"1000"
        },
        "Values": [
           0.2894745251078251,
           0.5364747050266715,
           0.1276845661831275,
           0.22528871956822372,
           0.7009319238651552,
           0.40267406135256123,
           0.8873626696379067,
           0.1248525955774931,
           0.9115507046412368,
           0.2450859133174706
        ]
      }
    ]
  },
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "Status": "success"
}

相关文档

向量检索