开放搜索OpenSearch向量检索

使用OpenSearch纯向量检索场景实践。

1. 什么是向量检索

人工智能算法可以对物理世界的人/物/场景所产生各种非结构化数据(如语音、图片、视频,语言文字、行为等)进行抽象,变成多维的向量。这些向量如同数学空间中的坐标,标识着各个实体和实体关系。我们一般将非结构化数据变成向量的过程称为 Embedding,而非结构化检索则是对这些生成的向量进行检索,从而找到相应实体的过程。

非结构化检索本质是向量检索技术,其主要的应用领域如人脸识别、推荐系统、图片搜索、视频指纹、语音处理、自然语言处理、文件搜索等。随着 AI 技术的广泛应用,以及数据规模的不断增长,向量检索也逐渐成了 AI 技术链路中不可或缺的一环,更是对传统搜索技术的补充,并且具备多模态搜索的能力。

为了满足更多元化、更复杂的多模态检索场景,开放搜索推出向量检索功能,可以一站式完成高性能向量检索系统的搭建。

2. 开放搜索实例创建

步骤1:点击立即购买

步骤2:配置实例规格参数

配置说明

  • 商品类型后付费(测试期间可使用后付费);

  • 地域和可用区华东1(杭州)(可自定义);

  • 应用名test_vector_opensearch(可自定义);

  • 版本类型:通用版

  • 规格:选择10GB,1000LCU独享计算型 最低配),点击“立即购买”;

步骤3:确认订单:勾选“我已阅读并同意”后“确认开通

开放搜索产品实例创建完成。

3. 向量召回服务实例配置

开放搜索控制台配置应用需要依次按照如下步骤进行:功能选择-->应用结构-->索引结构-->数据源-->完成。

3.1. 应用结构

开放搜索控制台-应用管理-->应用列表中找到对应的应用点击“配置

步骤一:配置应用结构

应用结构创建,有4种方式:数据源创建,手动创建,模板上传和文档上传,此处以MaxCompute为例:点击通过数据源创建选择MaxCompute点击新建数据库

填写连接数据库信息:

步骤二:选择对应的表点击确认

步骤三:选择主表和主键,如有多表join需求,可以参考多表join

注意:向量字段一定要设置为double array类型。

3.2. 索引结构

  1. 索引字段说明

应用结构配置完成后,系统会自动生成索引字段及其分析器、索引标签、和包含字段:

说明:这里需要为向量字段(vector_field)配置向量索引,维度可根据用户需求进行选择,OpenSearch默认支持64、128、256、512、1536维向量。

  1. 属性字段默认展示字段说明

3.3. 数据源

在配置应用结构时如选择MaxCompute数据源,此处会自动映射对应的项目表,您只需根据需求填写对应的分区导入条件即可,不填默认导入表全部分区数据:

若数据源表字段名称与配置应用结构中名称不一致,可点击编辑按钮手动修改映射字段:

确认无误后点击完成:

3.4. 配置完成

4. 在线查询

向量查询语法点击此处进行参考。

  • 搜索测试页检索:扩展功能>搜索测试

#这里使用的是1536维向量,未全部展示
vector_index:'-0.01786,0.03692,0.03710,0.01668,0.03655,-0.03515,0.02017,-0.00653,-0.01419,-0.01708,-0.00091,-0.03528,0.02821,-0.02194,-0.01609,-0.02045,0.02209,0.06413,0.06233,0.03064,-0.00863,-0.06810,0.00729,0.07912,-0.03948,0.06932,0.02051,-0.00688,-0.01138,0.03207,0.03040,-0.00050,0.06220,-0.03895,0.04575,-0.00259,0.04358,0.02027,0.03342,-0.02916,0.04793,-0.02954,0.04327,0.06156,-0.00230,0.00653,0.01515,-0.00287,0.03546,-0.01551,-0.03049,0.07542,-0.01563,0.00680,0.00598,-0.00396,0.00330,0.00359,-0.03395,-0.00825,-0.02175,0.04479,0.04008,0.03558,-0.03011,-0.00015,0.03086,-0.00941,0.03113,0.00758,-0.04333,0.04607,-0.02520,-0.01260,-0.04726,0.00564,-0.02423,-0.00439,-0.02739,-0.01674,0.06426,-0.05995,0.01762,0.04370,0.02211,-0.03174,0.04465,0.00475,-0.03577,0.01111,-0.00963,0.03510,-0.02533,-0.00444,0.00161,0.00561,0.00066,-0.04074,0.00682,0.03293,-0.01630,-0.02575,0.02834,0.02679,-0.04558,0.02395,0.00531,0.01240,0.04064,0.03599,0.00172,0.00413,-0.06839...&sf=0.8'