OSS data source-OpenSearch(Open Search)-阿里云帮助中心

Prerequisites

Add an OSS data source

Go to the OpenSearch console, switch to OpenSearch Vector Search Edition in the upper-left corner, find your instance on the instance list page, and click Manage in the Actions column.
On the Configuration Center > Data Source page, click Add Data Source. In the dialog box that appears, select OSS, enter the Data Source Name, OSS Path, and Bucket, and then click Verify.

Parameters:

Data Source Name: A custom name for the data source. (The name must start with a letter and can contain letters, digits, and underscores (_).)
OSS Path: The path to access the objects in the OSS bucket.
Bucket: The name of the OSS bucket.

Note:

The directory name must contain opensearch. Otherwise, the system cannot read the data. The name cannot contain special characters such as =, &, or ?.
To obtain the OSS path, go to your bucket in the OSS console and create a directory. Use the path of this directory. For example: /opensearch_index_data/.

Bucket: The name of your OSS bucket. You can use the name of a bucket you created in OSS.

Log on to the OSS console and click Buckets in the left-side navigation pane. Find the target bucket, confirm its name, and enter it in the Bucket field of the data source configuration.

Add an index table:
1. After creating the OSS data source, go to Configuration Center > Index Schema to add an index table.
2. Configure the index table. Specify a custom Index Table Name and select the OSS data source that you configured.

In the field settings, add a pk field (Type: STRING, select Primary Key) and an embeddings field (Type: FLOAT, select Multi-value, and set the delimiter to ^]). For both fields, select Attribute Field and Display in Search Results. Set Data Compression to No Compression and Analyzer to uniq. In the index settings, add a pk index (Type: PRIMARYKEY64, includes field: pk) and a vector index (Type: CUSTOMIZED, includes fields: pk and embeddings).

This example configures two fields: pk and embeddings. For sample data, see oss_test.txt.

CMD=add
pk=999000
embeddings=0.00.0039257140.0098142860.0039257140.00
pk=999000
embeddings=0.00.0039257140

The data format for the HA schema is described below.

Reindexing: Go to the O&M Center > O&M Management > Reindexing page to start reindexing.

In the Reindexing dialog box, select a Data Source Name, such as ha-cn-9lb3aulas01_oss_test, confirm the Data Source Type and Associated Index Table, set the Timestamp, and then click OK.

After reindexing is complete, you can run test queries.

HA file format

The data source file for indexing must be UTF-8 encoded and adhere to the format described in this section.

The following example shows the content of a complete data file named standard_sample.data.

CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^
CMD=delete^_
PK=12345321^_CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^
CMD=delete^_
PK=12345321^_

The data file above contains two commands: add and delete. Each command consists of multiple lines, and each line is a key-value pair. Commands are separated by '^^\n', key-value pairs by '^_\n', and multiple values by '^]'. See below for details.

File delimiters

C++ encoding	ASCII (hexadecimal)	Description	Display in Emacs/Vi	Input method in Emacs	Input method in Vi
"\x1F\n"	1F0A	Key-value delimiter	^_ (followed by a line break)	C-q C-7	C-v C-7
"\x1E\n"	1E0A	Command delimiter	^^ (followed by a line break)	C-q C-6	C-v C-6
"\x1D"	1D	Multi-value delimiter	^]	C-q C-5	C-v C-5
"\x1C"	1C	Section weight identifier	^\	C-q C-4	C-v C-4
"\x1D"	1D	Section delimiter	^]	C-q C-5	C-v C-5
"\x03"	03	Sub-document field delimiter	^C	C-q C-c	C-v C-c

Command format definition
- Add command format
  
  Use the add command to add content to an index.
  
  The first line of an add command must be CMD=add, followed by the document's fields. The fields can be in any order. All fields in the command must be defined in the schema.

CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^

Delete command format

Use the delete command to remove content from an index.

The first line of a delete command must be CMD=delete. The subsequent lines must specify the field that is defined as the primary key in the index schema, as well as any fields used for partition hashing.

If the primary key and the hash field are the same, you only need to specify the field once.

CMD=delete^_
PK=12345321^_
^^CMD=delete^_
PK=12345321^_
^^

Delete an OSS data source

On the Data Source Configuration page, find the data source you want to delete and click Delete.

Important:

This action is irreversible, and a deleted data source cannot be recovered. Proceed with caution.

If an OSS data source is associated with an index table, you must delete the index table before you can delete the data source.

Usage notes

The OSS service must be activated in the same region as your OpenSearch instance.
OpenSearch Vector Search Edition does not support OSS buckets that do not have a region attribute.
When you add an OSS data source, a service-linked role named AliyunServiceRoleForSearchEngine is automatically created if it does not already exist. OpenSearch uses this role to access your resources in other cloud services.