Vector index best practices-OpenSearch(Open Search)-阿里云帮助中心

Set up a vector index in OpenSearch Retrieval Engine Edition and run vector queries using the SDK. This guide covers the full configuration workflow: table setup, data synchronization, index schema definition, and index rebuilding.

Prerequisites

Before you begin, make sure you have:

A purchased OpenSearch Retrieval Engine Edition instance. See Purchase an OpenSearch Retrieval Engine Edition instance
Vector data ready to ingest (pre-generated vectors, or raw data you plan to vectorize)
Credentials for your data source (for example, AccessKey ID and AccessKey Secret for MaxCompute)

How it works

After purchasing an instance, OpenSearch automatically deploys an empty cluster that matches the number and specifications of your purchased query and data nodes. The instance shows a Pending configuration status until you complete setup.

Complete the following steps in order:

Step	What you configure
1. Table basic information	Table name, shard count, and update resources
2. Data synchronization	Data source connection and initial data load
3. Index schema	Fields, including the required primary key and vector fields
4. Index rebuilding	Trigger the initial full index build

Configure table basic information

Set the Table name, Number of shards, and Number of data update resources.

Warning

The shard count is capped at 256. Choose carefully.

Note

Keep the shard count at or below three times the number of data nodes in your instance.

For data update resources: two are provided free of charge per table. Each additional resource beyond that is billed as n - 2, where n is the total number of resources for the table.

Connect a data source

OpenSearch supports three full data source types. Choose based on where your data lives and how it is updated:

Data source	Best for
MaxCompute	Large-scale batch data already in MaxCompute tables
API push	Real-time or incremental data pushed from your application
Object Storage Service (OSS)	Data stored as files in OSS buckets

To add a data source:

Go to Data synchronization and click Add data source.
Select the data source type.
Fill in the required connection details.

For MaxCompute, provide the project name, AccessKey ID, AccessKey Secret, table name, and partition key. Consider enabling Automatic index rebuilding.

Define the index schema

In Index schema, define all fields your index will use.

Required fields

Every vector index requires the following two fields:

Field	Type	Notes
Primary key field	Any	Uniquely identifies each document
Vector field	Multi-value float	Stores the vector embeddings

Warning

The vector field must be configured as a multi-value float type. Other types are not supported.

Optional fields

To filter vector results by category, add a category field. Set its type to single-value or multi-value integer.

Compression settings

Attribute fields

Mode	Options
Form mode	Uncompressed, Compressed
Developer mode	`no_compressor`, `file_compressor`

Field content

By default, field content is uncompressed. The following defaults apply when compression is enabled:

Field type	Default compression
Multi-value and STRING	uniq compression
Single-value numeric	equal compression

What's next

After completing the index schema, trigger an index rebuild to populate the index with your data, then use the SDK to run vector queries.