Vector index best practices

更新时间:
复制 MD 格式

Set up a vector index in OpenSearch Retrieval Engine Edition and run vector queries using the SDK. This guide covers the full configuration workflow: table setup, data synchronization, index schema definition, and index rebuilding.

Prerequisites

Before you begin, make sure you have:

  • A purchased OpenSearch Retrieval Engine Edition instance. See Purchase an OpenSearch Retrieval Engine Edition instance

  • Vector data ready to ingest (pre-generated vectors, or raw data you plan to vectorize)

  • Credentials for your data source (for example, AccessKey ID and AccessKey Secret for MaxCompute)

How it works

After purchasing an instance, OpenSearch automatically deploys an empty cluster that matches the number and specifications of your purchased query and data nodes. The instance shows a Pending configuration status until you complete setup.

Complete the following steps in order:

StepWhat you configure
1. Table basic informationTable name, shard count, and update resources
2. Data synchronizationData source connection and initial data load
3. Index schemaFields, including the required primary key and vector fields
4. Index rebuildingTrigger the initial full index build

Configure table basic information

Set the Table name, Number of shards, and Number of data update resources.

Warning

The shard count is capped at 256. Choose carefully.

Note

Keep the shard count at or below three times the number of data nodes in your instance.

For data update resources: two are provided free of charge per table. Each additional resource beyond that is billed as n - 2, where n is the total number of resources for the table.

Connect a data source

OpenSearch supports three full data source types. Choose based on where your data lives and how it is updated:

Data sourceBest for
MaxComputeLarge-scale batch data already in MaxCompute tables
API pushReal-time or incremental data pushed from your application
Object Storage Service (OSS)Data stored as files in OSS buckets

To add a data source:

  1. Go to Data synchronization and click Add data source.

  2. Select the data source type.

  3. Fill in the required connection details.

For MaxCompute, provide the project name, AccessKey ID, AccessKey Secret, table name, and partition key. Consider enabling Automatic index rebuilding.

Define the index schema

In Index schema, define all fields your index will use.

Required fields

Every vector index requires the following two fields:

FieldTypeNotes
Primary key fieldAnyUniquely identifies each document
Vector fieldMulti-value floatStores the vector embeddings
Warning

The vector field must be configured as a multi-value float type. Other types are not supported.

Optional fields

To filter vector results by category, add a category field. Set its type to single-value or multi-value integer.

Compression settings

Attribute fields

ModeOptions
Form modeUncompressed, Compressed
Developer modeno_compressor, file_compressor

Field content

By default, field content is uncompressed. The following defaults apply when compression is enabled:

Field typeDefault compression
Multi-value and STRINGuniq compression
Single-value numericequal compression

What's next

After completing the index schema, trigger an index rebuild to populate the index with your data, then use the SDK to run vector queries.