Set up a vector index in OpenSearch Retrieval Engine Edition and run vector queries using the SDK. This guide covers the full configuration workflow: table setup, data synchronization, index schema definition, and index rebuilding.
Prerequisites
Before you begin, make sure you have:
A purchased OpenSearch Retrieval Engine Edition instance. See Purchase an OpenSearch Retrieval Engine Edition instance
Vector data ready to ingest (pre-generated vectors, or raw data you plan to vectorize)
Credentials for your data source (for example, AccessKey ID and AccessKey Secret for MaxCompute)
How it works
After purchasing an instance, OpenSearch automatically deploys an empty cluster that matches the number and specifications of your purchased query and data nodes. The instance shows a Pending configuration status until you complete setup.
Complete the following steps in order:
| Step | What you configure |
|---|---|
| 1. Table basic information | Table name, shard count, and update resources |
| 2. Data synchronization | Data source connection and initial data load |
| 3. Index schema | Fields, including the required primary key and vector fields |
| 4. Index rebuilding | Trigger the initial full index build |
Configure table basic information
Set the Table name, Number of shards, and Number of data update resources.
The shard count is capped at 256. Choose carefully.
Keep the shard count at or below three times the number of data nodes in your instance.
For data update resources: two are provided free of charge per table. Each additional resource beyond that is billed as n - 2, where n is the total number of resources for the table.
Connect a data source
OpenSearch supports three full data source types. Choose based on where your data lives and how it is updated:
| Data source | Best for |
|---|---|
| MaxCompute | Large-scale batch data already in MaxCompute tables |
| API push | Real-time or incremental data pushed from your application |
| Object Storage Service (OSS) | Data stored as files in OSS buckets |
To add a data source:
Go to Data synchronization and click Add data source.
Select the data source type.
Fill in the required connection details.
For MaxCompute, provide the project name, AccessKey ID, AccessKey Secret, table name, and partition key. Consider enabling Automatic index rebuilding.
Define the index schema
In Index schema, define all fields your index will use.
Required fields
Every vector index requires the following two fields:
| Field | Type | Notes |
|---|---|---|
| Primary key field | Any | Uniquely identifies each document |
| Vector field | Multi-value float | Stores the vector embeddings |
The vector field must be configured as a multi-value float type. Other types are not supported.
Optional fields
To filter vector results by category, add a category field. Set its type to single-value or multi-value integer.
Compression settings
Attribute fields
| Mode | Options |
|---|---|
| Form mode | Uncompressed, Compressed |
| Developer mode | no_compressor, file_compressor |
Field content
By default, field content is uncompressed. The following defaults apply when compression is enabled:
| Field type | Default compression |
|---|---|
| Multi-value and STRING | uniq compression |
| Single-value numeric | equal compression |
What's next
After completing the index schema, trigger an index rebuild to populate the index with your data, then use the SDK to run vector queries.