Vector storage

更新时间:
复制 MD 格式

ApsaraDB RDS for MySQL natively stores and queries floating-point vectors with up to 16,383 dimensions, using HNSW indexes for high-performance approximate nearest neighbor search. Query vector data with standard SQL — no separate vector database required.

Prerequisites

Before you begin, ensure that you have:

How it works

  1. Enable Vector Storage on your RDS instance from the console.

  2. Create a table with a VECTOR(n) column and an HNSW index specifying the distance type.

  3. Insert vector data using VEC_FROMTEXT, or batch-load from existing tables.

  4. Run similarity queries with VEC_DISTANCE and LIMIT — the HNSW index accelerates retrieval automatically.

The HNSW index uses single instruction multiple data (SIMD) hardware acceleration, Bloom filter search pruning, and LIMIT condition pushdown to speed up large-scale vector retrieval.

Vector Storage is fully compatible with the MySQL protocol and supports Java Database Connectivity (JDBC)/Object-Relational Mapping (ORM) tools and mainstream developer frameworks. It integrates with Data Transmission Service (DTS) for data synchronization and Data Management (DMS) for instance management, providing full lifecycle capabilities including data synchronization, management, backup, and recovery. Existing instances can be upgraded with one click — no new cluster needed.

Key concepts

VECTOR data type — Stores floating-point vectors of up to 16,383 dimensions. Compatible with standard SQL interfaces for read, write, and batch update.

HNSW index — A graph-based approximate nearest neighbor index. The two key parameters that control its behavior are:

  • M — the maximum number of connections per node in the graph. A higher M value improves recall at the cost of more memory and slower index builds.

  • ef_search — the search range during queries. A higher ef_search value improves recall at the cost of slower query speed.

Distance types — Two distance metrics are supported:

  • EUCLIDEAN — straight-line (geometric) distance between vectors in multidimensional space. Smaller distance = more similar.

  • COSINE — cosine of the angle between vectors. Measures directional similarity, ignoring vector length. Smaller distance = more similar.

Enable Vector Storage

Enabling or disabling Vector Storage does not require an instance restart.

  1. Go to the Instances page. In the top navigation bar, select the region where your instance resides, then click the instance ID.

  2. On the Basic Information page, find Vector Storage in the Status section and click Enable.

  3. Wait for the status to change to Enabled.

Create a table and insert vector data

Step 1: Create a table with a vector column and HNSW index

-- Create a table with a 5-dimension vector column and an HNSW index
CREATE TABLE product_embeddings (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  product_name VARCHAR(255),
  embedding VECTOR(5) NOT NULL,
  -- Specify the M value and distance type when creating the index
  VECTOR INDEX idx_embedding(embedding) M=16 DISTANCE=COSINE
);

Step 2: Insert vector data

Use VEC_FROMTEXT to convert string representations into vectors.

-- Insert three product embeddings
INSERT INTO product_embeddings (product_name, embedding) VALUES
  ('product_A', VEC_FROMTEXT('[0.1, 0.2, 0.3, 0.4, 0.5]')),
  ('product_B', VEC_FROMTEXT('[0.6, 0.7, 0.8, 0.9, 1.0]')),
  ('product_C', VEC_FROMTEXT('[0.11, 0.22, 0.33, 0.44, 0.55]'));

Run vector similarity queries

Find the two products most similar to a given vector using cosine distance. A smaller cosine distance means greater similarity.

SELECT
  id,
  product_name,
  VEC_DISTANCE(embedding, VEC_FROMTEXT('[0.1, 0.2, 0.3, 0.4, 0.51]')) AS similarity_score
FROM product_embeddings
ORDER BY similarity_score ASC
LIMIT 2;

If one argument to VEC_DISTANCE is an indexed column, the index distance type is applied automatically.

Vector functions

FunctionDescription
VECTOR_DIMReturns the number of dimensions in a vector
VEC_FROMTEXT / TO_VECTOR / STRING_TO_VECTORConverts a string to a vector
VEC_TOTEXT / FROM_VECTOR / VECTOR_TO_STRINGConverts a vector to a string
VEC_DISTANCECalculates the distance between two vectors. If one argument is an indexed column, the index distance type is applied automatically.
VEC_DISTANCE_EUCLIDEANCalculates Euclidean distance between two vectors
VEC_DISTANCE_COSINECalculates cosine distance between two vectors

Manage parameters

All vector-related parameters are dynamic — changes take effect immediately without an instance restart.

Parameter reference

ParameterScopeTypeDefaultRangeDescription
vidx_default_distanceSessionStringEUCLIDEANEUCLIDEAN, COSINEDefault distance type for vector queries. EUCLIDEAN measures straight-line distance; COSINE measures directional similarity.
vidx_hnsw_default_mSessionInteger6[3, 200]Default M value for HNSW indexes (max connections per node). Higher values improve recall but use more memory.
vidx_hnsw_ef_searchSessionInteger20[1, 10000]Search range for HNSW queries. Higher values improve recall at the cost of query speed.
vidx_hnsw_cache_sizeGlobalBigInt1048576[1048576, 18446744073709551615]Maximum memory the HNSW index cache can use, in bytes.

Modify parameters

  1. Go to the Instances page. Select the region and click the instance ID.

  2. In the left navigation pane, click Parameter.

  3. On the Modifiable Parameters tab, find the parameter and set a new value.

  4. Click OK, then Apply Changes. In the dialog that appears, select when the changes take effect.

Limitations

  • Vector indexes can only be created on InnoDB tables.

  • The primary key of a table with a vector index cannot exceed 256 bytes.

  • The inplace syntax cannot be used to create, modify, or delete vector indexes.

  • Vector indexes cannot be set to INVISIBLE.

  • Tables with vector indexes do not support the Recycle Bin feature.

  • Data modification and queries on vector indexes support only the Read Committed (RC) isolation level.

  • Because the HNSW algorithm involves random levels and heuristic graph construction, the graph structures on primary and standby instances are not guaranteed to be identical.

  • If the source database uses the vector type in stored procedures or functions, synchronization or migration to a destination that does not support vectors will fail.

Use cases

  • Semantic search — Store text or image embeddings and retrieve the most semantically similar items using nearest neighbor queries.

  • AI-powered recommendation — Find products, articles, or content similar to a user's past interactions based on vector distance.

  • Multi-modal analysis — Combine vector similarity search with SQL filters to support mixed structured and unstructured data queries.

What's next