A vector is a list of numbers that captures the meaning and relationships of data — text, images, or any other content. Vectors that are numerically close together represent data that is semantically similar. DashVector stores, indexes, and searches these vectors so you can build similarity search, recommendation, and retrieval-augmented generation (RAG) applications at scale.
How embedding works
Before storing data in DashVector, convert it to a vector using an embedding model. This conversion is called embedding.
For example, the Model Studio text-embedding-v1 model converts a piece of input text into a vector:
import dashscope
from dashscope import TextEmbedding
dashscope.api_key = {YOUR API KEY}
def embed_with_str():
resp = TextEmbedding.call(
model=TextEmbedding.Models.text_embedding_v1,
input='The quality of the clothes is simply outstanding, and they look fabulous. The wait was completely worth it, and I am absolutely delighted with my purchase. I will definitely be a repeat customer here.')
print(resp)
if __name__ == '__main__':
embed_with_str()The response contains an embedding field — that array of numbers is the vector:
{
"status_code": 200,
"request_id": "617b3670-6f9e-9f47-ad57-997ed8aeba6a",
"code": "",
"message": "",
"output": {
"embeddings": [
{
"embedding": [
0.09393704682588577,
2.4155092239379883,
-1.8923076391220093,
...
],
"text_index": 0
}
]
},
"usage": {
"total_tokens": 23
}
}Key concepts
Dimensions
The dimension of a vector is the number of elements in the array. A vector with 1,024 numbers is a 1,024-dimensional vector. The dimension is fixed for a given model — every output of text-embedding-v1 has exactly 1,024 dimensions.
Model name | Vector dimension | Maximum rows | Maximum tokens per row | Supported languages |
text-embedding-v3 | 1,024 (default), 768, 512, 256, 128, or 64 | 10 | 8,192 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and more than 50 other major languages |
text-embedding-v2 | 1,536 | 25 | 2,048 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian |
text-embedding-v1 | Chinese, English, Spanish, French, Portuguese, Indonesian | |||
text-embedding-async-v2 | 100,000 | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian | ||
text-embedding-async-v1 | Chinese, English, Spanish, French, Portuguese, Indonesian |
Data type
The data type of a vector is the data type of its elements. For example, text-embedding-v1 produces float elements, so its vectors are of the float type. A simple integer vector like [1, 2, 3, 4] is a 4-dimensional int vector.
Dimension and data type vary by embedding model. When creating a collection, set these parameters to match the model you use.
Distance metrics
DashVector measures similarity between vectors using distance. A shorter or smaller distance means the two vectors are more similar. DashVector supports three distance metrics.
Cosine distance
Cosine distance measures the angle between two vectors, not their magnitude. It works well for text and semantic search where the direction of meaning matters more than the scale.
DashVector defines cosine distance as:
cosine distance = 1 − cosine similarity
The valid range is [0, 2]. A smaller value indicates greater similarity.

Where _A_ and _B_ are two vectors, _n_ is the dimension, · is the dot product, and ||_A_|| and ||_B_|| are the magnitudes of the two vectors.


Euclidean distance
Euclidean distance is the straight-line distance between two vectors in space. It works well for spatial data and geometric similarity where the actual magnitude and position of vectors matter.
A shorter Euclidean distance indicates greater similarity.

Where _A_ and _B_ are two vectors and _n_ is the dimension.

Dot product
Dot product (also called scalar product) measures how much two vectors align. It works well for recommendation systems where you need to quantify how strongly two items agree with each other.
A larger dot product indicates greater similarity.

Where _A_ and _B_ are two vectors and _n_ is the dimension.

Common models and parameters
When creating a collection, match the dimension, data type, and distance metric to your embedding model.
| Model | Vector dimensions | Data type | Recommended distance metric |
|---|---|---|---|
| text-embedding-v3 | 1,536 | Float(32) | Cosine |
1024 | Float(32) | Cosine | |
| OpenAI Embedding | 1,536 | Float(32) | Cosine |

text-embedding-v3 model specifications
| Specification | Value |
|---|---|
| Vector dimensions | 1,024 (default), 768, or 512 |
| Maximum rows per request | 10 |
| Maximum tokens per row | 8,192 |
| Supported languages | Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and 50+ other languages |
For more information, see the text-embedding-v3 quick start guide.