What are vectors

更新时间:
复制 MD 格式

A vector is a list of numbers that captures the meaning and relationships of data — text, images, or any other content. Vectors that are numerically close together represent data that is semantically similar. DashVector stores, indexes, and searches these vectors so you can build similarity search, recommendation, and retrieval-augmented generation (RAG) applications at scale.

How embedding works

Before storing data in DashVector, convert it to a vector using an embedding model. This conversion is called embedding.

For example, the Model Studio text-embedding-v1 model converts a piece of input text into a vector:

import dashscope
from dashscope import TextEmbedding

dashscope.api_key = {YOUR API KEY}

def embed_with_str():
    resp = TextEmbedding.call(
        model=TextEmbedding.Models.text_embedding_v1,
        input='The quality of the clothes is simply outstanding, and they look fabulous. The wait was completely worth it, and I am absolutely delighted with my purchase. I will definitely be a repeat customer here.')
    print(resp)

if __name__ == '__main__':
    embed_with_str()

The response contains an embedding field — that array of numbers is the vector:

{
  "status_code": 200,
  "request_id": "617b3670-6f9e-9f47-ad57-997ed8aeba6a",
  "code": "",
  "message": "",
  "output": {
    "embeddings": [
      {
        "embedding": [
          0.09393704682588577,
          2.4155092239379883,
          -1.8923076391220093,
          ...
        ],
        "text_index": 0
      }
    ]
  },
  "usage": {
    "total_tokens": 23
  }
}

Key concepts

Dimensions

The dimension of a vector is the number of elements in the array. A vector with 1,024 numbers is a 1,024-dimensional vector. The dimension is fixed for a given model — every output of text-embedding-v1 has exactly 1,024 dimensions.

Model name

Vector dimension

Maximum rows

Maximum tokens per row

Supported languages

text-embedding-v3

1,024 (default), 768, 512, 256, 128, or 64

10

8,192

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and more than 50 other major languages

text-embedding-v2

1,536

25

2,048

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian

text-embedding-v1

Chinese, English, Spanish, French, Portuguese, Indonesian

text-embedding-async-v2

100,000

Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian

text-embedding-async-v1

Chinese, English, Spanish, French, Portuguese, Indonesian

Data type

The data type of a vector is the data type of its elements. For example, text-embedding-v1 produces float elements, so its vectors are of the float type. A simple integer vector like [1, 2, 3, 4] is a 4-dimensional int vector.

Dimension and data type vary by embedding model. When creating a collection, set these parameters to match the model you use.

Distance metrics

DashVector measures similarity between vectors using distance. A shorter or smaller distance means the two vectors are more similar. DashVector supports three distance metrics.

Cosine distance

Cosine distance measures the angle between two vectors, not their magnitude. It works well for text and semantic search where the direction of meaning matters more than the scale.

DashVector defines cosine distance as:

cosine distance = 1 − cosine similarity

The valid range is [0, 2]. A smaller value indicates greater similarity.

image.png

Where _A_ and _B_ are two vectors, _n_ is the dimension, · is the dot product, and ||_A_|| and ||_B_|| are the magnitudes of the two vectors.

image.pngimage.png

Euclidean distance

Euclidean distance is the straight-line distance between two vectors in space. It works well for spatial data and geometric similarity where the actual magnitude and position of vectors matter.

A shorter Euclidean distance indicates greater similarity.

image.png

Where _A_ and _B_ are two vectors and _n_ is the dimension.

image.png

Dot product

Dot product (also called scalar product) measures how much two vectors align. It works well for recommendation systems where you need to quantify how strongly two items agree with each other.

A larger dot product indicates greater similarity.

image.png

Where _A_ and _B_ are two vectors and _n_ is the dimension.

image.png

Common models and parameters

When creating a collection, match the dimension, data type, and distance metric to your embedding model.

ModelVector dimensionsData typeRecommended distance metric
text-embedding-v31,536Float(32)Cosine

Multimodal embedding

1024

Float(32)

Cosine

OpenAI Embedding1,536Float(32)Cosine
image.png

text-embedding-v3 model specifications

SpecificationValue
Vector dimensions1,024 (default), 768, or 512
Maximum rows per request10
Maximum tokens per row8,192
Supported languagesChinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, and 50+ other languages

For more information, see the text-embedding-v3 quick start guide.