Embedding-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

Embedding models convert data such as text, images, and videos into vectors for downstream tasks, including semantic search, recommendation, clustering, classification, and anomaly detection.

Prerequisites

Obtain an API key and export the API key as an environment variable. If you use the OpenAI SDK or DashScope SDK to make calls, install the SDK.

Get embeddings

Text embedding

To make an API request, specify the text to embed and the model to use.

OpenAI compatible API

Python

import os
from openai import OpenAI

input_text = "The quality of the clothes is excellent"

client = OpenAI(
    # If you have not set an environment variable, replace this line with your Model Studio API key: api_key="sk-xxx",
    # API keys are region-specific. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),  
    # The following is the base URL for the Beijing region. If you use a model in the Singapore region, replace base_url with: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
)

completion = client.embeddings.create(
    model="text-embedding-v4",
    input=input_text
)

print(completion.model_dump_json())

Node.js

const OpenAI = require("openai");

const openai = new OpenAI({
    // If you have not set an environment variable, replace this line with your Model Studio API key: apiKey:'sk-xxx',
    // API keys are region-specific. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY,
    // The following is the base URL for the Beijing region. If you use a model in the Singapore region, replace baseURL with: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    baseURL: 'https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1'
});

async function getEmbedding() {
    try {
        const inputTexts = "The quality of the clothes is excellent";
        const completion = await openai.embeddings.create({
            model: "text-embedding-v4",
            input: inputTexts
        });

        console.log(JSON.stringify(completion, null, 2));
    } catch (error) {
        console.error('Error:', error);
    }
}

getEmbedding();

curl

curl --location 'https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/embeddings' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v4",
    "input": "The quality of the clothes is excellent"
}'

DashScope

Python

import dashscope
from http import HTTPStatus
# This is the URL for the China (Beijing) region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
input_text = "The quality of the clothes is excellent"
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input=input_text,
)

if resp.status_code == HTTPStatus.OK:
    print(resp)

Java

import com.alibaba.dashscope.embeddings.TextEmbedding;
import com.alibaba.dashscope.embeddings.TextEmbeddingParam;
import com.alibaba.dashscope.embeddings.TextEmbeddingResult;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;

import java.util.Collections;

public class Main {
    // This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your workspace ID.
    static {Constants.baseHttpApiUrl="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1";}
    public static void main(String[] args) {
        String inputTexts = "The quality of the clothes is excellent";
        try {
            // Build the request parameters.
            TextEmbeddingParam param = TextEmbeddingParam
                    .builder()
                    .model("text-embedding-v4")
                    // Input text.
                    .texts(Collections.singleton(inputTexts))
                    .build();

            // Create a model instance and make a call.
            TextEmbedding textEmbedding = new TextEmbedding();
            TextEmbeddingResult result = textEmbedding.call(param);

            // Print the result.
            System.out.println(result);

        } catch (NoApiKeyException e) {
            // Catch and handle the exception for a missing API key.
            System.err.println("An exception occurred during the API call: " + e.getMessage());
            System.err.println("Please check if your API key is configured correctly.");
            e.printStackTrace();
        }
    }
}

curl

# ======= Important =======
# API keys are region-specific. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# The following is the URL for the China (Beijing) region. URLs are region-specific.
# === Remove this comment before running ===
curl --location 'https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v4",
    "input": {
        "texts": [
        "The quality of the clothes is excellent"
        ]
    }
}'

Independent multimodal vectors

You can generate separate vectors for different types of content, such as text, images, and videos. This is useful for processing each content type individually.

To generate independent multimodal vectors, use the DashScope SDK or call the API directly. This feature is unavailable through the OpenAI compatible API or in the console.

Python

import dashscope
import json
import os
from http import HTTPStatus
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your workspace ID.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

# The input can be a video.
# video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20250107/lbcemt/new+video.mp4"
# input = [{'video': video}]
# Or an image.
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
input = [{'image': image}]
resp = dashscope.MultiModalEmbedding.call(
    # If you have not set an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="tongyi-embedding-vision-plus",
    input=input
)

print(json.dumps(resp.output, indent=4))

Java

import com.alibaba.dashscope.embeddings.MultiModalEmbedding;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingItemImage;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingItemVideo;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingParam;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

import java.util.Collections;

public class Main {
    // This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your workspace ID.
    static {Constants.baseHttpApiUrl="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1";}
    public static void main(String[] args) {
        try {
            MultiModalEmbedding embedding = new MultiModalEmbedding();
            // The input can be a video.
            // MultiModalEmbeddingItemVideo video = new MultiModalEmbeddingItemVideo(
            //     "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20250107/lbcemt/new+video.mp4");
            // Or an image.
            MultiModalEmbeddingItemImage image = new MultiModalEmbeddingItemImage(
                "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png");

            MultiModalEmbeddingParam param = MultiModalEmbeddingParam.builder()
                .model("tongyi-embedding-vision-plus")
                .contents(Collections.singletonList(image))
                .build();

            MultiModalEmbeddingResult result = embedding.call(param);
            System.out.println(result);

        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.err.println("An exception occurred during the API call: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Multimodal fused vectors

You can combine content from different modalities, such as text, images, and videos, into a single fused vector. This enables applications such as text-to-image search, image-to-image search, text-to-video search, and cross-modal retrieval.

To generate multimodal fused vectors, use the Python DashScope SDK or call the API directly. This feature is unavailable through the OpenAI compatible API, the Java DashScope SDK, or the console.

qwen3-vl-embedding: Supports generating both fused and independent vectors. To generate a fused vector, set the boolean parameter enable_fusion to true.
qwen2.5-vl-embedding: Supports only fused embeddings, not independent embeddings.
tongyi-embedding-vision-plus-2026-03-06 and tongyi-embedding-vision-flash-2026-03-06 support both fused and independent embeddings. A fused embedding is created by placing text, image, and video in the same content object, without the need for the enable_fusion parameter.

Python

import dashscope
import json
import os
from http import HTTPStatus
# The following configuration is for the China (Beijing) region. When making the call, replace {WorkspaceId} with your actual workspace ID. Configurations are region-specific.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

# Multimodal fused vector: Combines text, images, and videos into a single fused vector.
# Suitable for use cases like cross-modal retrieval and image search.
text = "This is a test text for generating a multimodal fused vector"
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20250107/lbcemt/new+video.mp4"

# The input contains text, an image, and a video. A fused vector is generated by setting the enable_fusion parameter.
input_data = [
    {"text": text},
    {"image": image},
    {"video": video}
]

# Use qwen3-vl-embedding to generate a fused vector.
resp = dashscope.MultiModalEmbedding.call(
    # If you have not set an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen3-vl-embedding",
    input=input_data,
    enable_fusion=True,
    # Optional parameter: Specifies the vector dimension. Supported values: 2560, 2048, 1536, 1024, 768, 512, and 256. Default: 2560.
    # dimension = 1024
)

print(json.dumps(resp.output, indent=4))

The following example uses tongyi-embedding-vision-plus-2026-03-06 to generate a fused embedding. Unlike qwen3-vl-embedding, this model achieves fusion by placing text, image, and video in the same content object and does not require the enable_fusion parameter.

import dashscope
import json
import os
from http import HTTPStatus
# The following configuration is for the China (Beijing) region. When making the call, replace {WorkspaceId} with your actual workspace ID. Configurations are region-specific.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

# Example of a tongyi-embedding-vision-plus-2026-03-06 fused vector.
# Place text and image in the same content object. The enable_fusion parameter is not required.
text = "White sneakers, lightweight and breathable, suitable for running and daily wear"
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"

# Multimodal content in the same object is fused into a single vector (type: "fused").
input_data = [
    {"text": text, "image": image}
]

resp = dashscope.MultiModalEmbedding.call(
    # If you have not set an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="tongyi-embedding-vision-plus-2026-03-06",
    input=input_data,
    # Optional parameter: Specifies the vector dimension. Supported values: 1152, 1024, 512, 256, 128, and 64. Default: 1152.
    dimension=1152
)

print(json.dumps(resp.output, indent=4))

Java (HTTP)

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class Main {
    public static void main(String[] args) throws Exception {
        // If you have not set an environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx";
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Use enable_fusion to combine text, an image, and a video into a single fused vector.
        String requestBody = "{"
                + "\"model\": \"qwen3-vl-embedding\","
                + "\"input\": {"
                + "  \"contents\": ["
                + "    {\"text\": \"This is a test text for generating a multimodal fused vector\"},"
                + "    {\"image\": \"https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png\"},"
                + "    {\"video\": \"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20250107/lbcemt/new+video.mp4\"}"
                + "  ]"
                + "},"
                + "\"parameters\": {"
                + "  \"enable_fusion\": true"
                + "}"
                + "}";

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create("https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding"))
                .header("Authorization", "Bearer " + apiKey)
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
        System.out.println(response.body());
    }
}

The following example uses tongyi-embedding-vision-plus-2026-03-06 to generate a fused embedding. Unlike qwen3-vl-embedding, this model achieves fusion by placing text and image in the same content object, and does not require the enable_fusion parameter.

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class Main {
    public static void main(String[] args) throws Exception {
        // If you have not set an environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx";
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // tongyi-embedding-vision-plus-2026-03-06 fused vector.
        // Place text and image in the same content object. The enable_fusion parameter is not required.
        String requestBody = "{"
                + "\"model\": \"tongyi-embedding-vision-plus-2026-03-06\","
                + "\"input\": {"
                + "  \"contents\": ["
                + "    {"
                + "      \"text\": \"White sneakers, lightweight and breathable, suitable for running and daily wear\","
                + "      \"image\": \"https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png\""
                + "    }"
                + "  ]"
                + "},"
                + "\"parameters\": {"
                + "  \"dimension\": 1152"
                + "}"
                + "}";

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create("https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding"))
                .header("Authorization", "Bearer " + apiKey)
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
        System.out.println(response.body());
    }
}

Model selection

Choose the right model based on your input data type and use case.

Processing plain text or code: Use text-embedding-v4. It is the highest-performing model, supporting advanced features such as task instruction and a sparse vector to cover most text processing use cases.
Processing multimodal content:
- Fused embedding: To represent single-modal or mixed-modal inputs as a fused embedding for use cases such as cross-modal retrieval and image search, use qwen2.5-vl-embedding, qwen3-vl-embedding, tongyi-embedding-vision-plus-2026-03-06, or tongyi-embedding-vision-flash-2026-03-06. For example, you can input an image of a shirt with the text "find a similar style that looks more youthful," and the model fuses the image and task instruction into a single embedding for processing.
- Independent embedding: To generate an independent embedding for each input part (such as an image and its corresponding text caption), use tongyi-embedding-vision-plus, tongyi-embedding-vision-flash, tongyi-embedding-vision-plus-2026-03-06, tongyi-embedding-vision-flash-2026-03-06, or the general-purpose multimodal model multimodal-embedding-v1.
Processing large-scale data: To process large-scale, non-real-time text data, use text-embedding-v4 with the OpenAI compatible batch API to significantly reduce costs.

This table details the specifications for all available embedding models.

Text embedding

Beijing

Model	Embedding dimensions	Batch size	Max tokens per batch (note)	Price (per 1,000 input tokens)	Free quota (note)	Languages
text-embedding-v4 Part of the Qwen3-Embedding series	2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64	10	33,000	CNY 0.0005 batch API call: CNY 0.00025	1 million tokens Valid for 90 days after you activate Model Studio	100+ languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian
text-embedding-v3	1,024 (default), 768, 512, 256, 128, or 64	10	8,192	CNY 0.0005 batch API call: CNY 0.00025	500,000 tokens each Valid for 90 days after you activate Model Studio	50+ languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian
text-embedding-v2	1,536	25	2,048	CNY 0.0007 batch API call: CNY 0.00035		Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian
text-embedding-v1		25		CNY 0.0007 batch API call: CNY 0.00035		Chinese, English, Spanish, French, Portuguese, and Indonesian
text-embedding-async-v2		100,000		CNY 0.0007	20 million tokens Valid for 90 days after you activate Model Studio	Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian
text-embedding-async-v1		100,000		CNY 0.0007		Chinese, English, Spanish, French, Portuguese, and Indonesian

Singapore

Model

Embedding dimensions

Batch size

Max tokens per batch (note)

Price (per 1,000 input tokens)

Free quota (note)

Languages

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, or 64

8,192

USD 0.000514

no free quota

100+ languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

text-embedding-v3

1,024 (default), 768, 512, 256, 128, or 64

50+ languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

Note

Batch size is the maximum number of texts per API call. For example, text-embedding-v4 has a batch size of 10, which lets you include up to 10 texts for vectorization per request, where each text is limited to 8,192 tokens. This limit applies to:

string array input: The array can contain up to 10 elements.
file input: The text file can contain up to 10 lines.

Multimodal embedding

This model generates embeddings from text, image, or video inputs. You can use these embeddings for tasks like video and image classification, image-text retrieval, and text-to-image or text-to-video search.

The API accepts single text, image, or video inputs, as well as combinations like text and images. Some models support multiple inputs of the same type, such as multiple images. For details, see the limitations for each model.

China (Beijing)

Model	Embedding dimensions	Text length limit	Image size limit	Video size limit	Price (per 1,000 tokens)	Free quota (Note)
qwen3-vl-embedding	2560 (default), 2048, 1536, 1024, 768, 512, 256	32,000 token	Up to 10 MB per image	Up to 50 MB per video file	Image/Video: CNY 0.0018 Text: CNY 0.0007	1 million token This free quota expires 90 days after you activate Model Studio.
qwen2.5-vl-embedding	2048, 1024 (default), 768, 512	32,000 token	Up to 5 MB per image	Up to 50 MB per video file	Image/Video: CNY 0.0018 Text: CNY 0.0007
tongyi-embedding-vision-plus-2026-03-06	1152 (default), 1024, 512, 256, 128, 64	1,024 token	Recommended: up to 5 MB per image; maximum: 10 MB. Supports up to 64 images.	Up to 50 MB per video file Video encoding must be H.264 or H.265.	CNY 0.0005
tongyi-embedding-vision-flash-2026-03-06	768 (default), 512, 256, 128, 64				CNY 0.00015
tongyi-embedding-vision-plus	1152		Up to 3 MB per image. Supports up to 8 images.	Up to 10 MB per video file	CNY 0.0005
tongyi-embedding-vision-flash	768		Up to 3 MB per image. Supports up to 8 images.	Up to 10 MB per video file	CNY 0.00015
multimodal-embedding-v1	1,024	512 token	Up to 3 MB per image	Up to 10 MB per video file	Image/Video: CNY 0.0009 Text: CNY 0.0007

Singapore

Model	Embedding dimensions	Text length limit	Image size limit	Video size limit	Price (per 1,000 tokens)
tongyi-embedding-vision-plus	1152	1,024 token	Up to 8 images, with a maximum size of 3 MB each.	Up to 10 MB per video file	CNY 0.0005
tongyi-embedding-vision-flash	768	1,024 token	Up to 8 images, with a maximum size of 3 MB each.	Up to 10 MB per video file	CNY 0.00015

Input and language restrictions

Fused multimodal models
Model	Text	Image	Video	Request limit
qwen3-vl-embedding	Supports 33 major languages, such as Chinese, English, Japanese, Korean, French, and German. All supported languages Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Turkish, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi / Devanagari, and Hebrew.	JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, and SGI (URL or Base64 supported)	MP4, AVI, and MOV (URL only)	A single request can contain up to 20 content elements. This includes a maximum of 10 images and 1 video.
qwen2.5-vl-embedding	Supports 11 major languages, such as Chinese, English, Japanese, Korean, French, and German. All supported languages Chinese, English, Japanese, Korean, French, German, Russian, Portuguese, Spanish, Italian, and Indonesian.			A single request can contain a maximum of one of each content type: image, text, video, and fused object.
Multimodal embedding models
Model	Text	Image	Video	Request limit
tongyi-embedding-vision-plus-2026-03-06	Supports over 30 major languages, such as Chinese, English, Japanese, and Korean. All supported languages Chinese, Japanese, Korean, Indonesian, Vietnamese, Thai, English, French, German, Russian, Portuguese, Spanish, Italian, Swedish, Danish, Czech, Norwegian, Dutch, Finnish, Turkish, Polish, Swahili, Romanian, Serbian, Greek, Kazakh, Uzbek, Cebuano, Arabic, Urdu, Persian, Hindi / Devanagari, and Hebrew.	JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, and SGI (URL or Base64 supported)	MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, and MKV (URL only)	A single request can contain up to 20 text elements, 64 images, and 8 videos.
tongyi-embedding-vision-flash-2026-03-06
tongyi-embedding-vision-plus	Chinese and English	JPG, PNG, and BMP (URL or Base64 supported)	MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, and MKV (URL only)	No limit on the number of content elements. The total number of input tokens must not exceed the batch processing limit.
tongyi-embedding-vision-flash
multimodal-embedding-v1	Chinese and English	JPG, PNG, and BMP (URL or Base64 supported)		A single request is limited to 20 content elements. This limit is shared among a maximum of 1 image, 1 video, and 20 text elements.

Core features

Customize vector dimensions

The text-embedding-v4, text-embedding-v3, tongyi-embedding-vision-plus-2026-03-06, tongyi-embedding-vision-flash-2026-03-06, qwen3-vl-embedding, and qwen2.5-vl-embedding models support custom vector dimensions. Higher dimensions preserve more semantic information but increase storage and compute costs.

General use cases (Recommended): A dimension of 1024 provides an optimal balance between performance and cost, making it ideal for most semantic search tasks.
High-precision scenarios: For applications that require high precision, you can select a dimension of 1536 or 2048. This improves precision but significantly increases storage and compute overhead.
Resource-constrained environments: In cost-sensitive scenarios, select a dimension of 768 or lower. This significantly reduces resource consumption at the cost of some semantic information.

OpenAI-compatible API

import os
from openai import OpenAI

client = OpenAI(
    # API Keys are region-specific. To get an API Key, see https://help.aliyun.com/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # This is the base URL for the China (Beijing) region. For the Singapore region, use https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)

resp = client.embeddings.create(
    model="text-embedding-v4",
    input=["I like it and will buy from here again"],
    # Set the vector dimension to 256
    dimensions=256
)
print(f"Vector dimension: {len(resp.data[0].embedding)}")

DashScope

import dashscope
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your Workspace ID. Configurations are region-specific.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input=["I like it and will buy from here again"],
    # Set the vector dimension to 256
    dimension=256
)

print(f"Vector dimension: {len(resp.output['embeddings'][0]['embedding'])}")

Query vs. document text (text_type)

This parameter is only available through the DashScope SDK and API.

To achieve optimal results in search tasks, you should vectorize content differently based on its role. The text_type parameter is designed for this purpose:

text_type: 'query': Use for user-provided query text. The model generates a "title-like" vector that is more directional and optimized for information retrieval.
text_type: 'document' (default): Use for the document text stored in your knowledge base. The model generates a "body-like" vector that contains more comprehensive information and is optimized for matching.

When matching short text against long text, you should distinguish between query and document. However, for tasks such as clustering or classification where all texts have the same role, you do not need to set this parameter.

Task instructions (instruct)

This parameter is only available through the DashScope SDK and API.

You can provide a clear English task instruction to guide the text-embedding-v4 model in optimizing vector quality for specific retrieval scenarios, improving precision. When using this feature, you must set the text_type parameter to query.

# Example: Add an instruction to optimize retrieval quality when building document vectors.
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input="Research papers on machine learning",
    text_type="query",
    instruct="Given a research paper query, retrieve relevant research paper"
)

Dense and sparse vectors

This parameter is only available through the DashScope SDK and API.

The text-embedding-v4 and text-embedding-v3 models support three vector output types to accommodate different retrieval strategies.

Vector type (output_type)	Advantages	Limitations	Use cases
dense	Deep semantic understanding that identifies synonyms and context for more relevant results.	Higher compute and storage costs. Does not guarantee an exact match for keywords.	Semantic search, AI-powered Q&A, content recommendation.
sparse	High computational efficiency, focusing on an exact match for keywords and enabling fast filtering.	Lacks semantic understanding and cannot process synonyms or context.	Log retrieval, product SKU search, precise information filtering.
dense&sparse	Combines semantic and keyword matching for optimal search results. The generation cost is unchanged, and the API call overhead is identical to the single-vector mode.	Requires more storage, and the system architecture and retrieval logic are more complex.	High-quality, production-grade hybrid search engine.

Use cases

The following code is for demonstration purposes only. For production, pre-compute and store embeddings in a vector database. This way, you only need to generate the query embedding for retrieval.

Semantic search

Perform precise semantic matching by calculating the similarity between the query embedding and the document embeddings.

import dashscope
import numpy as np
from dashscope import TextEmbedding
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your Workspace ID.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

def cosine_similarity(a, b):
    """Calculate cosine similarity."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def semantic_search(query, documents, top_k=5):
    """Perform semantic search."""
    # Generate the query embedding.
    query_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=query,
        dimension=1024
    )
    query_embedding = query_resp.output['embeddings'][0]['embedding']

    # Generate the document embeddings.
    doc_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=documents,
        dimension=1024
    )

    # Calculate similarities.
    similarities = []
    for i, doc_emb in enumerate(doc_resp.output['embeddings']):
        similarity = cosine_similarity(query_embedding, doc_emb['embedding'])
        similarities.append((i, similarity))

    # Sort and return the top-k results.
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [(documents[i], sim) for i, sim in similarities[:top_k]]

# Example usage
documents = [
    "Artificial intelligence is a branch of computer science",
    "Machine learning is an important method for achieving artificial intelligence",
    "Deep learning is a subfield of machine learning"
]
query = "What is AI?"
results = semantic_search(query, documents, top_k=2)
for doc, sim in results:
    print(f"Similarity: {sim:.3f}, Document: {doc}")

Recommendation system

Analyze a user's behavioral history embeddings to identify their interests and recommend similar items.

import dashscope
import numpy as np
from dashscope import TextEmbedding
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your Workspace ID.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

def cosine_similarity(a, b):
    """Calculate cosine similarity."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def build_recommendation_system(user_history, all_items, top_k=10):
    """Build a recommendation system."""
    # Generate user history embeddings.
    history_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=user_history,
        dimension=1024
    )

    # Calculate the user preference embedding by averaging.
    user_embedding = np.mean([
        emb['embedding'] for emb in history_resp.output['embeddings']
    ], axis=0)

    # Generate all item embeddings.
    items_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=all_items,
        dimension=1024
    )

    # Calculate recommendation scores.
    recommendations = []
    for i, item_emb in enumerate(items_resp.output['embeddings']):
        score = cosine_similarity(user_embedding, item_emb['embedding'])
        recommendations.append((all_items[i], score))

    # Sort and return the recommendation results.
    recommendations.sort(key=lambda x: x[1], reverse=True)
    return recommendations[:top_k]

# Example usage
user_history = ["Science Fiction", "Action", "Suspense"]
all_movies = ["Future World", "Space Adventure", "Ancient War", "Romantic Journey", "Superhero"]
recommendations = build_recommendation_system(user_history, all_movies)
for movie, score in recommendations:
    print(f"Recommendation Score: {score:.3f}, Movie: {movie}")

Text clustering

Group similar texts by analyzing the distances between their embeddings.

# scikit-learn is required: pip install scikit-learn
import dashscope
import numpy as np
from sklearn.cluster import KMeans
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your Workspace ID.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"


def cluster_texts(texts, n_clusters=2):
    """Cluster a set of texts."""
    # 1. Get the embeddings for all texts.
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=texts,
        dimension=1024
    )
    embeddings = np.array([item['embedding'] for item in resp.output['embeddings']])

    # 2. Use the KMeans algorithm for clustering.
    kmeans = KMeans(n_clusters=n_clusters, random_state=0, n_init='auto').fit(embeddings)

    # 3. Organize and return the results.
    clusters = {i: [] for i in range(n_clusters)}
    for i, label in enumerate(kmeans.labels_):
        clusters[label].append(texts[i])
    return clusters


# Example usage
documents_to_cluster = [
    "Mobile phone company A releases a new phone",
    "Search engine company B launches a new system",
    "World Cup final: Argentina vs. France",
    "China wins another gold medal at the Olympics",
    "A company releases its latest AI chip",
    "European Cup match report"
]
clusters = cluster_texts(documents_to_cluster, n_clusters=2)
for cluster_id, docs in clusters.items():
    print(f"--- Cluster {cluster_id} ---")
    for doc in docs:
        print(f"- {doc}")

Text classification

Perform zero-shot text classification by calculating the similarity between an input text's embedding and predefined label embeddings. This process classifies text into new categories without requiring pre-labeled examples.

import dashscope
import numpy as np
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your Workspace ID.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"


def cosine_similarity(a, b):
    """Calculate cosine similarity."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def classify_text_zero_shot(text, labels):
    """Perform zero-shot text classification."""
    # 1. Get the embeddings for the input text and all labels.
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=[text] + labels,
        dimension=1024
    )
    embeddings = resp.output['embeddings']
    text_embedding = embeddings[0]['embedding']
    label_embeddings = [emb['embedding'] for emb in embeddings[1:]]

    # 2. Calculate the similarity with each label.
    scores = [cosine_similarity(text_embedding, label_emb) for label_emb in label_embeddings]

    # 3. Return the label with the highest similarity.
    best_match_index = np.argmax(scores)
    return labels[best_match_index], scores[best_match_index]


# Example usage
text_to_classify = "The fabric of this dress is comfortable, and the style is nice too"
possible_labels = ["Digital Products", "Apparel & Accessories", "Food & Beverage", "Home & Living"]

label, score = classify_text_zero_shot(text_to_classify, possible_labels)
print(f"Input text: '{text_to_classify}'")
print(f"Best matching category: '{label}' (Similarity: {score:.3f})")

Anomaly detection

Identify anomalous data by calculating the similarity between a text's embedding and the central embedding of normal samples. Data that significantly deviates from this pattern is considered an anomaly.

The threshold in the example is for demonstration purposes. The ideal value varies based on data content and distribution, so you must calibrate it using your own dataset.

import dashscope
import numpy as np
# This configuration is for the China (Beijing) region. Replace {WorkspaceId} with your Workspace ID.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"


def cosine_similarity(a, b):
    """Calculate cosine similarity."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def detect_anomaly(new_comment, normal_comments, threshold=0.6):
    # 1. Generate embeddings for all normal comments and the new comment.
    all_texts = normal_comments + [new_comment]
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=all_texts,
        dimension=1024
    )
    embeddings = [item['embedding'] for item in resp.output['embeddings']]

    # 2. Calculate the center embedding (average) of the normal comments.
    normal_embeddings = np.array(embeddings[:-1])
    normal_center_vector = np.mean(normal_embeddings, axis=0)

    # 3. Calculate the similarity between the new comment's embedding and the center embedding.
    new_comment_embedding = np.array(embeddings[-1])
    similarity = cosine_similarity(new_comment_embedding, normal_center_vector)

    # 4. Determine if it is an anomaly.
    is_anomaly = similarity < threshold
    return is_anomaly, similarity


# Example usage
normal_user_comments = [
    "Today's meeting was productive",
    "The project is progressing smoothly",
    "The new version will be released next week",
    "User feedback is positive"
]

test_comments = {
    "Normal comment": "The feature works as expected",
    "Anomaly - meaningless garbled text": "asdfghjkl zxcvbnm"
}

print("--- Anomaly Detection Example ---")
for desc, comment in test_comments.items():
    is_anomaly, score = detect_anomaly(comment, normal_user_comments)
    result = "Yes" if is_anomaly else "No"
    print(f"Comment: '{comment}'")
    print(f"Is anomaly: {result} (Similarity to normal samples: {score:.3f})\n")

API reference

General text embedding
- Synchronous processing API
- Batch processing API
Multimodal embedding
Multimodal embedding API

Error codes

If the model call fails and returns an error message, see Error codes for resolution.

Rate limiting

For the model's rate limiting conditions, see Rate limiting.

Model performance (MTEB/CMTEB)

Evaluation benchmarks

MTEB (Massive Text Embedding Benchmark): A comprehensive benchmark that assesses the general-purpose performance of text embeddings on tasks such as classification, clustering, and retrieval.
CMTEB (Chinese Massive Text Embedding Benchmark): A large-scale benchmark specifically for evaluating Chinese text embeddings.
Scores range from 0 to 100. Higher scores indicate better performance.

Model	MTEB	MTEB (retrieval task)	CMTEB	CMTEB (retrieval task)
text-embedding-v1	58.30	45.47	59.84	56.59
text-embedding-v2	60.13	49.49	62.17	62.78
text-embedding-v3 (64 dimensions)	57.40	46.52	59.19	62.03
text-embedding-v3 (128 dimensions)	60.19	52.51	63.81	68.22
text-embedding-v3 (256 dimensions)	61.13	54.41	65.92	71.07
text-embedding-v3 (512 dimensions)	62.11	54.30	66.81	71.88
text-embedding-v3 (768 dimensions)	62.43	54.74	67.90	72.29
text-embedding-v3 (1024 dimensions)	63.39	55.41	68.92	73.23
text-embedding-v4 (512 dimensions)	64.73	56.34	68.79	73.33
text-embedding-v4 (1024 dimensions)	68.36	59.30	70.14	73.98
text-embedding-v4 (2048 dimensions)	71.58	61.97	71.99	75.01

Prerequisites

Get embeddings

Text embedding

OpenAI compatible API

DashScope

Independent multimodal vectors

Python

Java

Multimodal fused vectors

Python

Java (HTTP)

Model selection

Text embedding

Beijing

Singapore

Multimodal embedding

China (Beijing)

Singapore

Input and language restrictions

Core features

Customize vector dimensions

OpenAI-compatible API

DashScope

Query vs. document text (text_type)

Task instructions (instruct)

Dense and sparse vectors

Use cases

Semantic search

Recommendation system

Text clustering

Text classification

Anomaly detection

API reference

General text embedding

Multimodal embedding

Error codes

Rate limiting

Model performance (MTEB/CMTEB)

Evaluation benchmarks