AI_EMBED-Hologres(Hologres)-阿里云帮助中心

Computes a fixed-dimension continuous vector for a given text or image input.

Prerequisites

Before using this function, deploy the required model in the Hologres console. For more information, see Managed models.

Syntax

Computes a fixed-dimension continuous vector for a given text or image input.

-- Compute a text embedding vector
SELECT ai_embed([model], content);

-- Compute an image embedding vector (via FILE object)
SELECT ai_embed([model], file);

-- Compute an image embedding vector (via BYTEA/BLOB binary data)
SELECT ai_embed(model_name TEXT, binary BYTEA, file_format TEXT);

Parameters

Parameter	Type	Required	Version	Description
model	TEXT	Optional	Hologres V3.2 and later	The name of a deployed embedding model.
content	TEXT	Required for text embedding	Hologres V3.2 and later	The input text. Must be a character type (char, varchar, or text). Returns NULL if the value is NULL or an empty string. Supports Hologres-managed models and built-in AI node models. For details, see Managed models.
file	FILE	Required for image embedding	Hologres V4.0 and later	A FILE object, typically an image FILE object. Returns NULL if the value is NULL. Supports Hologres-managed models and built-in AI node models. For the model list, see the supported models table below.
binary_data	BYTEA	Required for image embedding	Hologres V4.2 and later	Binary image data. Supports the BYTEA type from Hologres internal tables and the BLOB type from Paimon external tables. Only Hologres-managed models are supported.
file_format	TEXT	Required for image embedding	Hologres V4.2 and later	The image format. Required when using binary_data input. Supported formats include 'jpg', 'png', 'webp', 'gif', 'bmp', and 'heic'.

Return value

Returns a vector based on the model used. The model source can be Managed models or AI resources.
Returns NULL if the content parameter is NULL or an empty string.
Returns NULL if the FILE parameter is NULL.

Supported models

Text and image models can be Hologres-managed models or built-in AI node models. The BYTEA type only supports Hologres-managed models. When using a managed model, the formats are as follows. For more usage limits, see Managed models.

Note

Model names in the following table use hyphens (for example, qwen3-vl-embedding). When calling these models in SQL, replace the hyphens with underscores (for example, qwen3_vl_embedding).
Video models are available only in the Beijing and Singapore regions. AI node built-in models require Hologres V3.2 or later (the CLIP series and the Qwen3-Embedding series also require V3.2 or later).

Model	Source	Dimensions	Text	Image	Video
text-embedding-v1	Managed Models	1,536	Supported	Not supported	Not supported
text-embedding-v2		1,536	Supported	Not supported	Not supported
text-embedding-v3		2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64	Supported	Not supported	Not supported
text-embedding-v4		1,024 (default) / 768 / 512 / 256 / 128 / 64	Supported	Not supported	Not supported
tongyi-embedding-vision-plus		1,152	30+ languages including Chinese, English, Japanese, and Korean	JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, SGI (URL or Base64)	MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV (URL only)
tongyi-embedding-vision-flash		768	Supported	JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, SGI (URL or Base64)	MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV (URL only)
multimodal-embedding-v1		1,024	Chinese and English	JPG, PNG, BMP (URL or Base64)	A single request can contain up to 20 content elements in total, with a maximum of 1 image and 1 video. Text elements can be up to 20, sharing the total count limit.
qwen3-vl-embedding		2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256	33 languages including Chinese, English, Japanese, Korean, French, and German	JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, SGI (URL or Base64)	MP4, AVI, MOV (URL only)
qwen2.5-vl-embedding		2,048 / 1,024 (default) / 768 / 512	11 languages including Chinese, English, Japanese, Korean, French, and German	—	—
iic/nlp_gte_sentence-embedding_chinese-base	Hologres AI Node	768	Supported	Not supported	Not supported
iic/nlp_gte_sentence-embedding_chinese-large		1,024	Supported	Not supported	Not supported
iic/nlp_gte_sentence-embedding_chinese-small		512	Supported	Not supported	Not supported
Qwen/Qwen3-Embedding-0.6B		—	Supported	Not supported	Not supported
Qwen/Qwen3-Embedding-4B		—	Supported	Not supported	Not supported
Qwen/Qwen3-Embedding-8B		—	Supported	Not supported	Not supported
BAAI/bge-base-en-v1.5		768	Supported	Not supported	Not supported
BAAI/bge-base-zh-v1.5		768	Supported	Not supported	Not supported
BAAI/bge-large-en-v1.5		1,024	Supported	Not supported	Not supported
BAAI/bge-large-zh-v1.5		1,024	Supported	Not supported	Not supported
BAAI/bge-small-en-v1.5		384	Supported	Not supported	Not supported
BAAI/bge-small-zh-v1.5		512	Supported	Not supported	Not supported
clip-ViT-B-16		512	Supported	Supported	Not supported
clip-ViT-B-32-multilingual-v1		512	Supported	Supported	Not supported
clip-ViT-B-32		512	Supported	Supported	Not supported
clip-ViT-L-14		768	Supported	Supported	Not supported

Examples

Text embedding

Note

The following example omits the model name parameter. When the model name is omitted, you must first call set_ai_function_info to set a default model. Otherwise, an error is returned. You can also specify the model name directly in the function, for example, ai_embed('text_embedding_v4', 'your text').

-- Text embedding
SELECT ai_embed('Hologres is a one-stop real-time data warehousing engine independently developed by Alibaba. It supports real-time ingestion, real-time updates, real-time processing, and real-time analytics over massive datasets.');

-- Result (1024-dimension vector example)
{-0.020090256,-0.009496426,-0.01584659,0.014317295,...}

Image embedding (FILE object)

Note

This feature is supported in Hologres V4.0 and later.

-- Image embedding via an OSS FILE object
SELECT ai_embed('qwen3_vl_embedding', to_file('oss://****', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::****'));

-- Result (2560-dimension vector)
{-0.000733634,-0.00160808,0.0354091,...}

Image embedding (BYTEA binary)

Note

This feature is supported in Hologres V4.2 and later.

Method 1: Pass binary data directly

-- Convert an image to BYTEA and compute its embedding (supports webp, jpg, png, and other formats)
SELECT ai_embed('qwen3_vl_embedding',
                decode('/9j/4AAQSkZJRgABAQEAYABgAAD...', 'base64'),
                'jpg');

Method 2: Read BYTEA data from a Hologres internal table

-- Create an image storage table
CREATE TABLE image_table (
    image_name TEXT,
    image_bin BYTEA
);

-- Insert image data
INSERT INTO image_table VALUES
('concert.webp', decode('...webp_binary...', 'base64')),
('cat.png', decode('...png_binary...', 'base64'));

-- Generate embeddings using ai_embed
SELECT image_name,
       ai_embed('qwen3_vl_embedding', image_bin,
                CASE
                  WHEN image_name LIKE '%.webp' THEN 'webp'
                  WHEN image_name LIKE '%.png' THEN 'png'
                  WHEN image_name LIKE '%.jpg' THEN 'jpg'
                END) as embedding
FROM image_table;

Method 3: Read BLOB data from a Paimon external table (Hologres V4.2 and later)

-- Paimon external table (with a BLOB column)
CREATE EXTERNAL TABLE paimon_db.image_blob_table (
    id INT,
    image_name TEXT,
    picture BYTEA  -- Paimon BLOB is mapped to BYTEA
)
WITH (...);

-- Generate embeddings directly from Paimon BLOB data
SELECT id, image_name,
       ai_embed('qwen3_vl_embedding', picture, 'jpg') as embedding
FROM paimon_db.image_blob_table;

Typical use cases

Scenario 1: Image vector search with Hologres internal tables

This scenario deploys the qwen3-vl-embedding model and demonstrates the full workflow of image embedding, image-to-image similarity search, and text-to-image search using vector indexing.

Note

This feature is supported in Hologres V4.2 and later.

Step 1: Prepare data

-- 0. Create a source image table (stores raw image binary data)
CREATE TABLE source_image_table (
    image_id SERIAL PRIMARY KEY,
    image_name TEXT,
    image_bin BYTEA,
    image_format TEXT
);

-- Insert image data (example: read from local files and insert)
INSERT INTO source_image_table (image_name, image_bin, image_format) VALUES
('concert.webp', decode('UklGRiIAAABXRUJQVlA4IBYAAAAwAQCdASo...', 'base64'), 'webp'),
('cat.png', decode('iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==', 'base64'), 'png'),
('dog.jpg', decode('/9j/4AAQSkZJRgABAQEAYABgAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2ODAp', 'base64'), 'jpg');

-- 1. Create a target table with a vector index
CREATE TABLE image_search_table (
    image_id SERIAL PRIMARY KEY,
    image_name TEXT,
    image_bin BYTEA,
    image_format TEXT,
    embedding float4[] CHECK (array_ndims(embedding) = 1 AND array_length(embedding, 1) = 2560)
) WITH (
    vectors = '{ "embedding": { "algorithm": "HGraph", "distance_method": "Euclidean", "builder_params": { "base_quantization_type": "sq8_uniform", "max_degree": 64, "ef_construction": 400 } } }'
);

-- 2. Insert images from the source table and generate vectors
INSERT INTO image_search_table (image_name, image_bin, image_format, embedding)
SELECT image_name, image_bin, image_format,
       ai_embed('qwen3_vl_embedding', image_bin, image_format)
FROM source_image_table;

Step 2: Image similarity search

-- 3. Image-to-image similarity search (approximate nearest neighbor)
SELECT a.image_name as query_image,
       b.image_name as result_image,
       approx_euclidean_distance(a.embedding, b.embedding) as distance
FROM image_search_table a,
     image_search_table b
WHERE a.image_id = 1
ORDER BY distance ASC;

-- Sample result
--  query_image  | result_image | distance
-- --------------+--------------+----------
--  concert.webp | concert.webp |        0
--  concert.webp | cat.png      | 15.23457
--  concert.webp | dog.jpg      | 18.45679

Step 3: Text-to-image search

-- 4. Text-to-image search (multimodal retrieval)
SELECT image_id, image_name,
       approx_euclidean_distance(
           embedding,
           ai_embed('qwen3_vl_embedding', 'concert photo')
       ) as text_img_distance
FROM image_search_table
ORDER BY text_img_distance ASC;

-- Sample result
--  image_id | image_name   | text_img_distance
-- ----------+--------------+-------------------
--         1 | concert.webp |          8.123456
--         3 | dog.jpg      |         22.345678
--         2 | cat.png      |         25.678901

Scenario 2: Offline vector indexing from Paimon external tables

This scenario reads the BLOB type from Paimon and demonstrates the full workflow of embedding and searching Paimon BLOB data, enabling accelerated queries on unstructured data.

Note

This feature is supported in Hologres V4.2 and later.

The BLOB type in Paimon can be written via Flink or Spark. This is not covered here.

Step 1: Prepare data

-- 0. Create an External Database to connect to the Paimon data source
CREATE EXTERNAL DATABASE paimon_blob_bj
WITH (
    catalog_type 'paimon'
    metastore_type 'dlf-rest'
    dlf_catalog 'paimon_bj'
    dlf_access_id 'xxxx'
    dlf_access_key '***'
);
-- Create an external schema
CREATE EXTERNAL SCHEMA paimon_blob_bj.test_paimon_db;

-- 1. Create a Paimon external table (with a BLOB column)
CREATE EXTERNAL TABLE paimon_blob_bj.test_paimon_db.image_table (
    id INT,
    image_name TEXT,
    picture BYTEA  -- Paimon BLOB is mapped to BYTEA
)
WITH (
   "table_format" = 'paimon',
    "file_format" = 'orc',
    "bucket" = '1',
    "bucket-key" = 'id'
);

-- 2. Create a Hologres target vector table (with a vector index)
CREATE TABLE public.image_vector_table (
    id SERIAL PRIMARY KEY,
    image_name TEXT,
    picture BYTEA,
    image_format TEXT,
    embedding float4[] CHECK (array_ndims(embedding) = 1 AND array_length(embedding, 1) = 2560)
) WITH (
    vectors = '{
        "embedding": {
            "algorithm": "HGraph",
            "distance_method": "Euclidean",
            "builder_params": {
                "base_quantization_type": "sq8_uniform",
                "max_degree": 64,
                "ef_construction": 400
            }
        }
    }'
);

Step 2: Load Paimon data into the Hologres vector table (optional)

This step is optional. You can import Paimon data into Hologres for accelerated queries, or query the Paimon table directly depending on your requirements.

-- 3. Read BLOB data from the Paimon external table, generate vectors, and write to Hologres
INSERT INTO image_vector_table (id, image_name, picture, image_format, embedding)
SELECT
    id,
    image_name,
    picture,
    'jpg',
    ai_embed('qwen3_vl_embedding', picture, 'jpg')
FROM paimon_blob_bj.test_paimon_db.image_table;

Step 3: Image similarity search

SELECT
    a.id as query_id,
    a.image_name as query_image,
    b.id as result_id,
    b.image_name as result_image,
    approx_euclidean_distance(a.embedding, b.embedding) as distance
FROM image_vector_table a,
     image_vector_table b
WHERE a.id = 1
ORDER BY distance ASC
LIMIT 5;

-- Sample result
--  query_id | query_image  | result_id | result_image | distance
-- ----------+--------------+-----------+--------------+----------
--         1 | concert.webp |         1 | concert.webp |        0
--         1 | concert.webp |         3 | dog.jpg      | 15.23457
--         1 | concert.webp |         2 | cat.png      | 18.45679

Step 4: Text-to-image search

SELECT
    id,
    image_name,
    approx_euclidean_distance(
        embedding,
        ai_embed('qwen3_vl_embedding', 'concert photo')
    ) as text_img_distance
FROM image_vector_table
ORDER BY text_img_distance ASC
LIMIT 5;

-- Sample result
--  id | image_name   | text_img_distance
-- ----+--------------+-------------------
--   1 | concert.webp |          8.123456
--   3 | dog.jpg      |         22.345678
--   2 | cat.png      |         25.678901

Considerations

Model selection
- Image embedding requires a multimodal model such as qwen3_vl_embedding (2560 dimensions) or tongyi_embedding_vision_plus (1152 dimensions).
- Text embedding uses a text-only model such as text_embedding_v4 (1024 dimensions).
Vector dimensions
- Different models return vectors with different dimensions. The CHECK constraint in your table definition must match the model's output dimension.
- qwen3_vl_embedding: 2560 dimensions.
- tongyi_embedding_vision_plus: 1152 dimensions.
- text_embedding_v4: 1024 dimensions.
Performance optimization
- Create an HGraph vector index on the vector column for approximate nearest neighbor searches.
- For large datasets, use INSERT...SELECT to build vectors offline.
Error handling
- Returns NULL if binary_data is NULL.
- An error is returned if file_format is not supported.
- An error is returned if the image is corrupted or in an invalid format.