AI_EMBED

更新时间:
复制 MD 格式

Computes a fixed-dimension continuous vector for a given text or image input.

Prerequisites

Before using this function, deploy the required model in the Hologres console. For more information, see Model Studio models.

Syntax

Computes a fixed-dimension continuous vector for a given text or image input.

-- Compute a text embedding vector
SELECT ai_embed([model], content);

-- Compute an image embedding vector (via FILE object)
SELECT ai_embed([model], file);

-- Compute an image embedding vector (via BYTEA/BLOB binary data)
SELECT ai_embed(model_name TEXT, binary BYTEA, file_format TEXT);

Parameters

Parameter

Type

Required

Version

Description

model

TEXT

Optional

Hologres V3.2 and later

The name of a deployed embedding model.

content

TEXT

Required for text embedding

Hologres V3.2 and later

The input text. Must be a character type (char, varchar, or text). Returns NULL if the value is NULL or an empty string. Supports models from Model Studio (Bailian) and built-in AI node models. For details, see Model Studio models.

file

FILE

Required for image embedding

Hologres V4.0 and later

A FILE object, typically an image FILE object. Returns NULL if the value is NULL. Supports models from Model Studio (Bailian) and built-in AI node models. For the model list, see the supported models table below.

binary_data

BYTEA

Required for image embedding

Hologres V4.2 and later

Binary image data. Supports the BYTEA type from Hologres internal tables and the BLOB type from Paimon external tables. Only Model Studio (Bailian) models are supported.

file_format

TEXT

Required for image embedding

Hologres V4.2 and later

The image format. Required when using binary_data input. Supported formats include 'jpg', 'png', 'webp', 'gif', 'bmp', and 'heic'.

Return value

  • Returns a vector based on the model used. The model source can be Model Studio models or AI resources.

  • Returns NULL if the content parameter is NULL or an empty string.

  • Returns NULL if the FILE parameter is NULL.

Supported models

Text and image models can be Model Studio (Bailian) models or built-in AI node models. The BYTEA type only supports Model Studio (Bailian) models. When using a Model Studio model, the formats are as follows. For more usage limits, see Model Studio models.

Model

Dimensions

Text

Image

Video

qwen3-vl-embedding

2560 (default), 2048, 1536, 1024, 768, 512, 256

33 languages including Chinese, English, Japanese, Korean, French, and German

JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, SGI (URL or Base64)

MP4, AVI, MOV (URL only)

qwen2.5-vl-embedding

2048, 1024 (default), 768, 512

11 languages including Chinese, English, Japanese, Korean, French, and German

tongyi-embedding-vision-plus-2026-03-06

1152 (default), 1024, 512, 256, 128, 64

30+ languages including Chinese, English, Japanese, and Korean

JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, SGI (URL or Base64)

MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV (URL only)

tongyi-embedding-vision-flash-2026-03-06

768 (default), 512, 256, 128, 64

tongyi-embedding-vision-plus

1152

Chinese and English

JPG, PNG, BMP (URL or Base64)

MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV (URL only)

tongyi-embedding-vision-flash

768

multimodal-embedding-v1

1,024

Chinese and English

JPG, PNG, BMP (URL or Base64)

A single request can contain up to 20 content elements in total, with a maximum of 1 image and 1 video. Text elements can be up to 20, sharing the total count limit.

Examples

Text embedding

-- Text embedding
SELECT ai_embed('Hologres is a real-time data warehouse engine developed by Alibaba Cloud.');

-- Result (1024-dimension vector example)
{-0.020090256,-0.009496426,-0.01584659,0.014317295,...}

Image embedding (FILE object)

Note

This feature is supported in Hologres V4.0 and later.

-- Image embedding via an OSS FILE object
SELECT ai_embed('qwen3_vl_embedding', to_file('oss://****', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::****'));

-- Result (2560-dimension vector)
{-0.000733634,-0.00160808,0.0354091,...}

Image embedding (BYTEA binary)

Note

This feature is supported in Hologres V4.2 and later.

Method 1: Pass binary data directly

-- Convert an image to BYTEA and compute its embedding (supports webp, jpg, png, and other formats)
SELECT ai_embed('qwen3_vl_embedding',
                decode('/9j/4AAQSkZJRgABAQEAYABgAAD...', 'base64'),
                'jpg');

Method 2: Read BYTEA data from a Hologres internal table

-- Create an image storage table
CREATE TABLE image_table (
    image_name TEXT,
    image_bin BYTEA
);

-- Insert image data
INSERT INTO image_table VALUES
('concert.webp', decode('...webp_binary...', 'base64')),
('cat.png', decode('...png_binary...', 'base64'));

-- Generate embeddings using ai_embed
SELECT image_name,
       ai_embed('qwen3_vl_embedding', image_bin,
                CASE
                  WHEN image_name LIKE '%.webp' THEN 'webp'
                  WHEN image_name LIKE '%.png' THEN 'png'
                  WHEN image_name LIKE '%.jpg' THEN 'jpg'
                END) as embedding
FROM image_table;

Method 3: Read BLOB data from a Paimon external table (Hologres V4.2 and later)

-- Paimon external table (with a BLOB column)
CREATE EXTERNAL TABLE paimon_db.image_blob_table (
    id INT,
    image_name TEXT,
    picture BYTEA  -- Paimon BLOB is mapped to BYTEA
)
WITH (...);

-- Generate embeddings directly from Paimon BLOB data
SELECT id, image_name,
       ai_embed('qwen3_vl_embedding', picture, 'jpg') as embedding
FROM paimon_db.image_blob_table;

Typical use cases

Scenario 1: Image vector search with Hologres internal tables

This scenario deploys the qwen3-vl-embedding model and demonstrates the full workflow of image embedding, image-to-image similarity search, and text-to-image search using vector indexing.

Note

This feature is supported in Hologres V4.2 and later.

Step 1: Prepare data

-- 0. Create a source image table (stores raw image binary data)
CREATE TABLE source_image_table (
    image_id SERIAL PRIMARY KEY,
    image_name TEXT,
    image_bin BYTEA,
    image_format TEXT
);

-- Insert image data (example: read from local files and insert)
INSERT INTO source_image_table (image_name, image_bin, image_format) VALUES
('concert.webp', decode('UklGRiIAAABXRUJQVlA4IBYAAAAwAQCdASo...', 'base64'), 'webp'),
('cat.png', decode('iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==', 'base64'), 'png'),
('dog.jpg', decode('/9j/4AAQSkZJRgABAQEAYABgAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2ODAp', 'base64'), 'jpg');

-- 1. Create a target table with a vector index
CREATE TABLE image_search_table (
    image_id SERIAL PRIMARY KEY,
    image_name TEXT,
    image_bin BYTEA,
    image_format TEXT,
    embedding float4[] CHECK (array_ndims(embedding) = 1 AND array_length(embedding, 1) = 2560)
) WITH (
    vectors = '{ "embedding": { "algorithm": "HGraph", "distance_method": "Euclidean" } }'
);

-- 2. Insert images from the source table and generate vectors
INSERT INTO image_search_table (image_name, image_bin, image_format, embedding)
SELECT image_name, image_bin, image_format,
       ai_embed('qwen3_vl_embedding', image_bin, image_format)
FROM source_image_table;

Step 2: Image similarity search

-- 3. Image-to-image similarity search (approximate nearest neighbor)
SELECT a.image_name as query_image,
       b.image_name as result_image,
       approx_euclidean_distance(a.embedding, b.embedding) as distance
FROM image_search_table a,
     image_search_table b
WHERE a.image_id = 1
ORDER BY distance ASC;

-- Sample result
--  query_image  | result_image | distance
-- --------------+--------------+----------
--  concert.webp | concert.webp |        0
--  concert.webp | cat.png      | 15.23457
--  concert.webp | dog.jpg      | 18.45679

Step 3: Text-to-image search

-- 4. Text-to-image search (multimodal retrieval)
SELECT image_id, image_name,
       approx_euclidean_distance(
           embedding,
           ai_embed('qwen3_vl_embedding', 'concert photo')
       ) as text_img_distance
FROM image_search_table
ORDER BY text_img_distance ASC;

-- Sample result
--  image_id | image_name   | text_img_distance
-- ----------+--------------+-------------------
--         1 | concert.webp |          8.123456
--         3 | dog.jpg      |         22.345678
--         2 | cat.png      |         25.678901

Scenario 2: Offline vector indexing from Paimon external tables

This scenario reads the BLOB type from Paimon and demonstrates the full workflow of embedding and searching Paimon BLOB data, enabling accelerated queries on unstructured data.

Note

This feature is supported in Hologres V4.2 and later.

The BLOB type in Paimon can be written via Flink or Spark. This is not covered here.

Step 1: Prepare data

-- 0. Create an External Database to connect to the Paimon data source
CREATE EXTERNAL DATABASE paimon_blob_bj
WITH (
    catalog_type 'paimon'
    metastore_type 'dlf-rest'
    dlf_catalog 'paimon_bj'
    dlf_access_id 'xxxx'
    dlf_access_key '***'
);
-- Create an external schema
CREATE EXTERNAL SCHEMA paimon_blob_bj.test_paimon_db;

-- 1. Create a Paimon external table (with a BLOB column)
CREATE EXTERNAL TABLE paimon_blob_bj.test_paimon_db.image_table (
    id INT,
    image_name TEXT,
    picture BYTEA  -- Paimon BLOB is mapped to BYTEA
)
WITH (
   "table_format" = 'paimon',
    "file_format" = 'orc',
    "bucket" = '1',
    "bucket-key" = 'id'
);

-- 2. Create a Hologres target vector table (with a vector index)
CREATE TABLE public.image_vector_table (
    id SERIAL PRIMARY KEY,
    image_name TEXT,
    picture BYTEA,
    image_format TEXT,
    embedding float4[] CHECK (array_ndims(embedding) = 1 AND array_length(embedding, 1) = 2560)
) WITH (
    vectors = '{
        "embedding": {
            "algorithm": "HGraph",
            "distance_method": "Euclidean",
            "builder_params": {
                "base_quantization_type": "sq8_uniform",
                "max_degree": 64,
                "ef_construction": 400
            }
        }
    }'
);

Step 2: Load Paimon data into the Hologres vector table (optional)

This step is optional. You can import Paimon data into Hologres for accelerated queries, or query the Paimon table directly depending on your requirements.

-- 3. Read BLOB data from the Paimon external table, generate vectors, and write to Hologres
INSERT INTO image_vector_table (id, image_name, picture, image_format, embedding)
SELECT
    id,
    image_name,
    picture,
    'jpg',
    ai_embed('qwen3_vl_embedding', picture, 'jpg')
FROM paimon_blob_bj.test_paimon_db.image_table;

Step 3: Image similarity search

SELECT
    a.id as query_id,
    a.image_name as query_image,
    b.id as result_id,
    b.image_name as result_image,
    approx_euclidean_distance(a.embedding, b.embedding) as distance
FROM image_vector_table a,
     image_vector_table b
WHERE a.id = 1
ORDER BY distance ASC
LIMIT 5;

-- Sample result
--  query_id | query_image  | result_id | result_image | distance
-- ----------+--------------+-----------+--------------+----------
--         1 | concert.webp |         1 | concert.webp |        0
--         1 | concert.webp |         3 | dog.jpg      | 15.23457
--         1 | concert.webp |         2 | cat.png      | 18.45679

Step 4: Text-to-image search

SELECT
    id,
    image_name,
    approx_euclidean_distance(
        embedding,
        ai_embed('qwen3_vl_embedding', 'concert photo')
    ) as text_img_distance
FROM image_vector_table
ORDER BY text_img_distance ASC
LIMIT 5;

-- Sample result
--  id | image_name   | text_img_distance
-- ----+--------------+-------------------
--   1 | concert.webp |          8.123456
--   3 | dog.jpg      |         22.345678
--   2 | cat.png      |         25.678901

Considerations

  • Model selection

    • Image embedding requires a multimodal model such as qwen3_vl_embedding (2560 dimensions) or tongyi_embedding_vision_plus (1152 dimensions).

    • Text embedding uses a text-only model such as text_embedding_v4 (1024 dimensions).

  • Vector dimensions

    • Different models return vectors with different dimensions. The CHECK constraint in your table definition must match the model's output dimension.

    • qwen3_vl_embedding: 2560 dimensions.

    • tongyi_embedding_vision_plus: 1152 dimensions.

    • text_embedding_v4: 1024 dimensions.

  • Performance optimization

    • Create an HGraph vector index on the vector column for approximate nearest neighbor searches.

    • For large datasets, use INSERT...SELECT to build vectors offline.

  • Error handling

    • Returns NULL if binary_data is NULL.

    • An error is returned if file_format is not supported.

    • An error is returned if the image is corrupted or in an invalid format.