AI functions

更新时间:
复制 MD 格式

Hologres V3.2+ provides built-in AI functions for embedding, reranking, LLM inference, and more. Call them with standard SQL — each function automatically invokes a deployed model. Deploy models from Model Studio or a Hologres AI node (requires GPU resources).

Prerequisites

  • A model must be deployed in Hologres from Model Studio or a Hologres AI node. To use a Hologres AI node, first purchase AI resources, then deploy AI models.

  • Each AI function has recommended models listed in the summary table and the AI functions and models section below.

Limitations

  • AI functions require Hologres V3.2 or later.

  • Each AI function has unique limitations, described in its respective documentation.

AI functions

Hologres supports the following AI functions.

  • Hologres automatically maps each AI function to an optimal deployed model. View or change these mappings in system tables. Modify the model mapped to an AI function.

  • If multiple models are deployed, specify the model name when calling a function to avoid ambiguity.

  • Each model requires different AI node resources. Select a suitable plan from AI resource pricing and purchasing.

Function

Description

Supported models

Supported versions

ai_gen

Uses an LLM to perform inference on text or images based on a prompt. After the relevant generative models are deployed, it supports multimodal generation tasks such as text-to-image, image editing, multi-image fusion, text-to-video, and video generation from images (using a first frame, first and last frames, or a reference image).

Model Studio models: Qwen series models, and the qwen-image and wan series for intelligent content generation.

Built-in AI node models:

  • Text: Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

  • Image/video/audio: Qwen-wl-embedding series and qwen-vl series.

For multimodal generation model names and task types, see Image generation/editing and Video generation in Model Studio models.

  • Text inference: Hologres V3.2 and later

  • Image inference: Hologres V4.0 and later

to_file

Converts a URL to a FILE type.

No model required. This function is typically used with Object Tables.

Hologres V4.0 and later

prompt

Assembles a prompt for an LLM and packages multimodal prompts.

No model required.

ai_parse_document

Parses unstructured data, such as PDF files and images, into text.

ds4sd/docling-models.

ai_embed

Computes a fixed-dimensional dense vector for input text or images.

Model Studio models: text-embedding series, tongyi-embedding series, and qwen3-vl-embedding.

Built-in AI node models:

  • Text: iic/nlp_gte_sentence-embedding_chinese series and Qwen/Qwen3-Embedding-XB series.

  • Image: clip-ViT-B series.

  • Text vector: Hologres V3.2 and later

  • Image vector: Hologres V4.0 and later

ai_rank

Scores the relevance of the input text.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

Hologres V3.2.2 and later

ai_chunk

Performs text chunking.

recursive-character-text-splitter.

ai_classify

Classifies input text based on the category labels you provide.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

Hologres V3.2 and later

ai_extract

Extracts specified label information from the input text.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

ai_mask

Masks specified tag information from the input text by replacing it with the [MASKED] placeholder.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

ai_fix_grammar

Corrects grammar errors in the input text.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

ai_summarize

Generates a summary of input text.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

ai_translate

Translates the input text into a specified language.

Model Studio models: qwen-mt translation model series.

Built-in AI node models: Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

ai_similarity

Calculates the similarity between two text inputs.

Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

ai_analyze_sentiment

Analyzes the sentiment of the input text.

  • Qwen3 series LLMs (recommended: Qwen/Qwen3-32B).

  • iic/nlp_structbert_sentiment-classification_chinese-base model.

AI function reference

ai_gen

  • Description: Calls an LLM to perform inference on text or images based on a prompt. With a deployed multimodal generation model, ai_gen also supports text-to-image, image editing, multi-image blending, text-to-video, and video generation from a start frame, start/end frames, or a reference image. Model names and parameters are listed in Model Studio models.

    -- Text inference
    SELECT ai_gen([model,] text)
    -- Image inference
    SELECT ai_gen([model,] text, file)
    -- Prompt object input
    SELECT ai_gen([model,] prompt)
    -- Multimodal generation (Example: Specify a model and JSON prompt; 'file' is used for OSS authentication)
    SELECT ai_gen('<model_name>', <prompt_json>::text, to_file('oss://...','oss-xxx-internal.aliyuncs.com','acs:ram::...'));
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • text: (Required) The input prompt. This parameter supports CHAR, VARCHAR, and TEXT types.

    • file: (Required) A FILE type value, such as an image from an Object Table converted to the FILE type. This parameter is supported only in Hologres V4.0 and later.

    • prompt: (Required) A JSON type value returned by the prompt() function.

  • Return value

    • Returns the model's response.

    • If the text parameter is NULL, returns NULL.

    • If the text parameter is an empty string (""), returns an empty string ("").

    • If the prompt parameter is NULL, the function reports an error.

  • Examples

    • Text inference

      CREATE TABLE questions (
          question TEXT
      );
      
      INSERT INTO questions (question) VALUES
        ('What is artificial intelligence?'),
        ('How can I improve my English speaking skills?'),
        ('What are the key points of a healthy diet?');
      
      SELECT
        question,
        ai_gen('Answer the following question in 20 words: ' || question) AS answer
      FROM
        questions;

      The following result is returned.

                     question                |                                             answer
      -----------------------------------------+-------------------------------------------------------------------------------------------------
       How can I improve my English speaking skills?   | Practice often, mimic pronunciation, build vocabulary, and speak up.
       What are the key points of a healthy diet?      | Eat balanced meals, limit oil, salt, and sugar, eat more fruits and vegetables, and keep regular mealtimes.
       What is artificial intelligence?        | Artificial intelligence is a computer system that simulates human intelligence to learn, reason, perceive, and solve problems.
    • Image inference

      SELECT  ai_gen('jpg_llm','What is in this image?', to_file('oss://****/bd****k/val/images/b9b53753-91a5****.jpg','oss-cn-hangzhou-internal.aliyuncs.com','acs:ram::****' ) )

      The following result is returned.

      ai_gen
      -----
      This image shows a city street scene. A white car with the license plate BTB-9784 is parked on the side of the road. There are several cars on the street, including a yellow taxi. In the background, there are buildings and trees. The weather appears to be overcast, possibly a rainy day. There are also pedestrians and traffic lights on the street.
    • Multimodal generation (images and videos)

      Set prompt to a JSON string that contains business parameters. Pass a to_file function as the third argument to complete OSS authentication. You must pass a valid FILE object to pass the permission check, even when the function does not need to read image or video content. The result is typically a JSON text, with fields such as image_urls and video_url that vary by model. If you configure the output_dir parameter to write results to your OSS bucket, you may incur public network transfer fees. We recommend that you carefully evaluate the potential costs.

      Text-to-image (example)

      SELECT ai_gen(
        'qwen_image_2_pro',
        json_build_object(
          'prompt', 'Example: Generate a product promotion image',
          'parameters', json_build_object('n', 1, 'size', '1024*1024', 'watermark', false)
        )::text,
        to_file('oss://your-bucket/path/placeholder.png', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::your-account-id:role/your-role')
      );

      Text-to-video (example)

      SELECT ai_gen(
        'wan26_t2v',
        json_build_object(
          'prompt', 'Example: Show a game character entrance in 10 seconds',
          'parameters', json_build_object(
            'size', '1280*720',
            'prompt_extend', true,
            'duration', 5,
            'shot_type', 'multi'
          )
        )::text,
        to_file('oss://your-bucket/path/placeholder.png', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::your-account-id:role/your-role')
      );

      Quick reference for scenarios and models (Actual deployment names are shown in the console. This table is for reference only.)

      Scenario

      Example model names

      Text-to-image

      qwen-image-2.0-pro, qwen-image-max, qwen-image

      Image editing / multi-image blending

      qwen-image-edit, qwen-image-2.0-pro

      Text-to-video

      wan2.6-t2v

      Video generation from a start frame, start/end frames, or a reference image

      wan2.6-i2v-flash, wan2.2-kf2v-flash, wan2.6-r2v

      For a complete end-to-end pipeline example of game ad video generation, see Best practice: Generate game ad videos with an AI function.

to_file

  • Description: Converts a URL to the FILE type.

    select to_file(oss_url, oss_endpoint, oss_rolearn);
  • Usage notes

    No model required.

  • Parameters

    • oss_url: (Required) TEXT. The path of the OSS file to parse.

    • oss_endpoint: (Required) TEXT. The OSS region endpoint. Only classic network domain names are supported.

    • oss_rolearn: (Required) The RAM role ARN for accessing OSS.

  • Return value

    Returns a FILE type value. If the URL path is invalid or the file does not exist, the function reports an error.

  • Example

    select to_file('oss://****/bd****k/val/images/b9b53753-91a5****.jpg','oss-cn-hangzhou-internal.aliyuncs.com','acs:ram::****' );

prompt

  • Description: Assembles prompts for LLMs, including multimodal prompts.

    SELECT prompt('<template_string>', <expr_1> [ , <expr_2>, ... ])
        FROM <table>;
  • Usage notes

    The prompt() function does not support scalar string input. If you have only one string, pass it directly to the LLM function instead of using prompt(). This function is typically used when querying from a table.

  • Parameters

    • template_string: (Required) TEXT. The prompt template string. Use placeholders such as {0} and {1} for variables.

    • <expr_1> [ , <expr_2>, ... ]: One or more expression parameters. This parameter supports TEXT, NUMERIC, and FILE types.

  • Return value

    In most cases, this function returns a JSON type value in the following format:

    {
      "prompt": "<template_string>",
      "args": ARRAY(<value_1>, <value_2>, ...)
    }

    Special cases:

    • If the template_string parameter is NULL, the function reports an error.

    • If an expression evaluates to NULL, the corresponding placeholder in template_string is replaced with the string 'None'.

    • If all expressions in a row evaluate to NULL, the row is not filtered. Instead, all elements in the args array are populated with 'None'.

  • Example

    create table customer_service_konwledge_detail(
      question text, 
      question_summarize text
    );
    
    insert into customer_service_konwledge_detail values
    ('The instance is suddenly experiencing many OOM SQL errors.', 'A backend investigation revealed that an increase in customer traffic exceeded available resources. The customer resolved the issue by scaling up.'),
    ('DataWorks cannot connect to Hologres.', 'This is not our issue. Please contact the DataWorks on-call team.');
    
    -- prompt
    SELECT question, question_summarize, ai_gen(
      prompt('Customer ticket content: {0}, Support response: {1}. Did the response resolve the customer issue? Reply with yes or no only.', question, question_summarize)) 
          from customer_service_konwledge_detail;
         

    The following result is returned.

              question           |                                 question_summarize                                 | ai_gen 
    -----------------------------+------------------------------------------------------------------------------------+--------
     DataWorks cannot connect to Hologres | This is not our issue. Please contact the DataWorks on-call team.                                              | no
     The instance is suddenly experiencing many OOM SQL errors    | A backend investigation revealed that an increase in customer traffic exceeded available resources. The customer resolved the issue by scaling up. | yes
    (2 rows)

ai_parse_document

  • Description: Parses unstructured data (PDF, images, Word, PPT, TXT, Markdown) into text.

    SELECT ai_parse_document([model,] input_bytes , input_format [, output_format]);
    SELECT ai_parse_document([model,] file [, input_format, output_format]);
  • Parameters

    Parameter

    Description

    model

    (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    input_bytes

    (Required) BYTEA. The binary content of the file to parse.

    file

    (Required) FILE. We recommend that you use this function with an Object Table.

    input_format

    Optional. The default value is auto. The type is TEXT. Supported file formats include PDF, Word, PPT, TXT, IMAGE, and AUTO.

    • Supported image formats: ["jpg", "jpeg", "png", "tif", "tiff", "bmp"].

    • If an OSS directory contains documents of multiple types, you can set input_format=auto. The model automatically determines the file type based on the file extension.

    output_format

    (Optional) TEXT. The format of the parsed result. Supported formats include JSON and markdown. Default value: JSON.

  • Return value

    Returns a TEXT type value. The format is determined by the output_format setting.

    • If output_format=JSON, the output is a TEXT type JSON string. You must explicitly cast it to JSON for use.

    • If parsing fails, the function returns an error description as text instead of reporting an error.

  • Examples

    • Convert a single PDF file in OSS to text.

      SELECT object_uri, etag, ai_parse_document(to_file ('oss://xxxx-hangzhou/bs_challenge_financial_14b_dataset/pdf', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::18xxx:role/xxx'), 'auto', 'markdown') AS doc 
          FROM pdf_bs_challenge_financial_14b_dataset limit 1);
    • To convert unstructured data from an Object Table to text, see Unstructured data (Object Table).

ai_embed

  • Description: Computes a fixed-dimensional continuous vector for input text or an image.

    -- Compute text vector
    select ai_embed([model,] content);
    -- Compute image vector
    select ai_embed([model,] file);
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text. This parameter supports CHAR, VARCHAR, and TEXT types.

    • file: (Required) A FILE type value, which is typically the FILE object of an image. This type is supported only in Hologres V4.0 and later.

  • Return value

  • Examples

    • Text embedding

      SELECT ai_embed('Hologres is a one-stop real-time data warehouse engine self-developed by Alibaba, supporting real-time ingestion, updates, processing, and analysis of massive data.');

      The following result is returned.

      ai_embed
      -------
      {-0.020090256, -0.009496426, -0.01584659, ..., -0.057956327}
    • Image embedding

      -- The following example shows how to embed an image from OSS.
      SELECT ai_embed('clip-ViT-B-32', to_file('oss://****', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::****'));

ai_rank

  • Description: Calculates the relevance score between two texts.

    SELECT ai_rank([model,] source_sentence, sentence_to_compare);
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • source_sentence: (Required) The source text. This parameter supports CHAR, VARCHAR, and TEXT types.

    • sentence_to_compare: (Required) The text to compare with the source_sentence. This parameter supports CHAR, VARCHAR, and TEXT types.

  • Return value

    • Returns a FLOAT type relevance score in the range [0, 1]. A higher value indicates higher relevance.

    • If either the source_sentence or sentence_to_compare parameter is NULL, returns 0.

  • Example

    SELECT knowledge, ai_rank('What was the revenue of Alibaba in 2024?', knowledge) AS score
      FROM (
          VALUES ('Amazon revenue in 2024 was 638 billion USD'), 
                 ('Alibaba revenue in 2024 was 941.168 billion CNY'), 
                 ('Alibaba revenue in 2023 was 868.687 billion CNY')
      ) AS knowledge_table(knowledge)
      ORDER BY score DESC;

    The following result is returned.

    knowledge	                  | score
    -----------------------------|-------
    Alibaba revenue in 2024 was 941.168 billion CNY	 |0.899999976
    Alibaba revenue in 2023 was 868.687 billion CNY |0.200000003
    Amazon revenue in 2024 was 638 billion USD	   |0.100000001

ai_chunk

  • Description: Splits a long text into chunks.

    SELECT ai_chunk([model,] long_sentence[, chunk_size, chunk_overlap, separators])
  • Parameters

    Parameter

    Description

    model

    (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    long_sentence

    (Required) The source text to be chunked. This parameter supports CHAR, VARCHAR, and TEXT types.

    chunk_size

    (Optional) INT. The maximum number of characters in each chunk. Default value: 300.

    chunk_overlap

    (Optional) INT. Overlap between adjacent chunks to preserve semantic continuity. Default: 50.

    separators

    Optional. The separators used to split text into chunks. This parameter is of the TEXT[] type. The default value is ["\n\n", "\n", " ", ""], which is suitable for English text. For Chinese text, we recommend using Chinese separators, such as ["\n\n", "\n", "。", "!", "?", ";", ",", " "].

  • Return value

    • Returns a TEXT[] value, which is the list of chunks.

    • If the long_sentence parameter is NULL, returns NULL.

  • Example

    SELECT ai_chunk('Hologres is a one-stop real-time data warehouse engine (Real-Time Data Warehouse) self-developed by Alibaba. It supports real-time ingestion, updating, processing, and analysis of massive data, standard SQL (compatible with PostgreSQL protocol and syntax, supporting most PostgreSQL functions), PB-level multidimensional analysis (OLAP) and ad-hoc analysis (Ad Hoc), high-concurrency and low-latency online data services (Serving), and fine-grained isolation for multiple workloads and enterprise-level security capabilities. It is deeply integrated with MaxCompute, Flink, and DataWorks to provide an enterprise-level full-stack data warehouse solution for online and offline integration.',40,10);

    The following result is returned.

    ai_chunk
    ---
    "{\"Hologres is a one-stop real-time data\",\"warehouse engine (Real-Time Data Wareho\",\"use) self-developed by Alibaba. It suppo\",\"orts real-time ingestion, updating, proce\",\"ssing, and analysis of massive data, stand\",\"ard SQL (compatible with PostgreSQL protoc\",\"ol and syntax, supporting most PostgreSQL\",\" functions), PB-level multidimensional an\",\"alysis (OLAP) and ad-hoc analysis (Ad Ho\",\"c), high-concurrency and low-latency onli\",\"ne data services (Serving), and fine-grai\",\"ned isolation for multiple workloads and e\",\"nterprise-level security capabilities. It\",\" is deeply integrated with MaxCompute, Fl\",\"ink, and DataWorks to provide an enterpris\",\"e-level full-stack data warehouse solutio\",\"n for online and offline integration.\"}"

ai_classify

  • Description: Classifies input text based on a provided list of category labels.

    SELECT ai_classify([model,] content, labels)
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The text to be classified. This parameter supports CHAR, VARCHAR, and TEXT types.

    • labels: (Required) ARRAY. A list of expected output category labels. The number of labels must be between 2 and 20.

  • Return value

    • Returns the matched category label. The return type is TEXT.

    • If the content parameter is NULL, returns NULL.

    • If the content parameter is an empty string (""), returns NULL.

    • If the number of labels is outside the valid range, the function reports an error.

  • Example

    CREATE TABLE product_detail(
        product_name TEXT,
        product_desc TEXT
    );
    INSERT INTO product_detail VALUES
    ('iphone','Apple phone'),
    ('p50','Huawei phone'),
    ('x200','vivo phone'),
    ('aaa','Dior dress'),
    ('bbb','Dior pants'),
    ('Sheng Sheng Oolong','Cha Yan Yue Se milk tea'),
    ('sandwich cookie','Oreo cookie');
    
    --Classify text using ai_classify
    SELECT
        product_name,
        ai_classify(product_desc, ARRAY['Electronics', 'Apparel', 'Food & Beverage']) AS category
      FROM
         product_detail
      LIMIT 10;

    The following result is returned.

    product_name	|category
    --------------|------
    aaa	          |Apparel
    iphone	      |Electronics
    Sheng Sheng Oolong	      |Food & Beverage
    p50	          |Electronics
    x200	        |Electronics
    bbb	          |Apparel
    sandwich cookie	      |Food & Beverage

ai_extract

  • Description: Extracts specified information from input text and returns the results as a JSON object.

    SELECT ai_extract([model,] content, labels)
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text from which to extract information. This parameter supports CHAR, VARCHAR, and TEXT types.

    • labels: (Required) ARRAY. The labels for the information to extract. The number of labels must be between 1 and 20.

  • Return value

    • Returns the extracted information for each label in JSON format.

    • If the content parameter is NULL or an empty string (""), returns NULL.

    • If the number of labels is outside the valid range, the function reports an error.

  • Example

    CREATE TABLE users (
      user_id TEXT,
      resume TEXT
    );
    
    INSERT INTO users (user_id, resume) VALUES
      ('u001', 'Name: Zhang San, Male, 28 years old. Email: zhangsan@example.com, Phone: 1380013****. Extensive work experience.'),
      ('u002', 'Name: Li Si, Female, 35 years old. Phone: 1390013****, Email: lisi@example.com. Has management experience.'),
      ('u003', 'Name: Wang Wu, Male, 25 years old. Email: wangwu@example.com. Phone: 1370013****.');
    
    SELECT
      user_id,
      ai_extract(resume, ARRAY['name','email','phone','gender','age']) AS user_desc_obj
    FROM
      users;

    The following result is returned.

    user_id	|user_desc_obj
    --------|-------------
    u002	  |"{"name":"Li Si","age":"35 years old","gender":"Female","phone":"1390013****","email":"lisi@example.com"}"
    u003	  |"{"name":"Wang Wu","age":"25 years old","gender":"Male","phone":"1370013****","email":"wangwu@example.com"}"
    u001	  |"{"name":"Zhang San","age":"28 years old","gender":"Male","phone":"1380013****","email":"zhangsan@example.com"}"

ai_mask

  • Description: Masks specified information in the input text, replacing matches with the [MASKED] placeholder.

    SELECT ai_mask([model,] content, labels)
  • Parameters:

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text to be masked. This parameter supports CHAR, VARCHAR, and TEXT types.

    • labels: (Required) ARRAY. The labels for the information to be masked. The number of labels must be between 1 and 20.

  • Return value

    • Returns the masked text content.

    • If the content parameter is NULL, returns NULL.

    • If the content parameter is an empty string (""), returns an empty string.

    • If the number of labels is outside the valid range, the function reports an error.

  • Example

    SELECT ai_mask(
      'User Wang Xiaoming, ID number: 23030611111111, phone number: 1388888****.',
      ARRAY['ID number', 'phone number']); 

    The following result is returned.

    ai_mask
    -------
    User Wang Xiaoming, ID number: [MASKED], phone number: [MASKED].

ai_fix_grammar

  • Description: Corrects grammatical errors in the input text.

    SELECT ai_fix_grammar([model,] content)
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text to correct. This parameter supports CHAR, VARCHAR, and TEXT types.

  • Return value

    • Returns the corrected text content.

    • If the content parameter is NULL, returns NULL.

    • If the content parameter is an empty string (""), returns an empty string ("").

  • Example

    SELECT ai_fix_grammar('He dont know what to did.');

    The following result is returned.

    ai_fix_grammar
    --------------
    He doesn't know what to do.

ai_summarize

  • Description: Generates a summary of the input text.

    SELECT ai_summarize([model,] content[, max_words])
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text to be summarized. This parameter supports CHAR, VARCHAR, and TEXT types.

    • max_words: (Optional) The target number of words for the summary. The model attempts to generate a result that is close to this value. The default value is 50. If you set this parameter to 0, no limit is applied.

  • Return value

    • Returns a summary of the text.

    • If the content parameter is NULL, returns NULL.

    • If the content parameter is an empty string (""), returns an empty string ("").

    • If the value of max_words is less than 0, the function reports an error.

  • Example

    SELECT ai_summarize('Hologres is a one-stop real-time data warehouse engine (Real-Time Data Warehouse) self-developed by Alibaba. It supports real-time ingestion, updating, processing, and analysis of massive data, standard SQL (compatible with PostgreSQL protocol and syntax, supporting most PostgreSQL functions), PB-level multidimensional analysis (OLAP) and ad-hoc analysis (Ad Hoc), high-concurrency and low-latency online data services (Serving), and fine-grained isolation for multiple workloads and enterprise-level security capabilities. It is deeply integrated with MaxCompute, Flink, and DataWorks to provide an enterprise-level full-stack data warehouse solution for online and offline integration.', 15);

    The following result is returned.

    ai_summarize
    ------------
    Hologres is Alibaba's self-developed real-time data warehouse engine, supporting real-time processing and multidimensional analysis of massive data.

ai_translate

  • Description: Translates input text into a specified language.

    SELECT ai_translate([model,] content, to_lang)
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text to be translated. This parameter supports CHAR, VARCHAR, and TEXT types.

    • to_lang: (Required) The target language code per ISO 639-1.

  • Return value

    • Returns the translated text.

    • If the content parameter is NULL, returns NULL.

    • If the content parameter is an empty string (""), returns an empty string ("").

    • If the to_lang parameter value is invalid, the function reports an error.

  • Example

    SELECT ai_translate('Hologres是阿里巴巴自主研发的一站式实时数仓引擎(Real-Time Data Warehouse),支持海量数据实时写入、实时更新、实时加工、实时分析,支持标准SQL(兼容PostgreSQL协议和语法,支持大部分PostgreSQL函数),支持PB级数据多维分析(OLAP)与即席分析(Ad Hoc),支持高并发低延迟的在线数据服务(Serving),支持多种负载的细粒度隔离与企业级安全能力,与MaxCompute、Flink、DataWorks深度融合,提供企业级离在线一体化全栈数仓解决方案。', 'en');

    The following result is returned.

    ai_translate
    -----------
    Hologres is a one-stop real-time data warehouse engine self-developed by Alibaba. It supports real-time writing, updating, processing, and analysis of massive data, standard SQL (compatible with PostgreSQL protocol and syntax, and supports most PostgreSQL functions), PB-level multidimensional analysis (OLAP) and ad-hoc analysis (Ad Hoc), high-concurrency and low-latency online data services (Serving), and fine-grained isolation for multiple workloads and enterprise-level security capabilities. It is deeply integrated with MaxCompute, Flink, and DataWorks, providing an enterprise-level full-stack data warehouse solution that supports online and offline integration.

ai_similarity

  • Description: Computes the similarity score between two text inputs.

    SELECT ai_similarity([model,] text1, text2)
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • text1 and text2: (Required) The two texts to be compared for similarity. These parameters support CHAR, VARCHAR, and TEXT types.

  • Return value

    • Returns a value of the FLOAT type in the range [0, 1], where a larger value indicates higher similarity. A value of 0 indicates no similarity, and a value of 1 indicates that the two texts are identical.

    • If either the text1 or text2 parameter is NULL, returns 0.

    • If both the text1 and text2 parameters are empty strings (""), returns 1.

    • If one parameter is an empty string ("") and the other is a non-empty string, returns 0.

  • Example

    CREATE TABLE products2 (
        product_name TEXT
    );
    
    INSERT INTO products2 (product_name) VALUES
      ('white shirt'), ('black suit pants'), ('casual top'), ('sports jacket'), ('white dress'),
      ('bluetooth headset'), ('milk chocolate'), ('white top'), ('men''s t-shirt'), ('down jacket');
    
    SELECT product_name FROM products2 
      ORDER BY ai_similarity(product_name, 'white top') DESC LIMIT 5;
    

    The following result is returned.

    product_name
    ----------
    white top
    white shirt
    casual top
    white dress
    men's t-shirt

ai_analyze_sentiment

  • Description: Performs sentiment analysis on the input text.

    select ai_analyze_sentiment([model,] content);
  • Parameters

    • model: (Optional) The model name. Defaults to the optimal deployed model. To switch, deploy the target model first. Update the model for an AI function.

    • content: (Required) The input text to be analyzed. This parameter supports CHAR, VARCHAR, and TEXT types.

  • Return value

    • Returns the resulting sentiment label as a TEXT value. The returned labels vary by model.

      • Qwen3 series LLMs return one of the following labels: positive, negative, neutral, or mixed. If the input is empty, NULL is returned.

      • The iic/nlp_structbert_sentiment-classification_chinese-base model: Returns the label with the highest probability, which can be positive, negative, or NULL. If the input is empty, it returns NULL.

    • If the content parameter is NULL or an empty string (""), returns NULL.

  • Examples

    -- Using an LLM:
    SELECT ai_analyze_sentiment('A joyful night of celebration, a time of great success.');
    -- Example output: positive
    
    -- Using the iic/nlp_structbert_sentiment-classification_chinese-base model:
    SELECT ai_analyze_sentiment('A drizzling rain falls on the day of mourning; the traveler on the road is heartbroken.');
    -- Example output: negative

AI functions and models

Mapping between AI functions and models

The list_ai_function_infos system table shows the mapping between AI functions and models. After you deploy a model in the Hologres console, Hologres automatically updates this system table to map each AI function to the deployed model. You can then call the AI function to invoke the corresponding model.

Note

Different AI functions require specific model types. For example, ai_embed requires an embedding model, and ai_classify requires an LLM. If only one model type is deployed, some functions may lack an assigned model and cannot be used.

SELECT * FROM list_ai_function_infos();

The following result is returned.

    function_name     |    model_name    
----------------------+------------------
 ai_embed             | my_gte_embedding
 ai_classify          | my_qwen32b
 ai_extract           | my_qwen32b

Update the model mapping

Hologres maps AI functions to deployed models by default. Update this mapping with the following methods.

  • Global modification

    SELECT set_ai_function_info('<function_name>', '<model_name>');

    Parameters

    • function_name: The name of the AI function. You can find function names in the AI function summary.

    • model_name: The name of a deployed model. You can view the names of deployed models on the AI Node page in the Hologres console.

    Note

    The system returns an error if the specified AI function or model name does not exist.

    Example

    SELECT set_ai_function_info('ai_embed', 'my_gte_embedding');
  • Session-level modification

    A session-level configuration takes precedence over the global configuration (ai_function_info).

    --This setting applies only to the current connection.
    SET hg_experimental_ai_function_name_to_model_name_mapping='<function_name>:<model_name>[,<function_name1>:<model_name1>]';

Best practices

Explore these best practices for solving complex business problems with AI functions.