Configuration template: PAI Model Gallery-Elasticsearch(ES)-阿里云帮助中心

Model Gallery provides a rich library of models for various artificial intelligence (AI) scenarios, such as large language models (LLMs), artificial intelligence generated content (AIGC), computer vision (CV), natural language processing (NLP), and speech. It lets you perform one-click model training, including hyperparameter configuration, compression, evaluation, and deployment, to quickly validate your business requirements. This topic describes how to use a model deployed on PAI as a model inference service in ES.

Prerequisites

An Alibaba Cloud ES instance of version 8.15 or later (kernel version 2.1.2 or later) is created.
The Platform for AI (PAI) service is activated. If you have not activated it, see Activate PAI and create a default workspace.

Step 1: Deploy a model in PAI

This topic uses the deployment of an Embedding model as an example. Go to the PAI Model Gallery page, click Scenario > Natural Language Processing > Embedding, and then select the bge-m3 general-purpose vector model. The steps for other models are similar.

Note
Deploy the service in the same region as your ES cluster and configure it to use the same virtual private cloud (VPC). This allows ES to access the deployed service over a private network, which provides lower latency and a more stable connection.

Click Deploy in the bottom-right corner of the bge-m3 general-purpose vector model to open the model deployment page.

Parameter	Description
Deployment Method	The deployment method. Two methods are supported: vLLM Accelerated Deployment and FlagEmbedding, along with their corresponding instance types.
Basic Information	The name of the service. You can specify a custom name.
Resource Deployment	Deploy resources as needed.
Virtual Private Cloud (VPC)	The virtual private cloud. The VPC in the deployment configuration must be the same as the VPC of your ES cluster.
Service Features	Set the service features as needed.
Service Configuration	Configure the service as needed.

On the model deployment page, configure the parameters and click Deploy at the bottom of the page.
In the navigation pane on the left, click Model Deployment > Elastic Algorithm Service (EAS).

Note
You can call the model API when the service status is Running.
Click the model name in the Name/ID column to go to the Overview tab of the model.
Click View Endpoint Information to view the URL and token for calling the service.

Step 2: Create a model inference service in Alibaba Cloud Elasticsearch for the PAI model

Note

You can run the following code in the Kibana console of your Alibaba Cloud ES instance to create the model inference service.

The methods for each type are as follows:

`text_embedding` type

Syntax template for model creation:

PUT _inference/text_embedding/pai_embedding
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<Replace with your API key>"
    },
    "url":"<Replace with your service URL>",
    "path":{
      "<Replace with your service path>":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "input":${input}, 
              "embedding_type":"dense"
            }
            """
          },
          "response":{
            "json_parser":{
              "text_embeddings":"$.data[*].embedding"
            }
          }
        }
      }
    }
  }
}

Example:

PUT _inference/text_embedding/pai_embedding
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"xxx"
    },
    "url":"http://xxx.cn-hangzhou.pai-eas.aliyuncs.com",
    "path":{
      "/":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "input":${input}, 
              "embedding_type":"dense"
            }
            """
          },
          "response":{
            "json_parser":{
              "text_embeddings":"$.data[*].embedding"
            }
          }
        }
      }
    }
  }
}

Call the model:

POST _inference/text_embedding/pai_embedding
{
  "input":["hello", "world"]
}

Response:

{
  "text_embedding": [
    {
      "embedding": [
        -0.016567165,
        -0.015161497,
        ...
      ]
    },
    {
      "embedding": [
        -0.023222955,
        0.031465773,
        ...
      ]
    }
  ]
}

`completion` type

Model creation template syntax:

PUT _inference/completion/pai_deepseek
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<Replace with your API key>"
    },
    "url":"<Replace with your service URL>",
    "path":{
      "<Replace with your service path>":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "prompt":"${prompt}",
              "max_tokens":"${max_tokens}"
            }
            """
          },
          "response":{
            "json_parser":{
              "completion_result":"$.choices[*].text"
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      "max_tokens":"300"
    }
  }
}