Model Gallery provides a rich library of models for various artificial intelligence (AI) scenarios, such as large language models (LLMs), artificial intelligence generated content (AIGC), computer vision (CV), natural language processing (NLP), and speech. It lets you perform one-click model training, including hyperparameter configuration, compression, evaluation, and deployment, to quickly validate your business requirements. This topic describes how to use a model deployed on PAI as a model inference service in ES.
Prerequisites
An Alibaba Cloud ES instance of version 8.15 or later (kernel version 2.1.2 or later) is created.
The Platform for AI (PAI) service is activated. If you have not activated it, see Activate PAI and create a default workspace.
Step 1: Deploy a model in PAI
This topic uses the deployment of an
Embeddingmodel as an example. Go to the PAI Model Gallery page, click , and then select thebge-m3general-purpose vector model. The steps for other models are similar.NoteDeploy the service in the same region as your ES cluster and configure it to use the same virtual private cloud (VPC). This allows ES to access the deployed service over a private network, which provides lower latency and a more stable connection.
Click Deploy in the bottom-right corner of the
bge-m3general-purpose vector model to open the model deployment page.Parameter
Description
Deployment Method
The deployment method. Two methods are supported: vLLM Accelerated Deployment and FlagEmbedding, along with their corresponding instance types.
Basic Information
The name of the service. You can specify a custom name.
Resource Deployment
Deploy resources as needed.
Virtual Private Cloud (VPC)
The virtual private cloud. The VPC in the deployment configuration must be the same as the VPC of your ES cluster.
Service Features
Set the service features as needed.
Service Configuration
Configure the service as needed.
On the model deployment page, configure the parameters and click Deploy at the bottom of the page.
In the navigation pane on the left, click .
NoteYou can call the model API when the service status is Running.
Click the model name in the Name/ID column to go to the Overview tab of the model.
Click View Endpoint Information to view the
URLandtokenfor calling the service.
Step 2: Create a model inference service in Alibaba Cloud Elasticsearch for the PAI model
You can run the following code in the Kibana console of your Alibaba Cloud ES instance to create the model inference service.
The methods for each type are as follows:
text_embedding type
Syntax template for model creation:
PUT _inference/text_embedding/pai_embedding
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
"api_key":"<Replace with your API key>"
},
"url":"<Replace with your service URL>",
"path":{
"<Replace with your service path>":{
"POST":{
"headers":{
"Authorization": "Bearer ${api_key}",
"Content-Type": "application/json;charset=utf-8"
},
"request":{
"format":"string",
"content":"""
{
"input":${input},
"embedding_type":"dense"
}
"""
},
"response":{
"json_parser":{
"text_embeddings":"$.data[*].embedding"
}
}
}
}
}
}
}
Example:
PUT _inference/text_embedding/pai_embedding
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
"api_key":"xxx"
},
"url":"http://xxx.cn-hangzhou.pai-eas.aliyuncs.com",
"path":{
"/":{
"POST":{
"headers":{
"Authorization": "Bearer ${api_key}",
"Content-Type": "application/json;charset=utf-8"
},
"request":{
"format":"string",
"content":"""
{
"input":${input},
"embedding_type":"dense"
}
"""
},
"response":{
"json_parser":{
"text_embeddings":"$.data[*].embedding"
}
}
}
}
}
}
}
Call the model:
POST _inference/text_embedding/pai_embedding
{
"input":["hello", "world"]
}
Response:
{
"text_embedding": [
{
"embedding": [
-0.016567165,
-0.015161497,
...
]
},
{
"embedding": [
-0.023222955,
0.031465773,
...
]
}
]
}
completion type
Model creation template syntax:
PUT _inference/completion/pai_deepseek
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
"api_key":"<Replace with your API key>"
},
"url":"<Replace with your service URL>",
"path":{
"<Replace with your service path>":{
"POST":{
"headers":{
"Authorization": "Bearer ${api_key}"
},
"request":{
"format":"string",
"content":"""
{
"prompt":"${prompt}",
"max_tokens":"${max_tokens}"
}
"""
},
"response":{
"json_parser":{
"completion_result":"$.choices[*].text"
}
}
}
}
}
},
"task_settings":{
"parameters":{
"max_tokens":"300"
}
}
}
Example:
PUT _inference/completion/pai_deepseek
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
"api_key":"xxx"
},
"url":"http://xxx.cn-hangzhou.pai-eas.aliyuncs.com",
"path":{
"/api/predict/xxx/v1/completions":{
"POST":{
"headers":{
"Authorization": "Bearer ${api_key}"
},
"request":{
"format":"string",
"content":"""
{
"prompt":"${prompt}",
"max_tokens":"${max_tokens}"
}
"""
},
"response":{
"json_parser":{
"completion_result":"$.choices[*].text"
}
}
}
}
}
},
"task_settings":{
"parameters":{
"max_tokens":"300"
}
}
}
Invoke the model:
POST _inference/completion/pai_deepseek
{
"input":"",
"task_settings":{
"parameters":{
"prompt":"what is elastic search"
}
}
}
Response:
{
"completion": [
{
"result": """ and how is it used?
Elastic Search is a search engine that's built on top of Lucene, a search engine framework. It's designed to store, search, and analyze text documents efficiently. The key feature of Elastic Search is its ability to index and query structured data in real-time. It can build indices from existing data sources like databases or APIs and then query that data as if it were in-memory. Elastic Search is often used in applications that require fast search and analytics, such as search engines, customer relationship management systems, and big data platforms.
How to use it:
1. Storing Data:
Elastic Search primarily uses the NoSQL document model to store data. Each document is represented as a JSON object, which makes it easy to work with structured data.
2. Indexing:
Elastic Search provides several ways to index data. The most common method is through the REST API, where you can upload data in bulk or in real-time. It also supports indexing data from databases, APIs, or even cloud storage.
3. Querying:
Elastic Search supports various querying mechanisms. It has a simple query syntax that lets you search for specific fields within documents. It also supports aggregate queries, which allow you to perform aggregations like counts, sums, and averages over your data. Additionally, Elastic Search lets you match phrases, ranges, and more.
4. Updating and Deleting Data:
Elastic Search lets you update and delete documents from your index. This"""
}
]
}