Alibaba Cloud Elasticsearch extends the Inference API to provide a flexible and efficient method for deploying and managing custom AI models. This feature is ideal for scenarios such as recommendation systems, image retrieval, and natural language processing. You can quickly combine custom AI models with the high-performance Elasticsearch engine to enhance the efficiency and accuracy of your business applications.
Background information
The official
Inference APIfrom Elasticsearch (ES) supports calling model inference services from external platforms, such asHugging FaceandOpenAI. For more information, see Create inference API.Alibaba Cloud Elasticsearch (ES) extends the Inference API to support invoking models from various platforms in addition to the officially supported models. You can customize model inference services in ES. Alibaba Cloud ES maps your custom model IDs to the model IDs used by the
Inference APIand integrates processes, such asembeddingandrerank, into the read and write workflows of ES. This lets you quickly integrate AI model capabilities with the high-performance engine of ES to serve real-world business scenarios. Standardized templates are provided for common model platforms. For quick configuration, see the following templates:
Limits
Only Elasticsearch 8.15 and later versions support custom model inference services.
Create an inference model
General template
PUT _inference/<task_type>/<inference_id>
{
"service": "alibaba-cloud-custom-model",
"service_settings": {
"secret_parameters": {
...
},
"url": "<url>",
"path": {
"<path>": {
"<method>": {
"query_string": "<query_string>",
"headers": {
...
},
"request": {
"format": "string",
"content": "<content>"
},
"response": {
"json_parser":{
...
}
}
}
}
}
},
"task_settings": {
"parameters":{
...
}
}
}Parameters
Parameter | Type | Required | Description |
| / | Yes | The model type. Valid values are:
|
| / | Yes | A custom parameter. The name of the inference model to create. |
|
| Yes | Specifies the service to use. For Alibaba Cloud custom models, the value is |
| / | Yes | Service settings. |
|
| Yes | Configure sensitive parameters, such as |
|
| Yes | The service's |
|
| Yes | The service's |
|
| Yes | The |
|
| No | The |
|
| No | The |
|
| Yes | The |
|
| Yes | The |
|
| Yes | The |
|
| No | When |
|
| This parameter is not required when | Defines how to parse the |
|
| Required only when | When |
|
| Required only when | When |
|
| The path to parse | |
|
| The path to parse | |
|
| The path to parse | |
|
| Required when | The path to parse the |
|
| No | The path to parse the |
|
| No | The path to parse the |
|
| Required when | The path to parse the |
|
| No | Custom parameters. Configure them in |
You can configure optional parameters for the model service in task_settings.parameters using the default values of the inference model. To modify these values, you can set them when you create the service or configure the task_settings.parameters parameter when you call the service.
The response parameter parses the http response, converting it into an object that is recognizable by ES. This process integrates the model inference service with the ES write and query processes. You can use JSONPath expressions to parse the response.
ES supports custom models of the text_embedding, sparse_embedding, rerank, completion, and custom types. The custom type does not have a response parameter. The other four types have different response formats. The following sections provide examples of the response format for each type:
text_embedding type
For the text_embedding type, the input is a string or a List<string>. The required result is a List<List<Float>>, which represents the vector result for each input text.
The following is a sample Response:
{
"request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
"latency": 38,
"usage": {
"token_count": 3072
},
"result": {
"embeddings": [
{
"index": 0,
"embedding": [
-0.02868066355586052,
0.022033605724573135,
...
]
}
]
}
}The corresponding parameter settings are:
"response":{
"json_parser":{
"text_embeddings":"$.result.embeddings[*].embedding"
}
}sparse_embedding type
For the sparse_embedding type, the input is a string or a List<string>. The required result includes a List<List<string>> of tokens and a List<List<Float>> of weights.
Example Response (response result):
{
"request_id": "75C50B5B-E79E-4930-****-F48DBB392231",
"latency": 22,
"usage": {
"token_count": 11
},
"result": {
"sparse_embeddings": [
{
"index": 0,
"embedding": [
{
"tokenId": 6,
"weight": 0.10137939453125
},
{
"tokenId": 163040,
"weight": 0.2841796875
},
{
"tokenId": 354,
"weight": 0.1431884765625
},
{
"tokenId": 5998,
"weight": 0.161376953125
},
{
"tokenId": 8550,
"weight": 0.2388916015625
},
{
"tokenId": 2017,
"weight": 0.1614990234375
}
]
},
{
"index": 1,
"embedding": [
{
"tokenId": 9803,
"weight": 0.1951904296875
},
{
"tokenId": 86250,
"weight": 0.317138671875
},
{
"tokenId": 5889,
"weight": 0.17529296875
},
{
"tokenId": 2564,
"weight": 0.11614990234375
},
{
"tokenId": 59529,
"weight": 0.1666259765625
}
]
}
]
}
}The corresponding parameter settings are:
"response":{
"json_parser":{
"sparse_result":{
"path":"$.result.sparse_embeddings[*]",
"value":{
"sparse_token":"$.embedding[*].token_id",
"sparse_weight":"$.embedding[*].weight"
}
}
}
}rerank type
For the rerank type, the required result is a score of the List<Float> type, which represents the sorting scores of the input text for the query. The result can also include an optional index of the List<int> type, which represents the index of the doc in the input text array. If this parameter is not specified, the default order is used. Another optional field is text of the List<string> type, which represents the original input text that corresponds to the sorting result.
Example Response:
{
"request_id":"24B004E0-ADEF-****-879B-F28359BFAD1D",
"latency":19,
"usage":{
"doc_count":3
},
"result":{
"scores":[
{
"index":0,"score":0.45026873385713345
},
{
"index":1,"score":1.1412238544346029E-4
},
{
"index":2,"score":8.029784284533197E-5
}
]
}
}The corresponding parameter settings are:
"response":{
"json_parser":{
"relevance_score":"$.result.scores[*].score",
"reranked_index":"$.result.scores[*].index"
}
}completion type
For the completion type, the required result is a List<string>.
Example Response (response result):
{
"request_id": "450fcb80-f796-****-8d69-e1e86d29aa9f",
"latency": 564.903929,
"result": {
"text":"Zhengzhou is a modern city with a long and rich cultural history, and it has many fun places to visit. Here are some recommended tourist attractions:..."
}
"usage": {
"output_tokens": 6320,
"input_tokens": 35,
"total_tokens": 6355,
}
}The corresponding parameter settings are:
"response":{
"json_parser":{
"completion_result":"$.result.text"
}
}Creation examples for each type
TEXT_EMBEDDING (text embedding)
PUT _inference/text_embedding/<model_id>
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
<secret_parameter_values>
},
"url":"<your_url>",
"path":{
"<your_path>":{
"POST":{
"query_string": "<query_string_values>",
"headers":{
<header_values>
},
"request":{
"format":"string",
"content":"<model_request_format>"
},
"response":{
"json_parser":{
"text_embeddings":"<path_to_parse_text_embeddings>"
}
}
}
}
}
},
"task_settings":{
"parameters":{
<parameter_values>
}
}
}SPARSE_EMBEDDING (sparse text embedding)
PUT _inference/sparse_embedding/<model_id>
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
<secret_parameter_values>
},
"url":"<your_url>",
"path":{
"<your_path>":{
"<method>":{
"query_string": "<query_string_values>",
"headers":{
<header_values>
},
"request":{
"format":"string",
"content":"<model_request_format>"
},
"response":{
"json_parser":{
"sparse_result":{
"path":"<path_to_parse_sparse_embedding>",
"value":{
"sparse_token":"<path_to_parse_sparse_embedding_token>",
"sparse_weight":"<path_to_parse_sparse_embedding_weight>"
}
}
}
}
}
}
}
},
"task_settings":{
"parameters":{
<parameter_values>
}
}
}RERANK (sorting service)
PUT _inference/rerank/<model_id>
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
<secret_parameter_values>
},
"url":"<your_url>",
"path":{
"<your_path>":{
"<method>":{
"query_string": "<query_string_values>",
"headers":{
<header_values>
},
"request":{
"format":"string",
"content":"<model_request_format>"
},
"response":{
"json_parser":{
"relevance_score":"<path_to_parse_rerank_relevance_score>",
"reranked_index":"<path_to_parse_rerank_reranked_index>",
"document_text":"<path_to_parse_rerank_document_text>"
}
}
}
}
}
}
}COMPLETION (content generation service)
PUT _inference/completion/<model_id>
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
<secret_parameter_values>
},
"url":"<your_url>",
"path":{
"<your_path>":{
"<method>":{
"query_string": "<query_string_values>",
"headers":{
<header_values>
},
"request":{
"format":"string",
"content":"<model_request_format>"
},
"response":{
"json_parser":{
"completion_result":"<path_to_parse_completion>"
}
}
}
}
}
},
"task_settings":{
"parameters":{
<parameter_values>
}
}
}CUSTOM (custom service)
For model types that ES does not currently support, or when you need to retrieve the full response, you can set the model type to custom. In this case, the response that ES returns contains the complete model response.
Models of the text_embedding, sparse_embedding, rerank, and completion types can also be defined as custom.
PUT _inference/custom/<model_id>
{
"service":"alibaba-cloud-custom-model",
"service_settings":{
"secret_parameters":{
<secret_parameter_values>
},
"url":"<your_url>",
"path":{
"<your_path>":{
"<method>":{
"query_string": "<query_string_values>",
"headers":{
<header_values>
},
"request":{
"format":"string",
"content":"<model_request_format>"
}
}
}
}
},
"task_settings":{
"parameters":{
<parameter_values>
}
}
}Get an inference model
Syntax
GET /_inference/_all
GET /_inference/<inference_id>
GET /_inference/<task_type>/_all
GET /_inference/<task_type>/<inference_id>Parameters
Parameter | Content |
| The identifier of the custom |
| The type of the
|
Example
Sample request:
GET _inference/_allResponse:
{
"endpoints": [
{
"inference_id": "os_deployment_emb",
"task_type": "text_embedding",
"service": "alibaba-cloud-custom-model",
"service_settings": {
"url": "http://xxxx.opensearch.aliyuncs.com",
"path": {
"/v3/openapi/deployments/xxx/predict": {
"POST": {
"headers": {
"Authorization": "Bearer ${api_key}",
"Token": "${Token}"
},
"request": {
"format": "string",
"content": """{"input":${input},"input_type":"${input_type}"}"""
},
"response": {
"json_parser": {
"text_embeddings": "$.embeddings[*].embedding"
}
}
}
}
},
"rate_limit": {
"requests_per_minute": 10000
}
},
"task_settings": {
"parameters": {
"input_type": "document"
}
}
}
]
}Call an inference model
Syntax
POST /_inference/<inference_id>
POST /_inference/<task_type>/<inference_id>Parameters
Path parameters
Parameter | Content |
| The identifier of the custom inference endpoint. |
| The type of the inference interface. Supported values:
|
Query parameters
Parameter | Content |
| Optional. string. Controls the timeout duration for waiting for a request. The default value is 30 s. |
Request body parameters
Parameter | Content |
| Required. |
| Optional. |
| Optional. |
Examples
text_embedding
Sample call:
POST _inference/text_embedding/os_deployment_emb
{
"input":"hello world"
}Response (response):
{
"text_embedding": [
{
"embedding": [
-0.026062012,
0.01574707,
-0.03842163,
0.012580872,
...
]
}
]
}rerank
Sample call:
POST _inference/rerank/os_deployment_custom_rerank
{
"input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
"query": "star wars main character"
}Response:
{
"rerank": [
{
"index": 0,
"relevance_score": 0.8502201
},
{
"index": 1,
"relevance_score": 0.062216982
},
{
"index": 2,
"relevance_score": 0.60352296
},
{
"index": 3,
"relevance_score": 0.35611072
},
{
"index": 4,
"relevance_score": 0.40951595
},
{
"index": 5,
"relevance_score": 0.16277891
},
{
"index": 6,
"relevance_score": 0.12918286
}
]
}Delete an inference model
Syntax
DELETE /_inference/<inference_id>
DELETE /_inference/<task_type>/<inference_id>Parameters
Parameter | Content |
| The identifier of the custom Example: |
| The type of the
Example: |
Example
Delete example:
DELETE _inference/custom-rerankResponse (the returned result):
{
"acknowledged": true,
"pipelines": []
}