Custom model inference service-Elasticsearch(ES)-阿里云帮助中心

Alibaba Cloud Elasticsearch extends the Inference API to provide a flexible and efficient method for deploying and managing custom AI models. This feature is ideal for scenarios such as recommendation systems, image retrieval, and natural language processing. You can quickly combine custom AI models with the high-performance Elasticsearch engine to enhance the efficiency and accuracy of your business applications.

Background information

The official Inference API from Elasticsearch (ES) supports calling model inference services from external platforms, such as Hugging Face and OpenAI. For more information, see Create inference API.
Alibaba Cloud Elasticsearch (ES) extends the Inference API to support invoking models from various platforms in addition to the officially supported models. You can customize model inference services in ES. Alibaba Cloud ES maps your custom model IDs to the model IDs used by the Inference API and integrates processes, such as embedding and rerank, into the read and write workflows of ES. This lets you quickly integrate AI model capabilities with the high-performance engine of ES to serve real-world business scenarios. Standardized templates are provided for common model platforms. For quick configuration, see the following templates:

Limits

Only Elasticsearch 8.15 and later versions support custom model inference services.

Create an inference model

General template

PUT _inference/<task_type>/<inference_id>
{
  "service": "alibaba-cloud-custom-model",
  "service_settings": {
    "secret_parameters": {
      ...
    },
    "url": "<url>",
    "path": {
      "<path>": {
        "<method>": {
          "query_string": "<query_string>",
          "headers": {
            ...
          },
          "request": {
            "format": "string",
            "content": "<content>"
          },
          "response": {
            "json_parser":{
              ...
            }
          }
        }
      }
    }
  },
  "task_settings": {
    "parameters":{
      ...
    }
  }
}

Parameters

Parameter	Type	Required	Description
`<task_type>`	/	Yes	The model type. Valid values are: `text_embedding` `sparse_embedding` `rerank` `completion` `custom`: For inference models other than the four types listed above, or if you need the complete response, use the `custom` type. In this case, the `result` is the complete `http response`.
`inference_id`	/	Yes	A custom parameter. The name of the inference model to create.
`service`	`string`	Yes	Specifies the service to use. For Alibaba Cloud custom models, the value is `alibaba-cloud-custom-model`.
`service_settings`	/	Yes	Service settings.
`secret_parameters`	`object`	Yes	Configure sensitive parameters, such as `api_key` and `token`, in `secret_parameters`. Use placeholders to replace them where needed.
`<url>`	`string`	Yes	The service's `url` address.
`<path>`	`string`	Yes	The service's `path`, combined with the `url`, forms the service invocation endpoint.
`<method>`	`string`	Yes	The `http` request method. `POST`, `PUT`, and `GET` are supported.
`<query_string>`	`string`	No	The `query string` parameters for the `http` request.
`headers`	`object`	No	The `header` parameters for the `http` request.
`request`	`object`	Yes	The `http request` for the custom model.
`request.format`	`string`	Yes	The `body` type of the `request`. Currently, only `string` is supported.
`request.content`	`string`	Yes	The `request` `body` structure requires the `JSON`-formatted `http request body` to be passed as an escaped `string`.
`response`	`object`	No	When `task_type` is `text_embedding`, `sparse_embedding`, `rerank`, or `completion`, you must configure the corresponding `response.json_parser`.
`response.json_parser`	`object`	This parameter is not required when `<task_type>` is `custom`. For other types, you must specify the corresponding parameters.	Defines how to parse the `http response`. It uses JSONPath syntax to parse the `response` into an object that ES can recognize.
`response.json_parser.text_embeddings`	`string`	Required only when `<task_type>` is text_embedding.	When `task_type` is `text_embedding`, configure this parameter to parse the path of `text_embedding`.
`response.json_parser.sparse_result`	`object`	Required only when `<task_type>` is `sparse_embedding`.	When `task_type` is `sparse_embedding`, configure this parameter.
`response.json_parser.sparse_result.path`	`string`		The path to parse `sparse_result`.
`response.json_parser.sparse_result.value.sparse_token`	`string`		The path to parse `sparse_token`.
`response.json_parser.sparse_result.value.sparse_weight`	`string`		The path to parse `sparse_weight`.
`response.json_parser.relevance_score`	`string`	Required when `<task_type>` is `rerank`. Do not specify this for other types.	The path to parse the `rerank relevance_score`.
`response.json_parser.reranked_index`	`string`	No	The path to parse the `rerank reranked_index`.
`response.json_parser.document_text`	`string`	No	The path to parse the `rerank document_text`.
`response.json_parser.completion_result`	`string`	Required when `<task_type>` is `completion`. Do not specify this for other types.	The path to parse the `completion completion_result`.
`task_settings.parameters`	`object`	No	Custom parameters. Configure them in `parameters` and use placeholders to replace them where needed.

You can configure optional parameters for the model service in task_settings.parameters using the default values of the inference model. To modify these values, you can set them when you create the service or configure the task_settings.parameters parameter when you call the service.

The response parameter parses the http response, converting it into an object that is recognizable by ES. This process integrates the model inference service with the ES write and query processes. You can use JSONPath expressions to parse the response.

ES supports custom models of the text_embedding, sparse_embedding, rerank, completion, and custom types. The custom type does not have a response parameter. The other four types have different response formats. The following sections provide examples of the response format for each type:

`text_embedding` type

For the text_embedding type, the input is a string or a List<string>. The required result is a List<List<Float>>, which represents the vector result for each input text.

The following is a sample Response:

{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "token_count": 3072
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.02868066355586052,
                    0.022033605724573135,
                    ...
                ]
            }
        ]
    }
}

The corresponding parameter settings are:

"response":{
	"json_parser":{
		"text_embeddings":"$.result.embeddings[*].embedding"
	}
}

`sparse_embedding` type

For the sparse_embedding type, the input is a string or a List<string>. The required result includes a List<List<string>> of tokens and a List<List<Float>> of weights.

Example Response (response result):

{
	"request_id": "75C50B5B-E79E-4930-****-F48DBB392231",
	"latency": 22,
	"usage": {
		"token_count": 11
	},
	"result": {
		"sparse_embeddings": [
			{
				"index": 0,
				"embedding": [
					{
						"tokenId": 6,
						"weight": 0.10137939453125
					},
					{
						"tokenId": 163040,
						"weight": 0.2841796875
					},
					{
						"tokenId": 354,
						"weight": 0.1431884765625
					},
					{
						"tokenId": 5998,
						"weight": 0.161376953125
					},
					{
						"tokenId": 8550,
						"weight": 0.2388916015625
					},
					{
						"tokenId": 2017,
						"weight": 0.1614990234375
					}
				]
			},
			{
				"index": 1,
				"embedding": [
					{
						"tokenId": 9803,
						"weight": 0.1951904296875
					},
					{
						"tokenId": 86250,
						"weight": 0.317138671875
					},
					{
						"tokenId": 5889,
						"weight": 0.17529296875
					},
					{
						"tokenId": 2564,
						"weight": 0.11614990234375
					},
					{
						"tokenId": 59529,
						"weight": 0.1666259765625
					}
				]
			}
		]
	}
}

The corresponding parameter settings are:

"response":{
	"json_parser":{
		"sparse_result":{
			"path":"$.result.sparse_embeddings[*]",
			"value":{
				"sparse_token":"$.embedding[*].token_id",
				"sparse_weight":"$.embedding[*].weight"   
			}
		}
	}
}

`rerank` type

For the rerank type, the required result is a score of the List<Float> type, which represents the sorting scores of the input text for the query. The result can also include an optional index of the List<int> type, which represents the index of the doc in the input text array. If this parameter is not specified, the default order is used. Another optional field is text of the List<string> type, which represents the original input text that corresponds to the sorting result.

Example Response:

{
  "request_id":"24B004E0-ADEF-****-879B-F28359BFAD1D",
  "latency":19,
  "usage":{
      "doc_count":3
  },
  "result":{
      "scores":[
        {
          "index":0,"score":0.45026873385713345
        },
        {
          "index":1,"score":1.1412238544346029E-4
        },
        {
          "index":2,"score":8.029784284533197E-5
        }
      ]
    }
  }

The corresponding parameter settings are:

"response":{
	"json_parser":{
		"relevance_score":"$.result.scores[*].score",
		"reranked_index":"$.result.scores[*].index"
	}
}

`completion` type

For the completion type, the required result is a List<string>.

Example Response (response result):

{
  "request_id": "450fcb80-f796-****-8d69-e1e86d29aa9f",
  "latency": 564.903929,
  "result": {
    "text":"Zhengzhou is a modern city with a long and rich cultural history, and it has many fun places to visit. Here are some recommended tourist attractions:..."
  }
  "usage": {
      "output_tokens": 6320,
      "input_tokens": 35,
      "total_tokens": 6355,
  }
  
}

The corresponding parameter settings are:

"response":{
	"json_parser":{
		"completion_result":"$.result.text"
	}
}

Creation examples for each type

TEXT_EMBEDDING (text embedding)

PUT _inference/text_embedding/<model_id>
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      <secret_parameter_values>
    },
    "url":"<your_url>",
    "path":{
      "<your_path>":{
        "POST":{
          "query_string": "<query_string_values>",
          "headers":{
            <header_values>
          },
          "request":{
            "format":"string",
            "content":"<model_request_format>"
          },
          "response":{
            "json_parser":{
              "text_embeddings":"<path_to_parse_text_embeddings>"
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      <parameter_values>
    }
  }
}

SPARSE_EMBEDDING (sparse text embedding)

PUT _inference/sparse_embedding/<model_id>
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      <secret_parameter_values>
    },
    "url":"<your_url>",
    "path":{
      "<your_path>":{
        "<method>":{
          "query_string": "<query_string_values>",
          "headers":{
            <header_values>
          },
          "request":{
            "format":"string",
            "content":"<model_request_format>"
          },
          "response":{
            "json_parser":{
              "sparse_result":{
                "path":"<path_to_parse_sparse_embedding>",
                "value":{
                  "sparse_token":"<path_to_parse_sparse_embedding_token>",
                  "sparse_weight":"<path_to_parse_sparse_embedding_weight>"
                }
              }
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      <parameter_values>
    }
  }
}

RERANK (sorting service)

PUT _inference/rerank/<model_id>
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      <secret_parameter_values>
    },
    "url":"<your_url>",
    "path":{
      "<your_path>":{
        "<method>":{
          "query_string": "<query_string_values>",
          "headers":{
            <header_values>
          },
          "request":{
            "format":"string",
            "content":"<model_request_format>"
          },
          "response":{
            "json_parser":{
              "relevance_score":"<path_to_parse_rerank_relevance_score>",
              "reranked_index":"<path_to_parse_rerank_reranked_index>",
              "document_text":"<path_to_parse_rerank_document_text>"
            }
          }
        }
      }
    }
  }
}

COMPLETION (content generation service)

PUT _inference/completion/<model_id>
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      <secret_parameter_values>
    },
    "url":"<your_url>",
    "path":{
      "<your_path>":{
        "<method>":{
          "query_string": "<query_string_values>",
          "headers":{
            <header_values>
          },
          "request":{
            "format":"string",
            "content":"<model_request_format>"
          },
          "response":{
            "json_parser":{
              "completion_result":"<path_to_parse_completion>"
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      <parameter_values>
    }
  }
}

CUSTOM (custom service)

For model types that ES does not currently support, or when you need to retrieve the full response, you can set the model type to custom. In this case, the response that ES returns contains the complete model response.

Models of the text_embedding, sparse_embedding, rerank, and completion types can also be defined as custom.

PUT _inference/custom/<model_id>
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      <secret_parameter_values>
    },
    "url":"<your_url>",
    "path":{
      "<your_path>":{
        "<method>":{
          "query_string": "<query_string_values>",
          "headers":{
            <header_values>
          },
          "request":{
            "format":"string",
            "content":"<model_request_format>"
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      <parameter_values>
    }
  }
}

Get an inference model

Syntax

GET /_inference/_all

GET /_inference/<inference_id>

GET /_inference/<task_type>/_all

GET /_inference/<task_type>/<inference_id>

Parameters

Parameter

Content

<inference_id>

The identifier of the custom inference endpoint.

<task_type>

The type of the inference interface. Supported values:

text_embedding
sparse_embedding
rerank
completion
custom

Example

Sample request:

GET _inference/_all

Response:

{
  "endpoints": [
    {
      "inference_id": "os_deployment_emb",
      "task_type": "text_embedding",
      "service": "alibaba-cloud-custom-model",
      "service_settings": {
        "url": "http://xxxx.opensearch.aliyuncs.com",
        "path": {
"/v3/openapi/deployments/xxx/predict": {
            "POST": {
              "headers": {
                "Authorization": "Bearer ${api_key}",
                "Token": "${Token}"
              },
              "request": {
                "format": "string",
                "content": """{"input":${input},"input_type":"${input_type}"}"""
              },
              "response": {
                "json_parser": {
                  "text_embeddings": "$.embeddings[*].embedding"
                }
              }
            }
          }
        },
        "rate_limit": {
          "requests_per_minute": 10000
        }
      },
      "task_settings": {
        "parameters": {
          "input_type": "document"
        }
      }
    }
  ]
}

Call an inference model

Syntax

POST /_inference/<inference_id>

POST /_inference/<task_type>/<inference_id>

Parameters

Path parameters

Parameter

Content

<inference_id>

The identifier of the custom inference endpoint.

<task_type>

The type of the inference interface. Supported values:

text_embedding
sparse_embedding
rerank
completion
custom

Query parameters

Parameter	Content
`timeout`	Optional. string. Controls the timeout duration for waiting for a request. The default value is 30 s.

Request body parameters

Parameter	Content
`input`	Required. `string or array of strings`. The input text for calling the model.
`query`	Optional. `string`. Used only for the `rerank` interface. The input query content.
`task_settings`	Optional. `object`. The `task_settings` configuration for this model call request. This configuration overwrites the `task_settings` configuration from when the model inference was created.

Examples

text_embedding

Sample call:

POST _inference/text_embedding/os_deployment_emb
{
  "input":"hello world"
}

Response (response):

{
  "text_embedding": [
    {
      "embedding": [
        -0.026062012,
        0.01574707,
        -0.03842163,
        0.012580872,
        ...
      ]
    }
  ]
}

rerank

Sample call:

POST _inference/rerank/os_deployment_custom_rerank
{
  "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
  "query": "star wars main character"
}

Response:

{
  "rerank": [
    {
      "index": 0,
      "relevance_score": 0.8502201
    },
    {
      "index": 1,
      "relevance_score": 0.062216982
    },
    {
      "index": 2,
      "relevance_score": 0.60352296
    },
    {
      "index": 3,
      "relevance_score": 0.35611072
    },
    {
      "index": 4,
      "relevance_score": 0.40951595
    },
    {
      "index": 5,
      "relevance_score": 0.16277891
    },
    {
      "index": 6,
      "relevance_score": 0.12918286
    }
  ]
}

Delete an inference model

Syntax

DELETE /_inference/<inference_id>

DELETE /_inference/<task_type>/<inference_id>

Parameters

Parameter

Content

<inference_id>

The identifier of the custom inference endpoint.

Example: DELETE /_inference/<inference_id>.

<task_type>

The type of the inference interface. Supported values:

text_embedding
sparse_embedding
rerank
completion
custom

Example: DELETE /_inference/<task_type>/<inference_id>.

Example

Delete example:

DELETE _inference/custom-rerank

Response (the returned result):

{
  "acknowledged": true,
  "pipelines": []
}

Background information

Limits

Create an inference model

General template

Parameters

text_embedding type

sparse_embedding type

rerank type

completion type

Creation examples for each type

TEXT_EMBEDDING (text embedding)

SPARSE_EMBEDDING (sparse text embedding)

RERANK (sorting service)

COMPLETION (content generation service)

CUSTOM (custom service)

Get an inference model

Syntax

Parameters

Example

Call an inference model

Syntax

Parameters

Path parameters

Query parameters

Request body parameters

Examples

text_embedding

rerank

Delete an inference model

Syntax

Parameters

Example

`text_embedding` type

`sparse_embedding` type

`rerank` type

`completion` type