Manage AI inference models, create an AI inference model-Elasticsearch(ES)-阿里云帮助中心

To use an AI inference model in an Elasticsearch (ES) instance, you must first create the model within the service. The creation process differs for built-in and self-deployed models. This topic describes both procedures.

Billing

This is a pay-as-you-go service. You are billed based on the number of API calls. No fees are incurred if no calls are made.

Create a model

Built-in models

Log on to the Alibaba Cloud Elasticsearch console.
In the left navigation menu, choose Elasticsearch Clusters.
Navigate to the target cluster.
1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
2. On the Elasticsearch Clusters page, find the cluster and click its ID.
In the navigation pane on the left, choose AI Service Center > Model Management.

Click Initialize Model. The system automatically checks whether the AI Search Open Platform service is activated for your account.

If the service is activated, the model initialization is completed automatically.

If the service is not activated, you are redirected to the AI Search Open Platform activation page. After you activate the service, you are returned to the ES console, and the initialization is completed.

If a message indicates that Models that are not created exist. after initialization, click Create Model next to any model. The system then automatically creates all AI model services.

Parameter	Description
Model service namespace	A unit for data fencing. When you activate AI Search Open Platform for the first time, a default namespace named default is automatically created. You can also create custom namespaces. For more information, see Create a namespace.
API key	The credential for identity authentication when you make calls. For more information about how to obtain an API key, see Manage API keys.
Endpoint	ES instances provide endpoints for AI models primarily over a Virtual Private Cloud (VPC). ES instances in the China (Hangzhou), China (Shenzhen), China (Beijing), China (Zhangjiakou), and China (Qingdao) regions can call AI Search Open Platform services across regions over a VPC. To call the services over the public network, see Obtain a public endpoint.

Self-deployed models

Log on to the Alibaba Cloud Elasticsearch console.
In the left navigation menu, choose Elasticsearch Clusters.
Navigate to the target cluster.
1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
2. On the Elasticsearch Clusters page, find the cluster and click its ID.
In the navigation pane on the left, choose AI Service Center > Model Management.
Activate AI Search Open Platform, perform service deployment, and create an API key in the AI Search Open Platform console.

On the Model Management page, choose AI Search Open Platform Models > Self-deployed Models, and then click Create Model.
Review the parameter information required to call the AI model.

Note
Obtain a token: Obtain an internal-facing access token for the corresponding region.

For more information about how to call a self-deployed model from AI Search Open Platform in an ES instance, see Create a custom model inference service using the Inference API.

Call examples

Notes

Do not delete or disable the model service in AI Search Open Platform. Otherwise, the corresponding model in the ES console will become invalid.
Cross-border data compliance
- Data flow: When you call the service, your business data is transferred to the China (Shanghai) region for processing.
- Your responsibilities:
  Ensure that the data transfer complies with local laws and regulations.
  Implement data protection measures, provide privacy statements, and obtain necessary authorizations.
  Ensure that the data content is legal and compliant.
- Disclaimer: If you violate the preceding statements and warranties, causing any loss to Alibaba Cloud and/or its affiliates, you are liable for the corresponding compensation.

Data parsing models

Parse document content

POST _inference/doc_analyze/<inference_id>
{
  "input": ["http://opensearch-shanghai.oss-cn-shanghai.aliyuncs.com/chatos/rag/file-parser/samples/GB10767.pdf"], # Can be a URL, content, or task_id
  "task_settings": {
    "document": {
      "input_type": "url", # optional. Can be url, content, or task_id. The default is url. If you specify a task_id, the system queries the result of an asynchronous task.
      "file_name": "<file_name>", # optional. Upload if the file name cannot be inferred from the URL.
      "file_type": "<file_type>", # optional. Upload if the file type cannot be inferred from the file name.
    },
    "output": {
      "image_storage" : "<image_storage>" # optional. The default is base64.
    },
    "is_async" : "<true or false>", # The default is false.
  }
}

Extract image content

POST _inference/img_analyze/<inference_id>
{
  "input": ["https://img.alicdn.com/imgextra/i1/O1CN01WksnF41hlhBFsXDNB_!!6000000004318-0-tps-1000-1400.jpg"], # Can be a URL, content, or task_id
  "task_settings": {
    "document": {
      "input_type": "url", # optional. Can be url, content, or task_id. The default is url.
      "file_name": "<file_name>", # optional. Upload if the file name cannot be inferred from the URL.
      "file_type": "<file_type>", # optional. Upload if the file type cannot be inferred from the file name.
    },
    "is_async" : "<true or false>" # The default is false.
  }
}

Segment a document

POST _inference/doc_split/<inference_id>
{
  "input":"<input>"
}

Embedding models

Text embedding

POST _inference/text_embedding/<inference_id>
{
  "input":[<input>]
}

Text sparse vector

POST _inference/sparse_embedding/<inference_id>
{
  "input":[<input>]
}

Sorting

Sort

POST _inference/rerank/<inference_id>
{
  "input": [<input_list>],
  "query": "<query>"
}

Large language models

Query and analysis

POST _inference/query_analyze/<inference_id>
{
  "input":"<input>",
  "task_settings": {
    "history": [
      {
        "content": "<history.content>",
        "role": "<history.role>"
      },
      {
        "content": "<history.content>",
        "role": "<history.role>"
      }
    ]
  }
}

Content generation

POST _inference/completion/<inference_id>
{
  "input":["<input>"]
}

For more examples of Inference API calls, see Call built-in model services of AI Search Open Platform.