Configuration template: PAI Model Gallery

更新时间:
复制 MD 格式

Model Gallery provides a rich library of models for various artificial intelligence (AI) scenarios, such as large language models (LLMs), artificial intelligence generated content (AIGC), computer vision (CV), natural language processing (NLP), and speech. It lets you perform one-click model training, including hyperparameter configuration, compression, evaluation, and deployment, to quickly validate your business requirements. This topic describes how to use a model deployed on PAI as a model inference service in ES.

Prerequisites

Step 1: Deploy a model in PAI

  1. This topic uses the deployment of an Embedding model as an example. Go to the PAI Model Gallery page, click Scenario > Natural Language Processing > Embedding, and then select the bge-m3 general-purpose vector model. The steps for other models are similar.

    Note

    Deploy the service in the same region as your ES cluster and configure it to use the same virtual private cloud (VPC). This allows ES to access the deployed service over a private network, which provides lower latency and a more stable connection.

  2. Click Deploy in the bottom-right corner of the bge-m3 general-purpose vector model to open the model deployment page.

    Parameter

    Description

    Deployment Method

    The deployment method. Two methods are supported: vLLM Accelerated Deployment and FlagEmbedding, along with their corresponding instance types.

    Basic Information

    The name of the service. You can specify a custom name.

    Resource Deployment

    Deploy resources as needed.

    Virtual Private Cloud (VPC)

    The virtual private cloud. The VPC in the deployment configuration must be the same as the VPC of your ES cluster.

    Service Features

    Set the service features as needed.

    Service Configuration

    Configure the service as needed.

  3. On the model deployment page, configure the parameters and click Deploy at the bottom of the page.

  4. In the navigation pane on the left, click Model Deployment > Elastic Algorithm Service (EAS).

    Note

    You can call the model API when the service status is Running.

  5. Click the model name in the Name/ID column to go to the Overview tab of the model.

  6. Click View Endpoint Information to view the URL and token for calling the service.

Step 2: Create a model inference service in Alibaba Cloud Elasticsearch for the PAI model

Note

You can run the following code in the Kibana console of your Alibaba Cloud ES instance to create the model inference service.

The methods for each type are as follows:

text_embedding type

Syntax template for model creation:

PUT _inference/text_embedding/pai_embedding
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<Replace with your API key>"
    },
    "url":"<Replace with your service URL>",
    "path":{
      "<Replace with your service path>":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "input":${input}, 
              "embedding_type":"dense"
            }
            """
          },
          "response":{
            "json_parser":{
              "text_embeddings":"$.data[*].embedding"
            }
          }
        }
      }
    }
  }
}

Example:

PUT _inference/text_embedding/pai_embedding
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"xxx"
    },
    "url":"http://xxx.cn-hangzhou.pai-eas.aliyuncs.com",
    "path":{
      "/":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "input":${input}, 
              "embedding_type":"dense"
            }
            """
          },
          "response":{
            "json_parser":{
              "text_embeddings":"$.data[*].embedding"
            }
          }
        }
      }
    }
  }
}

Call the model:

POST _inference/text_embedding/pai_embedding
{
  "input":["hello", "world"]
}

Response:

{
  "text_embedding": [
    {
      "embedding": [
        -0.016567165,
        -0.015161497,
        ...
      ]
    },
    {
      "embedding": [
        -0.023222955,
        0.031465773,
        ...
      ]
    }
  ]
}

completion type

Model creation template syntax:

PUT _inference/completion/pai_deepseek
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"<Replace with your API key>"
    },
    "url":"<Replace with your service URL>",
    "path":{
      "<Replace with your service path>":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "prompt":"${prompt}",
              "max_tokens":"${max_tokens}"
            }
            """
          },
          "response":{
            "json_parser":{
              "completion_result":"$.choices[*].text"
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      "max_tokens":"300"
    }
  }
}

Example:

PUT _inference/completion/pai_deepseek
{
  "service":"alibaba-cloud-custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":"xxx"
    },
    "url":"http://xxx.cn-hangzhou.pai-eas.aliyuncs.com",
    "path":{
      "/api/predict/xxx/v1/completions":{
        "POST":{
          "headers":{
            "Authorization": "Bearer ${api_key}"
          },
          "request":{
            "format":"string",
            "content":"""
            {
              "prompt":"${prompt}",
              "max_tokens":"${max_tokens}"
            }
            """
          },
          "response":{
            "json_parser":{
              "completion_result":"$.choices[*].text"
            }
          }
        }
      }
    }
  },
  "task_settings":{
    "parameters":{
      "max_tokens":"300"
    }
  }
}

Invoke the model:

POST _inference/completion/pai_deepseek
{
  "input":"",
  "task_settings":{
    "parameters":{
      "prompt":"what is elastic search"
    }
  }
}

Response:

{
  "completion": [
    {
      "result": """ and how is it used?
Elastic Search is a search engine that's built on top of Lucene, a search engine framework. It's designed to store, search, and analyze text documents efficiently. The key feature of Elastic Search is its ability to index and query structured data in real-time. It can build indices from existing data sources like databases or APIs and then query that data as if it were in-memory. Elastic Search is often used in applications that require fast search and analytics, such as search engines, customer relationship management systems, and big data platforms.

How to use it:

1. Storing Data:

Elastic Search primarily uses the NoSQL document model to store data. Each document is represented as a JSON object, which makes it easy to work with structured data.

2. Indexing:

Elastic Search provides several ways to index data. The most common method is through the REST API, where you can upload data in bulk or in real-time. It also supports indexing data from databases, APIs, or even cloud storage.

3. Querying:

Elastic Search supports various querying mechanisms. It has a simple query syntax that lets you search for specific fields within documents. It also supports aggregate queries, which allow you to perform aggregations like counts, sums, and averages over your data. Additionally, Elastic Search lets you match phrases, ranges, and more.

4. Updating and Deleting Data:

Elastic Search lets you update and delete documents from your index. This"""
    }
  ]
}