Model inference

更新时间:
复制 MD 格式

The Lindorm AI engine lets you use the model inference RESTful API operation to call models that are in the READY state to perform tasks such as inference and generation.

Prerequisites

A model has been created or imported, and its state is READY. For more information about how to view the model state, see View model details.

API operation

POST v1/ai/models/${MODEL_NAME}/infer

Request parameters

Request parameters

Type

Description

input

No fixed type

The input for model inference. Different models can have different input formats. For more information, see Model inference input (input).

params

JSON

The parameters for model inference. Different models can have different inference parameters. For more information, see Model inference parameters (params).

Model inference input (input)

The input format varies based on the task, as described below:

  • FEATURE_EXTRACTION (Text Embedding): The input format is a string or an array of strings.

  • FEATURE_EXTRACTION (Multimodal Embedding): The following two input formats are supported.

    Array of JSON objects

    Each JSON object must contain the following parameters:

    Parameter

    Type

    Description

    image

    String

    The Base64-encoded string or OSS URI of the input image for model inference.

    text

    String

    The input text string for model inference.

    JSON object

    Parameter name

    Type

    Description

    images

    Array of strings

    The Base64-encoded strings or OSS URIs of the input images for model inference.

    texts

    Array of strings

    The input text strings for model inference.

    You can use this input format to send requests for inference scenarios that involve only images or only plain text.

  • QUESTION_ANSWERING: The input format is a string.

  • SEMANTIC_SIMILARITY: The input format is a JSON object that contains the following parameters:

    Parameter

    Type

    Description

    query

    String

    The query for which you want to compare similarity (rerank).

    chunks

    Array of strings

    A list of document chunks for which you want to compare similarity.

Model inference parameters (params)

FEATURE_EXTRACTION

Parameter

Type

Description

normalize

BOOLEAN

Specifies whether to normalize the vector. The default value is true.

batchSize

INT

The batch size for inference. A smaller value reduces GPU memory consumption. The default value is 32.

QUESTION_ANSWERING

Parameter

Type

Description

max_length

INT

Specifies the maximum length of the generated text.

num_beams

INT

Sets the number of branches for Beam Search. A larger num_beams value improves the quality of the generated text but increases the computing cost.

do_sample

BOOLEAN

Specifies whether to use sampling to generate text.

top_p

FLOAT

Specifies the probability threshold for nucleus sampling. The next word is selected only from the vocabulary whose cumulative probability reaches top_p.

temperature

FLOAT

Specifies the temperature for sampling. A higher temperature value makes the generated text more random and diverse. A lower temperature value makes the generated text more deterministic and focused.

SEMANTIC_SIMILARITY

Parameter name

Type

Description

topK

INT

The number of most similar data entries to return during reranking. The value must be in the range of [1, 10000]. The default value is the length of the input chunks array.

batchSize

INT

The batch size for inference. A smaller value reduces GPU memory consumption. The default value is 32.

Response parameter descriptions

The response parameters in the data field vary based on the task. The following sections provide details.

FEATURE_EXTRACTION

No parameters. The output is a vector.

QUESTION_ANSWERING

Parameter name

Type

Description

data.output

string

This parameter is mutually exclusive with the outputs parameter. When stream_mode is set to off during model creation, this parameter contains the response from the Large Language Model (LLM).

data.outputs

string array

This parameter is mutually exclusive with the output parameter. When stream_mode is set to on during model creation, this parameter contains the streaming output from the LLM. The response appears as multiple string elements in the outputs array.

SEMANTIC_SIMILARITY

Parameter name

Type

Description

data

JSON array

An array of the reranking results.

data.chunk

string

The document chunk after it is reranked.

data.score

floating-point number

The similarity score of the chunk.

data.index

integer

The index of the chunk in the original input.

Response encoding format

You can include the Content-Encoding request header in an HTTP request to receive a compressed response. By default, responses are not compressed. Only gzip compression is supported.

Gzip-compressed request:

POST <URI> HTTP/1.1
Content-Encoding: gzip
Content-Length: 88

"request contents here"

Response:

HTTP/11 200 OK
Content-encoding: gzip
Date: Tue, 24 Sep 2024 09:01:12 GMT
Content-type: application/json
Content-length: 39342

"response contents here"

Examples

Example 1: Feature extraction

Text feature extraction

  • Single string request:

    POST v1/ai/models/bge_m3_model/infer HTTP/1.1
    Content-Type: application/json
    
    {
        "input": "I love Tiananmen"
    }

    Response:

    HTTP/1.1 200 OK
    Date: Tue, 28 Nov 2023 03:18:55 GMT
    Content-type: application/json
    Content-length: 419
    
    {
      "code": 0,
      "msg": "SUCCESS",
      "data": [0.027204733341932297,0.004229982383549213, ...],
      "success": true,
      "request_id":"..."
    }
  • Batch string request:

    POST v1/ai/models/bge_m3_model/infer HTTP/1.1
    Content-Type: application/json
    
    {
        "input": ["I love Tiananmen", "test text"]
    }

    Response:

    HTTP/1.1 200 OK
    Date: Tue, 28 Nov 2023 03:18:55 GMT
    Content-type: application/json
    Content-length: 419
    
    {
      "code": 0,
      "msg": "SUCCESS",
      "data": [[0.027204733341932297,0.004229982383549213, ...], 
      [-0.05367295444011688,0.022600287571549416, ...]],
      "success": true
    }

Example 2: Q&A

Request:

POST v1/ai/models/qa_model/infer HTTP/1.1
Content-Type: application/json

{
    "input": "Write an essay of about 200 words"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Transfer-encoding: chunked
Content-type: application/json

{"code":0,"msg":"SUCCESS","success":true,"data":{"outputs":["","Life is like ","a journey, full of ","unknowns and challenges.","On this journey, ","we will meet various people and ","events. Some make us ","feel excited and curious",", while others make us ","feel frustrated and disappointed",". But no matter what ","situations we encounter, ","we must maintain a positive and optimistic attitude",", believe in ourselves, believe ","in the future, and keep moving forward.","\n\nOn the journey of life",", we will encounter ","various people and events. ","Some people make us feel excited ","and curious, while others ","make us feel frustrated ","and disappointed. But ","no matter what situations ","we encounter, we must maintain ","a positive and optimistic attitude, believe in ourselves",", believe in the future, ","and keep moving forward.","\n\nSometimes, we may ","feel lost and unsure of what to do",". At these times, we need to give ourselves ","some time and space to think ","about what we want and how to achieve ","our goals. At the same time, ","we also need to communicate with others",", listen to their opinions ","and ideas, and from ","them, gain inspiration and help.","\n\nOn the journey ","of life, we need to ","constantly learn and grow ","to better face the challenges ahead",". No matter what ","difficulties and challenges we face, ","we must maintain a positive and optimistic attitude",", believe in ourselves, believe ","in the future, and keep moving forward.","\n\nOn the journey of life",", we need to constantly ","learn and grow to better ","face the challenges ahead.","No matter what ","difficulties and challenges we face, we must ","maintain a positive and optimistic attitude,"]}}%

Example 3: Semantic similarity

Request:

POST v1/ai/models/bge_rerank_model/infer HTTP/1.1
Content-Type: application/json

{
    "input": {"query": "a dog", "chunks": ["a slide", "a small yellow dog"]}
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 419

{
  "code" : 0,
  "msg" : "SUCCESS",
  "data" : [ {
    "chunk" : "a small yellow dog",
    "score" : "0.6785464882850647",
    "index" : 1
  }, {
    "chunk" : "a slide",
    "score" : "0.5992767214775085",
    "index" : 0
  }],
  "success" : true
}