AI model inference-Lindorm(Lindorm)-阿里云帮助中心

The Lindorm AI engine lets you use the model inference RESTful API operation to call models that are in the READY state to perform tasks such as inference and generation.

Prerequisites

A model has been created or imported, and its state is READY. For more information about how to view the model state, see View model details.

API operation

POST v1/ai/models/${MODEL_NAME}/infer

Request parameters

Request parameters	Type	Description
input	No fixed type	The input for model inference. Different models can have different input formats. For more information, see Model inference input (input).
params	JSON	The parameters for model inference. Different models can have different inference parameters. For more information, see Model inference parameters (params).

Model inference input (input)

The input format varies based on the task, as described below:

FEATURE_EXTRACTION (Text Embedding): The input format is a string or an array of strings.
FEATURE_EXTRACTION (Multimodal Embedding): The following two input formats are supported.
Array of JSON objects
Each JSON object must contain the following parameters:
Parameter
Type
Description
image
String
The Base64-encoded string or OSS URI of the input image for model inference.
text
String
The input text string for model inference.
JSON object
Parameter name
Type
Description
images
Array of strings
The Base64-encoded strings or OSS URIs of the input images for model inference.
texts
Array of strings
The input text strings for model inference.
You can use this input format to send requests for inference scenarios that involve only images or only plain text.
QUESTION_ANSWERING: The input format is a string.
SEMANTIC_SIMILARITY: The input format is a JSON object that contains the following parameters:
Parameter
Type
Description
query
String
The query for which you want to compare similarity (rerank).
chunks
Array of strings
A list of document chunks for which you want to compare similarity.

Model inference parameters (params)

FEATURE_EXTRACTION

Parameter	Type	Description
normalize	BOOLEAN	Specifies whether to normalize the vector. The default value is `true`.
batchSize	INT	The batch size for inference. A smaller value reduces GPU memory consumption. The default value is `32`.

QUESTION_ANSWERING

Parameter	Type	Description
max_length	INT	Specifies the maximum length of the generated text.
num_beams	INT	Sets the number of branches for Beam Search. A larger num_beams value improves the quality of the generated text but increases the computing cost.
do_sample	BOOLEAN	Specifies whether to use sampling to generate text.
top_p	FLOAT	Specifies the probability threshold for nucleus sampling. The next word is selected only from the vocabulary whose cumulative probability reaches top_p.
temperature	FLOAT	Specifies the temperature for sampling. A higher temperature value makes the generated text more random and diverse. A lower temperature value makes the generated text more deterministic and focused.

SEMANTIC_SIMILARITY

Parameter name	Type	Description
topK	INT	The number of most similar data entries to return during reranking. The value must be in the range of [1, 10000]. The default value is the length of the input chunks array.
batchSize	INT	The batch size for inference. A smaller value reduces GPU memory consumption. The default value is `32`.

Response parameter descriptions

The response parameters in the data field vary based on the task. The following sections provide details.

FEATURE_EXTRACTION

No parameters. The output is a vector.

QUESTION_ANSWERING

Parameter name	Type	Description
data.output	string	This parameter is mutually exclusive with the outputs parameter. When stream_mode is set to `off` during model creation, this parameter contains the response from the Large Language Model (LLM).
data.outputs	string array	This parameter is mutually exclusive with the output parameter. When stream_mode is set to `on` during model creation, this parameter contains the streaming output from the LLM. The response appears as multiple string elements in the outputs array.

SEMANTIC_SIMILARITY

Parameter name	Type	Description
data	JSON array	An array of the reranking results.
data.chunk	string	The document chunk after it is reranked.
data.score	floating-point number	The similarity score of the chunk.
data.index	integer	The index of the chunk in the original input.

Response encoding format

You can include the Content-Encoding request header in an HTTP request to receive a compressed response. By default, responses are not compressed. Only gzip compression is supported.

Gzip-compressed request:

POST <URI> HTTP/1.1
Content-Encoding: gzip
Content-Length: 88

"request contents here"

Response:

HTTP/11 200 OK
Content-encoding: gzip
Date: Tue, 24 Sep 2024 09:01:12 GMT
Content-type: application/json
Content-length: 39342

"response contents here"

Examples

Example 1: Feature extraction

Text feature extraction

Single string request:

POST v1/ai/models/bge_m3_model/infer HTTP/1.1
Content-Type: application/json

{
    "input": "I love Tiananmen"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 419

{
  "code": 0,
  "msg": "SUCCESS",
  "data": [0.027204733341932297,0.004229982383549213, ...],
  "success": true,
  "request_id":"..."
}

Batch string request:

POST v1/ai/models/bge_m3_model/infer HTTP/1.1
Content-Type: application/json

{
    "input": ["I love Tiananmen", "test text"]
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 419

{
  "code": 0,
  "msg": "SUCCESS",
  "data": [[0.027204733341932297,0.004229982383549213, ...], 
  [-0.05367295444011688,0.022600287571549416, ...]],
  "success": true
}

Example 2: Q&A

Request:

POST v1/ai/models/qa_model/infer HTTP/1.1
Content-Type: application/json

{
    "input": "Write an essay of about 200 words"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Transfer-encoding: chunked
Content-type: application/json

{"code":0,"msg":"SUCCESS","success":true,"data":{"outputs":["","Life is like ","a journey, full of ","unknowns and challenges.","On this journey, ","we will meet various people and ","events. Some make us ","feel excited and curious",", while others make us ","feel frustrated and disappointed",". But no matter what ","situations we encounter, ","we must maintain a positive and optimistic attitude",", believe in ourselves, believe ","in the future, and keep moving forward.","\n\nOn the journey of life",", we will encounter ","various people and events. ","Some people make us feel excited ","and curious, while others ","make us feel frustrated ","and disappointed. But ","no matter what situations ","we encounter, we must maintain ","a positive and optimistic attitude, believe in ourselves",", believe in the future, ","and keep moving forward.","\n\nSometimes, we may ","feel lost and unsure of what to do",". At these times, we need to give ourselves ","some time and space to think ","about what we want and how to achieve ","our goals. At the same time, ","we also need to communicate with others",", listen to their opinions ","and ideas, and from ","them, gain inspiration and help.","\n\nOn the journey ","of life, we need to ","constantly learn and grow ","to better face the challenges ahead",". No matter what ","difficulties and challenges we face, ","we must maintain a positive and optimistic attitude",", believe in ourselves, believe ","in the future, and keep moving forward.","\n\nOn the journey of life",", we need to constantly ","learn and grow to better ","face the challenges ahead.","No matter what ","difficulties and challenges we face, we must ","maintain a positive and optimistic attitude,"]}}%

Example 3: Semantic similarity

Request:

POST v1/ai/models/bge_rerank_model/infer HTTP/1.1
Content-Type: application/json

{
    "input": {"query": "a dog", "chunks": ["a slide", "a small yellow dog"]}
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 419

{
  "code" : 0,
  "msg" : "SUCCESS",
  "data" : [ {
    "chunk" : "a small yellow dog",
    "score" : "0.6785464882850647",
    "index" : 1
  }, {
    "chunk" : "a slide",
    "score" : "0.5992767214775085",
    "index" : 0
  }],
  "success" : true
}

Prerequisites

API operation

Request parameters

Model inference input (input)

Array of JSON objects

JSON object

Model inference parameters (params)

FEATURE_EXTRACTION

QUESTION_ANSWERING

SEMANTIC_SIMILARITY

Response parameter descriptions

FEATURE_EXTRACTION

QUESTION_ANSWERING

SEMANTIC_SIMILARITY

Response encoding format

Examples

Example 1: Feature extraction

Text feature extraction

Example 2: Q&A

Example 3: Semantic similarity