The Lindorm AI engine lets you use the model inference RESTful API operation to call models that are in the READY state to perform tasks such as inference and generation.
Prerequisites
A model has been created or imported, and its state is READY. For more information about how to view the model state, see View model details.
API operation
POST v1/ai/models/${MODEL_NAME}/inferRequest parameters
Request parameters | Type | Description |
input | No fixed type | The input for model inference. Different models can have different input formats. For more information, see Model inference input (input). |
params | JSON | The parameters for model inference. Different models can have different inference parameters. For more information, see Model inference parameters (params). |
Model inference input (input)
The input format varies based on the task, as described below:
FEATURE_EXTRACTION (Text Embedding): The input format is a string or an array of strings.
FEATURE_EXTRACTION (Multimodal Embedding): The following two input formats are supported.
Array of JSON objects
Each JSON object must contain the following parameters:
Parameter
Type
Description
image
String
The Base64-encoded string or OSS URI of the input image for model inference.
text
String
The input text string for model inference.
JSON object
Parameter name
Type
Description
images
Array of strings
The Base64-encoded strings or OSS URIs of the input images for model inference.
texts
Array of strings
The input text strings for model inference.
You can use this input format to send requests for inference scenarios that involve only images or only plain text.
QUESTION_ANSWERING: The input format is a string.
SEMANTIC_SIMILARITY: The input format is a JSON object that contains the following parameters:
Parameter
Type
Description
query
String
The query for which you want to compare similarity (rerank).
chunks
Array of strings
A list of document chunks for which you want to compare similarity.
Model inference parameters (params)
FEATURE_EXTRACTION
Parameter | Type | Description |
normalize | BOOLEAN | Specifies whether to normalize the vector. The default value is |
batchSize | INT | The batch size for inference. A smaller value reduces GPU memory consumption. The default value is |
QUESTION_ANSWERING
Parameter | Type | Description |
max_length | INT | Specifies the maximum length of the generated text. |
num_beams | INT | Sets the number of branches for Beam Search. A larger num_beams value improves the quality of the generated text but increases the computing cost. |
do_sample | BOOLEAN | Specifies whether to use sampling to generate text. |
top_p | FLOAT | Specifies the probability threshold for nucleus sampling. The next word is selected only from the vocabulary whose cumulative probability reaches top_p. |
temperature | FLOAT | Specifies the temperature for sampling. A higher temperature value makes the generated text more random and diverse. A lower temperature value makes the generated text more deterministic and focused. |
SEMANTIC_SIMILARITY
Parameter name | Type | Description |
topK | INT | The number of most similar data entries to return during reranking. The value must be in the range of [1, 10000]. The default value is the length of the input chunks array. |
batchSize | INT | The batch size for inference. A smaller value reduces GPU memory consumption. The default value is |
Response parameter descriptions
The response parameters in the data field vary based on the task. The following sections provide details.
FEATURE_EXTRACTION
No parameters. The output is a vector.
QUESTION_ANSWERING
Parameter name | Type | Description |
data.output | string | This parameter is mutually exclusive with the outputs parameter. When stream_mode is set to |
data.outputs | string array | This parameter is mutually exclusive with the output parameter. When stream_mode is set to |
SEMANTIC_SIMILARITY
Parameter name | Type | Description |
data | JSON array | An array of the reranking results. |
data.chunk | string | The document chunk after it is reranked. |
data.score | floating-point number | The similarity score of the chunk. |
data.index | integer | The index of the chunk in the original input. |
Response encoding format
You can include the Content-Encoding request header in an HTTP request to receive a compressed response. By default, responses are not compressed. Only gzip compression is supported.
Gzip-compressed request:
POST <URI> HTTP/1.1
Content-Encoding: gzip
Content-Length: 88
"request contents here"Response:
HTTP/11 200 OK
Content-encoding: gzip
Date: Tue, 24 Sep 2024 09:01:12 GMT
Content-type: application/json
Content-length: 39342
"response contents here"Examples
Example 1: Feature extraction
Text feature extraction
Single string request:
POST v1/ai/models/bge_m3_model/infer HTTP/1.1 Content-Type: application/json { "input": "I love Tiananmen" }Response:
HTTP/1.1 200 OK Date: Tue, 28 Nov 2023 03:18:55 GMT Content-type: application/json Content-length: 419 { "code": 0, "msg": "SUCCESS", "data": [0.027204733341932297,0.004229982383549213, ...], "success": true, "request_id":"..." }Batch string request:
POST v1/ai/models/bge_m3_model/infer HTTP/1.1 Content-Type: application/json { "input": ["I love Tiananmen", "test text"] }Response:
HTTP/1.1 200 OK Date: Tue, 28 Nov 2023 03:18:55 GMT Content-type: application/json Content-length: 419 { "code": 0, "msg": "SUCCESS", "data": [[0.027204733341932297,0.004229982383549213, ...], [-0.05367295444011688,0.022600287571549416, ...]], "success": true }
Example 2: Q&A
Request:
POST v1/ai/models/qa_model/infer HTTP/1.1
Content-Type: application/json
{
"input": "Write an essay of about 200 words"
}Response:
HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Transfer-encoding: chunked
Content-type: application/json
{"code":0,"msg":"SUCCESS","success":true,"data":{"outputs":["","Life is like ","a journey, full of ","unknowns and challenges.","On this journey, ","we will meet various people and ","events. Some make us ","feel excited and curious",", while others make us ","feel frustrated and disappointed",". But no matter what ","situations we encounter, ","we must maintain a positive and optimistic attitude",", believe in ourselves, believe ","in the future, and keep moving forward.","\n\nOn the journey of life",", we will encounter ","various people and events. ","Some people make us feel excited ","and curious, while others ","make us feel frustrated ","and disappointed. But ","no matter what situations ","we encounter, we must maintain ","a positive and optimistic attitude, believe in ourselves",", believe in the future, ","and keep moving forward.","\n\nSometimes, we may ","feel lost and unsure of what to do",". At these times, we need to give ourselves ","some time and space to think ","about what we want and how to achieve ","our goals. At the same time, ","we also need to communicate with others",", listen to their opinions ","and ideas, and from ","them, gain inspiration and help.","\n\nOn the journey ","of life, we need to ","constantly learn and grow ","to better face the challenges ahead",". No matter what ","difficulties and challenges we face, ","we must maintain a positive and optimistic attitude",", believe in ourselves, believe ","in the future, and keep moving forward.","\n\nOn the journey of life",", we need to constantly ","learn and grow to better ","face the challenges ahead.","No matter what ","difficulties and challenges we face, we must ","maintain a positive and optimistic attitude,"]}}%Example 3: Semantic similarity
Request:
POST v1/ai/models/bge_rerank_model/infer HTTP/1.1
Content-Type: application/json
{
"input": {"query": "a dog", "chunks": ["a slide", "a small yellow dog"]}
}Response:
HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 419
{
"code" : 0,
"msg" : "SUCCESS",
"data" : [ {
"chunk" : "a small yellow dog",
"score" : "0.6785464882850647",
"index" : 1
}, {
"chunk" : "a slide",
"score" : "0.5992767214775085",
"index" : 0
}],
"success" : true
}