This document specifies the API parameters for the OpenAI-compatible content generation service.
URL
{host}/compatible-mode/v1/chat/completionshost: The service endpoint. You can call the API over the public network or from a VPC. For more information, see Get service endpoints.

Request parameters
Parameter | Type | Required | Description | Example |
| List[Dict] | Yes | A list of messages that make up the conversation so far.
| [ {"role": "system", "content": "You are a robot assistant"}, {"role": "user", "content": "What is the capital of Henan?"}, {"role": "assistant", "content": "Zhengzhou"}, {"role": "user", "content": "What are some fun places to visit there?"} ] |
| String | Yes | The service ID of the model to use. For a list of supported service IDs, see | ops-qwen-turbo |
| Int | No | The maximum number of tokens to generate for the chat completion. If the output reaches this limit, the | 1024 |
| Float | No | Controls the probability distribution over candidate tokens during generation, which affects the randomness and diversity of the response. The value must be in the range Higher values flatten the distribution, making the model more likely to select lower-probability tokens and produce more diverse output. Lower values sharpen the distribution, making the model favor higher-probability tokens and produce more deterministic output. | 1 |
| Float | No | The probability threshold for nucleus sampling. The value must be in the range | 0.8 |
| Float | No | Controls the model's tendency to repeat topics within the generated sequence. The value must be in the range A higher | 0 |
| Float | No | Reduces the likelihood of the model repeating the same lines of text. The value must be in the range Positive values penalize tokens based on their frequency in the text so far, reducing the model's likelihood of repeating the same phrases. | 0 |
| String, List[String] | No | A stop sequence. The model stops generating text before outputting the specified string or token ID. The generated content does not include the stop sequence. This parameter can be a string or a list of strings. | null |
| Boolean | No | Controls whether to use streaming output. If set to | false |
Response parameters
Parameter | Type | Description | Example |
id | String | The system-generated ID for this request. | 2244F3A8-4201-4F37-BF86-42013B1026D6 |
object | String | The object type, which is always | chat.completion |
created | Long | The Unix timestamp, in seconds, when the response was created. | 1719313883 |
model | String | The model name used for the request. | ops-qwen-turbo |
choices.index | Int | The index of the result, which starts at | 0 |
choices.message | Map | The message output by the model. | { "role":"assistant", "content":"This is an example" } |
choices.finish_reason | String | Possible values for both standard and streaming modes:
| stop |
usage.completion_tokens | Int | The number of tokens in the generated response. | 150 |
usage.prompt_tokens | Int | The number of tokens in the user's input prompt. | 180 |
usage.total_tokens | Int | The total number of tokens used, which is the sum of | 330 |
cURL request example
curl http://your-endpoint-in-china-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model":"ops-qwen-turbo",
"messages":[
{"role": "system", "content": "You are a robot assistant"},
{"role": "user", "content": "Recommend 1 science fiction book"}
]
}'Response example
{
"id":"fb4b3860e051ecad0b019971******",
"object":"chat.completion",
"created":1749804786,
"model":"ops-qwen-turbo",
"choices":
[
{
"index":0,
"message":
{
"role":"assistant",
"content":"The 'Three-Body Problem' series by Liu Cixin. This is a story about..."
},
"finish_reason":"stop"
}
],
"usage":
{
"prompt_tokens":22,
"completion_tokens":48,
"total_tokens":70
}
} Status codes
For details, see the OpenSearch Status codes.