OpenAI-compatible chat completions-OpenSearch(Open Search)-阿里云帮助中心

This document specifies the API parameters for the OpenAI-compatible content generation service.

URL

{host}/compatible-mode/v1/chat/completions

host: The service endpoint. You can call the API over the public network or from a VPC. For more information, see Get service endpoints.

AI apikey截图.png

Request parameters

Parameter	Type	Required	Description	Example
`messages`	List[Dict]	Yes	A list of messages that make up the conversation so far. `role`: The role of the message author. Valid values are `system`, `user`, and `assistant`. The `system` role is optional but, if present, must be the first message in the list (`messages[0]`). `user` and `assistant`: Represent the conversation between the user and the model. These roles should alternate to simulate a real conversation. `content`: The content of the message. This field cannot be empty.	[ {"role": "system", "content": "You are a robot assistant"}, {"role": "user", "content": "What is the capital of Henan?"}, {"role": "assistant", "content": "Zhengzhou"}, {"role": "user", "content": "What are some fun places to visit there?"} ]
`model`	String	Yes	The service ID of the model to use. For a list of supported service IDs, see List of supported services.	ops-qwen-turbo
`max_tokens`	Int	No	The maximum number of tokens to generate for the chat completion. If the output reaches this limit, the `finish_reason` is `length`. Otherwise, it is `stop`.	1024
`temperature`	Float	No	Controls the probability distribution over candidate tokens during generation, which affects the randomness and diversity of the response. The value must be in the range `[0, 2)`. A value of 0 makes the output deterministic by always selecting the most likely token. Higher values flatten the distribution, making the model more likely to select lower-probability tokens and produce more diverse output. Lower values sharpen the distribution, making the model favor higher-probability tokens and produce more deterministic output.	1
`top_p`	Float	No	The probability threshold for nucleus sampling. The value must be in the range `(0, 1.0)`. A higher value increases randomness, while a lower value increases determinism.	0.8
`presence_penalty`	Float	No	Controls the model's tendency to repeat topics within the generated sequence. The value must be in the range `[-2.0, 2.0]`. The default is 0. A higher `presence_penalty` value reduces the likelihood of topic repetition.	0
`frequency_penalty`	Float	No	Reduces the likelihood of the model repeating the same lines of text. The value must be in the range `[-2.0, 2.0]`. The default is 0. Positive values penalize tokens based on their frequency in the text so far, reducing the model's likelihood of repeating the same phrases.	0
`stop`	String, List[String]	No	A stop sequence. The model stops generating text before outputting the specified string or token ID. The generated content does not include the stop sequence. This parameter can be a string or a list of strings.	null
`stream`	Boolean	No	Controls whether to use streaming output. If set to `true`, the API returns a generator that streams incremental results. You must iterate through the generator to receive each part of the sequence. The default is `false`.	false

Response parameters

Parameter	Type	Description	Example
id	String	The system-generated ID for this request.	2244F3A8-4201-4F37-BF86-42013B1026D6
object	String	The object type, which is always `chat.completion`.	chat.completion
created	Long	The Unix timestamp, in seconds, when the response was created.	1719313883
model	String	The model name used for the request.	ops-qwen-turbo
choices.index	Int	The index of the result, which starts at `0`.	0
choices.message	Map	The message output by the model.	{ "role":"assistant", "content":"This is an example" }
choices.finish_reason	String	Possible values for both standard and streaming modes: `stop`: The model returned a complete output. `length`: Generation stopped because the output exceeded the `max_tokens` limit. To generate longer content, increase the `max_tokens` value. Values starting with `content_filter` indicate that the output was filtered for safety reasons.	stop
usage.completion_tokens	Int	The number of tokens in the generated response.	150
usage.prompt_tokens	Int	The number of tokens in the user's input prompt.	180
usage.total_tokens	Int	The total number of tokens used, which is the sum of `prompt_tokens` and `completion_tokens`.	330

cURL request example

curl http://your-endpoint-in-china-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
        "model":"ops-qwen-turbo",
        "messages":[
            {"role": "system", "content": "You are a robot assistant"},
            {"role": "user", "content": "Recommend 1 science fiction book"}
         ]
  }'

Response example

{
  "id":"fb4b3860e051ecad0b019971******",
  "object":"chat.completion",
  "created":1749804786,
  "model":"ops-qwen-turbo",
  "choices":
      [
         {
            "index":0,
            "message":
                {
                  "role":"assistant",
                  "content":"The 'Three-Body Problem' series by Liu Cixin. This is a story about..."
                 },
                  "finish_reason":"stop"
           }
        ],
     "usage":
         {
             "prompt_tokens":22,
             "completion_tokens":48,
             "total_tokens":70
           }
  }

Status codes

For details, see the OpenSearch Status codes.