Content generation service

更新时间:
复制 MD 格式

This document specifies the API parameters for the OpenAI-compatible content generation service.

URL

{host}/compatible-mode/v1/chat/completions

host: The service endpoint. You can call the API over the public network or from a VPC. For more information, see Get service endpoints.

AI apikey截图.png

Request parameters

Parameter

Type

Required

Description

Example

messages

List[Dict]

Yes

A list of messages that make up the conversation so far.

  • role: The role of the message author. Valid values are system, user, and assistant.

    • The system role is optional but, if present, must be the first message in the list (messages[0]).

    • user and assistant: Represent the conversation between the user and the model. These roles should alternate to simulate a real conversation.

  • content: The content of the message. This field cannot be empty.

[

{"role": "system", "content": "You are a robot assistant"},

{"role": "user", "content": "What is the capital of Henan?"},

{"role": "assistant", "content": "Zhengzhou"},

{"role": "user", "content": "What are some fun places to visit there?"}

]

model

String

Yes

The service ID of the model to use. For a list of supported service IDs, see

List of supported services.

ops-qwen-turbo

max_tokens

Int

No

The maximum number of tokens to generate for the chat completion. If the output reaches this limit, the finish_reason is length. Otherwise, it is stop.

1024

temperature

Float

No

Controls the probability distribution over candidate tokens during generation, which affects the randomness and diversity of the response. The value must be in the range [0, 2). A value of 0 makes the output deterministic by always selecting the most likely token.

Higher values flatten the distribution, making the model more likely to select lower-probability tokens and produce more diverse output. Lower values sharpen the distribution, making the model favor higher-probability tokens and produce more deterministic output.

1

top_p

Float

No

The probability threshold for nucleus sampling. The value must be in the range (0, 1.0). A higher value increases randomness, while a lower value increases determinism.

0.8

presence_penalty

Float

No

Controls the model's tendency to repeat topics within the generated sequence. The value must be in the range [-2.0, 2.0]. The default is 0.

A higher presence_penalty value reduces the likelihood of topic repetition.

0

frequency_penalty

Float

No

Reduces the likelihood of the model repeating the same lines of text. The value must be in the range [-2.0, 2.0]. The default is 0.

Positive values penalize tokens based on their frequency in the text so far, reducing the model's likelihood of repeating the same phrases.

0

stop

String, List[String]

No

A stop sequence. The model stops generating text before outputting the specified string or token ID. The generated content does not include the stop sequence. This parameter can be a string or a list of strings.

null

stream

Boolean

No

Controls whether to use streaming output. If set to true, the API returns a generator that streams incremental results. You must iterate through the generator to receive each part of the sequence. The default is false.

false

Response parameters

Parameter

Type

Description

Example

id

String

The system-generated ID for this request.

2244F3A8-4201-4F37-BF86-42013B1026D6

object

String

The object type, which is always chat.completion.

chat.completion

created

Long

The Unix timestamp, in seconds, when the response was created.

1719313883

model

String

The model name used for the request.

ops-qwen-turbo

choices.index

Int

The index of the result, which starts at 0.

0

choices.message

Map

The message output by the model.

{

"role":"assistant",

"content":"This is an example"

}

choices.finish_reason

String

Possible values for both standard and streaming modes:

  • stop: The model returned a complete output.

  • length: Generation stopped because the output exceeded the max_tokens limit. To generate longer content, increase the max_tokens value.

  • Values starting with content_filter indicate that the output was filtered for safety reasons.

stop

usage.completion_tokens

Int

The number of tokens in the generated response.

150

usage.prompt_tokens

Int

The number of tokens in the user's input prompt.

180

usage.total_tokens

Int

The total number of tokens used, which is the sum of prompt_tokens and completion_tokens.

330

cURL request example

curl http://your-endpoint-in-china-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
        "model":"ops-qwen-turbo",
        "messages":[
            {"role": "system", "content": "You are a robot assistant"},
            {"role": "user", "content": "Recommend 1 science fiction book"}
         ]
  }'

Response example

{
  "id":"fb4b3860e051ecad0b019971******",
  "object":"chat.completion",
  "created":1749804786,
  "model":"ops-qwen-turbo",
  "choices":
      [
         {
            "index":0,
            "message":
                {
                  "role":"assistant",
                  "content":"The 'Three-Body Problem' series by Liu Cixin. This is a story about..."
                 },
                  "finish_reason":"stop"
           }
        ],
     "usage":
         {
             "prompt_tokens":22,
             "completion_tokens":48,
             "total_tokens":70
           }
  }           

Status codes

For details, see the OpenSearch Status codes.