Anthropic-compatible Messages

Migrate your Anthropic application to Model Studio by changing three settings. This topic covers the request and response parameters with code examples.

To migrate an existing Anthropic application to Model Studio, change these settings:

api_key: Replace with the Model Studio API key.
base_url: Replace with a Model Studio endpoint listed below.
model: Replace with a supported model name, such as qwen3.7-plus.

Important

Alibaba Cloud Model Studio has released workspace-specific domains for the China (Beijing), Singapore regions. The new dedicated domains deliver superior performance and higher stability for inference requests. We recommend migrating to the new domains:

China (Beijing): from https://dashscope.aliyuncs.com to https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com
Singapore: from https://dashscope-intl.aliyuncs.com to https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com

{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Alibaba Cloud Model Studio console. The existing domain remains fully functional.

China (Beijing)

SDK base_url:https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic

HTTP request URL:POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages

Singapore

SDK base_url:https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/apps/anthropic

HTTP request URL:POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/apps/anthropic/v1/messages

US (Virginia)

SDK base_url:https://dashscope-us.aliyuncs.com/apps/anthropic

HTTP request URL:POST https://dashscope-us.aliyuncs.com/apps/anthropic/v1/messages

Germany (Frankfurt)

SDK base_url:https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/apps/anthropic

HTTP request URL:POST https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/apps/anthropic/v1/messages

Japan (Tokyo)

SDK base_url:https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/apps/anthropic

HTTP request URL:POST https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/apps/anthropic/v1/messages

Replace {WorkspaceId} with your actual workspace ID.

Authentication: Pass your Model Studio API key in either the x-api-key header or the Authorization: Bearer header.

Request Body	Basic Call Python `import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) message = client.messages.create( model="qwen3.7-plus", max_tokens=1024, system="You are a helpful assistant", messages=[ { "role": "user", "content": "Who are you?" } ], thinking={"type": "disabled"}, ) print(message.content[0].text)` TypeScript import Anthropic from "@anthropic-ai/sdk"; const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); async function main() { const message = await anthropic.messages.create({ model: "qwen3.7-plus", max_tokens: 1024, system: "You are a helpful assistant", messages: [{ role: "user", content: "Who are you?" }], thinking: { type: "disabled" }, }); console.log(message.content[0].text); } main().catch(console.error); curl `curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "qwen3.7-plus", "max_tokens": 1024, "system": "You are a helpful assistant", "messages": [ { "role": "user", "content": "Who are you?" } ], "thinking": {"type": "disabled"} }'` Streaming Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) stream = client.messages.create( model="qwen3.7-plus", max_tokens=1024, stream=True, messages=[ { "role": "user", "content": "Give a brief introduction to artificial intelligence." } ], thinking={"type": "disabled"}, ) for chunk in stream: if chunk.type == "content_block_delta": if hasattr(chunk.delta, 'text'): print(chunk.delta.text, end="", flush=True) TypeScript import Anthropic from "@anthropic-ai/sdk"; async function main() { const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); const stream = await anthropic.messages.create({ model: "qwen3.7-plus", max_tokens: 1024, stream: true, messages: [{ role: "user", content: "Give a brief introduction to artificial intelligence." }], thinking: { type: "disabled" }, }); for await (const chunk of stream) { if (chunk.type === "content_block_delta" && 'text' in chunk.delta) { process.stdout.write(chunk.delta.text); } } } main().catch(console.error); curl `curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ --no-buffer \ -d '{ "model": "qwen3.7-plus", "max_tokens": 1024, "stream": true, "messages": [ { "role": "user", "content": "Give a brief introduction to artificial intelligence." } ], "thinking": {"type": "disabled"} }'` Extended Thinking Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) stream = client.messages.create( model="qwen3.7-plus", max_tokens=2048, stream=True, thinking={ "type": "enabled", "budget_tokens": 1024 }, messages=[ { "role": "user", "content": "Analyze the future prospects of quantum computing." } ] ) for chunk in stream: if chunk.type == "content_block_delta": if hasattr(chunk.delta, 'thinking'): print(chunk.delta.thinking, end="", flush=True) elif hasattr(chunk.delta, 'text'): print(chunk.delta.text, end="", flush=True) TypeScript import Anthropic from "@anthropic-ai/sdk"; async function main() { const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); const stream = await anthropic.messages.create({ model: "qwen3.7-plus", max_tokens: 2048, stream: true, thinking: { type: "enabled", budget_tokens: 1024 }, messages: [{ role: "user", content: "Analyze the future prospects of quantum computing." }] }); for await (const chunk of stream) { if (chunk.type === "content_block_delta") { if ('thinking' in chunk.delta) { process.stdout.write(chunk.delta.thinking); } else if ('text' in chunk.delta) { process.stdout.write(chunk.delta.text); } } } } main().catch(console.error); curl `curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "qwen3.7-plus", "max_tokens": 2048, "stream": true, "thinking": { "type": "enabled", "budget_tokens": 1024 }, "messages": [ { "role": "user", "content": "Analyze the future prospects of quantum computing." } ] }'` Image Understanding Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) stream = client.messages.create( model="qwen3.7-plus", max_tokens=1024, stream=True, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "url", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250414/mqqmiy/animal_01.jpg", }, }, { "type": "text", "text": "Describe the content of this image." }, ], } ], thinking={"type": "disabled"}, ) for chunk in stream: if chunk.type == "content_block_delta": if hasattr(chunk.delta, 'text'): print(chunk.delta.text, end="", flush=True) TypeScript import Anthropic from "@anthropic-ai/sdk"; async function main() { const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); const stream = await anthropic.messages.create({ model: "qwen3.7-plus", max_tokens: 1024, stream: true, messages: [{ role: "user", content: [ { type: "image", source: { type: "url", url: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250414/mqqmiy/animal_01.jpg", }, }, { type: "text", text: "Describe the content of this image." }, ], }], thinking: { type: "disabled" }, }); for await (const chunk of stream) { if (chunk.type === "content_block_delta" && 'text' in chunk.delta) { process.stdout.write(chunk.delta.text); } } } main().catch(console.error); curl curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "qwen3.7-plus", "max_tokens": 1024, "stream": true, "messages": [ { "role": "user", "content": [ { "type": "image", "source": { "type": "url", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250414/mqqmiy/animal_01.jpg" } }, { "type": "text", "text": "Describe the content of this image." } ] } ], "thinking": {"type": "disabled"} }' Video Understanding Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) stream = client.messages.create( model="qwen3.7-plus", max_tokens=1024, stream=True, messages=[ { "role": "user", "content": [ { "type": "video", "source": { "type": "url", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251208/zpupby/3e81ef38-98f0-4d55-bbb6-259334ca18d0.mp4", }, }, { "type": "text", "text": "Describe the content of this video." }, ], } ], thinking={"type": "disabled"}, ) for chunk in stream: if chunk.type == "content_block_delta": if hasattr(chunk.delta, 'text'): print(chunk.delta.text, end="", flush=True) TypeScript import Anthropic from "@anthropic-ai/sdk"; async function main() { const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); const stream = await anthropic.messages.create({ model: "qwen3.7-plus", max_tokens: 1024, stream: true, messages: [{ role: "user", content: [ { type: "video", source: { type: "url", url: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251208/zpupby/3e81ef38-98f0-4d55-bbb6-259334ca18d0.mp4", }, }, { type: "text", text: "Describe the content of this video." }, ], }], thinking: { type: "disabled" }, }); for await (const chunk of stream) { if (chunk.type === "content_block_delta" && 'text' in chunk.delta) { process.stdout.write(chunk.delta.text); } } } main().catch(console.error); curl curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "qwen3.7-plus", "max_tokens": 1024, "stream": true, "messages": [ { "role": "user", "content": [ { "type": "video", "source": { "type": "url", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251208/zpupby/3e81ef38-98f0-4d55-bbb6-259334ca18d0.mp4" } }, { "type": "text", "text": "Describe the content of this video." } ] } ], "thinking": {"type": "disabled"} }' Function calling Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) tools = [ { "name": "get_weather", "description": "Get weather information for a specified city", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name" } }, "required": ["city"] } } ] message = client.messages.create( model="qwen3.7-plus", max_tokens=1024, tools=tools, messages=[ { "role": "user", "content": "What's the weather like in Hangzhou today?" } ] ) print(message.content) TypeScript import Anthropic from "@anthropic-ai/sdk"; async function main() { const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); const message = await anthropic.messages.create({ model: "qwen3.7-plus", max_tokens: 1024, tools: [ { name: "get_weather", description: "Get weather information for a specified city", input_schema: { type: "object", properties: { city: { type: "string", description: "City name" } }, required: ["city"], }, }, ], messages: [{ role: "user", content: "What's the weather like in Hangzhou today?" }], }); console.log(JSON.stringify(message.content, null, 2)); } main().catch(console.error); curl curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "qwen3.7-plus", "max_tokens": 1024, "tools": [ { "name": "get_weather", "description": "Get weather information for a specified city", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name" } }, "required": ["city"] } } ], "messages": [ { "role": "user", "content": "What's the weather like in Hangzhou today?" } ] }' Prompt Caching Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) # Simulate code repository content. Must reach minimum cacheable length (1024 tokens) long_text_content = "<Your Code Here>" * 400 def get_completion(user_input): response = client.messages.create( # Choose a model that supports prompt caching model="qwen3.7-plus", max_tokens=1024, system=[ { "type": "text", "text": long_text_content, # Add cache_control on a text block to mark a cache breakpoint. Can also be placed on content blocks in the messages array "cache_control": {"type": "ephemeral"}, } ], messages=[ {"role": "user", "content": user_input}, ], ) return response # First request: Create cache first = get_completion("What does this code do?") print(f"Cache creation tokens: {first.usage.cache_creation_input_tokens}") print(f"Cache read tokens: {first.usage.cache_read_input_tokens}") print("=" * 20) # Second request: Same long content, different question -> Cache hit second = get_completion("How can this code be optimized?") print(f"Cache creation tokens: {second.usage.cache_creation_input_tokens}") print(f"Cache read tokens: {second.usage.cache_read_input_tokens}") TypeScript import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); // Simulate code repository content. Must reach minimum cacheable length (1024 tokens) const longTextContent = "<Your Code Here>".repeat(400); async function getCompletion(userInput) { return client.messages.create({ // Choose a model that supports prompt caching model: "qwen3.7-plus", max_tokens: 1024, system: [ { type: "text", text: longTextContent, // Add cache_control on a text block to mark a cache breakpoint. Can also be placed on content blocks in the messages array cache_control: { type: "ephemeral" }, }, ], messages: [{ role: "user", content: userInput }], }); } // First request: Create cache const first = await getCompletion("What does this code do?"); console.log(`Cache creation tokens: ${first.usage.cache_creation_input_tokens}`); console.log(`Cache read tokens: ${first.usage.cache_read_input_tokens}`); console.log("=".repeat(20)); // Second request: Same long content, different question -> Cache hit const second = await getCompletion("How can this code be optimized?"); console.log(`Cache creation tokens: ${second.usage.cache_creation_input_tokens}`); console.log(`Cache read tokens: ${second.usage.cache_read_input_tokens}`); curl `curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "qwen3.7-plus", "max_tokens": 1024, "system": [ { "type": "text", "text": "<Place cacheable content here with at least 1024 tokens>", "cache_control": {"type": "ephemeral"} } ], "messages": [ {"role": "user", "content": "What does this code do?"} ] }'` Structured Outputs Python import anthropic import os client = anthropic.Anthropic( api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", ) message = client.messages.create( model="deepseek-v4-pro", max_tokens=1024, messages=[ { "role": "user", "content": "Extract key info from this email: John Smith (john@example.com) is interested in the Enterprise plan and wants to schedule a demo for next Tuesday at 2pm." } ], extra_body={ "output_config": { "format": { "type": "json_schema", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string"}, "plan_interest": {"type": "string"}, "demo_requested": {"type": "boolean"} }, "required": ["name", "email", "plan_interest", "demo_requested"], "additionalProperties": False } } } }, ) # deepseek-v4-pro returns a thinking block; find the text content block text_block = next(block for block in message.content if block.type == "text") print(text_block.text) TypeScript import Anthropic from "@anthropic-ai/sdk"; const anthropic = new Anthropic({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic", }); async function main() { // output_config is a Model Studio platform extension parameter, pass via body const message = await (anthropic.messages.create as Function)({ model: "deepseek-v4-pro", max_tokens: 1024, messages: [{ role: "user", content: "Extract key info from this email: John Smith (john@example.com) is interested in the Enterprise plan and wants to schedule a demo for next Tuesday at 2pm." }], output_config: { format: { type: "json_schema", schema: { type: "object", properties: { name: { type: "string" }, email: { type: "string" }, plan_interest: { type: "string" }, demo_requested: { type: "boolean" } }, required: ["name", "email", "plan_interest", "demo_requested"], additionalProperties: false } } } }); // deepseek-v4-pro returns a thinking block; find the text content block const textBlock = message.content.find( (block: { type: string }) => block.type === "text" ); console.log(textBlock?.text); } main().catch(console.error); curl curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/apps/anthropic/v1/messages" \ -H "Content-Type: application/json" \ -H "x-api-key: $DASHSCOPE_API_KEY" \ -d '{ "model": "deepseek-v4-pro", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Extract key info from this email: John Smith (john@example.com) is interested in the Enterprise plan and wants to schedule a demo for next Tuesday at 2pm." } ], "output_config": { "format": { "type": "json_schema", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string"}, "plan_interest": {"type": "string"}, "demo_requested": {"type": "boolean"} }, "required": ["name", "email", "plan_interest", "demo_requested"], "additionalProperties": false } } } }'
model `string` (Required) Model name. Supported models: Supported Models Qwen-Max: qwen3.8-max-preview (Token Plan only), qwen3.7-max, qwen3.7-max-2026-05-20, qwen3.7-max-2026-06-08, qwen3.6-max-preview, qwen3-max, qwen3-max-2026-01-23, qwen3-max-preview Qwen-Plus: qwen3.7-plus, qwen3.7-plus-2026-05-26, qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-04-20, qwen3.5-plus-2026-02-15, qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11 Qwen-Flash: qwen3.6-flash, qwen3.6-flash-2026-04-16, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen-flash, qwen-flash-2025-07-28 Qwen-Turbo: qwen-turbo Qwen-Coder: qwen3-coder-next, qwen3-coder-plus, qwen3-coder-plus-2025-09-23, qwen3-coder-flash Qwen-VL: qwen3-vl-plus, qwen3-vl-flash, qwen-vl-max, qwen-vl-plus Qwen Open-Source Models: qwen3.6-27b, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b Third-Party Models deepseek-v4-pro, deepseek-v4-flash, kimi-k2.5, kimi-k2-thinking, glm-5.1, glm-5, glm-4.7, glm-4.6, MiniMax-M2.5, MiniMax-M2.1
max_tokens `integer` (Required) Maximum number of tokens for the reply content. If the generated content exceeds this value, generation stops early and `stop_reason` is `max_tokens`. `max_tokens` does not limit the length of the thinking process. When extended thinking is enabled, the thinking tokens are controlled separately by `thinking.budget_tokens`.
system `string or array` (Optional) System prompt that defines model behavior. `system` is a top-level parameter — the `messages` array does not accept a `system` role. A string equals a single `type="text"` block. Pass an array to mark prompt caching breakpoints. Properties type `string` (Required) Fixed value: `text`. text `string` (Required) The system prompt text. cache_control `object` (Optional) Prompt caching breakpoint. On cache hit, subsequent requests are billed at the cache read rate. Contains only `type`, fixed to `ephemeral`.
messages `array` (Required) The message array, arranged in alternating `user`/`assistant` turns. messages array element role `string` (Required) The message role. Valid values: `user`, `assistant`. content `string or array` (Required) Plain text string or structured content array. A string equals a single `content` block with `type="text"`. content array element types Text Properties type `string` (Required) Fixed value: `text`. text `string` (Required) The text content. cache_control `object` (Optional) Prompt caching breakpoint. Contains only `type`, fixed to `ephemeral`. Image (requires vision model) Properties type `string` (Required) Fixed value: `image`. source `object` (Required) The source of the image data. Properties type `string` (Required) Valid values: `url` (public image URL), `base64` (Base64-encoded). url `string` The public URL of the image. Required when `type` is `url`. media_type `string` The MIME type of the image, such as `image/jpeg`. Required when `type` is `base64`. data `string` The Base64-encoded image data. Required when `type` is `base64`. Video (requires vision model) Properties type `string` (Required) Fixed value: `video`. source `object` (Required) The source of the video data. Properties type `string` (Required) Valid values: `url` (public video URL), `base64` (Base64-encoded). url `string` The public URL of the video. Required when `type` is `url`. media_type `string` The MIME type of the video, such as `video/mp4`. Required when `type` is `base64`. data `string` The Base64-encoded video data. Required when `type` is `base64`. Tool use (assistant role; tool call instruction returned by the model) Properties type `string` (Required) Fixed value: `tool_use`. id `string` (Required) The unique identifier of the tool call, used to associate the result in a subsequent `tool_result`. name `string` (Required) The name of the called tool. input `object` (Required) The input parameters of the tool call. The structure is determined by the `input_schema` of the corresponding tool in `tools`. cache_control `object` (Optional) Prompt caching breakpoint. Contains only `type`, fixed to `ephemeral`. The tool call content participates in the cache prefix. Tool result (user role; execution result of a tool sent back to the model) Properties type `string` (Required) Fixed value: `tool_result`. tool_use_id `string` (Required) Corresponds to the `id` in the `tool_use` block. content `string` (Required) Content returned by the tool. cache_control `object` (Optional) Prompt caching breakpoint. Contains only `type`, fixed to `ephemeral`.
stream `boolean` (Optional) Whether to enable streaming. Default value: `false`.
temperature `number` (Optional) Controls the diversity of generated text. Value range: [0, 2). Higher values produce more random results. Note This range is different from the official Anthropic range of [0.0, 1.0]. When migrating from Anthropic, verify the value of this parameter.
top_p `number` (Optional) Nucleus sampling probability threshold. Both `temperature` and `top_p` can control the diversity of generated text. We recommend setting only one of them. For more information, see Overview.
top_k `integer` (Optional) Candidate set size during sampling.
stop_sequences `array` (Optional) Text sequences that trigger generation to stop. Output ends before the matched sequence. Note After a match, the `stop_reason` in the response is still `end_turn`, and the response does not include the matched sequence.
thinking `object` (Optional) Extended thinking configuration. When enabled, the model reasons before responding, and the response includes `thinking`-type content blocks. Not all models support thinking mode. Properties type `string` (Required) Valid values: `enabled` (enable thinking mode), `disabled` (disable thinking mode). budget_tokens `integer` (Optional) Maximum tokens for the thinking process. Disjoint from `max_tokens`: this parameter limits the thinking portion, while `max_tokens` limits the final reply. A larger budget allows more thorough analysis on complex questions. Takes effect when `type` is `enabled`.
tools `array` (Optional) Tool definitions for function calling. tools array element name `string` (Required) The tool name. description `string` (Optional) The description of the tool function. input_schema `object` (Required) The JSON Schema definition of the tool input parameters.
tool_choice `object` (Optional) Tool selection strategy: `{"type": "auto"}`: The model decides whether to call a tool (default). `{"type": "any"}`: Force the model to call any tool. `{"type": "none"}`: Prohibit the model from calling tools. `{"type": "tool", "name": "tool_name"}`: Force the model to call a specified tool.
output_config `object` (Optional) Properties effort `string` (Optional) Controls the inference intensity of models. The valid values and default values vary by model. Default value: `max`. Valid values: `high`: High-intensity inference `max`: Maximum-intensity inference low and medium are mapped to high, and xhigh is mapped to max. Supported models: glm-5.2, deepseek-v4-pro, and deepseek-v4-flash(supplied by Alibaba Cloud). format `object` (Optional) Structured output configuration. When enabled, the model returns a JSON string. Behavior varies by model: Strict structured outputs: Available for deepseek and glm series models. The model strictly follows the provided JSON Schema, guaranteeing the same field types and hierarchy. Regular structured outputs: For all other models, schema field constraints are not enforced — the API automatically falls back to a plain JSON mode (only guaranteeing that the output is a valid JSON string). In this fallback mode, the request must satisfy both of the following: (1) the `output_config` parameter is explicitly provided; (2) the `system` or `messages` content contains the keyword "JSON" (case-insensitive). If the keyword "JSON" is missing, the API throws: `'messages' must contain the word 'json' in some form`. Properties type `string` (Required) Fixed value: `json_schema`. schema `object` (Required) JSON Schema object that follows the standard JSON Schema specification. Should include `type` (data type), `properties` (field definitions), `required` (array of required field names), and `additionalProperties` (must be set to `false`).

Non-streaming Response	Response Example `{ "id": "msg_e2898f19-fc0e-4cb3-bd9b-5b7dc4ea3bc9", "type": "message", "role": "assistant", "model": "qwen3.7-plus", "content": [ { "type": "thinking", "thinking": "Let me analyze this problem...", "signature": "" }, { "type": "text", "text": "Hello! I am Qwen..." } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 22, "output_tokens": 223, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0 } }`
id `string` Unique message identifier.
type `string` Fixed value: `message`.
role `string` Fixed value: `assistant`.
model `string` The model used for generation.
content `array` The content array. content array element types Text Properties type `string` Fixed value: `text`. text `string` The text response generated by the model. Thinking (returned when Extended Thinking is enabled) Properties type `string` Fixed value: `thinking`. thinking `string` The model's reasoning before the final response. signature `string` Currently fixed as an empty string. Tool use (function call scenario) Properties type `string` Fixed value: `tool_use`. id `string` Unique tool call identifier, used to match the `tool_result`. name `string` The name of the called tool. input `object` The input parameters of the tool call.
stop_reason `string` Reason generation stopped. Valid values: `end_turn` (normal completion), `max_tokens` (token limit reached), `tool_use` (tool call).
stop_sequence `string` Always `null`.
usage `object` Token usage statistics. Note In streaming calls, the `usage` field of the `message_start` event contains only `input_tokens` and `output_tokens`. The full four fields are returned in the `message_delta` event. Properties input_tokens `integer` Input tokens. output_tokens `integer` Output tokens. cache_creation_input_tokens `integer` Tokens consumed for cache creation. cache_read_input_tokens `integer` Tokens consumed from cache reads.

Streaming Response	Streaming response example {"type":"message_start","message":{"id":"msg_xxx","type":"message","role":"assistant","model":"qwen3.7-plus","content":[],"usage":{"input_tokens":15,"output_tokens":0}}} {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":"","signature":""}} {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"Here's a thinking process:\n\n1. Analyze User Input:\n - Topic: Artificial Intelligence (AI)\n - Request: Give a brief introduction to artificial intelligence."}} {"type":"content_block_delta","index":0,"delta":{"type":"signature_delta","signature":""}} {"type":"content_block_stop","index":0} {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}} {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"Artificial intelligence (AI) is an important branch of computer science..."}} {"type":"content_block_stop","index":1} {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":15,"output_tokens":1078,"cache_creation_input_tokens":0,"cache_read_input_tokens":0}} {"type":"message_stop"}
message_start First stream event, marks message start. Properties type `string` Fixed value: `message_start`. message `object` The initial message object. `content` is an empty array, and `usage` contains only `input_tokens` and `output_tokens`.
content_block_start Marks the start of a content block. Properties type `string` Fixed value: `content_block_start`. index `integer` 0-based index corresponding to position in the `content` array. content_block `object` The initial object of the content block. The `type` value is `text`, `thinking`, or `tool_use`. For the `tool_use` type, the `input` field is an empty object in this event, and the complete input parameters are assembled from subsequent `content_block_delta` deltas.
content_block_delta Incremental content block update. Multiple deltas sent per block. Properties type `string` Fixed value: `content_block_delta`. index `integer` The index of the associated content block. delta `object` Delta object. `type` values: `text_delta`: Text delta, containing the `text` field. `thinking_delta`: Thinking delta, containing the `thinking` field. `signature_delta`: Signature delta, containing the `signature` field (currently fixed as an empty string). `input_json_delta`: Tool call input parameter delta, containing the `partial_json` field.
content_block_stop Marks the end of a content block. Properties type `string` Fixed value: `content_block_stop`. index `integer` The index of the ended content block.
message_delta Sent after all content blocks end. Contains stop reason and final token usage. Properties type `string` Fixed value: `message_delta`. delta `object` Contains `stop_reason` and `stop_sequence`. For valid values, see the Non-streaming Response table above. usage `object` Complete token usage statistics, including `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, and `cache_read_input_tokens`.
message_stop Final event, marks message end. Properties type `string` Fixed value: `message_stop`. In addition, streaming responses periodically send ping events (`{"type":"ping"}`) to keep the connection alive. Clients can ignore them.

FAQ

After configuring Claude Desktop or Claude Code, the connection test fails with Model discovery — Gateway /v1/models returned HTTP 404, or the request URL contains /v1/v1/models. How do I fix it?

The model discovery feature of clients such as Claude Desktop and Claude Code automatically appends /v1/models to the configured base URL. Check the following two points:

Do not end the base URL with /v1/: it should end at /apps/anthropic (for example, for China (Beijing) use https://dashscope.aliyuncs.com/apps/anthropic; see the endpoint information above for other regions). If you mistakenly enter .../apps/anthropic/v1/, the client appends /v1/models and produces the duplicated path /v1/v1/models, which returns HTTP 404. Therefore, when you get a 404, first check whether the actual request URL contains a duplicated /v1/v1/; if so, remove the trailing /v1/ from the base URL.
Add models manually to skip discovery: the Model Studio Anthropic-compatible endpoint provides only the Messages API (/v1/messages) and does not provide a model list endpoint (/v1/models), so the model discovery request returns 404 as well. Manually add models (for example, qwen3.7-plus) under Models in the client to skip automatic discovery.

China (Beijing)

Singapore

US (Virginia)

Germany (Frankfurt)

Japan (Tokyo)

Request Body

Basic Call

Python

TypeScript

curl

Streaming

Python

TypeScript

curl

Extended Thinking

Python

TypeScript

curl

Image Understanding

Python

TypeScript

curl

Video Understanding

Python

TypeScript

curl

Function calling

Python

TypeScript

curl

Prompt Caching

Python

TypeScript

curl

Structured Outputs

Python

TypeScript

curl

Non-streaming Response

Streaming Response

FAQ