This document describes how to call the model inference service directly provided by Moonshot AI on the Alibaba Cloud Model Studio platform.
This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.
Activate the service
Go to the Model Studio console, search for Kimi, find the Kimi model card, and click Activate Now.
In the pop-up window, confirm the activation and authorization.
After you complete these steps, you can call the Moonshot AI Kimi model service.
Getting started
Prerequisites
You have obtained an API key and configured the API key as an environment variable.
If you call the service using a software development kit (SDK), you must install the SDK.
kimi/kimi-k2.7-code-highspeed, kimi/kimi-k2.7-code, kimi/kimi-k2.6, and kimi/kimi-k2.5 all support text, image, or video input. kimi/kimi-k2.7-code-highspeed and kimi/kimi-k2.7-code are thinking-only models (enable_thinking is always true and cannot be set to false). For kimi/kimi-k2.6 and kimi/kimi-k2.5, you can use the enable_thinking parameter to control the thinking mode, which is enabled by default:
Thinking mode (
enable_thinking: true): The model outputs a detailed inference process (reasoning_content).Non-thinking mode (
enable_thinking: falseor if the parameter is omitted): The model directly outputs the result without the inference process.
kimi/kimi-k2.7-code-highspeed, kimi/kimi-k2.7-code, and kimi/kimi-k2.6 support using the preserve_thinking parameter to pass the thinking process in multi-turn conversations. For more information, see Multi-turn conversation.
The following example shows how to call the kimi/kimi-k2.6 model in thinking mode to perform text generation.
OpenAI compatible
enable_thinking is not a standard OpenAI parameter. For the OpenAI Python SDK, you can pass it in the extra_body parameter. For the Node.js SDK, you can pass it as a top-level parameter.
Python
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi/kimi-k2.6",
messages=[{"role": "user", "content": "What is 1+1?"}],
# Use extra_body to set enable_thinking and enable thinking mode.
extra_body={"enable_thinking": True}
)
msg = completion.choices[0].message
if getattr(msg, "reasoning_content", None):
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
print(msg.reasoning_content or "")
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
print(msg.content)Response
====================Reasoning Process====================
The user asked a simple math question: "What is 1+1?"
This is a very basic arithmetic problem. The answer is 2.
I should answer this question directly and clearly. Although the user asked in Chinese, the answer is a universal mathematical fact.
Answer structure:
1. Directly provide the answer: 2
2. Briefly explain that this is a basic arithmetic operation.
No need to overcomplicate. Keep it simple and clear.
====================Full Response====================
1+1 equals **2**.
This is the most basic arithmetic operation, where adding two units results in two units.Node.js
import OpenAI from "openai";
import process from 'process';
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1",
});
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is 1+1?" },
];
const response = await client.chat.completions.create({
model: "kimi/kimi-k2.6",
messages,
extra_body: { enable_thinking: true },
});
const msg = response.choices[0].message;
if (msg.reasoning_content) {
console.log("\n" + "=".repeat(20) + "Reasoning Process" + "=".repeat(20) + "\n");
console.log(msg.reasoning_content || "");
}
console.log("\n" + "=".repeat(20) + "Full Response" + "=".repeat(20) + "\n");
console.log(msg.content);Response
====================Reasoning Process====================
The user asked a simple math question: "What is 1+1?"
This is a very basic arithmetic problem. The answer is 2.
I should answer this question directly and clearly. Although the user asked in Chinese, the answer is a universal mathematical fact.
Answer structure:
1. Directly provide the answer: 2
2. Briefly explain that this is a basic arithmetic operation.
No need to overcomplicate. Keep it simple and clear.
====================Full Response====================
1+1 equals **2**.
This is the most basic arithmetic operation, where adding two units results in two units.HTTP
curl
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "kimi/kimi-k2.6",
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is 1+1?"
}
],
"enable_thinking": true
}'Multimodal call examples
In addition to plain text conversations, kimi/kimi-k2.7-code-highspeed, kimi/kimi-k2.7-code, kimi/kimi-k2.6, and kimi/kimi-k2.5 have powerful multimodal understanding capabilities. This section describes how to use the model to understand image and video content.
Image and video files can be provided only using public network URLs. Base64 encoding is not supported.
Image understanding
The image understanding feature allows the Kimi model to recognize and analyze image content. You can provide one or more images. For more information about limitations on image files, see Image limitations.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
# Single-image input example (thinking mode enabled)
completion = client.chat.completions.create(
model="kimi/kimi-k2.6",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What scene is depicted in the image?"},
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
}
]
}
],
extra_body={"enable_thinking":True} # Enable thinking mode
)
# Output the reasoning process
if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
print(completion.choices[0].message.reasoning_content)
# Output the response content
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
print(completion.choices[0].message.content)
# Multi-image input example (thinking mode enabled, uncomment to use)
# completion = client.chat.completions.create(
# model="kimi/kimi-k2.6",
# messages=[
# {
# "role": "user",
# "content": [
# {"type": "text", "text": "What content do these images depict?"},
# {
# "type": "image_url",
# "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
# },
# {
# "type": "image_url",
# "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
# }
# ]
# }
# ],
# extra_body={"enable_thinking":True}
# )
#
# # Output the reasoning process and response
# if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
# print("\nReasoning Process:\n" + completion.choices[0].message.reasoning_content)
# print("\nFull Response:\n" + completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import process from 'process';
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
// Single-image input example (thinking mode enabled)
const completion = await openai.chat.completions.create({
model: 'kimi/kimi-k2.6',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What scene is depicted in the image?' },
{
type: 'image_url',
image_url: {
url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg'
}
}
]
}
],
enable_thinking: true // Enable thinking mode
});
// Output the reasoning process
if (completion.choices[0].message.reasoning_content) {
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.reasoning_content);
}
// Output the response content
console.log('\n' + '='.repeat(20) + 'Full Response' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.content);
// Multi-image input example (thinking mode enabled, uncomment to use)
// const multiCompletion = await openai.chat.completions.create({
// model: 'kimi/kimi-k2.6',
// messages: [
// {
// role: 'user',
// content: [
// { type: 'text', text: 'What content do these images depict?' },
// {
// type: 'image_url',
// image_url: { url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg' }
// },
// {
// type: 'image_url',
// image_url: { url: 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png' }
// }
// ]
// }
// ],
// enable_thinking: true
// });
//
// // Output the reasoning process and response
// if (multiCompletion.choices[0].message.reasoning_content) {
// console.log('\nReasoning Process:\n' + multiCompletion.choices[0].message.reasoning_content);
// }
// console.log('\nFull Response:\n' + multiCompletion.choices[0].message.content);HTTP
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi/kimi-k2.6",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What scene is depicted in the image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
}
]
}
],
"enable_thinking": true
}'
# Multi-image input example (uncomment to use)
# curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
# "model": "kimi/kimi-k2.6",
# "messages": [
# {
# "role": "user",
# "content": [
# {
# "type": "text",
# "text": "What content do these images depict?"
# },
# {
# "type": "image_url",
# "image_url": {
# "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
# }
# },
# {
# "type": "image_url",
# "image_url": {
# "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
# }
# }
# ]
# }
# ],
# "enable_thinking": true
# }'Video understanding
For more information about limitations on video files, see Video limitations.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi/kimi-k2.6",
messages=[
{
"role": "user",
"content": [
# When passing a video file directly, set the type to video_url.
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
}
},
{
"type": "text",
"text": "What is the content of this video?"
}
]
}
]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "kimi/kimi-k2.6",
messages: [
{
role: "user",
content: [
// When passing a video file directly, set the type to video_url.
{
type: "video_url",
video_url: {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
}
},
{
type: "text",
text: "What is the content of this video?"
}
]
}
]
});
console.log(response.choices[0].message.content);
}
main();Response
====================Reasoning Process====================
The user asked a simple math question: "What is 1+1?"
This is a very basic arithmetic problem. The answer is 2.
I should answer this question directly and clearly. Although the user asked in Chinese, the answer is a universal mathematical fact.
Answer structure:
1. Directly provide the answer: 2
2. Briefly explain that this is a basic arithmetic operation.
No need to overcomplicate. Keep it simple and clear.
====================Full Response====================
1+1 equals **2**.
This is the most basic arithmetic operation, where adding two units results in two units.HTTP
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi/kimi-k2.6",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps":2
},
{
"type": "text",
"text": "What is the content of this video?"
}
]
}
]
}'File limitations
Image files
Image resolution: The recommended image resolution is 4K (4096 × 2160) or lower.
Supported image formats: PNG, JPEG, WEBP, and GIF.
Image size and quantity: There is no limit on the size or quantity of images, but you must ensure that the total size of the text and images in the request does not exceed 100 MB.
Video files
Video size and duration: There is no limit on the size or duration of the video, but you must ensure that the total size of the text and video in the request does not exceed 100 MB.
Video formats: MP4, MPEG, MOV, AVI, X-FLV, MPG, WEBM, WMV, and 3GPP.
Video dimensions: There is no specific limit on video dimensions. A resolution of 2K or lower is recommended. Higher resolutions increase the processing time but do not improve the model's understanding.
Audio understanding: The model does not support understanding audio in video files.
Other features
Model | ||||||
kimi/kimi-k2.7-code-highspeed | ||||||
kimi/kimi-k2.7-code | ||||||
kimi/kimi-k2.6 | ||||||
kimi/kimi-k2.5 |
kimi/kimi-k2.7-code-highspeed, kimi/kimi-k2.7-code, kimi/kimi-k2.6, and kimi/kimi-k2.5 support context cache (implicit cache, enabled by default). For kimi/kimi-k2.7-code-highspeed and kimi/kimi-k2.7-code, input tokens that hit the cache are billed at 20.0% of the standard input price. For kimi/kimi-k2.6, input tokens that hit the cache are billed at 16.9% of the standard input price. For kimi/kimi-k2.5, input tokens that hit the cache are billed at 17.5% of the standard input price.
When using kimi/kimi-k2.7-code-highspeed, kimi/kimi-k2.7-code, kimi/kimi-k2.6, or kimi/kimi-k2.5 for tool calling in thinking mode: you must keep the
reasoning_contentfield in each assistant message, andtool_choiceonly supports"auto"(default) and"none". Otherwise, an error occurs.
Parameter default values
Model | stream_options | temperature | top_p | repetition_penalty | presence_penalty | tool_choice | top_k | preserve_thinking |
kimi/kimi-k2.7-code-highspeed | The only supported value is | 1.0 | 0.95 | 0.0 | 0.0 | auto | - | Enabled by default |
kimi/kimi-k2.7-code | The only supported value is | 1.0 | 0.95 | 0.0 | 0.0 | auto | - | Enabled by default |
kimi/kimi-k2.6 | The only supported value is | Thinking mode: 1.0 | Thinking mode/Non-thinking mode: 0.95 | Thinking mode/Non-thinking mode: 0.0 | Thinking mode/Non-thinking mode: 0.0 | Thinking mode/Non-thinking mode: auto | - | Disabled by default |
kimi/kimi-k2.5 | - |
The
stream_optionsparameter supports only the valuetrue. The values for thetemperature,top_p,repetition_penalty, andpresence_penaltyparameters cannot be modified.In thinking mode, you cannot force the model to call a specific tool. The
tool_choiceparameter supports only the valuesauto(default) andnone.A hyphen (-) indicates that the parameter has no default value and cannot be set.
Model list and billing
kimi/kimi-k2.7-code-highspeed and kimi/kimi-k2.7-code are thinking-only models (enable_thinking is always true and cannot be set to false). kimi/kimi-k2.6 and kimi/kimi-k2.5 are hybrid thinking models. You can use the enable_thinking parameter to enable or disable the thinking mode. Note: The thinking_budget parameter cannot be used to limit the length of the chain-of-thought for any of these models.
For more information about model context length and pricing, see the Alibaba Cloud Model Studio console.
Billing is based on the number of input and output tokens for the model.
In thinking mode, the chain-of-thought is billed as output tokens.
Error codes
If an execution error occurs, see Error messages to resolve the issue.