The Assistant API helps developers quickly build Large Language Model (LLM) applications called Assistants, such as personal assistants, intelligent shopping guides, and meeting assistants. Unlike the text generation API, the Assistant API includes built-in components for multi-turn conversations and tool calling. This reduces the cost of developing LLM applications.
The Assistant API is being unpublished. Migrate to the Responses API. It offers multiple built-in tools and supports multi-turn context management, making it a suitable alternative.
Agent applications and Assistants are both LLM applications. However, they have independent features and are used in different ways.
Agent applications: You can create, view, update, and delete them only in the console. You can call them using the application invocation API.
Assistants: You can create, view, update, delete, and call them only using the Assistant API.
Why choose the Assistant API
The Assistant API provides an efficient and flexible way to build LLM applications. It has the following core advantages:
Built-in official tools: Provides practical tools such as code execution, text-to-image, and online search. For example, an Assistant can run Python code to generate results, call the search function to get real-time information, or generate images for creative designs. | |
Built-in conversation management: Provides context management tools. You do not need to manually maintain the conversation history. | |
Quickly build multi-agent systems: Provides simple templates for Assistants, context, message encapsulation, and flow control. This lets you flexibly and efficiently implement multi-agent systems, such as a multi-agent system with automatic planning capabilities. | |
Quickly build an Assistant
You can have a multi-turn conversation with an Assistant and enable streaming output. To do this, you typically complete the following four steps:
Create an Assistant: An Assistant is configured with an LLM, instructions, and a list of tools to perform specific tasks.
Create a Thread: A thread records all messages between the user and the Assistant to enable multi-turn conversations.
Create a Message: A message is a container for messages from the user and the Assistant.
Create a Run: A run represents the series of steps an Assistant takes to respond in a multi-turn conversation, including model inference and tool calling. In this step, you can also enable streaming output for a natural interactive experience.
For more information, see Quick Start for Assistant API. This guide shows you how to use Assistants, including model inference, tool calling, multi-turn conversations, and streaming output.
Compatibility
Model support
The Assistant API supports several mainstream Qwen models. You can view and try these models in the Model Square.
Snapshot versions of the Qwen-Turbo, Qwen-Plus, and Qwen-Max models, such as qwen-plus-1220, are compatible only with the function calling and retrieval-augmented generation (RAG) tools. Model compatibility depends on the runtime results.
Model series | Model identifier |
Qwen-Turbo | qwen-turbo |
Qwen-Plus | qwen-plus |
Qwen-Max | qwen-max |
Tool support
The Assistant API supports multiple official tools, along with custom function calls and plugins.
Plugin compatibility depends on the actual execution results. For more information, see the Plugin List.
Tool (tools) | Unique identifier | Purpose |
Code Interpreter | code_interpreter | Helps execute Python code. Suitable for scenarios such as programming problems, mathematical calculations, and data analytics. |
Quark Search | quark_search | Retrieves web information in real-time to enhance knowledge acquisition. |
Text-to-image | text_to_image | Converts text descriptions into images to enrich response formats. |
Calculator | calculator | Has strong calculation capabilities. Use it for precise arithmetic tasks. |
Generate QR code | generate_qrcode | Converts text to a QR code. |
GitHub Search | github_search | Searches for real-time information on GitHub projects. |
Function calling | function | Executes specific functions on an on-premises device without relying on external network services. |
Retrieval-augmented generation (RAG) | rag | Retrieves external knowledge to improve the accuracy of LLM responses. |
Custom plugin | ${plugin_id} | Connects to custom business interfaces to extend AI business capabilities. |


