Assistant API (being unpublished)

更新时间:
复制 MD 格式

The Assistant API helps developers quickly build Large Language Model (LLM) applications called Assistants, such as personal assistants, intelligent shopping guides, and meeting assistants. Unlike the text generation API, the Assistant API includes built-in components for multi-turn conversations and tool calling. This reduces the cost of developing LLM applications.

Important

The Assistant API is being unpublished. Migrate to the Responses API. It offers multiple built-in tools and supports multi-turn context management, making it a suitable alternative.

Note

Agent applications and Assistants are both LLM applications. However, they have independent features and are used in different ways.

  • Agent applications: You can create, view, update, and delete them only in the console. You can call them using the application invocation API.

  • Assistants: You can create, view, update, delete, and call them only using the Assistant API.

Why choose the Assistant API

The Assistant API provides an efficient and flexible way to build LLM applications. It has the following core advantages:

Built-in official tools: Provides practical tools such as code execution, text-to-image, and online search. For example, an Assistant can run Python code to generate results, call the search function to get real-time information, or generate images for creative designs.2025-01-16_15-58-13 (1)

# Sample code for reference only
def submit_message(thread, assistant, message):
    Messages.create(
        thread_id=thread.id,
        role="user",
        content=message
    )

    run = Runs.create(
        thread_id=thread.id,
        assistant_id=assistant.id,
        stream=True
    )
    
    for event, data in run:
        if event == 'thread.message.delta':
            yield data.delta.content.text.value
        if event == 'thread.message.completed':
            yield '\n'
        if event == 'thread.run.step.delta':
            # When step.delta is detected for the first time, output the tool name
            if not hasattr(submit_message, 'tool_name_shown'):
                submit_message.tool_name_shown = True
                tool_name = data.delta.step_details.tool_calls[0]['type']
                yield f"\nUsing tool: {tool_name}\n\n"
            
            formatted_output = format_tool_output(data.delta.step_details.tool_calls[0])
            if formatted_output is not None:
                yield formatted_output

Built-in conversation management: Provides context management tools. You do not need to manually maintain the conversation history.2025-01-20_16-55-03 (1)

# Sample code for reference only
while True:
    for event, data in run:
        elif event == 'thread.run.requires_action':
            # Tool call => May need to submit tool output => Leads to a new run generator
            tool_calls = data.required_action.submit_tool_outputs.tool_calls
            if not tool_calls:
                continue  
         
            tool_outputs = []
            for tool_call in tool_calls:
                name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)
                output = tools_map[name](**arguments)
                # Normal tool
                tool_outputs.append({"output": output})

            # Submit the tool output to the run, which returns a new run generator
            run = Runs.submit_tool_outputs(
                thread_id=thread.id,  # Native conversation thread, no manual management needed
                run_id=data.id,
                tool_outputs=tool_outputs,
                stream=True
            )
            yield "Transferring...\n"
            break  # Break the current event loop and continue processing with the returned new_run

        elif event in ('thread.run.completed', 'thread.run.cancelled',
                    'thread.run.expired', 'thread.run.failed'):
            # The current run ends
            break
    else:
        # The for loop ends normally (without a break), indicating the run stream is exhausted
        break

Quickly build multi-agent systems: Provides simple templates for Assistants, context, message encapsulation, and flow control. This lets you flexibly and efficiently implement multi-agent systems, such as a multi-agent system with automatic planning capabilities.output

# Sample code for reference only
# Get the multi-agent response. The input and output must align with the parameters on the Gradio frontend interface.
def get_multi_agent_response(query,history):
    # Handle cases where the input is empty
    if len(query) == 0:
        return "",history+[("","")],"",""
    # Get the execution order of the agents
    assistant_order = get_agent_response(PlannerAssistant,query)
    try:
        order_stk = ast.literal_eval(assistant_order)
        cur_query = query
        # Run the agents in order
        for i in range(len(order_stk)):
            yield "----->".join(order_stk),history+[(query,"The multi-agent is working...")],f"{order_stk[i]} is processing the information...",""
            cur_assistant = assistant_mapper[order_stk[i]]
            response = get_agent_response(cur_assistant,cur_query)
            yield "----->".join(order_stk),history+[(query,"The multi-agent is working...")],response,""
            # If the current agent is the last one, use its output as the output of the multi-agent system
            if i == len(order_stk)-1:
                yield "----->".join(order_stk),history+[(query,response)],"The assistant has finished processing.",""
            # If the current agent is not the last one, add its output response to the next round's query as reference information
            else:
                # Add special identifiers before and after the reference information to prevent the LLM from confusing it with the query
                cur_query = f"You can refer to the known information:\n{response}\nAnswer the user's question completely. The question is: {query}."
    # Fallback policy: If the program fails, call ChatAssistant directly
    except Exception as e:
        yield "ChatAssistant",[(query,get_agent_response(ChatAssistant,query))],"",""

Quickly build an Assistant

You can have a multi-turn conversation with an Assistant and enable streaming output. To do this, you typically complete the following four steps:

  1. Create an Assistant: An Assistant is configured with an LLM, instructions, and a list of tools to perform specific tasks.

  2. Create a Thread: A thread records all messages between the user and the Assistant to enable multi-turn conversations.

  3. Create a Message: A message is a container for messages from the user and the Assistant.

  4. Create a Run: A run represents the series of steps an Assistant takes to respond in a multi-turn conversation, including model inference and tool calling. In this step, you can also enable streaming output for a natural interactive experience.

For more information, see Quick Start for Assistant API. This guide shows you how to use Assistants, including model inference, tool calling, multi-turn conversations, and streaming output.

Compatibility

Model support

The Assistant API supports several mainstream Qwen models. You can view and try these models in the Model Square.

Note

Snapshot versions of the Qwen-Turbo, Qwen-Plus, and Qwen-Max models, such as qwen-plus-1220, are compatible only with the function calling and retrieval-augmented generation (RAG) tools. Model compatibility depends on the runtime results.

Model series

Model identifier

Qwen-Turbo

qwen-turbo

Qwen-Plus

qwen-plus

Qwen-Max

qwen-max

Tool support

The Assistant API supports multiple official tools, along with custom function calls and plugins.

Note

Plugin compatibility depends on the actual execution results. For more information, see the Plugin List.

Tool (tools)

Unique identifier

Purpose

Code Interpreter

code_interpreter

Helps execute Python code. Suitable for scenarios such as programming problems, mathematical calculations, and data analytics.

Quark Search

quark_search

Retrieves web information in real-time to enhance knowledge acquisition.

Text-to-image

text_to_image

Converts text descriptions into images to enrich response formats.

Calculator

calculator

Has strong calculation capabilities. Use it for precise arithmetic tasks.

Generate QR code

generate_qrcode

Converts text to a QR code.

GitHub Search

github_search

Searches for real-time information on GitHub projects.

Function calling

function

Executes specific functions on an on-premises device without relying on external network services.

Retrieval-augmented generation (RAG)

rag

Retrieves external knowledge to improve the accuracy of LLM responses.

Custom plugin

${plugin_id}

Connects to custom business interfaces to extend AI business capabilities.