File Q&A

更新时间:
复制 MD 格式

After uploading files in an Agent Application, you can ask intelligent questions about their content. This feature supports content understanding, information extraction, and intelligent Q&A for documents, images, audio, and video files, offering three processing modes. Choose the mode that best fits your needs:

  • Full-text citation: An internal parser extracts the file content. The full content—within the model’s context length limit—is passed to the model as a single input.

    • Best for: Tasks requiring global understanding, such as document summarization, full-text translation, or style refinement.

    • Key traits: Simple and direct, but limited by the model’s context length.

  • Chunked retrieval (RAG): An internal parser extracts the file content and splits it into smaller chunks. When you ask a question, the system retrieves the most relevant chunks and passes them—and your question—to the model to generate an answer.

    • Best for: Long-document Q&A, knowledge base search, or scenarios where precise source attribution is required.

    • Key traits: Handles very long files. Answer quality depends on chunking and retrieval strategy.

  • Custom processing: You provide the file (as a URL or raw content) to the model. The model then decides whether to call external tools—such as plug-ins or MCP Server—to process the file based on the task requirements.

    • Best for: Tasks requiring additional file operations, such as image style transfer or generating reports after video analysis.

    • Characteristics: Powerful and flexible, and depends on the configured tools (such as plugins and MCP).

Scope

Supported regions

This documentation applies only to the China (Beijing) region.

Supported models

Note

Data may be delayed. For the latest supported models, check your agent application interface.

Text generation models

Visual understanding models

  • Qwen-VL-Max, Qwen-VL-Plus, Qwen-VL-OCR

Supported file formats

You can upload up to 10 files per session. Each file must be no larger than 10 MB.

Important

Uploaded files are valid only in the current session. Refreshing or closing the page will delete them. Complete your tasks promptly.

You can upload local documents, images, videos, or audio files in the following formats:

  • Documents: .doc, .docx, .wps, .ppt, .pptx, .xls, .xlsx, .md, .txt, .pdf.

  • Images: .png, .jpg, .jpeg, .bmp, .gif.

  • Videos: .mp4, .mkv, .avi, .mov, .wmv, .webm, .flv.

  • Audio: .aac, .amr, .flac, .m4a, .mp3, .mpeg, .ogg, .opus, .wav, .wma.

To process files larger than 10 MB, use the file upload API. See the API reference section.

How to use

Full-text citation

Steps

  1. Select a model in the agent application.

  2. In the Planning > File Processing module, select Full-text Reference.

  3. In the debugging window on the right, click the image icon to upload a local file. Then ask questions about its content.

Parameter settings

Click the image icon to open the configuration page:

  • Maximum Parse Length per File (tokens): Limits how many tokens the system extracts from a single file. Content beyond this limit is truncated from the end of the file.

  • Maximum Assembly Length: Limits the total number of tokens across all files. Content beyond this limit is truncated from the end of the last assembled file.

image

Note

To avoid losing information, set Maximum Parse Length per File (tokens) carefully—or use Chunk Retrieval for long files.

Example

Segment Retrieval

Parameter settings

Click the image icon to open the configuration page:

  • Number of Recalled Chunks: Maximum number of text fragments the model references when answering.

  • Maximum Assembly Length: Limits the total token count of all recalled fragments. If exceeded, the system discards the lowest-scoring fragments until the limit is met.

image

Search only the uploaded file

Steps
  1. Select a model in the agent application.

  2. In the Planning > File Processing module, select Chunk Retrieval.

  3. In the debugging window on the right, click the image icon to upload a local file. Then ask questions about its content.

Example

Upload Alibaba Cloud Model Studio Mobile Product Introduction.docx. Ask: “Recommend a smartphone around CNY 5,000.image

Hybrid retrieval: file + knowledge base

Steps
  1. Select a model in the agent application.

  2. In the Planning > File Processing module, select Chunk Retrieval.

  3. In the Planning > Knowledge > Document module, click the + button to select and add an existing knowledge base. If none exists, create one first on the knowledge base page.

  4. In the debugging window on the right, click the image icon to upload a local file. You can now ask questions using both the uploaded file and the linked knowledge base.

Example

Add a video file to your knowledge base first. Then click the image icon in the input box to upload test.mp4. Ask: “Does any person in this video appear in the knowledge base?image

Custom processing

Steps

  1. You can select a model in the agent application.

  2. In the Planning > File Processing module, select Custom Processing..

  3. Under Skills, add the required tools—such as the MCP Server or plug-ins.

  4. In the debugging window on the right, click the image icon to upload a file. Then provide instructions in chat to let the model call the configured tools to process it.

Image processing settings for specific models

After you select a Qwen-VL series model and upload a file, click the image icon to configure processing options.

In custom processing mode, file handling logic varies by type:

  • Image files: Choose one of the following options:

    • Model-only processing: The model uses its built-in visual capabilities to analyze the image and answer directly—without calling external tools. This option is best for visual Q&A.

    • Model processing + planning: After analyzing the image, the model decides whether to call your configured external tools—such as plug-ins—for more complex tasks. This option is best for editing, transforming, or analyzing images with tools.

    image

  • Other files (documents, audio, video): The model autonomously decides whether to call tools.

Example

Agent configuration:

  1. Select a Qwen-VL series model. Click the configuration icon next to Custom processing and set image processing to Model processing + planning.

  2. Under MCPS, add the Character style re-rendering tool.

Usage: Upload girl.png. Ask: “Transform this image into a vibrant cartoon style.

image

API reference

Prerequisites

  1. Application publishing: You must publish your application in the console before you can call the API.

  2. Processing mode: Files are processed using the mode that is saved in your agent application—such as full-text citation or chunked retrieval. You cannot change the mode dynamically during API calls.

Call limits

File Q&A API calls follow the unified rate-limiting policy of the associated agent application.

  • Default limit: Up to 100 calls per minute per agent application.

  • Shared quota: This limit applies to all API requests to the application—including File Q&A calls and other API requests. For example, if you make 50 File Q&A calls in one minute, only 50 calls remain for other API requests.

File delivery methods and parameters

File delivery method

API parameter

Main use case / key traits

Pass image URLs via the image_list parameter

image_list

For image retrieval and visual understanding. Max file size: 10 MB.

Pass generic file URLs via the file_list parameter

file_list

Passes generic file URLs. In full-text citation or chunked retrieval mode, the system extracts and uses the file’s text. In custom processing mode, the model receives the raw file URL to call tools. Max file size: 10 MB.

Use the file upload API

session_file_id

Recommended for production environments.

Workflow:
1. Call the file upload API to upload the file and get a session_file_id.
2. Pass this ID in your chat API request.
Benefits: Supports larger files and offers more stable transfers.













































































For more information, see File Q&A (documents, images, audio, video).

Billing

  • File upload: Uploading files is free of charge.

  • Model calls: Q&A based on file content consumes input and output tokens. You are charged at the standard rate for your selected model. For more information, see the Model List. Token usage varies by mode:

    • Full-text citation: The entire file—or the truncated version—is passed as input. Input token usage is high.

    • Chunked retrieval: Only your question and the most relevant retrieved fragments are passed as input. Input token usage is typically much lower—ideal for long documents.

    • Custom processing: Token usage depends on interaction complexity—such as interpreting your instruction, calling tools, and summarizing tool results.

  • Tool calls: Some tools incur fees. Pricing is available on the tool's details page.

FAQ

  1. How do I provide a publicly accessible file URL for the API?

    We recommend Object Storage Service (OSS). It provides highly available, reliable storage and enables you to easily generate public URLs.

    Ensure your URL is accessible to Alibaba Cloud Model Studio services. Test it in a browser or with curl to confirm the file downloads successfully.

  2. How long is a file valid?

    • Uploaded via chat window: Valid only in the current session. Files expire if you close or refresh the page—or if the session times out.

    • Uploaded via file upload API (session_file_id): Typically valid for 24 hours.

    • Provided via URL (image_list, file_list): Validity depends entirely on your URL.

  3. Why did my file upload fail?

    Upload failures may result from several causes:

    • File issues: Check file size and format against the supported file formats.

    • Network issues: Check your network connection and connectivity to the service endpoint.

    • URL issues: Confirm the URL is publicly accessible and not a temporary signed URL.

    • API issues: Verify your authentication credentials and parameter formatting.

    • Error messages: Read the API response body. It usually identifies the exact issue.

  4. Why is the model’s answer incomplete or inaccurate?

    Possible reasons:

    • Content truncation: File content was cut off due to token limits. In full-text citation mode, increase “Max tokens per file” and “Max assembled tokens”. For very long files, switch to chunked retrieval.

    • Vague questions: Be more specific. Clearly state what information you need so that the model better understands your intent.

    • Poor retrieval (chunked retrieval mode): Inaccurate answers may stem from low-quality retrieved fragments or suboptimal chunking. Refine your question for better targeting. Adjust the chunk size in your app settings to improve retrieval coverage.

    • Poor file quality: Parsers struggle with low-definition scans, documents with complex tables or formulas, or poorly encoded text files. Provide clear, simple source files whenever possible.