PAI-RAG Web UI configuration

更新时间:
复制 MD 格式

Configure knowledge bases, models, search services, Code sandbox, MCP tools, file chunking, and FAQs in the PAI-RAG Web UI.

Configure models

Click Settings > Model in the lower-left corner. On the LLM tab, add a model.

Note

All-in-one deployments generate a model configuration record automatically. You can add models from other sources.

  • Model ID: Identifies a model configuration.

  • Endpoint URL: The service endpoint of the model.

    Note
    • Alibaba Cloud Model Studio models are billed separately. Billing of Alibaba Cloud Model Studio.

    • For EAS model services, go to the service details page, click Basic Information, and then click View Endpoint Information. Append /v1 to the endpoint URL.

  • API Key: For Alibaba Cloud Model Studio, obtain an API key. For EAS services, enter the token from the endpoint information.

  • Model Name: The model name. For EAS services using the vLLM inference engine, enter the exact model name from the /v1/models API. For other deployment modes, set this to default.

  • Multimodal Model: Select this option if the model supports multimodal input. Disabled by default.

  • Thinking Model: Enables or disables thinking mode for models that support it. Disabled by default.

image

After configuring the model, test it. Click New Chat in the left navigation pane, select the model at the top of the chat page, and start a conversation.

image

Configure MCP

Click Settings > MCP in the lower-left corner to add an MCP.

  • MCP Link: The MCP service endpoint URL.

  • MCP Type: SSE, STDIO, or Streamable HTTP.

  • Bearer Token: (Optional) Access token for Bearer token authentication.

image.png

Configure search

When the knowledge base does not cover a question or real-time information is needed, enable a search service (Tavily or Alibaba Cloud General Search) as a supplement.

Click Settings > Search in the lower-left corner.

Tavily search

Register at the official Tavily website and obtain an API key.

image.png

Alibaba Cloud General Search

General Search Endpoint:

AccessKey ID and AccessKey secret:

  • Create a RAM user and grant permissions. For access mode, select Use permanent AccessKey for access. After the user is created, copy the AccessKey ID and AccessKey secret.

  • Grant the AliyunIQSFullAccess permission to the RAM user. Without this permission, the search feature returns errors.

image

Configure the Code sandbox

The Code sandbox provides a secure Python execution environment. When enabled, the AI assistant automatically invokes the Code sandbox to run code.

Scenarios

  • Data analytics: Statistics, aggregation, and filtering. Example: "Analyze sales data and calculate the average sales per region."

  • Data visualization: Chart generation and trend plotting. Example: "Plot the sales trend for the past year."

  • Mathematical operations: Complex calculations and equation solving. Example: "Calculate the standard deviation of this sequence."

  • File processing: Parse CSV, Excel, and other files to extract and transform data.

  • Other tasks that require code execution

Prerequisites

Complete the following before configuring the Code sandbox:

  1. Activate Function Compute: Go to the Function Compute console and follow the prompts to activate it.

  2. Create an AgentRun interpreter: Go to the AgentRun console. Choose Sandbox in the left navigation pane. Create a sandbox template with type Code Interpreter.

    Note

    The default Network Type is Allow the default NIC to access the public network, which requires the RAG service to have public network access.

    Alternatively, select Allow access to VPC and configure the same VPC for the RAG service.

  3. Obtain the Alibaba Cloud account ID and Sandbox ID. If you configured access credentials, also obtain an API key.

Configuration

Click Settings > Code Sandbox in the lower-left corner and configure the following parameters:

  • Enable Sandbox: Turns the sandbox feature on or off.

  • Sandbox Type: Only Alibaba Cloud FC sandboxes are supported.

  • Alibaba Cloud ID: Your Alibaba Cloud account ID.

  • Interpreter ID: The sandbox ID.

  • Interpreter Name: The name of the code interpreter.

  • API Key: The AccessKey pair used for identity verification.

  • Default Timeout (s): Maximum code execution duration in seconds. Default: 50.

image

Configure the file chunking policy

Chunking settings configure how documents in a knowledge base are split into chunks for vectorization and retrieval. Proper chunking improves retrieval hit rate and answer quality.

Chunking can be configured at two levels:

  • Knowledge base level: Uploaded files use the knowledge base chunking settings by default.image

  • File level:

    • Specify chunking parameters when uploading a file.image

    • Specify chunking settings when re-parsing an existing file.image

Prerequisites

Before configuring chunking, ensure the following:

  • A knowledge base: An active knowledge base exists in the system.

  • An embedding configuration: At least one embedding model is configured.

  • (Optional) A multimodal model: Required for image understanding. Configure a vision model in the system.

Parameter description

Parameter

Description

Chunk Type

Select a chunk type based on document characteristics:

  • Structured: Splits by structure (titles, paragraphs). Suitable for most Markdown, PDF, and Word files.

  • By token: Splits by token count. Suitable for strict length requirements.

  • Table: Splits by table structure. Suitable for Excel and CSV files. Configure Max table header row index, Merge rows, Row delimiter, and Format as JSON.

  • Paragraph: Splits by custom separator, such as \n\n. You can also set chunk size and chunk overlap.

Chunk Size

Maximum length of each chunk in characters or tokens, depending on the chunk type. Recommended: 1000.

Chunk Overlap

Overlap length between adjacent chunks. Preserves context and prevents semantic truncation. Recommended: 50.

Important

Chunk size must exceed chunk overlap.

Image Understanding Model

  • Interprets image content in documents, such as embedded images in PDFs or Markdown files.

  • Select a configured vision model from the drop-down list, or choose not to use one.

  • When selected, images are interpreted during parsing and included in chunking and retrieval.

Vector Model

  • Converts chunked text and image understanding results into vectors for retrieval.

  • Select an embedding model configured in the system, such as BAAI/bge-m3.

Note

Tuning suggestion: Upload a few documents with default settings first. Analyze recall and accuracy in the evaluation module, then adjust parameters based on the results.

Usage suggestions

  • RAG for long documents: Set Chunk Type to Structured, Chunk Size to 1000, and Chunk Overlap to 50. This controls chunk length while preserving context.

  • Strict chunking by separator: Use Paragraph chunking with a custom separator.

  • Tabular data: Set Chunk Type to Table. Configure the table header row and row delimiter for Excel or CSV files. Without Merge rows, data is chunked by row. With Merge rows, rows are merged into blocks based on the chunk size limit.

  • Multimodal documents: Enable Image Understanding Model to include image content from PDFs and other documents in retrieval and answer generation.

Configure the application FAQ

The FAQ feature maintains a question and answer knowledge base per application. Use it for product manuals, customer service scripts, and frequently asked questions.

With FAQ enabled and entries configured, conversations follow this flow:

  1. The AI assistant retrieves the FAQ entry most similar to the user's question.

  2. If a match meets the similarity threshold, the system either returns the FAQ answer directly or generates an answer with the model based on the FAQ result, depending on the configuration.

  3. If no match is found or the direct return option is not enabled, the system falls back to other capabilities such as the knowledge base and search.

Configuration:

  1. Log on to the system and go to the configuration page of the target application.

  2. Enable FAQ: Turn on the Enable FAQ switch in the application configuration and save.image

  3. Open the FAQ management page to perform the following:

    • FAQ Reply Settings: Click Settings. Configure the similarity score threshold (0.8 to 1.0 recommended), the embedding model, whether to include questions or answers in retrieval and display, and whether to return tool results directly.

      image.png

    • Manage FAQs:

      • Add, edit, or delete a single FAQ entry.image

      • Batch delete entries.image

      • Batch import: Upload an Excel file, map the question and answer columns, and import entries.image.png

  4. After saving, conversations in this application automatically prioritize FAQ retrieval.