Integrate RAG with OpenAI-compatible interfaces-Platform For AI(PAI)-阿里云帮助中心

PAI-RAG is a modular, open-source framework that enhances Large Language Models (LLMs) like Qwen and DeepSeek with knowledge-grounded reasoning capabilities. It supports features such as knowledge base uploads, web search, and data analysis. PAI-RAG provides an OpenAI-compatible API, allowing you to configure and use the RAG service with a single click in web UIs and local LLM applications. This guide shows you how to integrate and use the RAG service in these environments.

Before you begin

Before you integrate the RAG service using its OpenAI-compatible API, complete the following prerequisites:

Deploy the RAG service and an LLM service, and configure the LLM service on the RAG service web UI. For more information, see Custom deployment of a RAG service.

Important
By default, Elastic Algorithm Service (EAS) services cannot access the internet. To enable the web search feature for the RAG service, you must grant your EAS service public or internal network access. For more information, see Access public or internal resources from an EAS service.
Obtain the endpoint and token for your RAG service. You will need these to connect to the RAG service from your web UI or local application.
1. On the Elastic Algorithm Service (EAS) page, click the name of your RAG service.
2. In the Basic Information section, click View Invocation Information. In the Invocation Information dialog box, you can find the public endpoint, VPC endpoint, and token on the Public Endpoint and VPC Direct Connect tabs.
Configure the following features on the RAG service web UI:
- To use the web search feature in your web UI or local application, on the Chat tab of the web UI, select the Default Web Search parameter and configure the search engine parameters. For more information, see RAG-based conversational system for LLMs (v0.3.x).
- To generate fact-based responses from your existing knowledge, upload files to your knowledge base on the Knowledge Base tab of the web UI.

Integrate the RAG service with Open-WebUI

Open-WebUI is an open-source project that lets you create custom user interfaces for various pre-trained language models. Follow these steps to connect to and use the RAG service for model inference.

Step 1: Deploy Open-WebUI and connect RAG

Deploy Open-WebUI using one of the following methods.

Deploy locally

In your local terminal, run the following commands to install and start Open-WebUI. Python 3.11 or later is recommended.

pip install open-webui
open-webui serve

Deploy with EAS

By deploying Open-WebUI with EAS, you can easily manage your deployment from code to cloud without worrying about the underlying infrastructure. Additionally, EAS lets you store backend data in Alibaba Cloud Object Storage Service (OSS) for data persistence and multi-user management.

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Custom Model Deployment section, click Deploy with JSON.

In the JSON editor, paste the following configuration. Modify the fields as needed based on the parameter descriptions below, and then click Deploy.

{
  "cloud": {
    "computing": {
      "instances": [
        {
          "type": "ecs.c6.large"
        }
      ]
    },
    "networking": {
      "security_group_id": "",
      "vpc_id": "",
      "vswitch_id": ""
    }
  },
  "containers": [
    {
      "env": [
        {
          "name": "ENABLE_LOGIN_FORM",
          "value": "True"
        },
        {
          "name": "ENABLE_SIGNUP",
          "value": "True"
        },
        {
          "name": "ENABLE_OPENAI_API",
          "value": "True"
        },
        {
          "name": "OPENAI_API_BASE_URL",
          "value": ""
        },
        {
          "name": "OPENAI_API_KEY",
          "value": ""
        },
        {
          "name": "WEBUI_AUTH",
          "value": "True"
        },
        {
          "name": "PORT",
          "value": "3000"
        },
        {
          "name": "RAG_EMBEDDING_MODEL",
          "value": "sentence-transformers/all-MiniLM-L6-v2"
        },
        {
          "name": "ENABLE_EVALUATION_ARENA_MODELS",
          "value": "False"
        },
        {
          "name": "WEBUI_URL",
          "value": "http://0.0.0.0:3000"
        },
        {
          "name": "ENABLE_TAGS_GENERATION",
          "value": "False"
        },
        {
          "name": "DEFAULT_USER_ROLE",
          "value": "admin"
        },
        {
          "name": "ENABLE_RAG_WEB_SEARCH",
          "value": "True"
        }
      ],
      "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/open-webui:main",
      "port": 3000,
      "script": "bash start.sh"
    }
  ],
  "metadata": {
    "cpu": 2,
    "enable_webservice": true,
    "instance": 1,
    "memory": 4000,
    "name": "dpsk_test_ui",
    "workspace_id": "484***"
  },
  "storage": [
    {
      "mount_path": "/app/backend/data",
      "oss": {
        "endpoint": "oss-cn-hangzhou-internal.aliyuncs.com",
        "path": "oss://examplebucket/test_web/",
        "readOnly": false
      },
      "properties": {
        "resource_type": "model"
      }
    }
  ]
}

The following table describes the key parameters.

Parameter		Description
networking	security_group_id	If you access the RAG service via a public IP address, you must select a Virtual Private Cloud (VPC), a vSwitch, and a security group that have public network access. For more information, see Access public or internal resources from EAS. If you access the RAG service using an internal endpoint, select the same Virtual Private Cloud (VPC) as the RAG service.
	vpc_id
	vswitch_id
metadata	name	A custom name for the EAS service.
metadata	workspace_id	The ID of your workspace. You can find this on the workspace details page.
containers	env	The following list describes the key environment variables. OPENAI_API_BASE_URL: Set this to the endpoint of the deployed RAG service with `/v1` appended to the end, for example, `http://test**0220.115770327099.cn-hangzhou.pai-eas.aliyuncs.com/v1`. Alternatively, you can configure this on the Open-WebUI page after startup. OPENAI_API_KEY: Set this to the token for your deployed RAG service. You can also configure this on the Open-WebUI page after deployment. WEBUI_AUTH**: Set to `True`: Enables user authentication. Users must log on to Open-WebUI, which facilitates user management. You must obtain a username and password from the Open-WebUI administrator in advance. Set to `False`: Disables user authentication. Users can access the Open-WebUI page without logging on.
containers	image	Replace the region ID in the image address with the corresponding region ID. For example, the image address for the China (Beijing) region is `eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/open-webui:main`. For more information about region IDs, see Regions and Endpoints.
storage	oss	Add an Object Storage Service (OSS) mount configuration. EAS stores backend data in OSS for data persistence and multi-user management. endpoint: The OSS endpoint. For a list of regions and their endpoints, see Regions and Endpoints. path: The OSS storage path. To learn how to create an OSS bucket and directory, see Quick Start.

After the service is deployed, click the service name, and then click View Web App in the upper-right corner.

Connect to the RAG service on the Open-WebUI page.

Navigate to the admin page. In the upper-right corner of the Open-WebUI page, click the user profile icon and select Admin Panel from the drop-down menu.

Connect to the PAI-RAG service. In the Admin Panel, click Settings > External Connections. In the OpenAI API section, turn on the connection switch, and then click + to add a connection. In the dialog box that appears, configure the connection parameters and click Save.

The following table describes the key parameters.

Parameter	Description
URL	The endpoint for the RAG service. Add the `/v1` suffix to the end. For example, `http://test**0220.115770327099.cn-hangzhou.pai-eas.aliyuncs.com/v1`. Note** A local deployment of Open-WebUI requires a public endpoint to access the RAG service. An EAS deployment of Open-WebUI can use a VPC endpoint. If public network access is enabled, you can also use a public endpoint.
Key	The token for the RAG service.
Model ID	A custom model ID, such as pai-rag-new. Use this ID to select the RAG service on the Open-WebUI page.

Step 2: Test results

In the upper-left corner of the Open-WebUI page, select the model ID that corresponds to the OpenAI API connection (for example, pai-rag-new) to invoke the service.

Enter a question in the chat window and send it to view the model's response. The conversation history is displayed on the left, and you can switch between different chat models in the upper-left corner.

Step 3: Use advanced PAI-RAG features

You can add filter functions in Open-WebUI to use advanced PAI-RAG features, such as data analysis. Follow these steps:

1. Configure RAG

To use the automated data analysis feature of PAI-RAG in the Open-WebUI chat interface, you must first configure it on the Data Analysis tab of the RAG service web UI. For more information, see Automated data analysis.

On the Data Analysis tab of the PAI-RAG console, configure the database connection information, including Database Type, Username, and Password. Set the LLM Data Analysis Guidance prompt, and then wait for the Data source loaded successfully message to appear.

2. Configure Open-WebUI

In the upper-right corner of the Open-WebUI page, click the user profile icon and select Admin Panel from the drop-down menu.
Click the function, and then click the plus sign on the right.
In the inlet method of the Filter class, add the required extension functionality (to use the data analysis feature, add body["chat_db"] = True), customize the function name and description in the upper-left corner, and then click the Save button in the lower-right corner.

Important
Currently, autonomous recognition of multiple functions is not supported. If you encounter interference, you can set other configurations to False. For example, body["chat_knowledgebase"] = False or body["search_web"] = False.

The system returns to the Functions page. You must enable the switch on the right to activate the new filter. A filter entry named chat_db appears in the function list.
On the left, click Workspace. In the model area on the right, click +. On the Model page, assign the new filter to a specific model. On the model configuration page, specify a custom model name, select chat_db in the Filters section, and then click Save and Update.
Return to the chat page and select the model to start a conversation using the advanced features of PAI-RAG. For example, if you select the chat_new_db model, you can ask database-related questions to get data analysis results.

Integrate RAG with local applications

This section shows how to connect to and use the RAG service, using three applications as examples.

Chatbox

Chatbox AI is an intelligent assistant and client application that supports various advanced AI models and APIs. It is available on Windows, macOS, Android, iOS, Linux, and the web.

Step 1: Connection method

This section shows how to connect Chatbox to the PAI-RAG service, using Windows as an example. The following steps are for reference only, and the actual process may vary.

Go to the Chatbox website, select the version for your operating system, and then download, install, and open Chatbox.
In the left-side navigation pane, click Settings, select Model Provider, and click Add in the drop-down list. In the Add Model Provider dialog box, configure the following parameters.
- Name: A custom name, such as PAI-RAG.
- API Mode: Select OpenAI API Compatible.

Select the newly configured model provider and configure the service request parameters.

Parameter	Description
API key	The token for the RAG service.
API domain	API Host: Set to the endpoint of the RAG service and append the `/v1` suffix. For example, `http://test**0220.115770327099.cn-hangzhou.pai-eas.aliyuncs.com/v1`. API Path**: Leave this blank.
Model	A custom model name, such as PAI-RAG.

Step 2: Test results

Once configured, you can invoke the service. Your results may vary. In the Chatbox chat window, enter a question and send it to view the model's response. The currently used model (such as PAI-RAG) is displayed in the lower-right corner.

Cherry Studio

Cherry Studio AI is a powerful multi-model AI assistant available for iOS, macOS, and Windows. It lets you quickly switch between various advanced LLMs to enhance work and learning productivity.

Step 1: Connection method

This section shows how to connect Cherry Studio to the PAI-RAG service. The following steps are for reference only, and the actual process may vary.

Go to the Cherry Studio website, download, install, and open Cherry Studio.

Connect to the PAI-RAG service. In the list of model providers on the left, select OpenAI. In the configuration panel on the right, configure the parameters as shown in the table below. In the Management section at the bottom, click Add to add a custom model name.

The following table describes the key parameters.

Parameter	Description
API key	The token for the RAG service.
API address	The endpoint of the RAG service.
Model	A custom model name, such as PAI-RAG.

Step 2: Test results

Once configured, you can invoke the service. Your results may vary.

On the Cherry Studio chat page, select the PAI-RAG model, enter a question, and send it to view the response.

AnythingLLM

AnythingLLM is an AI client application that supports various advanced AI models and APIs.

Step 1: Connection method

This section shows how to connect AnythingLLM to the PAI-RAG service, using Windows as an example. The following steps are for reference only, and the actual process may vary.

Go to the AnythingLLM website, select the version for your operating system, and then download, install, and open AnythingLLM.

First, create a new workspace and integrate the PAI-RAG service. Then, on the Chat settings page, click Update workspace at the bottom to update the workspace configuration. In the Chat settings tab, search for and select Generic OpenAI as the LLM provider, fill in the configuration parameters below, and click Save settings.

The following table describes the key parameters.

Parameter	Description
Base URL	Set this to the RAG service endpoint and append the `/v1` suffix. For example, `http://test**0220.115770327099**.cn-hangzhou.pai-eas.aliyuncs.com/v1`.
API key	The token for the RAG service.
Chat model name	A custom model name, such as PAI-RAG.

Step 2: Test results

Once configured, you can invoke the service. Your results may vary. In the workspace chat window, enter a question and send it to view the model's response.