Create and use a DTS RAGFlow knowledge base

更新时间:
复制 MD 格式

This topic describes how to create and use a RAGFlow knowledge base with Data Transmission Service (DTS).

Prerequisites

  • You have created a vector database that meets the following requirements:

    Supported vector databases

    Requirements

    AnalyticDB for PostgreSQL instance

    • Engine version: 7.0 Standard.

    • Kernel version: Must be upgraded to 7.2.1.2 or later.

    • Vector engine optimization: Must be enabled.

    PolarSearch cluster

    A PolarDB for MySQL cluster with the PolarSearch feature enabled.

    Lindorm instance

    The engine type must include Search Engine and Vector Engine.

    PolarDB for PostgreSQL cluster

    The PGVector plugin must be installed.

  • An OSS bucket with a storage class of Standard has been created in the same region as the vector database. For the storage redundancy type, we recommend Zone-redundant Storage (Recommended).

  • Region: This feature is available only in the China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), Singapore, and Indonesia (Jakarta) regions.

Precautions

  • You cannot disable the public endpoint for a RAGFlow knowledge base after it is enabled.

  • A registered RAGFlow account is valid only for the corresponding RAGFlow knowledge base.

Billing

For more information, see Billing for Data Preparation.

Procedure

Create a RAGFlow knowledge base

  1. Go to the RAGFlow knowledge base list page for the destination region.

    1. Log on to the Data Transmission Service (DTS) console.

    2. In the left-side navigation pane, click Data Preparation.

    3. In the upper-left corner, select the region where the data preparation instance resides.

    4. Click the RAGFlow knowledge base tab.

  2. Click Create Knowledge Base to open the task configuration page.

  3. Configure the RAGFlow knowledge base.

    1. In the Deployment Scope section, enter an Instance Name for the RAGFlow knowledge base.

    2. In the Network and Zone section, select a VPC, a Primary Zone and vSwitch, and a Secondary Zone and vSwitch for the RAGFlow knowledge base.

    3. In the RAGFlow Knowledge Base Configuration section, enter the Number of Knowledge Base Services.

      Note

      In this example, the Configuration Plan is kept as Default.

    4. In the Vector Database Configuration section, configure the vector database.

      Note

      If you select Import from Existing Instance, enter the Database Name, Database Schema Name, and Database Account of the existing instance.

      ADB PostgreSQL

      Set Engine to AnalyticDB for PostgreSQL. In the Database field, select the destination AnalyticDB for PostgreSQL instance and enter the Database Name, Database Schema Name, Database Account, and Password for that instance.

      PolarSearch

      Set Engine to PolarSearch. In the Database field, select a PolarDB for MySQL cluster with PolarSearch enabled, and enter the Database Account and Password for that cluster.

      PolarDB PostgreSQL

      Set Engine to PolarDB PostgreSQL. In the Database field, select the destination PolarDB for PostgreSQL cluster and enter the Database Name, Database Schema Name, Database Account, and Password for that cluster.

      Lindorm

      Set Engine to Lindorm. In the Database field, select the destination Lindorm instance and enter the Database Account and Password for that instance.

    5. In the OSS Configuration section, select the destination bucket and enter the data storage path.

    Parameters

    Parameter

    Description

    Billing Method

    Only Pay-as-you-go is supported.

    Region

    The region where the RAGFlow knowledge base resides.

    Deployment Scope

    Deployment Scope

    The default value is RAGFlow knowledge base.

    Instance Name

    The name of the RAGFlow knowledge base. Choose an easily identifiable name that reflects its business purpose.

    Permission Check

    SLR Authorization

    Make sure that you have the AliyunServiceRoleForADBPG service-linked role for AnalyticDB for PostgreSQL.

    Network and Zone

    Network Type

    The default value is VPC.

    VPC

    The VPC in which the RAGFlow knowledge base resides.

    Primary Zone and vSwitch

    The primary availability zone and vSwitch for the RAGFlow knowledge base.

    Deployment Solution

    Only multi-zone deployment is supported.

    Secondary Zone and vSwitch

    The secondary availability zone and vSwitch for the RAGFlow knowledge base.

    Note

    The secondary availability zone cannot be the same as the primary availability zone.

    RAGFlow Knowledge Base Configuration

    Configuration Plan

    DTS supports the Default and Custom plans.

    Note

    If you select Default, you only need to configure Number of knowledge base services for the RAGFlow knowledge base configuration (the Knowledge base service specifications parameter currently supports only 4-core 16 GB).

    Basic Knowledge Base Service Specifications

    The specifications of the RAGFlow knowledge base basic services. Currently, only 4 vCPU, 16 GB Memory is supported.

    Number of Knowledge Base Basic Services

    The number of RAGFlow knowledge base basic services. The default value is 4.

    Note

    This number affects the fees for the RAGFlow knowledge base.

    Repository Data Preparation Service Specification

    The specifications of the RAGFlow knowledge base data preparation services. Currently, only 4 vCPU, 16 GB Memory is supported.

    Number of Knowledge Base Data Preparation Services

    The number of RAGFlow knowledge base data preparation services. The default value is 2.

    Note

    This number affects the fees for the RAGFlow knowledge base.

    Vector Database Configuration

    Vector database

    Currently, only Import from Existing Instance is supported.

    Engine

    The type of the destination vector database.

    Database

    The destination vector database instance.

    Database Name

    The name of the database in the instance to receive data.

    Note

    This parameter is available only when Engine is set to AnalyticDB for PostgreSQL or PolarDB PostgreSQL.

    Database Schema Name

    The name of the schema in the database to receive data.

    Note
    • This parameter is available only when Engine is set to AnalyticDB for PostgreSQL or PolarDB PostgreSQL.

    • The current value is fixed at public and cannot be changed.

    Database Account

    The username and password of the database account for the vector database instance.

    Password

    OSS Configuration

    OSS Bucket

    The destination bucket.

    Path

    The path in the bucket where data is stored.

  4. After you complete the configuration, click Buy Now on the right side of the page.

  5. Return to the RAGFlow knowledge base list page and wait for the instance to start. The Status changes to Running.

    Note

    You can click the refresh icon image in the upper-right corner to view the latest status of the RAGFlow knowledge base.

Configure an IP whitelist

  1. In the Actions column of the target RAGFlow knowledge base, click Set up a white list.

  2. In the Set up a white list panel, add IP addresses to the whitelist based on your access method.

    Access Method

    Example Scenario

    IP whitelist

    Description

    Internal network

    The client and the RAGFlow knowledge base are in the same VPC.

    The private IP address or CIDR block of the client.

    • Separate multiple IP addresses or CIDR blocks with commas (,).

    • To find the client's public IP address, run the curl ipinfo.io/ip (recommended) or curl ifconfig.me command.

    Internet

    The client is on your on-premises server.

    The public IP address or CIDR block of the client.

  3. Click Yes.

Log on to RAGFlow

  1. In the Actions column of the target RAGFlow knowledge base, click Manage.

    Note

    You can also click Actions in the Login to Knowledge Base column and choose to log on over the internal network or the internet.

  2. In the Endpoint section, click Login external network address or Login Intranet Address.

    Note

    To access the RAGFlow knowledge base over the internet, you must enable the public endpoint for the instance.

  3. On the logon page, enter the email address and password for your account, and then click Login.

  4. On the RAGFlow page, manage knowledge bases and perform other operations.

    Note

    For more information, see the official RAGFlow documentation.

(Optional) Network configuration

By default, RAGFlow cannot access external networks. To add model providers in RAGFlow, you must configure a NAT gateway for the VPC that hosts the vector database used by RAGFlow. This allows the RAGFlow knowledge base to access external models.

  • Connect over a private network (Alibaba Cloud Model Studio)

    Accessing Alibaba Cloud Model Studio over a private network improves data transfer security and efficiency. You can use PrivateLink to establish a network connection between your VPC and Alibaba Cloud Model Studio. For detailed instructions, see Access Model Studio models and application APIs over a private network.

  • Connect over the internet

    Configure a NAT gateway for the VPC that hosts the vector database used by RAGFlow to allow access to external models. For more information about NAT gateways, see Public NAT gateway.

Appendix

Enable the public endpoint

  1. In the Actions column of the target RAGFlow knowledge base, click Manage.

  2. In the Endpoint section, click Open external network address.

  3. In the Open external network address dialog box that appears, click OK.

  4. Wait for the public endpoint to be enabled. The Status in the Basic Information section changes to Running.

Register a RAGFlow account

  1. Go to the RAGFlow logon page for the target RAGFlow knowledge base.

  2. On the RAGFlow logon page, click Register.

  3. Enter the email address, name, and password for the account.

  4. Click Continue.

    A message image appears at the top of the page, indicating that the account has been registered successfully.