Hands-on Guide: Build GraphRAG for AnalyticDB for PostgreSQL Using DTS RAGFlow-Data Transmission Service(DTS)-阿里云帮助中心

This topic describes how to use DTS RAGFlow to build a GraphRAG with AnalyticDB for PostgreSQL. With this solution, you can build a knowledge graph for a knowledge base to achieve deep understanding and precise retrieval for complex relational questions, surpassing the limitations of traditional vector search.

Note

This feature is currently in private preview. To use this feature, submit a ticket to enable it for your account.

Use cases

Traditional Retrieval-Augmented Generation (RAG) solutions often fail to answer complex queries when a knowledge base contains deep logical connections, such as in financial reports or technical manuals. These queries require multi-step reasoning or relationship mining that goes beyond simple vector similarity.

GraphRAG enhances traditional vector search by structuring the implicit entities and relationships within unstructured text as an explicit knowledge graph. This enables the system to use graph search to locate relevant, structured subgraphs. By providing the Large Language Model (LLM) with this richer, more contextual information, GraphRAG significantly improves the quality of answers to complex, multi-hop queries.

Solution architecture

This solution integrates the data processing capabilities of DTS RAGFlow with the graph analytics engine of AnalyticDB for PostgreSQL to create an end-to-end pipeline from document ingestion to knowledge graph construction and intelligent Q&A.

The workflow is as follows:

Data ingestion and processing: Upload unstructured documents to a DTS RAGFlow knowledge base.
Knowledge extraction and storage:
- RAGFlow automatically parses documents, performs chunking, and generates embeddings.
- When GraphRAG is enabled, RAGFlow invokes a knowledge extraction operator to extract Subject-Predicate-Object (S-P-O) triples from the text.
- The vectorized text chunks and the extracted knowledge graph data (entities and edges) are written to the specified AnalyticDB for PostgreSQL instance for unified storage.
Hybrid search:
- The system performs a hybrid search when you run a retrieval test or submit a query through an API.
- First, the system performs a vector search to find relevant text chunks.
- At the same time, the system performs a graph search in the graph analytics engine of AnalyticDB for PostgreSQL to find relevant entities and associated subgraphs.
Context augmentation and generation: The system submits the text chunks from the vector search and the associated subgraphs from the graph search to the LLM as context. The LLM then generates the final answer based on this enriched information.

Procedure

Step 1: Prepare the environment

Create an AnalyticDB for PostgreSQL instance.
1. The kernel version must be 7.3 or later.
2. Enable the GraphRAG service by installing the corresponding plugin.
In Data Transmission Service (DTS), create a RAGFlow knowledge base and configure an IP allowlist.

Step 2: Configure knowledge base and enable GraphRAG

In this step, you associate the DTS RAGFlow knowledge base with your AnalyticDB for PostgreSQL instance.

Go to the RAGFlow knowledge base list page for the destination region.
1. Log on to the Data Transmission Service (DTS) console.
2. In the left-side navigation pane, click Data Preparation.
3. In the upper-left corner, select the region where the data preparation instance resides.
4. Click the RAGFlow knowledge base tab.
Log on to RAGFlow.
1. In the Actions column of the target RAGFlow knowledge base, click Manage.
  
  Note
  You can also click Actions in the Login to Knowledge Base column and choose to log on over the internal network or the internet.
2. In the Endpoint section, click Login external network address or Login Intranet Address.
  
  Note
  To access the RAGFlow knowledge base over the internet, you must enable the public endpoint for the instance.
3. On the logon page, enter the email address and password for your account, and then click Login.
4. On the RAGFlow page, manage knowledge bases and perform other operations.
  
  Note
  For more information, see the official RAGFlow documentation.
Click Create Knowledge Base. On the page that appears, enter a name for the knowledge base, and then toggle the Enable GraphRAG switch.
Note
- When you create a knowledge base, you must first add your models in Model Factory and then select them for use under System Settings > Model Factory.
- If you need to use external models and LLMs in your DTS RAGFlow knowledge base, you must configure a NAT Gateway for the knowledge base's Virtual Private Cloud (VPC) to allow access to external networks.
  - Create an Internet NAT gateway: Go to the NAT Gateway purchase page to create a gateway. During creation, make sure to select the same VPC and VSwitch as your DTS RAGFlow knowledge base.
  - Configure an SNAT entry: Go to the Internet NAT Gateway page. In the Actions column of the target gateway, click Configure SNAT, and then click Create SNAT Entry. Configure the parameters as follows:
    
    SNAT entry granularity: VPC granularity.
    
    Select Elastic IP Address: Select an Elastic IP Address (EIP) from the drop-down list.

Step 3: Upload documents and build knowledge graph

In the newly created knowledge base, go to the Dataset tab and click Add File to upload a local file.
After the file is uploaded, RAGFlow automatically parses, chunks, and embeds the file. Because GraphRAG is enabled, the system also extracts knowledge and writes the generated entity and edge data to the configured AnalyticDB for PostgreSQL instance, building the knowledge graph. After the upload is complete, the parsing status of the file in the file list changes to Succeeded. You can click the record to view the parsing logs and confirm that both document parsing and knowledge graph upload to AnalyticDB for PostgreSQL are complete.
After processing is complete, a Knowledge Graph component appears in the left-side navigation pane. Use this component to visually browse and verify the extracted entities and relationships. After the knowledge graph is built, click Knowledge Graph in the left-side navigation pane and then click GraphRAG Graph to see a visualization of the extracted entity nodes and their relationship network.

Step 4: Test retrieval

Perform comparative tests to verify how effectively GraphRAG handles different types of queries.

Go to the Retrieval Test page of the knowledge base.
Scenario 1: Standard query

For simple, fact-based queries (for example, "What is RAG?"), the system primarily relies on vector search to return the most relevant text chunks, regardless of whether you use GraphRAG.
Scenario 2: Complex query
- For complex queries that require understanding relationships between entities (for example, "Compare the pros and cons of RAG and GraphRAG"), select the Use GraphRAG checkbox.
  
  The search results will then include not only relevant text chunks but also a structured, associated subgraph. This subgraph is presented in a Markdown table, clearly listing the core entities and their relationships involved in the query. This provides structured context for the LLM, enabling it to generate a more logical and in-depth answer.
  
  The returned subgraph is formatted as follows:
```
---- Entities ---- ,Entity,Description 
0,entity1,description1 
1,entity2,description2
---- Relations ---- ,From Entity,To Entity,Description 
0,source_entity1,target_entity1,description1 
1,source_entity2,target_entity2,description2
```
- If you do not select the Use GraphRAG checkbox, the search results will only contain text chunks, and the LLM may fail to provide an accurate answer due to the lack of structured relational information.