GraphRAG-PolarDB(PolarDB)-阿里云帮助中心

This document introduces GraphRAG, a feature of the knowledge platform, and covers knowledge base management, document processing, knowledge graph visualization, knowledge Q&A, and model configuration.

Overview

GraphRAG is a knowledge-enhanced retrieval system built on a graph structure. Its core principle is to parse document content into an entity-relation graph. This graph serves as the foundation for retrieval to answer complex semantic questions that span multiple documents and sections.

GraphRAG includes the following core concepts:

Concept	Description
Workspace	A workspace is the top-level organizational unit in GraphRAG. Data is isolated between workspaces.
Entity	A typed knowledge node, such as a person, organization, location, or event, extracted from a document.
Relation	A semantic edge that connects two entities and describes their association.
Chunk	A text segment created by chunking a document. It is the fundamental data unit for knowledge graph construction and retrieval.

The GraphRAG processing workflow is as follows:

Document parsing and chunking: After you upload a document, the system uses a parsing engine to extract text, tables, and image content. The system then splits the text into chunks based on the configured parameters and generates an embedding for each chunk.
Knowledge graph construction: The system calls a large language model (LLM) to perform entity extraction and relation extraction on the chunks. The system then writes the extracted entities and relations to a knowledge graph, which is stored using the polar_age graph extension for PolarDB.
Multi-mode retrieval and Q&A: When you ask a question, the system retrieves relevant context from the knowledge graph and vector database based on the selected retrieval mode (such as local mode, global mode, hybrid mode, or mixed retrieval). The LLM then generates an answer.

Knowledge base

Use the Knowledge base page to manage workspaces. A workspace corresponds to an independent collection of documents and a knowledge graph. You can create multiple workspaces to manage documents for different business scenarios.

Create a workspace

In the left-side navigation pane of the GraphRAG page, click Knowledge Base.
Click New Knowledge Base.

In the creation dialog box, configure the following parameters:

Parameter	Description
Name	A unique name for the workspace to identify the document collection. We recommend using a name that reflects your business.
Parsing parameters	Configure document parsing parameters, including the parser type, parsing method, chunk size, and chunk overlap. For details, see Parameter settings.

Click Create to create the workspace.

Switch workspaces

In the workspace list, find the target workspace and click Switch in the Actions column to set it as the active workspace. After you switch, features such as document processing, knowledge graph, and knowledge Q&A will run on the data in that workspace.

Delete a workspace

In the workspace list, find the target workspace and click Delete.

Note

Deleting a workspace also deletes all documents, the knowledge graph, and related data within it. This action cannot be undone. Proceed with caution.

Default workspace

When the system starts for the first time, a default workspace named default is automatically created. If you delete all workspaces, the system prompts you to create a new one the next time you access it.

Document processing

Document processing is the data ingestion entry point for GraphRAG. The system processes uploaded documents through a workflow of parsing, chunking, graph construction, and status updates. After processing, the system transforms the document content into an entity-relation graph for use in knowledge Q&A.

Supported document formats

GraphRAG supports the following document formats:

PDF
Word (.doc, .docx)
PPT (.ppt, .pptx)
Excel (.xls, .xlsx)
TXT
Markdown (.md)
Image (JPG, PNG, etc.)

Upload documents

In the left-side navigation pane of the GraphRAG page, click Document Processing.
Confirm that the current workspace is the one you want to use.
Click Upload Documents and select the files you want to process.
After the upload is complete, the system automatically starts the document processing workflow. You can view the processing status in the document list.

Document processing status

During processing, a document passes through the following statuses:

Status	Description
`pending`	The document has been uploaded and is waiting to be processed.
`processing`	The document is being processed, including parsing and chunking operations.
`preprocessed`	Document preprocessing is complete, and graph construction is in progress.
`processed`	Document processing is complete. Entities and relations have been extracted into the knowledge graph and are ready for knowledge Q&A.
`failed`	Document processing failed. You can view the logs to understand the cause of failure and reprocess the document.

Document management operations

You can perform the following operations on documents in the document list:

View execution log: The Execution Log panel at the bottom of the page displays detailed logs for document processing, including records for parsing, chunking, and graph construction. You can expand or collapse the log panel.
View chunk details: Click a document's file name to view a list of its chunks and their content.
Reprocess failed documents: For documents that failed to process, click the Reprocess Failed Documents button in the toolbar at the top of the page. The system will restart the processing workflow for all failed documents.
Delete document: Click the Delete icon in the Actions column for the target document to remove it and its associated graph data.

Parameter settings

When you create or edit a workspace, you can configure the following document processing parameters:

Parameter	Values	Description
Parser	`mineru`, `docling`, `paddleocr`	Select the document parsing engine. Different parsers are suitable for different document scenarios. See the parser selection recommendations below.
Parsing method	`auto`, `txt`, `ocr`	Set the document parsing method. `auto` enables automatic detection, `txt` performs plain text extraction, and `ocr` uses Optical Character Recognition (OCR).
Chunk size	Default: 1200	Set the maximum number of characters for each text chunk. A larger chunk size preserves more context, while a smaller size can improve retrieval precision.
Chunk overlap	Default: 100	Set the number of overlapping characters between adjacent text chunks. A moderate overlap helps prevent splitting key information across chunk boundaries.

Parser selection recommendations

We recommend choosing a parser based on the type and layout of your documents:

Parser	Use case	Description
MinerU	Complex layouts	Ideal for documents with complex layouts, such as multi-column text, nested tables, and mixed text and images. It excels at layout analysis.
Docling	Structured formats	Best for documents with well-defined structures, such as Word, PPT, and Excel files. It effectively preserves document hierarchy and formatting information.
PaddleOCR	Image-only documents	Designed for image-only documents like scans and screenshots. It uses OCR technology to recognize and extract text from images.

Knowledge graph

The Knowledge graph module visualizes the network of entities and relations extracted from your documents. This graph view helps you intuitively understand the semantic connections within your content.

Default entity types

By default, the system extracts the following types of entities:

Entity type	Description
`organization`	An organization, such as a company, department, or team.
`person`	A person, such as an author, manager, or participant.
`geo`	A geographical location, such as a country, city, or region.
`event`	An event, such as a project launch, version release, or meeting.
`category`	A category, such as a technical field or product classification.

Browse the graph

On the knowledge graph visualization page, you can perform the following actions:

Zoom: Use your mouse wheel or trackpad gestures to adjust the graph's display scale, allowing you to see the overall structure or focus on local details.
Pan: Click and drag the canvas to move the visible area of the graph.
Click to highlight: Click an entity node to highlight it and its directly connected entities and relations. This makes it easier to trace the network of a specific entity.

Label filtering

In the label filtering panel on the graph page, you can filter the displayed entities by type, such as organization or person, to focus on specific types of knowledge networks.

Adjust the display scale

You can adjust the scale of the displayed graph using the following parameters:

Depth: Set the number of relation layers to expand in the graph. The value can range from 1 to 5. A higher value displays more connection layers.
Number of nodes: Control the maximum number of nodes to display in the graph. We recommend setting this value to 300 or less to ensure a smooth experience.

Note

When the number of nodes is too large, the graph may become slow to render. We recommend using label filtering and adjusting the depth to control the display scale.

Search entities

Enter an entity name or keyword in the search box on the graph page. The system will locate and highlight matching entity nodes, helping you quickly find a target entity in a large-scale graph.

Knowledge Q&A

The Knowledge Q&A module provides a conversational question-and-answer feature enhanced by the knowledge graph and supports multi-turn conversation. The system retrieves relevant information from the knowledge graph and document content to generate accurate answers based on your questions.

Retrieval modes

GraphRAG supports the following retrieval modes. You can choose the appropriate mode based on your question type:

Mode	Use case	Description
Local mode	Specific detail queries	Performs retrieval based on the local graph structure. Ideal for querying detailed information, such as the attributes or relations of a specific entity.
Global mode	High-level summary questions	Performs retrieval and summarization based on the global graph structure. Best for high-level questions that require analysis across multiple documents.
Hybrid mode	Comprehensive questions	Combines the retrieval results of local and global modes, balancing detailed information with a global perspective.
Mixed retrieval	General use (Recommended)	The recommended default mode. It combines graph retrieval and vector search to achieve the best results in most Q&A scenarios.
Naive mode	Simple keyword matching	A pure vector search mode that does not use graph enhancement. Suitable for simple keyword matching scenarios and offers faster response times.
Bypass mode	Pure LLM conversation	Skips all retrieval and uses the large language model directly. Suitable for general questions that are unrelated to the document content.

Q&A operations

In the left-side navigation pane of the GraphRAG page, click Knowledge Q&A.
Confirm that the current workspace is the one you want to use, and select a suitable retrieval mode. We recommend using mixed retrieval.
Enter your question in the conversation input box and press Enter or click the Send button. The system streams the answer in real time.
After the answer is generated, you can continue the multi-turn conversation by asking follow-up questions. The system will use the conversation context to generate more accurate answers.

If the answer is not satisfactory, try switching the retrieval mode. For example, use local mode for specific details or global mode for a high-level overview.

To start a new, independent session, click the Clear Conversation button to clear the current session's context.

View references

The system annotates each answer with references. You can click a reference marker to view the original document content, which helps you verify the accuracy of the answer and trace the information back to its origin.

FAQ

Why does the answer show "No relevant information found"?
Possible reasons include: the document has not finished processing, you are using a restrictive retrieval mode like local mode, or the wording of your question differs significantly from the document content. Try switching to mixed retrieval or hybrid mode and ask again.
Why is the answer quality poor?
Possible reasons include: incorrect LLM configuration, a chunk size that is too small leading to incomplete context, or a lack of custom business-specific entity types. Check that your model configuration is correct, try increasing the chunk size in the parameter settings, and consider adding more specific entity types for your business in the knowledge base settings.

Model configuration

The model configuration module is used to manage the AI models for GraphRAG. The system uses a two-layer configuration architecture of model provider + model, where the model ID is in the format model_name@provider_name.

Model types

GraphRAG uses the following four types of AI models:

Model type	Required	Description
LLM	Yes	A large language model (LLM), used for core tasks such as entity extraction, relation extraction, and knowledge Q&A.
Embedding model	Yes	A text embedding model, used to convert text into vector representations to support semantic similarity search.
Rerank model	No	A rerank model, used to refine the initial retrieval results and improve retrieval quality. Configuring this model enhances Q&A accuracy.
Vision language model (VLM)	No	A vision language model (VLM), used for processing documents that contain images. Configuring this model enhances the ability to understand image-based content.

Note

LLM and embedding models are required, and you must configure them before the system can run. Rerank and VLM models are optional and can be configured based on your needs.

Core concepts

Model provider: The platform or vendor that supplies the model. Each provider is configured with an API key and an optional Base URL. The system comes with presets for over 30 mainstream providers.
Model: A specific model instance associated with a provider. It includes attributes like the model name, type (chat, embedding, rerank, etc.), and maximum token count.
Model ID: The system uses model-name@provider-name as the unique identifier for a model, such as qwen-plus@Tongyi-Qianwen.
Relationship between API keys and models: When you modify a provider's API key, the system automatically updates the key for all models from that provider.

First-time setup

When you first use GraphRAG, you need to configure your models by following these steps:

Add a model provider: On the Model configuration page, expand the Add from Preset Model Providers section. Select your target provider (such as Tongyi-Qianwen, OpenAI, or DeepSeek) and fill in the connection information, including the API key.
Add Model: Click Add Custom Model, select a provider, and enter the model name and type. The system automatically generates a model ID in the format ModelName@ProviderName.
Set default models: In the default model settings section at the top of the page, select the default models for the LLM and embedding types, then click Save.

Model management operations

After the initial setup, you can also perform the following management operations:

Modify a provider's API key: In the list of added models, find the target provider and click the API Key button to the right of the provider row. Modify the API key or Base URL in the pop-up dialog box and save your changes. This simultaneously updates the key for all models under this provider.
Delete individual models: Expand the model list for the target provider, select the checkboxes for the models you want to delete, and click the Delete Selected button.
Note
The Delete Selected button appears only after you select at least one model.
Delete a provider and all its models: Click the Delete All button to the right of the provider row. After confirmation, the system will remove the provider's API key record and all models under it.
Reload Configured Models: If you modify environment variables such as LLM_MODEL, LLM_API_KEY, and EMBEDDING_MODEL in the server-side .env file, you can click the Reload Configured Models button to synchronize the model configuration from the environment variables to the configuration file without restarting the service.

Supported model providers

The system has presets for the following mainstream model providers (partial list):

Provider	Supported types	Description
Tongyi-Qianwen	LLM, embedding, rerank, VLM	Alibaba Cloud Tongyi-Qianwen
OpenAI	LLM, embedding, VLM	Official OpenAI
DeepSeek	LLM	DeepSeek
ZHIPU-AI	LLM, embedding, rerank	ZHIPU-AI GLM
Ollama	LLM, embedding	Supports local model deployment
Azure-OpenAI	LLM, embedding, VLM	Azure-hosted OpenAI
Bedrock	LLM, embedding, VLM	AWS Bedrock

Note

This is only a partial list. For the complete list of providers, see the Add from Preset Model Providers section on the Model configuration page.

Troubleshooting

If you encounter problems during model configuration, refer to the following troubleshooting solutions:

Symptom	Possible cause	Solution
The document status remains `pending`.	The default LLM or embedding model is not configured.	Go to the Model configuration page and ensure the default LLM and embedding models are set correctly.
API key invalid	The API key is invalid or has expired.	Check that the model provider's API key is correct. Use the connection test feature to validate the key. If the key has expired, generate a new one from the model provider's platform.
Connection timeout	Network connection timed out.	Check network connectivity and confirm that the knowledge platform's environment can access the model provider's API endpoint. If you are using a VPC network, verify that your security group and allowlist configurations are correct.
Model not found	The model name is incorrect or is not in the provider's supported list.	Confirm that the model name is spelled correctly and check that the model is available in the corresponding provider's list of supported models.