This document introduces GraphRAG, a feature of the knowledge platform, and covers knowledge base management, document processing, knowledge graph visualization, knowledge Q&A, and model configuration.
Overview
GraphRAG is a knowledge-enhanced retrieval system built on a graph structure. Its core principle is to parse document content into an entity-relation graph. This graph serves as the foundation for retrieval to answer complex semantic questions that span multiple documents and sections.
GraphRAG includes the following core concepts:
Concept | Description |
Workspace | A workspace is the top-level organizational unit in GraphRAG. Data is isolated between workspaces. |
Entity | A typed knowledge node, such as a person, organization, location, or event, extracted from a document. |
Relation | A semantic edge that connects two entities and describes their association. |
Chunk | A text segment created by chunking a document. It is the fundamental data unit for knowledge graph construction and retrieval. |
The GraphRAG processing workflow is as follows:
Document parsing and chunking: After you upload a document, the system uses a parsing engine to extract text, tables, and image content. The system then splits the text into chunks based on the configured parameters and generates an embedding for each chunk.
Knowledge graph construction: The system calls a large language model (LLM) to perform entity extraction and relation extraction on the chunks. The system then writes the extracted entities and relations to a knowledge graph, which is stored using the polar_age graph extension for PolarDB.
Multi-mode retrieval and Q&A: When you ask a question, the system retrieves relevant context from the knowledge graph and vector database based on the selected retrieval mode (such as local mode, global mode, hybrid mode, or mixed retrieval). The LLM then generates an answer.
Knowledge base
Use the Knowledge base page to manage workspaces. A workspace corresponds to an independent collection of documents and a knowledge graph. You can create multiple workspaces to manage documents for different business scenarios.
Create a workspace
In the left-side navigation pane of the GraphRAG page, click Knowledge Base.
Click New Knowledge Base.
In the creation dialog box, configure the following parameters:
Parameter
Description
Name
A unique name for the workspace to identify the document collection. We recommend using a name that reflects your business.
Parsing parameters
Configure document parsing parameters, including the parser type, parsing method, chunk size, and chunk overlap. For details, see Parameter settings.
Click Create to create the workspace.
Switch workspaces
In the workspace list, find the target workspace and click Switch in the Actions column to set it as the active workspace. After you switch, features such as document processing, knowledge graph, and knowledge Q&A will run on the data in that workspace.
Delete a workspace
In the workspace list, find the target workspace and click Delete.
Deleting a workspace also deletes all documents, the knowledge graph, and related data within it. This action cannot be undone. Proceed with caution.
Default workspace
When the system starts for the first time, a default workspace named default is automatically created. If you delete all workspaces, the system prompts you to create a new one the next time you access it.
Document processing
Document processing is the data ingestion entry point for GraphRAG. The system processes uploaded documents through a workflow of parsing, chunking, graph construction, and status updates. After processing, the system transforms the document content into an entity-relation graph for use in knowledge Q&A.
Supported document formats
GraphRAG supports the following document formats:
PDF
Word (.doc, .docx)
PPT (.ppt, .pptx)
Excel (.xls, .xlsx)
TXT
Markdown (.md)
Image (JPG, PNG, etc.)
Upload documents
In the left-side navigation pane of the GraphRAG page, click Document Processing.
Confirm that the current workspace is the one you want to use.
Click Upload Documents and select the files you want to process.
After the upload is complete, the system automatically starts the document processing workflow. You can view the processing status in the document list.
Document processing status
During processing, a document passes through the following statuses:
Status | Description |
| The document has been uploaded and is waiting to be processed. |
| The document is being processed, including parsing and chunking operations. |
| Document preprocessing is complete, and graph construction is in progress. |
| Document processing is complete. Entities and relations have been extracted into the knowledge graph and are ready for knowledge Q&A. |
| Document processing failed. You can view the logs to understand the cause of failure and reprocess the document. |
Document management operations
You can perform the following operations on documents in the document list:
View execution log: The Execution Log panel at the bottom of the page displays detailed logs for document processing, including records for parsing, chunking, and graph construction. You can expand or collapse the log panel.
View chunk details: Click a document's file name to view a list of its chunks and their content.
Reprocess failed documents: For documents that failed to process, click the Reprocess Failed Documents button in the toolbar at the top of the page. The system will restart the processing workflow for all failed documents.
Delete document: Click the Delete icon in the Actions column for the target document to remove it and its associated graph data.
Parameter settings
When you create or edit a workspace, you can configure the following document processing parameters:
Parameter | Values | Description |
Parser |
| Select the document parsing engine. Different parsers are suitable for different document scenarios. See the parser selection recommendations below. |
Parsing method |
| Set the document parsing method. |
Chunk size | Default: 1200 | Set the maximum number of characters for each text chunk. A larger chunk size preserves more context, while a smaller size can improve retrieval precision. |
Chunk overlap | Default: 100 | Set the number of overlapping characters between adjacent text chunks. A moderate overlap helps prevent splitting key information across chunk boundaries. |
Parser selection recommendations
We recommend choosing a parser based on the type and layout of your documents:
Parser | Use case | Description |
MinerU | Complex layouts | Ideal for documents with complex layouts, such as multi-column text, nested tables, and mixed text and images. It excels at layout analysis. |
Docling | Structured formats | Best for documents with well-defined structures, such as Word, PPT, and Excel files. It effectively preserves document hierarchy and formatting information. |
PaddleOCR | Image-only documents | Designed for image-only documents like scans and screenshots. It uses OCR technology to recognize and extract text from images. |
Knowledge graph
The Knowledge graph module visualizes the network of entities and relations extracted from your documents. This graph view helps you intuitively understand the semantic connections within your content.
Default entity types
By default, the system extracts the following types of entities:
Entity type | Description |
| An organization, such as a company, department, or team. |
| A person, such as an author, manager, or participant. |
| A geographical location, such as a country, city, or region. |
| An event, such as a project launch, version release, or meeting. |
| A category, such as a technical field or product classification. |
Browse the graph
On the knowledge graph visualization page, you can perform the following actions:
Zoom: Use your mouse wheel or trackpad gestures to adjust the graph's display scale, allowing you to see the overall structure or focus on local details.
Pan: Click and drag the canvas to move the visible area of the graph.
Click to highlight: Click an entity node to highlight it and its directly connected entities and relations. This makes it easier to trace the network of a specific entity.
Label filtering
In the label filtering panel on the graph page, you can filter the displayed entities by type, such as organization or person, to focus on specific types of knowledge networks.
Adjust the display scale
You can adjust the scale of the displayed graph using the following parameters:
Depth: Set the number of relation layers to expand in the graph. The value can range from 1 to 5. A higher value displays more connection layers.
Number of nodes: Control the maximum number of nodes to display in the graph. We recommend setting this value to 300 or less to ensure a smooth experience.
When the number of nodes is too large, the graph may become slow to render. We recommend using label filtering and adjusting the depth to control the display scale.
Search entities
Enter an entity name or keyword in the search box on the graph page. The system will locate and highlight matching entity nodes, helping you quickly find a target entity in a large-scale graph.
Knowledge Q&A
The Knowledge Q&A module provides a conversational question-and-answer feature enhanced by the knowledge graph and supports multi-turn conversation. The system retrieves relevant information from the knowledge graph and document content to generate accurate answers based on your questions.
Retrieval modes
GraphRAG supports the following retrieval modes. You can choose the appropriate mode based on your question type:
Mode | Use case | Description |
Local mode | Specific detail queries | Performs retrieval based on the local graph structure. Ideal for querying detailed information, such as the attributes or relations of a specific entity. |
Global mode | High-level summary questions | Performs retrieval and summarization based on the global graph structure. Best for high-level questions that require analysis across multiple documents. |
Hybrid mode | Comprehensive questions | Combines the retrieval results of local and global modes, balancing detailed information with a global perspective. |
Mixed retrieval | General use (Recommended) | The recommended default mode. It combines graph retrieval and vector search to achieve the best results in most Q&A scenarios. |
Naive mode | Simple keyword matching | A pure vector search mode that does not use graph enhancement. Suitable for simple keyword matching scenarios and offers faster response times. |
Bypass mode | Pure LLM conversation | Skips all retrieval and uses the large language model directly. Suitable for general questions that are unrelated to the document content. |
Q&A operations
In the left-side navigation pane of the GraphRAG page, click Knowledge Q&A.
Confirm that the current workspace is the one you want to use, and select a suitable retrieval mode. We recommend using mixed retrieval.
Enter your question in the conversation input box and press Enter or click the Send button. The system streams the answer in real time.
After the answer is generated, you can continue the multi-turn conversation by asking follow-up questions. The system will use the conversation context to generate more accurate answers.
If the answer is not satisfactory, try switching the retrieval mode. For example, use local mode for specific details or global mode for a high-level overview.
To start a new, independent session, click the Clear Conversation button to clear the current session's context.
View references
The system annotates each answer with references. You can click a reference marker to view the original document content, which helps you verify the accuracy of the answer and trace the information back to its origin.
FAQ
Why does the answer show "No relevant information found"?
Possible reasons include: the document has not finished processing, you are using a restrictive retrieval mode like local mode, or the wording of your question differs significantly from the document content. Try switching to mixed retrieval or hybrid mode and ask again.
Why is the answer quality poor?
Possible reasons include: incorrect LLM configuration, a chunk size that is too small leading to incomplete context, or a lack of custom business-specific entity types. Check that your model configuration is correct, try increasing the chunk size in the parameter settings, and consider adding more specific entity types for your business in the knowledge base settings.
Model configuration
The model configuration module is used to manage the AI models for GraphRAG. The system uses a two-layer configuration architecture of model provider + model, where the model ID is in the format model_name@provider_name.
Model types
GraphRAG uses the following four types of AI models:
Model type | Required | Description |
LLM | Yes | A large language model (LLM), used for core tasks such as entity extraction, relation extraction, and knowledge Q&A. |
Embedding model | Yes | A text embedding model, used to convert text into vector representations to support semantic similarity search. |
Rerank model | No | A rerank model, used to refine the initial retrieval results and improve retrieval quality. Configuring this model enhances Q&A accuracy. |
Vision language model (VLM) | No | A vision language model (VLM), used for processing documents that contain images. Configuring this model enhances the ability to understand image-based content. |
LLM and embedding models are required, and you must configure them before the system can run. Rerank and VLM models are optional and can be configured based on your needs.
Core concepts
Model provider: The platform or vendor that supplies the model. Each provider is configured with an API key and an optional Base URL. The system comes with presets for over 30 mainstream providers.
Model: A specific model instance associated with a provider. It includes attributes like the model name, type (chat, embedding, rerank, etc.), and maximum token count.
Model ID: The system uses
model-name@provider-nameas the unique identifier for a model, such asqwen-plus@Tongyi-Qianwen.Relationship between API keys and models: When you modify a provider's API key, the system automatically updates the key for all models from that provider.
First-time setup
When you first use GraphRAG, you need to configure your models by following these steps:
Add a model provider: On the Model configuration page, expand the Add from Preset Model Providers section. Select your target provider (such as Tongyi-Qianwen, OpenAI, or DeepSeek) and fill in the connection information, including the API key.
Add Model: Click Add Custom Model, select a provider, and enter the model name and type. The system automatically generates a model ID in the format
ModelName@ProviderName.Set default models: In the default model settings section at the top of the page, select the default models for the LLM and embedding types, then click Save.
Model management operations
After the initial setup, you can also perform the following management operations:
Modify a provider's API key: In the list of added models, find the target provider and click the API Key button to the right of the provider row. Modify the API key or Base URL in the pop-up dialog box and save your changes. This simultaneously updates the key for all models under this provider.
Delete individual models: Expand the model list for the target provider, select the checkboxes for the models you want to delete, and click the Delete Selected button.
NoteThe Delete Selected button appears only after you select at least one model.
Delete a provider and all its models: Click the Delete All button to the right of the provider row. After confirmation, the system will remove the provider's API key record and all models under it.
Reload Configured Models: If you modify environment variables such as
LLM_MODEL,LLM_API_KEY, andEMBEDDING_MODELin the server-side.envfile, you can click the Reload Configured Models button to synchronize the model configuration from the environment variables to the configuration file without restarting the service.
Supported model providers
The system has presets for the following mainstream model providers (partial list):
Provider | Supported types | Description |
Tongyi-Qianwen | LLM, embedding, rerank, VLM | Alibaba Cloud Tongyi-Qianwen |
OpenAI | LLM, embedding, VLM | Official OpenAI |
DeepSeek | LLM | DeepSeek |
ZHIPU-AI | LLM, embedding, rerank | ZHIPU-AI GLM |
Ollama | LLM, embedding | Supports local model deployment |
Azure-OpenAI | LLM, embedding, VLM | Azure-hosted OpenAI |
Bedrock | LLM, embedding, VLM | AWS Bedrock |
This is only a partial list. For the complete list of providers, see the Add from Preset Model Providers section on the Model configuration page.
Troubleshooting
If you encounter problems during model configuration, refer to the following troubleshooting solutions:
Symptom | Possible cause | Solution |
The document status remains | The default LLM or embedding model is not configured. | Go to the Model configuration page and ensure the default LLM and embedding models are set correctly. |
API key invalid | The API key is invalid or has expired. | Check that the model provider's API key is correct. Use the connection test feature to validate the key. If the key has expired, generate a new one from the model provider's platform. |
Connection timeout | Network connection timed out. | Check network connectivity and confirm that the knowledge platform's environment can access the model provider's API endpoint. If you are using a VPC network, verify that your security group and allowlist configurations are correct. |
Model not found | The model name is incorrect or is not in the provider's supported list. | Confirm that the model name is spelled correctly and check that the model is available in the corresponding provider's list of supported models. |