If you encounter incomplete knowledge retrieval or inaccurate content with the retrieval-augmented generation (RAG) feature in Alibaba Cloud Model Studio, refer to the suggestions and examples in this topic to improve RAG performance.
1. RAG process
Retrieval-augmented generation (RAG) is a technique that combines information retrieval with text generation, allowing a large model to use relevant information from an external knowledge base when generating answers. RAG effectiveness depends on three core stages:
-
Indexing: Parsing, chunking, and vectorizing knowledge.
-
Retrieval and recall: Matching and retrieving relevant text chunks from vector storage based on the user query (prompt).
-
Answer generation: The large model generates the final answer based on the retrieved text chunks and the user prompt.

This topic introduces several strategies for optimizing RAG performance across these three stages.
2. Establish an evaluation baseline
Before optimizing, establish a quantifiable evaluation baseline to measure the impact of subsequent improvements.
-
Create an evaluation set
-
Purpose: Define a set of standard, repeatable test cases. Each case must include a question and its expected result.
-
Procedure: Use the automatic evaluation feature in Alibaba Cloud Model Studio to create an evaluation set with at least 100 question-answer pairs. This set should cover core, real-world scenarios. Include the following question types:
-
Factual: What is the warranty period for "Product X"?
-
Comparative: Compare the main differences between "Product X" and "Product Y."
-
Instructional: How do I install "Product X"?
-
Analytical: Why have sales of "Product X" increased over the past three months?
-
-
-
Run the evaluation and record the results
-
Purpose: Record the RAG application's performance with its initial configuration to serve as a baseline for future optimizations.
-
Procedure: Run the entire evaluation set once and record the retrieved content and diagnostic results for each case.
-
This step produces a comprehensive RAG baseline performance report detailing the success or failure of the current configuration for each test case, along with the diagnostic results.
3. Diagnosis and improvement
Review the failed cases (large model score < 4) from the baseline test report and make targeted improvements based on the diagnostic results.
3.1 Invalid retrieval: No relevant knowledge found
Solutions:
-
Include relevant knowledge: If the knowledge base lacks relevant information, the large model cannot answer related questions. Update the knowledge base with the necessary information.
-
Optimize source file content and layout: Review and correct source files to ensure key content is not lost during parsing due to formatting issues. Follow these best practices:
-
Ensure headings at all levels are distinct and the content structure is clear.
-
Remove page watermarks.
-
Avoid complex tables, such as those with merged or cross-page cells.
-
Use Markdown format whenever possible. For formats like PDF or DOCX, convert them to Markdown before importing.
To convert a PDF to Markdown, you can use the DashScopeParse tool in Alibaba Cloud Model Studio. For usage instructions, see the RAG chapter of the Alibaba Cloud Large Model ACP course.
-
-
Align with prompt language: If user prompts are predominantly in a foreign language (e.g., English), your source files should also use that language. For technical terms, consider multilingual processing.
-
Entity disambiguation: Standardizes expressions for the same entity. For example, "ML", "Machine Learning", and "Machine Learning" can be standardized as "Machine Learning".
You can use a large model for standardization. If the content is long, split it into smaller parts and input them sequentially.
-
Enable multi-turn conversation rewriting: Automatically supplement user queries based on conversation history. This ensures the large model correctly understands pronouns and omitted context in multi-turn dialogues.
3.2 Invalid retrieval: Irrelevant knowledge retrieved
-
Typical problem: A knowledge base contains files from multiple categories. When a query is about content in a Category A file, the retrieval results include irrelevant text chunks from other categories, such as Category B.
Solution: Add tags to files. The knowledge base then filters files by tag before performing vector retrieval.
-
Typical problem: The knowledge base contains multiple files with similar or identical structures, such as both File A and File B having a "Feature Overview" section. You only want to retrieve from the "Feature Overview" in File A.
Solution: Define metadata for the files. This allows the knowledge base to perform a structured search before retrieval, precisely locating the target file and extracting relevant information.
3.3 Incomplete chunks
Files imported into a knowledge base are parsed and chunked to reduce interference during vectorization while maintaining semantic integrity. An inappropriate chunking method can lead to the following problems:
|
Text chunks too short |
Text chunks too long |
Abrupt semantic breaks |
|
|
|
|
|
Overly short chunks can lack semantic context, leading to retrieval mismatches. |
Overly long chunks can contain irrelevant topics, leading to noisy retrieval. |
Forced semantic breaks can cause content to be lost during retrieval. |
In practice, aim for text chunks that are semantically complete while minimizing irrelevant information.
Solutions:
-
Use the smart chunking strategy: This approach splits text based on semantic relevance, which helps preserve semantic integrity.
-
Manually inspect and correct text chunk content: Ensure files are parsed and chunked correctly.
3.4 Poor reranking
After the knowledge base finds text chunks related to the user's prompt, it sends them to a reranking model. The similarity threshold is then used to filter the reranked text chunks. Only chunks with a similarity score above this threshold are provided to the large model.

Lowering this threshold retrieves more text chunks, but may also include less relevant ones. Raising it reduces the number of retrieved chunks.
If this value is set too high, the knowledge base might discard all relevant text chunks, limiting the model's ability to get sufficient background information to generate an answer.

Solutions:
-
Adjust the "similarity threshold": Relax the retrieval conditions to avoid missed retrievals due to overly strict filtering.
-
Increase the "Number of Recalled Chunks": For complex questions requiring summarization, enumeration, or comparison, increasing this value helps the large model generate more complete answers.
3.5 Model misunderstanding
-
Typical problem: The large model fails to understand the relationship between the knowledge and the user's prompt, resulting in a seemingly stitched-together answer.
Solution: Switch to a generation model that better understands the relationship between the knowledge and the user's prompt.
-
Typical problem: The returned result does not follow instructions or is incomplete.
Solution: Optimize the prompt template. By adjusting the prompt, you can influence the large model's behavior (such as how it utilizes retrieved knowledge), which can indirectly improve RAG performance.
-
Typical problem: The response includes the large model's own general knowledge instead of being strictly based on the knowledge base.
Solution: Enable rejection to restrict answers to only the knowledge retrieved from the knowledge base.
-
Typical problem: For the same prompt, you want the result to be either the same or different each time.
Solution: Adjust the large model parameters.
4. Next steps
4.1 Continuous iteration
-
Re-evaluate: After each configuration change, rerun the evaluation set created previously.
-
Compare and analyze: Compare the results with the baseline report to quantitatively analyze the impact of the changes (what issues were solved, and whether new ones were introduced).
-
Iterate continuously: Based on the data analysis, decide on the next optimization strategy.
4.2 Model fine-tuning
Finally, if you have exhausted the preceding methods and need to further improve performance, consider model fine-tuning for your specific scenario.


























