Integrate the retrieval-augmented generation (RAG) service from Alibaba Cloud Model Studio with LlamaIndex.
Get started
Prerequisites
You have obtained an API key and configured it as an environment variable. This configuration method is scheduled for deprecation and will be incorporated into the main API key configuration process.
You have enabled the knowledge base service in the Model Studio console. When you open the knowledge base page for the first time, follow the on-screen prompts to enable the service.
If you want to specify a workspace, obtain its ID.
In your terminal, execute the following command to install the DashScopeCloudIndex package. This package requires a Python version from 3.8 to 3.12.
pip install llama-index-core pip install llama-index-llms-dashscope pip install llama-index-indices-managed-dashscope
File parsing
Prepare your knowledge base files:
You can use one or more individual files.
You can place all files in a single folder.
The following example uses DashScopeParse from Alibaba Cloud Model Studio as the document parser.
The DashScopeParse parser supports online parsing of .doc, .docx, and .pdf files. Each file must be smaller than 100 MB and have fewer than 1,000 pages.
import os
from llama_index.readers.dashscope.base import DashScopeParse
from llama_index.readers.dashscope.utils import ResultType
# Set the workspace ID. This determines which workspace the parsed documents are uploaded to in the "Create a knowledge base" step.
os.environ['DASHSCOPE_WORKSPACE_ID'] = "<Your Workspace id, Default workspace is empty.>"
# Method 1: Use the document parser to parse one or more files.
file = [
# Files to parse. Supported formats: .pdf, .doc, .docx.
]
# Parse the files.
parse = DashScopeParse(result_type=ResultType.DASHSCOPE_DOCMIND)
documents = parse.load_data(file_path=file)
# Method 2: Use the document parser to parse files of a specific type within a folder.
from llama_index.core import SimpleDirectoryReader
parse = DashScopeParse(result_type=ResultType.DASHSCOPE_DOCMIND)
# Define parsers for different document types.
file_extractor = {".pdf": parse, '.doc': parse, '.docx': parse}
# Read the folder, then extract and parse file information.
documents = SimpleDirectoryReader(
"your_folder", file_extractor=file_extractor
).load_data(num_workers=1)
After the files are uploaded, go to the Data Connectors page. On the card for your connector, click View Details to view the uploaded documents.
Create a knowledge base
You can use the `documents` object to create a knowledge base.
from llama_index.indices.managed.dashscope import DashScopeCloudIndex
# create a new index
index = DashScopeCloudIndex.from_documents(
documents,
"my_first_index",
verbose=True,
)After you create a knowledge base, it appears on the Knowledge Base page.
Read a knowledge base
You can use the following code to initialize an existing knowledge base in LlamaIndex.
index = DashScopeCloudIndex("my_first_index")Get a retriever
You can retrieve a retriever from the index object, or initialize a DashScopeCloudRetriever using the knowledge base name.
# convert from index
retriever = index.as_retriever()
# initialize from DashScopeCloudRetriever
from llama_index.indices.managed.dashscope.retriever import DashScopeCloudRetriever
retriever = DashScopeCloudRetriever("my_first_index")
nodes = retriever.retrieve("my query")
Get a query engine
import os
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
dashscope_llm = DashScope(
model_name=DashScopeGenerationModels.QWEN_MAX, api_key=os.environ["DASHSCOPE_API_KEY"]
)
query_engine = index.as_query_engine(llm=dashscope_llm)
Add or delete documents in a knowledge base
# add documents to index
index._insert(documents)
# delete documents from index
index.delete_ref_doc([doc_id])