Build RAG applications using LlamaIndex

更新时间:
复制 MD 格式

Integrate the retrieval-augmented generation (RAG) service from Alibaba Cloud Model Studio with LlamaIndex.

Get started

Prerequisites

  • You have obtained an API key and configured it as an environment variable. This configuration method is scheduled for deprecation and will be incorporated into the main API key configuration process.

  • You have enabled the knowledge base service in the Model Studio console. When you open the knowledge base page for the first time, follow the on-screen prompts to enable the service.

  • If you want to specify a workspace, obtain its ID.

  • In your terminal, execute the following command to install the DashScopeCloudIndex package. This package requires a Python version from 3.8 to 3.12.

    pip install llama-index-core
    pip install llama-index-llms-dashscope
    pip install llama-index-indices-managed-dashscope

File parsing

Prepare your knowledge base files:

  • You can use one or more individual files.

  • You can place all files in a single folder.

The following example uses DashScopeParse from Alibaba Cloud Model Studio as the document parser.

The DashScopeParse parser supports online parsing of .doc, .docx, and .pdf files. Each file must be smaller than 100 MB and have fewer than 1,000 pages.
import os

from llama_index.readers.dashscope.base import DashScopeParse
from llama_index.readers.dashscope.utils import ResultType

# Set the workspace ID. This determines which workspace the parsed documents are uploaded to in the "Create a knowledge base" step.
os.environ['DASHSCOPE_WORKSPACE_ID'] = "<Your Workspace id, Default workspace is empty.>"

# Method 1: Use the document parser to parse one or more files.
file = [
    # Files to parse. Supported formats: .pdf, .doc, .docx.
]
# Parse the files.
parse = DashScopeParse(result_type=ResultType.DASHSCOPE_DOCMIND)
documents = parse.load_data(file_path=file)

# Method 2: Use the document parser to parse files of a specific type within a folder.
from llama_index.core import SimpleDirectoryReader
parse = DashScopeParse(result_type=ResultType.DASHSCOPE_DOCMIND)
# Define parsers for different document types.
file_extractor = {".pdf": parse, '.doc': parse, '.docx': parse}
# Read the folder, then extract and parse file information.
documents = SimpleDirectoryReader(
    "your_folder", file_extractor=file_extractor
).load_data(num_workers=1)

After the files are uploaded, go to the Data Connectors page. On the card for your connector, click View Details to view the uploaded documents.

Create a knowledge base

You can use the `documents` object to create a knowledge base.

from llama_index.indices.managed.dashscope import DashScopeCloudIndex

# create a new index
index = DashScopeCloudIndex.from_documents(
    documents,
    "my_first_index",
    verbose=True,
)

After you create a knowledge base, it appears on the Knowledge Base page.

Read a knowledge base

You can use the following code to initialize an existing knowledge base in LlamaIndex.

index = DashScopeCloudIndex("my_first_index")

Get a retriever

You can retrieve a retriever from the index object, or initialize a DashScopeCloudRetriever using the knowledge base name.

# convert from index
retriever = index.as_retriever()

# initialize from DashScopeCloudRetriever
from llama_index.indices.managed.dashscope.retriever import DashScopeCloudRetriever
retriever = DashScopeCloudRetriever("my_first_index")

nodes = retriever.retrieve("my query")

Get a query engine

import os

from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels

dashscope_llm = DashScope(
  model_name=DashScopeGenerationModels.QWEN_MAX, api_key=os.environ["DASHSCOPE_API_KEY"]
)

query_engine = index.as_query_engine(llm=dashscope_llm)

Add or delete documents in a knowledge base

# add documents to index
index._insert(documents)
# delete documents from index
index.delete_ref_doc([doc_id])