This tutorial shows how to use DashVector, a vector retrieval service, together with a Large Language Model (LLM) to build a Q&A service based on proprietary knowledge from a specific domain. You access the LLM and text embedding capabilities through the Qwen API and Embedding API on Alibaba Cloud Model Studio.
Background and implementation
Large Language Models (LLMs) are a core technology in natural language processing and provide extensive NLP capabilities. However, their training corpora have limitations. These corpora typically include general knowledge, common sense information such as Wikipedia articles, news, and novels, and professional knowledge from various fields. As a result, LLMs often lack sufficient depth or accuracy when representing or applying knowledge in specific domains—especially proprietary knowledge within a vertical industry or enterprise.
To build a Q&A service for a specific domain, you must enable the LLM to understand and access domain-specific knowledge that lies outside its training data. You can also design targeted prompts to help the LLM interpret user intent and answer questions using this injected domain knowledge. Unlike search engines—where users often enter only a few keywords—users of Q&A services typically ask questions in complete sentences. Direct keyword matching against a corporate knowledge base is therefore often ineffective. Long sentences also require additional processing, such as tokenization and weighting. In contrast, converting both the question and knowledge base content into high-quality vectors enables semantic search via vector retrieval. This approach makes extracting relevant knowledge points simple and efficient.
This tutorial uses the Chinese Emergency Corpus (CEC Corpus) to demonstrate a Q&A service for news reports about emergency events.
Overall flow

The process has three main stages:
-
Vectorize the local knowledge base. Use a text embedding model to convert the knowledge base into high-quality, low-dimensional vector data, and write it to DashVector. This tutorial uses the Embedding API on Model Studio for data vectorization.
-
Extract relevant knowledge points. Vectorize the user’s question and use DashVector to retrieve the original text of relevant knowledge points.
-
Construct a prompt and ask the question. Combine the relevant knowledge points with the question to create a prompt, then send it to Qwen.
Preparations
1. Prepare API keys and a cluster
-
Activate Alibaba Cloud Model Studio and obtain an API key. For more information, see Obtain an API key and Configure the API key as an environment variable (to be unpublished and merged into Configure an API key).
-
Activate DashVector and obtain an API key. For more information, see DashVector API key management.
-
Activate DashVector and create a cluster.
-
Obtain the cluster endpoint. For more information, see Cluster details.
The API key for Alibaba Cloud Model Studio is separate from the API key for DashVector. You must obtain them individually.
2. Prepare the environment
You must install Python 3.7 or later. Make sure you have the correct Python version.
pip3 install dashvector dashscope3. Prepare the data
git clone https://github.com/shijiebei2009/CEC-Corpus.gitSteps
In this tutorial, you must replace your-xxx-api-key and your-xxx-cluster-endpoint with your own API key and cluster endpoint for the code to run correctly.
1. Vectorize the local knowledge base
The CEC-Corpus dataset contains the corpus and annotated data for 332 news reports on emergency events. For this tutorial, you only need to extract the original news text, vectorize it, and store it in DashVector. For a tutorial on text vectorization, see Implement semantic search using Vector Retrieval Service and TextEmbedding. Create an embedding.py file and copy the following sample code into it:
import os
import dashscope
from dashscope import TextEmbedding
from dashvector import Client, Doc
def prepare_data(path, batch_size=25):
batch_docs = []
for file in os.listdir(path):
with open(path + '/' + file, 'r', encoding='utf-8') as f:
batch_docs.append(f.read())
if len(batch_docs) == batch_size:
yield batch_docs
batch_docs = []
if batch_docs:
yield batch_docs
def generate_embeddings(news):
rsp = TextEmbedding.call(
model=TextEmbedding.Models.text_embedding_v1,
input=news
)
embeddings = [record['embedding'] for record in rsp.output['embeddings']]
return embeddings if isinstance(news, list) else embeddings[0]
if __name__ == '__main__':
dashscope.api_key = '{your-dashscope-api-key}'
# Initialize the DashVector client
client = Client(
api_key='{your-dashvector-api-key}',
endpoint='{your-dashvector-cluster-endpoint}'
)
# Create a collection. Specify the collection name and vector dimensions. The text_embedding_v1 model generates vectors with 1536 dimensions.
rsp = client.create('news_embeddings', 1536)
assert rsp
# Load the corpus
id = 0
collection = client.get('news_embeddings')
for news in list(prepare_data('CEC-Corpus/raw corpus/allSourceText')):
ids = [id + i for i, _ in enumerate(news)]
id += len(news)
vectors = generate_embeddings(news)
# Write to DashVector to build the index
rsp = collection.upsert(
[
Doc(id=str(id), vector=vector, fields={"raw": doc})
for id, vector, doc in zip(ids, vectors, news)
]
)
assert rspIn the example, the embedding vectors and the news report text (as the raw field) are stored together in DashVector. This allows the original text to be retrieved during vector search.
2. Extract knowledge points
After writing all the news reports from the CEC-Corpus dataset to DashVector, you can perform fast vector retrieval. To do this, vectorize the question and search DashVector for the most relevant knowledge points—that is, related news reports. Create a search.py file and copy the following sample code into it.
from dashvector import Client
from embedding import generate_embeddings
def search_relevant_news(question):
# Initialize the DashVector client
client = Client(
api_key='{your-dashvector-api-key}',
endpoint='{your-dashvector-cluster-endpoint}'
)
# Get the collection you just stored data in
collection = client.get('news_embeddings')
assert collection
# Vector retrieval: specify topk=1
rsp = collection.query(generate_embeddings(question), output_fields=['raw'],
topk=1)
assert rsp
return rsp.output[0].fields['raw']3. Construct a prompt to query the LLM (Qwen)
After retrieving relevant knowledge points, combine the question and those knowledge points into a prompt based on a specific template, then send it to the LLM. The LLM used here is Qwen, a large-scale language model developed by Alibaba. It interprets user intent through natural language understanding and semantic analysis of user input. You can obtain more accurate results by providing clear and detailed instructions—or prompts. These capabilities are available through the Qwen API.
The prompt template designed for this tutorial is: Please answer the question based on the content I provide. The content is {___}, and my question is {___}. You can also design your own template. Create an answer.py file and copy the following sample code into it.
from dashscope import Generation
def answer_question(question, context):
prompt = f'''Please answer the question based on the content within the triple backticks.
```
{context}
```
My question is: {question}.
'''
rsp = Generation.call(model='qwen-turbo', prompt=prompt)
return rsp.output.textQ&A
After completing these preparations, you can ask the LLM questions related to specific knowledge points. For example, the CEC-Corpus news dataset includes a report. Because the entire news dataset has already been converted into vectors and stored, you can now use this news report as a knowledge point and ask a specific question, such as: Where did the Hainan Ding'an rear-end collision happen? What was the cause? What were the casualties?, and then view the answer.

Create a run.py file and copy the following sample code into it.
import dashscope
from search import search_relevant_news
from answer import answer_question
if __name__ == '__main__':
dashscope.api_key = '{your-dashscope-api-key}'
question = 'Where did the Hainan Ding\'an rear-end collision happen? What was the cause? What were the casualties?'
context = search_relevant_news(question)
answer = answer_question(question, context)
print(f'question: {question}\n' f'answer: {answer}')

As you can see, using DashVector as the foundation for vector retrieval extends the LLM’s knowledge scope to a proprietary, specific domain—and enables it to provide accurate answers.
Conclusion
This tutorial demonstrates that DashVector, as a standalone vector retrieval service, provides powerful, out-of-the-box vector retrieval capabilities. When combined with various AI models, these capabilities support diverse AI applications. In this example, the LLM Q&A and text embedding generation capabilities are accessed through the Qwen API and Embedding API on Alibaba Cloud Model Studio. In practice, you can also implement these capabilities using other third-party services or open source model communities, such as the various open source LLM models on ModelScope.