Python探针是阿里云可观测产品自研的Python语言的可观测采集探针,其基于OpenTelemetry标准实现了自动化埋点能力,支持追踪LLM应用程序。
背景信息
LLM(Large Language Model)应用是指基于大语言模型所开发的各种应用,大语言模型通过大量的数据和参数训练,能够回答类似人类自然语言的问题,因此在自然语言处理、文本生成和智能对话等领域有广泛应用。
由于LLM的输出结果往往很难准确预测,同时面临训练和生产效果可能出现偏差、数据分布漂移导致性能下降、数据质量不保鲜、依赖外部数据不可靠等若干因素的不可控情况,这往往会影响LLM应用的整体表现,当模型输出质量下降时能够及时识别就显得非常重要。
ARMS支持对LLM应用通过Python探针自动埋点或通过OpenTelemetry手动埋点,将LLM应用接入ARMS后,您即可查看LLM应用的调用链视图,更直观地分析不同操作类型的输入输出、Token消耗等信息。更多信息,请参见LLM调用链分析。
支持的 LLM 应用框架
框架 | PyPI/Github仓库地址 | 低版本 | 高版本 |
OpenAI | v1.0.0 | 无限制 | |
DashScope | v1.0.0 | 无限制 | |
LlamaIndex | v0.10.5 | v0.11.0 | |
LangChain | v0.1.0 | 无限制 | |
Dify | v0.8.3 | 无限制 |
安装 Python 探针
根据LLM应用部署环境选择合适的安装方式:
通过 Python 探针启动应用
aliyun-instrument python llm_app.py
说明
请将llm_app.py替换为实际应用,如果您暂时没有可接入的LLM应用,您也可以使用附录提供的应用Demo。
执行结果
约一分钟后,若Python应用出现在ARMS控制台的 页面中且有数据上报,则说明接入成功。
附录
OpenAI Demo
llm_app.py
import openai
from fastapi import FastAPI
import uvicorn
app = FastAPI()
@app.get("/")
def call_openai():
client = openai.OpenAI(api_key="sk-xxx')
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a haiku."}],
max_tokens=20,
)
return {"data": f"{response}"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
requirements.txt
fastapi
uvicorn
openai >= 1.0.0
DashScope Demo
llm_app.py
from http import HTTPStatus
import dashscope
from dashscope import Generation
from fastapi import FastAPI
import uvicorn
app = FastAPI()
@app.get("/")
def call_dashscope():
dashscope.api_key = 'YOUR-DASHSCOPE-API-KEY'
responses = Generation.call(model=Generation.Models.qwen_turbo,
prompt='今天天气好吗?')
resp = ""
if responses.status_code == HTTPStatus.OK:
resp = f"Result is: {responses.output}"
else:
resp = f"Failed request_id: {responses.request_id}, status_code: {responses.status_code}, code: {responses.code}, message: {responses.message})
return {"data": f"{resp}"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
requirements.txt
fastapi
uvicorn
dashscope >= 1.0.0
LlamaIndex Demo
在data目录下存放知识库文档(pdf、txt、doc等文本格式)。
llm_app.py
import time
from fastapi import FastAPI
import uvicorn
import aiohttp
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.dashscope import DashScopeEmbedding
import chromadb
import dashscope
import os
from dotenv import load_dotenv
from llama_index.core.llms import ChatMessage
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
import random
load_dotenv()
os.environ["DASHSCOPE_API_KEY"] = 'sk-xxxxxx'
dashscope.api_key = 'sk-xxxxxxx'
api_key = 'sk-xxxxxxxx'
llm = DashScope(model_name=DashScopeGenerationModels.QWEN_MAX,api_key=api_key)
# create client and a new collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("chapters")
# define embedding function
embed_model = DashScopeEmbedding(model_name="text-embedding-v1", api_key=api_key)
# load documents
filename_fn = lambda filename: {"file_name": filename}
# automatically sets the metadata of each document according to filename_fn
documents = SimpleDirectoryReader(
"./data/", file_metadata=filename_fn
).load_data()
# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, embed_model=embed_model
)
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=4,
verbose=True
)
# configure response synthesizer
response_synthesizer = get_response_synthesizer(llm=llm, response_mode="refine")
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
SYSTEM_PROMPT = """
你是一个儿童通识类聊天机器人,你的任务是根据用户输入的问题,结合知识库中找到的最相关的内容,然后根据内容生成回答。注意不回答主观问题。
"""
# Initialize the conversation with a system message
messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]
app = FastAPI()
async def fetch(question):
url = "https://www.aliyun.com"
call_url = os.environ.get("LLM_INFRA_URL")
if call_url is None or call_url == "":
call_url = url
else:
call_url = f"{call_url}?question={question}"
print(call_url)
async with aiohttp.ClientSession() as session:
async with session.get(call_url) as response:
print(f"GET Status: {response.status}")
data = await response.text()
print(f"GET Response JSON: {data}")
return data
@app.get("/heatbeat")
def heatbeat():
return {"msg", "ok"}
cnt = 0
@app.get("/query")
async def call(question: str = None):
global cnt
cnt += 1
if cnt == 20:
cnt = 0
raise BaseException("query is over limit,20 ", 401)
# Add user message to the conversation history
message = ChatMessage(role="user", content=question)
# Convert messages into a string
message_string = f"{message.role}:{message.content}"
search = await fetch(question)
print(f"search:{search}")
resp = query_engine.query(message_string)
print(resp)
return {"data": f"{resp}".encode('utf-8').decode('utf-8')}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
requirements.txt
fastapi
uvicorn
numpy==1.23.5
llama-index==0.10.62
llama-index-core==0.10.28
llama-index-embeddings-dashscope==0.1.3
llama-index-llms-dashscope==0.1.2
llama-index-vector-stores-chroma==0.1.6
aiohttp
LangChain Demo
llm_app.py
from fastapi import FastAPI
from langchain.llms.fake import FakeListLLM
import uvicorn
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
app = FastAPI()
llm = FakeListLLM(responses=["I'll callback later.", "You 'console' them!"])
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
@app.get("/")
def call_langchain():
res = llm_chain.run(question)
return {"data": res}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
requirements.txt
fastapi
uvicorn
langchain
langchain_community
Dify Demo
您可以参考基于Dify构建网页定制化AI问答助手文档快速搭建一个Dify应用。
该文章对您有帮助吗?