LLM 大语言模型应用接入 ARMS

Python探针是阿里云可观测产品自研的Python语言的可观测采集探针,其基于OpenTelemetry标准实现了自动化埋点能力,支持追踪LLM应用程序。

背景信息

LLM(Large Language Model)应用是指基于大语言模型所开发的各种应用,大语言模型通过大量的数据和参数训练,能够回答类似人类自然语言的问题,因此在自然语言处理、文本生成和智能对话等领域有广泛应用。

由于LLM的输出结果往往很难准确预测,同时面临训练和生产效果可能出现偏差、数据分布漂移导致性能下降、数据质量不保鲜、依赖外部数据不可靠等若干因素的不可控情况,这往往会影响LLM应用的整体表现,当模型输出质量下降时能够及时识别就显得非常重要。

ARMS支持对LLM应用通过Python探针自动埋点或通过OpenTelemetry手动埋点,将LLM应用接入ARMS后,您即可查看LLM应用的调用链视图,更直观地分析不同操作类型的输入输出、Token消耗等信息。更多信息,请参见LLM调用链分析

支持的 LLM 应用框架

框架

PyPI/Github仓库地址

低版本

高版本

OpenAI

https://pypi.org/project/openai/

v1.0.0

无限制

DashScope

https://pypi.org/project/dashscope/

v1.0.0

无限制

LlamaIndex

https://pypi.org/project/llama-index/

v0.10.5

v0.11.0

LangChain

https://pypi.org/project/langchain/

v0.1.0

无限制

Dify

https://github.com/langgenius/dify

v0.8.3

无限制

安装 Python 探针

根据LLM应用部署环境选择合适的安装方式:

通过 Python 探针启动应用

aliyun-instrument python llm_app.py
说明

请将llm_app.py替换为实际应用,如果您暂时没有可接入的LLM应用,您也可以使用附录提供的应用Demo。

执行结果

约一分钟后,若Python应用出现在ARMS控制台LLM应用监控 > 应用列表页面中且有数据上报,则说明接入成功。

2025-01-07_11-44-13

附录

OpenAI Demo

llm_app.py

import openai
from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.get("/")
def call_openai():
    client = openai.OpenAI(api_key="sk-xxx')
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Write a haiku."}],
        max_tokens=20,
    )
    return {"data": f"{response}"}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)    

requirements.txt

fastapi
uvicorn
openai >= 1.0.0

DashScope Demo

llm_app.py

from http import HTTPStatus
import dashscope
from dashscope import Generation
from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.get("/")
def call_dashscope():
    dashscope.api_key = 'YOUR-DASHSCOPE-API-KEY'
    responses = Generation.call(model=Generation.Models.qwen_turbo,
                                prompt='今天天气好吗?')
    resp = ""
    if responses.status_code == HTTPStatus.OK:
        resp = f"Result is: {responses.output}"
    else:
        resp = f"Failed request_id: {responses.request_id}, status_code: {responses.status_code}, code: {responses.code}, message: {responses.message})
    return {"data": f"{resp}"}


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

requirements.txt

fastapi
uvicorn
dashscope >= 1.0.0

LlamaIndex Demo

data目录下存放知识库文档(pdf、txt、doc等文本格式)。

llm_app.py

import time

from fastapi import FastAPI
import uvicorn
import aiohttp

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.dashscope import DashScopeEmbedding
import chromadb
import dashscope
import os
from dotenv import load_dotenv
from llama_index.core.llms import ChatMessage
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
import random

load_dotenv()

os.environ["DASHSCOPE_API_KEY"] = 'sk-xxxxxx'
dashscope.api_key = 'sk-xxxxxxx'
api_key = 'sk-xxxxxxxx'

llm = DashScope(model_name=DashScopeGenerationModels.QWEN_MAX,api_key=api_key)

# create client and a new collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("chapters")

# define embedding function
embed_model = DashScopeEmbedding(model_name="text-embedding-v1", api_key=api_key)

# load documents
filename_fn = lambda filename: {"file_name": filename}

# automatically sets the metadata of each document according to filename_fn
documents = SimpleDirectoryReader(
    "./data/", file_metadata=filename_fn
).load_data()

# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=4,
    verbose=True
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(llm=llm, response_mode="refine")

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

SYSTEM_PROMPT = """
你是一个儿童通识类聊天机器人,你的任务是根据用户输入的问题,结合知识库中找到的最相关的内容,然后根据内容生成回答。注意不回答主观问题。
"""

# Initialize the conversation with a system message
messages = [ChatMessage(role="system", content=SYSTEM_PROMPT)]

app = FastAPI()


async def fetch(question):
    url = "https://www.aliyun.com"
    call_url = os.environ.get("LLM_INFRA_URL")
    if call_url is None or call_url == "":
        call_url = url
    else:
        call_url = f"{call_url}?question={question}"
    print(call_url)
    async with aiohttp.ClientSession() as session:
        async with session.get(call_url) as response:
            print(f"GET Status: {response.status}")
            data = await response.text()
            print(f"GET Response JSON: {data}")
            return data


@app.get("/heatbeat")
def heatbeat():
    return {"msg", "ok"}


cnt = 0


@app.get("/query")
async def call(question: str = None):
    global cnt
    cnt += 1
    if cnt == 20:
        cnt = 0
        raise BaseException("query is over limit,20 ", 401)
    # Add user message to the conversation history
    message = ChatMessage(role="user", content=question)
    # Convert messages into a string
    message_string = f"{message.role}:{message.content}"

    search = await fetch(question)
    print(f"search:{search}")
    resp = query_engine.query(message_string)
    print(resp)
    return {"data": f"{resp}".encode('utf-8').decode('utf-8')}


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

requirements.txt

fastapi
uvicorn
numpy==1.23.5
llama-index==0.10.62
llama-index-core==0.10.28
llama-index-embeddings-dashscope==0.1.3
llama-index-llms-dashscope==0.1.2
llama-index-vector-stores-chroma==0.1.6
aiohttp

LangChain Demo

llm_app.py

from fastapi import FastAPI
from langchain.llms.fake import FakeListLLM
import uvicorn
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

app = FastAPI()
llm = FakeListLLM(responses=["I'll callback later.", "You 'console' them!"])

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"

@app.get("/")
def call_langchain():
    res = llm_chain.run(question)
    return {"data": res}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

requirements.txt

fastapi
uvicorn
langchain
langchain_community

Dify Demo

您可以参考基于Dify构建网页定制化AI问答助手文档快速搭建一个Dify应用。