AI 应用可观测内置了一系列评估模板,用于评估一些通用场景。除此之外,如果开发者有自定义的评估指标,可基于可观测的实时计算能力进行自定义调用。
查询内置的评估模板
使用以下语句查询所有的中文评估模板。您可以根据内置模板进行修改后适配特殊场景。
* | select * from "resource.llm_evaluation" where lang='cn'
访问百炼
http_call
是SQL/SPL中的一个scalar函数,接收body、header等信息,访问外部服务。您可通过该函数访问百炼,处理数据。
语法
http_call(url, method, headers, params, body, timeout)
参数说明
说明
访问token消耗的费用由阿里云可观测产品代为收取。如果希望使用自己的账号,请在header中填写bear key。
参数名称 | 类型 | 说明 |
url | string | 访问的 url。目前只支持访问百炼,请填写百炼的HTTPS地址 |
method | JSON 格式的 string | HTTP 请求。 |
headers | JSON 格式的 string | 访问百炼的header,可在header中使用自己的百炼key。 |
params | JSON 格式的 string | GET请求下使用,在POST请求时可设置为空。 |
body | JSON 格式的 message | 访问百炼的prompt,在prompt中指定访问的模型。 |
timeout | number | 超时时间,单位为毫秒。 |
使用样例
http_call(
'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
'POST',
'{ "Content-Type":"application/json"}',
'',
'{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "你是谁?"
}
]
}',
60 * 1000
)
自定义评估方案
以下SQL是基于自定义评估模板,对数据进行总结。评估指令可以在SQL中指定。
(* and id: 999 and type: dca)| set
session velox_use_io_executor = true;
with t1 as (
select
__time__,
"sentence1" as trans, -- 此处是待处理数据
'{{query}}' as targets, --此处是评估模板中的占位符
'{"model":"<QWEN_MODEL>","input":{"messages":[{"role":"system","content":"<SYSTEM_PROMPT>"},{"role":"user","content":"<USER_PROMPT>"}]}}' as body_template, --此处为访问百炼的body模板
cast('请用一句话对以下内容进行总结:{{query}}' as varchar) as eval_prompt --此处为评估模板,包含了指令信息和占位符
FROM log
),
t1_1 as (
select
__time__,
body_template,
eval_prompt,
replace(eval_prompt, targets, trans) as eval_content,
trans
FROM t1
),
t2 as (
select
__time__,
body_template,
eval_content,
trans
FROM t1_1
),
t3 as (
select
__time__,
trans as oldeval,
body_template,
replace(
replace(
replace(
replace(
replace(
replace(replace(eval_content, chr(92), '\\'), '"', '\"'),
chr(8),
'\b'
),
chr(12),
'\f'
),
chr(10),
'\n'
),
chr(13),
'\r'
),
chr(9),
'\t'
) as eval,
trans
FROM t2
),
t4 as (
select
__time__,
replace(
replace(
replace(body_template, '<QWEN_MODEL>', 'qwen-turbo'),
'<SYSTEM_PROMPT>',
'You are a helpful assistant.'
),
'<USER_PROMPT>',
eval
) as body,
oldeval,
trans
FROM t3
),
t5 as (
select
__time__,
http_call(
'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation',
'POST',
'{ "Content-Type":"application/json"}',
'',
body,
60 * 1000
) as response,
body,
oldeval,
trans
FROM t4
),
t6 as (
select
__time__,
oldeval,
body,
response.code,
response.header,
response.body body_res,
response.error_msg as error,
trans
FROM t5
)
select
__time__,
trans as "原文",
replace(
replace(
json_extract_scalar(body_res, '$.output.text'),
'```json',
' '
),
'```',
' '
) as "总结信息",
error,
json_extract_scalar(body_res, '$.usage.total_tokens') as "消耗的token"
FROM t6
基于SPL的方式进行评估:
.let t3=.logstore with(query='attributes.gen_ai.span.kind: LLM and attributes.input.value: * and *')
|extend trans = "attributes.input.value",targets = '{{query}}',body_template='{"model":"<QWEN_MODEL>","input":{"messages":[{"role":"system","content":"<SYSTEM_PROMPT>"},{"role":"user","content":"<USER_PROMPT>"}]}}' ,eval_prompt= cast('请用一句话对以下内容进行总结:{{query}}' as varchar)
|project __time__ ,trans,targets,body_template,eval_prompt;
$t3 |extend eval_content=replace(eval_prompt, targets, trans)
| extend eval = replace(replace(replace(replace(replace(replace(replace(eval_content, chr(92), '\\'), '"', '\"'), chr(8), '\b'), chr(12), '\f'), chr(10), '\n'), chr(13), '\r'), chr(9), '\t')
| extend body = replace(
replace(
replace(body_template, '<QWEN_MODEL>', 'qwen-max'),
'<SYSTEM_PROMPT>',
'You are a helpful assistant.'
),
'<USER_PROMPT>',
eval
)
| extend response = http_call(
'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation',
'POST',
'{ "Content-Type":"application/json"}',
'',
body,
60 * 1000
)
| extend code=response.code, header=response.header,body_res = response.body,error = response.error_msg
| extend evaluationResult = replace(replace(json_extract_scalar(body_res ,'$.output.text'),'```json',' '),'```',' ') ,evaluationTemplate='mcp_tool_poisoning_attack_cn'
| limit 10000
| where evaluationResult <> 'null'
| project __time__,evaluationResult, evaluationTemplate, error ,body_res
该文章对您有帮助吗?