AI函数

更新时间:
复制为 MD 格式

AI函数支持在数据库内直接调用AI服务,例如基于AI的文本生成、条件判断、情感分析和分类等。本文介绍云原生数据仓库 AnalyticDB MySQL 版AI函数的用法与示例。

重要

由于AI模型本身具有随机性,即使输入相同,通过 AI 函数计算后也可能产生不同的输出结果。

  • 文本函数

    • ai_filter:对输入文本进行事实性或合规性判断。

    • ai_translate:把输入文本翻译成对应的语言。

    • ai_classify:对输入文本进行语义分析,并将其归类到预定义的类别或标签中。

    • ai_extract:从非结构化文本中抽取指定类型的结构化信息(如人名、日期、地址、关键词等)。

    • ai_sentiment:分析输入文本的情感倾向,返回正面、负面或中性等情感判断。

    • ai_similarity:计算两段文本在语义上的相似度。

    • ai_mask:对输入文本中的敏感信息(如身份证号、手机号、邮箱等)进行掩码处理。

    • ai_summarize:对长文本进行语义压缩,生成简洁、准确的摘要内容。

  • 音频函数

    • ai_audio_transcribe:将一段音频转录成对应的文本,可以按需求指定切分模式、目标语言等。

    • ai_audio_embed:将一段音频转换成其对应的嵌入向量,返回对应的嵌入向量数组。

前提条件

  • AnalyticDB for MySQL企业版、基础版及湖仓版,或数仓版弹性模式,已开启ENI访问。

    重要
    • 登录云原生数据仓库AnalyticDB MySQL控制台,在集群管理 > 集群信息网络信息区域,打开ENI网络开关。

    • 开启和关闭ENI网络会导致数据库连接中断大约2分钟,无法读写。请谨慎评估影响后再开启或关闭ENI网络。

  • 目前功能处于公测阶段,如需使用请提交工单联系技术支持开通。

准备工作

如需使用AI函数能力,需配置AnalyticDB for MySQL到阿里云百炼的网络链路,并完成模型创建。

通过公网连接

  1. 配置公网NAT网关。

    创建与AnalyticDB for MySQL集群同地域的公网NAT网关,并为其绑定弹性公网IP(EIP),然后创建SNAT条目(推荐按交换机粒度创建SNAT条目,指定任意交换机即可)。具体操作,请参见公网 NAT 网关

  2. 创建模型。模型创建的语法说明,请参见CREATE MODEL

    示例:

    CREATE MODEL qwen_plus_external
    OPTIONS (
        type='external',
        provider='bailian',
        name='qwen-plus',
        interface='TEXT_TO_TEXT',
        api_key='sk-xxx' --需替换成阿里云百炼侧开通的模型服务的api_key。
    )

通过PrivateLink私网连接

  1. 配置终端节点并获取私网域名。具体操作,请参见私网访问阿里云百炼模型或应用 API

    重要

    阿里云百炼服务所在地域:华北2(北京)、新加坡

    终端节点须与阿里云百炼服务位于同一地域,如需从其他地域的VPC内进行私网访问,请参考跨地域私网访问阿里云百炼 API

  2. 创建模型。模型创建的语法说明,请参见CREATE MODEL

    示例:

    CREATE MODEL qwen_plus_external
    OPTIONS (
        type='external',
        provider='bailian',
        name='qwen-plus',
        interface='TEXT_TO_TEXT',
        api_key='sk-xxx', --需替换成阿里云百炼侧开通的模型服务的api_key
        endpoint='ep-xxx.com' --需替换成私网连接终端节点的域名
    )

文本函数

ai_filter

ai_filter(text)
ai_filter(model_name, text)
  • 命令说明:对输入文本进行事实性或合规性判断,若内容可信、合理或符合预设规则,则返回真;否则返回假。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

  • 返回值类型:BOOLEAN。

示例1

SELECT ai_filter("浙江在中国么?")

返回结果

+---------------------------------+
| ai_filter("浙江在中国么?")       |
+---------------------------------+
| 1                               | 

示例2

SELECT ai_filter("qwen_plus_external", "浙江在中国么?")

返回结果

+----------------------------------------------+
| ai_filter("qwen_plus_external", "浙江在中国么?")       |
+----------------------------------------------+
| 1                                            | 

ai_translate

ai_translate(text, targetLang)
ai_translate(model_name, text, targetLang)
  • 命令说明:把输入文本翻译成对应的语言。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

    • targetLang:需要翻译的目标语言,VARCHAR类型。

      重要

      目标语言需要模型支持该语言。

  • 返回值类型:VARCHAR。

示例

SELECT ai_translate("AnalyticDB for MySQL is a data analytics platform based on a lakehouse architecture.", "cn") as translate_text

返回结果

+-------------------------------------------------------+
| translate_text                                        |
+-------------------------------------------------------+
| AnalyticDB for MySQL 是一个基于湖仓一体架构的数据分析平台。 | 

ai_classify

ai_classify(text, labels)
ai_classify(model_name, text, labels)
  • 命令说明:对输入文本进行语义分析,并将其归类到预定义的类别或标签中。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

    • labels:预先定义的类别标签列表,支持VARCHAR类型,或Array<VARCHAR>类型。

  • 返回值类型:VARCHAR。

示例1

SELECT ai_classify('昨晚在厨房尝试了新买的意大利面食谱,酱汁浓郁,家人赞不绝口!', "['travel', 'cooking', 'reading', 'driving']")
AS predicted_label;

返回结果

+----------------------------------------------+
| predicted_label                              |
+----------------------------------------------+
| cooking                                      | 

示例2

SELECT ai_classify('昨晚在厨房尝试了新买的意大利面食谱,酱汁浓郁,家人赞不绝口!', ARRAY['travel', 'cooking', 'reading', 'driving'])
AS predicted_label;

返回结果

+----------------------------------------------+
| predicted_label                              |
+----------------------------------------------+
| cooking                                      | 

ai_extract

ai_extract(text, labels)
ai_extract(model_name, text, labels)
  • 命令说明:从非结构化文本中抽取指定类型的结构化信息(如人名、日期、地址、关键词等)。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

    • labels:预先定义的类别标签列表,支持VARCHAR类型,或Array<VARCHAR>类型。

  • 返回值类型:VARCHAR。

示例

select ai_extract('我昨天(2024年6月15日)在淘宝买了iPhone 15 Pro,它的钛金属机身和A17芯片真的很惊艳!', "['product_name', 'date', 'key_feature']") as result

返回结果

+----------------------------------------------+
| result                                       |
+----------------------------------------------+
| product_name=iPhone 15 Pro, date=2024年6月15日, key_feature=钛金属机身和A17芯片                                         | 

ai_generate

ai_generate(text)
ai_generate(model_name, text)
  • 命令说明:从非结构化文本中抽取指定类型的结构化信息(如人名、日期、地址、关键词等)。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

  • 返回值类型:VARCHAR。

示例

select ai_generate('一句话介绍一下TPC-H测试集') as result

返回结果

+----------------------------------------------+
| result                                       |
+----------------------------------------------+
| TPC-H 是用于评估数据库决策支持能力的标准基准测试集。 | 

ai_sentiment

ai_sentiment(text)
ai_sentiment(model_name, text)
  • 命令说明:分析输入文本的情感倾向,返回正面、负面或中性等情感判断。返回值固定为positivenegativeneutralmixed四个标签其中一个。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

  • 返回值类型:VARCHAR。

示例

SELECT ai_sentiment('这款耳机音质出色,佩戴舒适,强烈推荐!') AS sentiment;

返回结果

+----------------------------------------------+
| sentiment                                    |
+----------------------------------------------+
| positive                                     | 

ai_similarity

ai_similarity(text1, text2)
ai_similarity(model_name, text1, text2)
  • 命令说明:计算两段文本在语义上的相似度。返回010之间的相似度分数,分数越大表示相似程度越高。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text1text2:输入文本,VARCHAR类型。

  • 返回值类型:VARCHAR。

示例

SELECT ai_similarity(
  '如何重置我的账户密码?',
  '我忘记了登录密码,该怎么找回?'
) AS result;

返回结果

+----------------------------------------------+
| result                                       |
+----------------------------------------------+
| 9.3                                          | 

ai_mask

ai_mask(text, labels)
ai_mask(model_name, text, labels)
  • 命令说明:对输入文本中的敏感信息(如身份证号、手机号、邮箱等)进行掩码处理。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

    • labels:预先定义的类别标签列表,支持VARCHAR类型,或Array<VARCHAR>类型。

  • 返回值类型:VARCHAR。

示例

SELECT ai_mask(
  '联系我:13812345678,邮箱是user@example.com,身份证号110101199003072316',
  "['phone', 'email', 'id_card']"
) AS result;

返回结果

+----------------------------------------------+
| result                                       |
+----------------------------------------------+
| 联系我:[MSKED],邮箱是[MSKED],身份证号[MSKED]   | 

ai_summarize

ai_summarize(text)
ai_summarize(model_name, text)
  • 命令说明:对长文本进行语义压缩,生成简洁、准确的摘要内容。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

  • 返回值类型:VARCHAR。

示例

SELECT ai_summarize('曲曲折折的荷塘上面,弥望的是田田的叶子。叶子出水很高,像亭亭的舞女的裙。层层的叶子中间,零星地点缀着些白花,有袅娜地开着的,有羞涩地打着朵儿的;正如一粒粒的明珠,又如碧天里的星星,又如刚出浴的美人。微风过处,送来缕缕清香,仿佛远处高楼上渺茫的歌声似的。这时候叶子与花也有一丝的颤动,像闪电般,霎时传过荷塘的那边去了。叶子本是肩并肩密密地挨着,这便宛然有了一道凝碧的波痕。叶子底下是脉脉的流水,遮住了,不能见一些颜色;而叶子却更见风致了。月光如流水一般,静静地泻在这一片叶子和花上。薄薄的青雾浮起在荷塘里。叶子和花仿佛在牛乳中洗过一样;又像笼着轻纱的梦。虽然是满月,天上却有一层淡淡的云,所以不能朗照;但我以为这恰是到了好处——酣眠固不可少,小睡也别有风味的。月光是隔了树照过来的,高处丛生的灌木,落下参差的斑驳的黑影,峭楞楞如鬼一般;弯弯的杨柳的稀疏的倩影,却又像是画在荷叶上。塘中的月色并不均匀;但光与影有着和谐的旋律,如梵婀玲上奏着的名曲。') AS result;

返回结果

+---------------------------------------------------------+
| result                                                  |
+---------------------------------------------------------+
| 月光下的荷塘静谧优美:田田荷叶如舞女裙,白花点缀其间,似明珠、星星或出浴美人;微风送香,叶花轻颤,形成碧波;流水隐于叶下,更显风致。月光如水,青雾轻笼,叶花如洗,朦胧如梦。云遮满月,光影斑驳,灌木黑影峭楞,杨柳倩影如画,光与影和谐如名曲。| 

音频函数

ai_audio_transcribe

仅支持使用内置模型,模型服务创建请参见基于AnalyticDB Ray创建推理服务

ai_audio_transcribe(url)
ai_audio_transcribe(model_name, url)
ai_audio_transcribe(model_name, url, options)
  • 命令说明:将一段音频转录成对应的文本,可以按需求指定切分模式、目标语言等。返回转录后的文本或对应文件url。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • url:输入的音频文件地址,VARCHAR类型。

    • options:可选参数,是JSON格式的文本。支持的参数如下。

      参数名

      含义

      可选配置项

      默认值

      language

      转录文本的目标语言

      • cn(中文)

      • en(英语)

      • ja(日语)

      • yue(粤语)

      • fspk(中英文自由说)

      cn

      diarization_mode

      文本划分段落的模式

      • word(按单词维度)

      • sentence(按句子维度)

      • speaker(按说话者维度)

      word(按单词维度切分)

      output_type

      输出格式

      • json(直接返回转录后的json文本)

      • url(保存到oss上并返回文本文件的url地址)

      json

  • 返回值类型:VARCHAR。

示例1

SELECT ai_audio_transcribe("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav") as result

返回结果

+----------------------------------------------+
| result                                       |
+----------------------------------------------+
| {"TaskId": "xxxx", "Transcription": {"AudioInfo": {"Size": 128480, "Duration": 3834, "SampleRate": 16000, "Language": "cn"}, "Paragraphs": [{"ParagraphId": "1768447804076500000", "SpeakerId": "1", "Words": [{"Id": 10, "SentenceId": 1, "Start": 100, "End": 595, "Text": "Hello, "}, {"Id": 20, "SentenceId": 1, "Start": 596, "End": 841, "Text": "world, "}, {"Id": 30, "SentenceId": 1, "Start": 844, "End": 1588, "Text": "这里是"}, {"Id": 40, "SentenceId": 1, "Start": 1588, "End": 2580, "Text": "阿里巴巴"}, {"Id": 50, "SentenceId": 1, "Start": 2580, "End": 3076, "Text": "语音"}, {"Id": 60, "SentenceId": 1, "Start": 3076, "End": 3820, "Text": "实验室。"}]}], "AudioSegments": [[100, 3820]]}}                                | 

示例2

SELECT ai_audio_transcribe("tingwu", "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav", "{'diarization_mode':'sentence'}") as result

返回结果

+----------------------------------------------+
| result                                       |
+----------------------------------------------+
| {"TaskId": "xxxx", "Transcription": {"AudioInfo": {"Size": 128480, "Duration": 3834, "SampleRate": 16000, "Language": "cn"}, "Speakers": [{"SpeakerId": "1", "Sentences": [{"SentenceId": 1, "Start": 100, "End": 3820, "Text": "Hello, world, 这里是阿里巴巴语音实验室。"}]}]}}| 

ai_audio_embed

音频向量嵌入,仅支持使用内置模型,模型服务创建请参见基于AnalyticDB Ray创建推理服务

ai_audio_embed(text)
ai_audio_embed(model_name, text)
ai_audio_embed(model_name, text, options)
  • 命令说明:将一段音频转换成其对应的嵌入向量,返回对应的嵌入向量数组。可以通过参数控制是否需要对音频按起止时间切分后再生成嵌入向量。

  • 输入值类型:

    • model_name:模型名,VARCHAR类型。

    • text:输入文本,VARCHAR类型。

    • options:可选参数,是JSON格式的文本。支持的参数如下。

      参数名

      含义

      说明

      access_id

      ossAK

      在使用role_arn的情况下,不再需要指定access_id、access_secrettoken信息。

      access_secret

      ossSK

      token

      osstoken凭证

      role_arn

      角色扮演凭证

      start_time

      音频的开始时间

      不填默认为从头开始。

      end_time

      音频的结束时间

      不填默认为到尾结束。

  • 返回值类型:Array<FLOAT>。

示例

SELECT ai_audio_embed("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav")

返回结果

[0.4234241,0.3575562,0.022631245,-0.09829298,-0.40605977,-0.72465694,-1.8039973,2.4773192,0.91702485,0.15146817,0.7209677,0.78260005,0.8019302,2.7577147,-0.86063147,-0.8249354,0.28732437,0.9009849,0.8899265,-1.5320694,-2.0866642,0.5051579,1.3553369,2.0348034,-0.74032015,-0.66065377,0.2756823,0.15714298,-2.8250875,0.8162984,0.20878711,0.8285897,1.4863579,1.0645012,-1.2933203,0.6919463,0.87850314,1.2212433,-0.44420764,-1.0146322,-1.8209833,0.16996253,-1.534067,-0.10253409,0.07611574,2.4005992,-1.2903963,0.6613166,-0.2554945,0.97570425,-0.108424075,0.82061934,-0.18102498,-0.27137256,-1.2723781,-0.9188018,-0.027503261,-0.15346082,-0.9564006,-0.4618776,-1.077228,-0.27756396,0.28749034,0.2570488,-0.39753774,0.5270934,0.6970718,-1.2984067,2.4312186,1.485573,1.6031872,-0.6158852,-1.1512791,0.49305362,-1.1939837,0.72933173,0.2659372,2.5559616,-0.41886744,0.6304756,-0.6171451,-0.5282561,0.78426796,-0.57570314,-0.91166306,-0.531768,0.96598166,0.17423655,-0.9133532,-0.09258405,-0.26951456,-0.34667063,-0.21612991,-0.054430377,0.052798696,0.5198285,1.9431844,-0.5638291,-2.1043544,-0.47198293,0.21959576,0.5975169,0.3951007,0.28112686,1.792782,-0.15896009,-0.71152246,1.1540072,0.37563428,0.0016163457,0.6544563,0.043974634,-1.6375796,-0.3698572,1.5310435,-0.20282944,0.73641545,2.1867669,1.4115202,-0.8114418,-0.36265984,0.2191038,-0.8933491,0.6616433,1.545423,0.10970531,-0.109743424,0.44605017,-0.087265715,0.29077715,-2.4435875,-1.1484046,0.078253455,0.02283861,-0.29482847,2.0237377,-0.6009212,-1.2377249,0.05194488,-0.16573223,0.8370868,-0.43602425,-0.4706988,0.17996503,0.88799554,0.6562157,-0.7520023,0.64789253,0.36865348,-1.4820247,0.5299587,1.1884397,1.4351603,0.21073115,-0.1712549,2.9410155,1.7485802,-0.6828801,-0.4833641,-0.4477328,2.3307724,-0.35595444,-0.61682695,-0.5370858,-0.8068234,1.2195143,-1.0834758,0.45275012,1.6243625,-1.3629726,-0.2959109,0.05071621,-0.7280639,-0.1713935,-0.43650395,-0.2131698,0.25380868,-0.16288652,2.0921175,0.3555297,-0.10752705,0.5716714,-0.09943808,-1.9066161,0.12683412,0.8566219,-0.20323975,1.4260024,-0.28789783,0.32412016,0.88897985,0.86555415,0.5940666,0.6645551,0.046334613,-1.5640875,-0.3510718,0.55363727,-0.13503519,0.67008746,-0.9184686,0.025278697,1.0128921,-0.61241907,-0.01107134,-0.8309523,-0.51025873,1.1141272,0.28813183,-0.10634196,-0.9954202,-0.07171043,-0.5856978,-1.2660285,2.1327019,-0.60268104,-0.6227884,-0.42101067,0.059599742,-0.09640202,-1.258711,0.5054924,-1.1847625,-0.044398762,-0.98595804,0.8883682,0.6221085,-2.5484293,0.7249505,0.69930685,0.7739025,-0.8139478,-1.1988907,-1.0416493,2.0153732,-1.7091763,-0.5611238,-1.2147603,0.9113469,1.5113174,0.23810485,-1.702736,-0.3295935,-0.41867778,-1.0378691,-0.45600057,-0.43525052,-0.1078409,-1.2993969,0.12842774,0.026097976,-0.7705405,-1.0907317,0.28274077,1.236289,1.6190177,2.1874366,0.16072829,-0.33150536,2.218483,-1.1703843,0.10300327,1.1994884,0.48275462,-0.40795812,0.5020531,1.1787555,-0.082187966,0.6315653,0.36654752,-0.24940589,-0.8652801,1.6283739,0.41405886,0.6377814,0.08396838,-1.0169003,1.2100558,1.4457762,-0.07999261,-0.012102162,0.85055244,-0.09711141,-1.0452846,0.13768612,-2.0506873,-1.6474499,0.043265514,-1.0009454,-0.0111249415,-1.2523409,-0.080719866,-0.6187693,-1.398226,0.6425289,-0.4808641,-0.06030046,-0.10275636,-0.31625932,-1.5993032,-0.20966552,-1.4618409,0.34925935,-0.5034448,0.100028045,0.25327235,-1.078896,-0.23394233,1.2247928,-2.6050038,-0.71609926,-0.77765155,-1.2089496,0.8526703,-0.1358416,1.1074059,1.1545771,-0.94525933,0.41012967,0.9361201,-0.14788401,-0.29333082,1.5782444,1.1100405,-0.4074414,-0.3862537,-0.5779069,-0.88644946,0.2233385,1.3612705,1.2413827,-1.3625424,-1.3623037,0.3056319,-1.4446377,0.64613384,0.15064861,-0.61473364,-1.3611295,-0.1975697,-1.0701923,0.7591377,-1.2106745,-2.067824,0.45041704,-0.71582735,-1.743847,1.169414,2.0158787,0.4734838,-0.3133036,-1.9916989,1.1441987,0.9155275,-1.3003027,0.82898057,-0.7439868,1.2072865,0.46877453,0.6648313,0.80477613,1.6927507,0.5842916,0.36608973,2.259473,-1.1628797,-0.3311869,0.36989415,-0.25035658,-0.28012496,1.092324,-0.40238732,0.0046352614,-2.0625768,1.161326,0.92277956,-0.20316431,-0.15164377,1.1715052,-0.7067665,-0.8608931,0.079684004,-0.89916384,0.02488108,-0.57668805,-0.879138,-1.7274998,0.824049,1.5638052,0.28415555,-0.17347385,-2.085917,0.4987632,-0.031601395,-1.0190377,-0.7815816,0.69643223,-0.33574122,2.1242745,0.19785832,0.55690974,-1.3932099,-0.44952002,-0.7719798,1.1326215,0.43839702,0.996999,-0.55692834,1.4014084,-2.3939395,-1.0112343,-1.4143248,1.1514114,1.1233637,-0.3385779,-0.23665123,-0.100857966,0.4633971,1.6215613,-0.0692888,-0.031505972,1.0472811,0.57112235,0.6015763,0.07582237,0.52702487,-1.3809607,-1.7482765,0.38386008,0.99316126,0.95603,0.40644804,-1.5072054,-0.34419048,0.63205683,-1.0854999,-0.92245156,-0.2712947,-0.75696105,0.996232,-0.10738732,-0.4674776,-1.2149413,-1.5094053,1.6796608,0.21961057,-0.35295358,-1.2609407,-0.040009048,-0.38785484,0.7788784,-0.65823495,2.0559616,-1.0074826,1.2282485,1.2540467,0.4914942,-0.47057188,-0.47061247,-0.16255763,-0.6718562,-0.53630847,0.4804698,-0.3134068,0.6407026,-0.727981,0.0481851,0.06927338,0.8321921,-0.6639807,0.74932885,0.23291564,0.76362675,-1.2966217,0.8806557,-1.2141875,0.6996881,-0.8293652,0.9085288,1.8878758,0.11363957,0.148718,0.5030497,-1.0422761,0.08673843,-0.80342984,-0.8046266,-0.18026677,0.28900644,0.76534355,-1.3163859,-0.72775376,0.36529016,-0.3660175,0.31056792,0.052575216,-1.11831,0.7895246,1.1172394,-0.31374845,0.17143561,-0.42633826,0.16579832,-0.012790448,-1.4290546,-0.47322562,0.7557427,1.0487514,-0.14331971,0.30455515,0.6938429,0.5565799]

配置AI函数默认模型

如果您在调用AI函数时未指定模型名,系统将使用默认模型,您可以通过以下配置项修改默认模型。配置修改操作,请参见ConfigHint配置参数

模型类型

默认模型

配置项

TEXT_TO_TEXT(文本模型)

qwen-plus

AI_FUNCTION_TEXT_TO_TEXT_MODEL

TEXT_TO_EMBEDDING(音频嵌入向量模型)

text-embedding-v2

AI_FUNCTION_TEXT_TO_EMBEDDING_MODEL

AUDIO_TO_TEXT(音频转文本模型)

tingwu

AI_FUNCTION_AUDIO_TO_TEXT_MODEL

AUDIO_TO_EMBEDDING(音频嵌入向量模型)

qwen2.5-vl-embedding

AI_FUNCTION_AUDIO_TO_EMBEDDING_MODEL

典型场景示例

场景一:文本召回

适用场景:在诸如RAG、客服系统等场景下,根据用户提问的语义,从文本库中搜索某段文本,选取最匹配的若干条结果。

通过ai_embedd编码将待搜索文本和文本库的文本转换为向量,然后排序计算出余弦相似度最高的十条文本。

SELECT cosine_similarity(ai_embedd("待搜索的语义文本"), ai_embedd(text)) AS cos
FROM text_table
ORDER BY cos
LIMIT 10

进一步优化向量搜索性能,请参见向量检索

场景二:声纹检索

image.png

适用场景:在会议、日常对话等场景下,客户希望能把录音转录成文本,并按照说话人区分对话信息。

若需进一步分析说话人信息,可以按说话人切分音频后,和语音数据集做embedding比对,整体方案请参见声纹检索

  • 转录与切分:使用AI_AUDIO_TRANSCRIBE函数将音频转化成文本,并按照说话人维度切分。

    SELECT AI_AUDIO_TRANSCRIBE("oss://xxx", "{'diarization_mode':'speaker'}")
  • 声纹检索:使用AI_AUDIO_EMBED函数对输入音频和原始的音频库分别求出嵌入向量,计算l2_distance后,选取最接近的10个。

    SELECT l2_distance(input_embedding, origin_embedding) AS ld
    FROM(
      SELECT ai_audio_embed("oss://xxx") AS input_embedding
      JOIN
      SELECT ai_audio_embed(url) AS origin_embedding
      FROM audio_table)
    ORDER BY ld
    LIMIT 10

场景三:提取顾客的不满意评论

适用场景:从用户评论中,筛选出不满意的反馈。

customer_comment表中已经存储了每天顾客的评论,通过情感分析函数ai_sentiment来判断顾客的情感倾向,筛选出202611日所有不满意的顾客评论。

-- 已提前创建好qwen_plus_external模型,对应百炼提供的qwen-plus模型。
SELECT name, comment
FROM customer_comment
WHERE ai_sentiment("qwen_plus_external", comment)='negative'
AND dt='20260101'