Push 推送数据方式,主要是预先生成符合我们规定格式的待推送数据集合,最后在调用Push方法时,将这些数据集合一次性批量推送到应用中。
相关依赖
使用SDK上传文件所需填下如下的依赖:
BaseRequest参考:Python client 示例
<dependency>
<groupId>com.aliyun.opensearch</groupId>
<artifactId>aliyun-sdk-opensearch</artifactId>
<version>4.0.0</version>
</dependency>
pip install alibabacloud_tea_util
pip install alibabacloud_opensearch_util
pip install alibabacloud_credentials
配置环境变量
配置环境变量ALIBABA_CLOUD_ACCESS_KEY_ID和ALIBABA_CLOUD_ACCESS_KEY_SECRET。
阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维,具体操作,请参见创建RAM用户。
创建AccessKey ID和AccessKey Secret,请参考创建AccessKey。
如果您使用的是RAM用户的AccessKey,请确保主账号已授权AliyunServiceRoleForOpenSearch服务关联角色,请参考OpenSearch-行业算法版服务关联角色,相关文档参考访问鉴权规则。
请不要将AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
Linux和macOS系统配置方法:
执行以下命令,其中,
<access_key_id>
需替换为您RAM用户的AccessKey ID,<access_key_secret>
替换为您RAM用户的AccessKey Secret。export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id> export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>
Windows系统配置方法
新建环境变量文件,添加环境变量ALIBABA_CLOUD_ACCESS_KEY_ID和ALIBABA_CLOUD_ACCESS_KEY_SECRET,并写入已准备好的AccessKey ID和AccessKey Secret。
重启Windows系统生效。
Push Demo 样例代码
package com.leiyu.push;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
public class PushNonStructuralLLM {
private static String appName = "替换为应用名称";
private static String host = "替换应用的API访问地址";
private static String path = "/apps/%s/actions/knowledge-bulk";
public static void main(String[] args) throws IOException {
//用户识别信息
//从环境变量读取配置的AccessKey ID和AccessKey Secret,运行代码示例前必须先配置环境变量
String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
String appPath = String.format(path, appName);
//创建并构造OpenSearch对象
OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
//创建OpenSearchClient对象,并以OpenSearch对象作为构造参数
OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);
//单个doc构建
Path path = Paths.get("C:/Users/LEIYU/Desktop/Word/test.docx");
JSONObject oneRequest = new JSONObject();
oneRequest.put("cmd", "BASE64");
//上传非机构化文档(pdf,word,html)cmd为BASE64
JSONObject fields = new JSONObject();
fields.put("id", "50");
//主键ID,唯一不重复。
fields.put("title", "test.docx");
//带后缀的文件名
fields.put("url", "www.baidu.com");
//文档链接
fields.put("content", Base64.getEncoder().encodeToString(Files.readAllBytes(path)));
fields.put("category", "docs");
oneRequest.put("fields",fields);
//可以同时添加多条数据
final JSONArray request = new JSONArray();
request.add(oneRequest);
//request.add(twoRequest);
Map<String, String> params = new HashMap<String, String>() {{
put("format", "full_json");
put("_POST_BODY", request.toString());
}};
try {
OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
//打印返回结果
System.out.println(openSearchResult.getResult());
} catch (OpenSearchException e) {
e.printStackTrace();
} catch (OpenSearchClientException e) {
e.printStackTrace();
}
}
}
# -*- coding: utf-8 -*-
import time, os
import base64
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from BaseRequest import Config, Client
class knowledge:
def __init__(self, config: Config):
self.Clients = Client(config=config)
self.runtime = util_models.RuntimeOptions(
connect_timeout=10000,
read_timeout=10000,
autoretry=False,
ignore_ssl=False,
max_idle_conns=50,
max_attempts=3
)
self.header = {}
def docBulk(self, app_name: str,doc_content: list):
try:
response = self.Clients._request(method="POST",
pathname=f'/v3/openapi/apps/{app_name}/actions/knowledge-bulk',
query={}, headers=self.header,
body=doc_content, runtime=self.runtime)
return response
except Exception as e:
print(e)
if __name__ == "__main__":
# 配置统一的请求入口 注意:host需要去掉http://
endpoint = "<endpoint>"
# 支持 protocol 配置 HTTPS/HTTP
endpoint_protocol = "HTTP"
# 用户识别信息
# 从环境变量读取配置的AccessKey ID和AccessKey Secret,
# 运行代码示例前必须先配置环境变量,参考文档上面“配置环境变量”步骤
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
# 支持 type 配置 sts/access_key 鉴权. 其中 type 默认为 access_key 鉴权. 使用 sts 可配置 RAM-STS 鉴权.
# 备选参数为: sts 或者 access_key
auth_type = "access_key"
# 如果使用 RAM-STS 鉴权, 请配置 security_token, 可使用 阿里云 AssumeRole 获取 相关 STS 鉴权结构.
security_token = "<security_token>"
# 配置请求使用的通用信息.
# 注意:security_token和type参数,如果不是子账号需要省略
Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
security_token=security_token, type=auth_type, protocol=endpoint_protocol)
# 创建 opensearch 智能问答版实例
# 请将<应用名称>替换为您创建的智能问答版实例名称
ops = knowledge(Configs)
app_name = "<应用名称>"
# --------------- 智能问答版文档非结构化文档推送 ---------------
# 只需修改本地的文件路径即可
with open('/Users/liu/Downloads/test.docx', 'rb') as file:
data = file.read()
data_b64 = base64.b64encode(data)
document = [
{
"fields": {
"id": "1",
"title": "test.docx",
"url": "www.baidu.com",
"content": data_b64,
"category": "opensearch",
"timestamp": 1691722088645,
"score": 0.8821945219723084
},
"cmd": "BASE64"
}
]
# 删除记录
deletedocument = {"cmd": "DELETE", "fields": {"id": 2}}
documents = document
res5 = ops.docBulk(app_name=app_name, doc_content=documents)
print(res5)
cmd 需要使用 "BASE64"。
需要推送的非结构化内容放到content字段中,详情可参考上述样例代码。
需要推送的文件名放title字段中 。