本文将以Java SDK为例,介绍如何通过SDK上传数据,快速实现企业知识库问答。对于ADD(增加数据)操作,会直接覆盖相同id文档的数据,实现数据UPDATE。因此,对于大批量数据的增、删、改,以及数据的定时同步与更新,建议通过API/SDK操作。
前提条件
请确保您已经获取RAM用户的AccessKey ID以及AccessKey Secret,用于作为调用SDK的凭证。
说明AccessKey Secret只在创建时显示,不支持查看。
请确保代码运行环境设置了环境变量ALIBABA_CLOUD_ACCESS_KEY_ID和ALIBABA_CLOUD_ACCESS_KEY_SECRET。具体配置方法,请参考:在Linux、macOS和Windows系统配置环境变量。
安装所需依赖
本文以Maven工程为例,要在Maven工程中使用OpenSearch Java SDK,只需在pom.xml中加入相应依赖即可。
<dependency>
<groupId>com.aliyun.opensearch</groupId>
<artifactId>aliyun-sdk-opensearch</artifactId>
<version>6.0.0</version>
</dependency>
<dependency>
<groupId>com.aliyun</groupId>
<artifactId>aliyun-java-sdk-core</artifactId>
<version>4.6.0</version>
</dependency
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.76</version>
</dependency
创建实例
您首先需要创建一个OpenSearch-LLM智能问答版实例,请参考:创建LLM智能问答版实例。
企业知识库配置
现在您已经创建好一个智能问答版实例,接下来需要上传企业相关知识。您可以根据数据类型进行结构化数据、非结构化数据以及网站推送。
文档导入
通过以下示例代码可以完成单条或多条结构化数据的导入:
import java.util.HashMap; import java.util.Map; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import com.aliyun.opensearch.OpenSearchClient; import com.aliyun.opensearch.sdk.generated.OpenSearch; import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException; import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException; import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult; /** * 结构化数据添加 */ public class testPushDemo { private static String appName = "test"; //填入您的实例名称 private static String host = "http://opensearch-cn-shanghai.aliyuncs.com"; //流量服务接入地址 private static String path = "/apps/%s/actions/knowledge-bulk"; //API接口 public static void main(String[] args) { String appPath = String.format(path, appName); //用户识别信息 //从环境变量读取配置的AccessKey ID和AccessKey Secret,运行代码示例前必须先配置环境变量 String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"); String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"); //创建并构造OpenSearch对象 OpenSearch openSearch = new OpenSearch(accesskey, secret, host); //创建OpenSearchClient对象,并以OpenSearch对象作为构造参数 OpenSearchClient openSearchClient = new OpenSearchClient(openSearch); //单个结构化文档构建 JSONObject oneRequest = new JSONObject(); oneRequest.put("cmd", "ADD"); JSONObject fields = new JSONObject(); fields.put("id", "1"); //(必填)文档ID,唯一不重复 fields.put("title", "产品优势"); //(选填)文档标题 fields.put("url", "https://help.aliyun.com/document_detail/464900.html"); //(选填)文档url链接 fields.put("content", "行业算法版智能内置丰富的定制化算法模型,并结合不同行业搜索特点,推出行业召回、" + "排序算法,保障更优搜索效果。灵活、可定制开发者可基于自身业务特性与数据,定制相应的算法模型、应用结构、" + "数据处理、查询分析、排序等配置,满足个性化搜索需求,提升搜索结果点击率,实现业务快速迭代,极大缩短需求上线的周期。" + "安全、稳定提供7×24小时的运行维护,并以在线工单和电话报障等方式提供技术支持,具备完善的故障监控、自动告警、" + "快速定位等一系列故障应急响应机制。"); //(必填)文档内容 fields.put("category", "OpenSearch,行业算法版"); //(选填)文档类目 fields.put("timestamp", "1720668785888"); //(选填)时间戳,文档时间新鲜度 oneRequest.put("fields", fields); //可以同时添加多条数据 JSONArray request = new JSONArray(); request.add(oneRequest); //request.add(twoRequest); Map<String, String> params = new HashMap<String, String>() {{ put("format", "full_json"); put("_POST_BODY", request.toJSONString()); }}; try { OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST"); //打印返回结果 System.out.println(openSearchResult.getResult()); } catch (OpenSearchException e) { e.printStackTrace(); } catch (OpenSearchClientException e) { e.printStackTrace(); } } }
通过以下示例代码可以完成单条或多条非结构化数据(支持doc、docx、pdf、html、txt、ppt、pptx格式)的导入:
import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Base64; import java.util.HashMap; import java.util.Map; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import com.aliyun.opensearch.OpenSearchClient; import com.aliyun.opensearch.sdk.generated.OpenSearch; import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException; import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException; import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult; public class PushNonStructuralLLM { private static String appName = "test"; //填入您的实例名称 private static String host = "http://opensearch-cn-shanghai.aliyuncs.com"; //流量服务接入地址 private static String path = "/apps/%s/actions/knowledge-bulk"; //API接口 public static void main(String[] args) throws IOException { //用户识别信息 //从环境变量读取配置的AccessKey ID和AccessKey Secret,运行代码示例前必须先配置环境变量 String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"); String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"); String appPath = String.format(path, appName); //创建并构造OpenSearch对象 OpenSearch openSearch = new OpenSearch(accesskey, secret, host); //创建OpenSearchClient对象,并以OpenSearch对象作为构造参数 OpenSearchClient openSearchClient = new OpenSearchClient(openSearch); //单个doc构建 Path path = Paths.get("/Users/xxx/Documents/示例企业知识库.docx"); JSONObject oneRequest = new JSONObject(); oneRequest.put("cmd", "BASE64"); //上传非结构化文档(doc、docx、pdf、html、txt、ppt、pptx),cmd为BASE64 JSONObject fields = new JSONObject(); fields.put("id", "2"); //文档ID,唯一不重复。 fields.put("title", "示例企业知识库.docx"); //(必填)带后缀的文件名 fields.put("url", "https://help.aliyun.com/document_detail/464900.html"); //(选填)文档链接 fields.put("content", Base64.getEncoder().encodeToString(Files.readAllBytes(path))); //(必填)文档内容 fields.put("category", "OpenSearch,智能问答版"); //(必填)文档类目 fields.put("timestamp", "1720668785888"); //(选填)文档时间新鲜度 oneRequest.put("fields",fields); //可以同时添加多条数据 final JSONArray request = new JSONArray(); request.add(oneRequest); //request.add(twoRequest); Map<String, String> params = new HashMap<String, String>() {{ put("format", "full_json"); put("_POST_BODY", request.toString()); }}; try { OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST"); //打印返回结果 System.out.println(openSearchResult.getResult()); } catch (OpenSearchException e) { e.printStackTrace(); } catch (OpenSearchClientException e) { e.printStackTrace(); } } }
说明批量推送文档个数不能太大,不能超过我们规定限制,否则可能会导致推送报错。
API详情请参考:PushKnowledgeDocuments-文档推送。
网站导入
通过以下示例代码可以完成网站导入任务:
import com.aliyuncs.CommonRequest; import com.aliyuncs.CommonResponse; import com.aliyuncs.DefaultAcsClient; import com.aliyuncs.IAcsClient; import com.aliyuncs.exceptions.ClientException; import com.aliyuncs.exceptions.ServerException; import com.aliyuncs.http.FormatType; import com.aliyuncs.http.MethodType; import com.aliyuncs.http.ProtocolType; import com.aliyuncs.profile.DefaultProfile; /** * 网站导入 */ public class CreateSpider { private static String appName = "test"; //填入您的实例名称 private static String path = "/v4/openapi/app-groups/%s/chatos/spiders"; //API接口 public static void main(String[] args) { String appPath = String.format(path, appName); // Please ensure that the environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET are set. DefaultProfile profile = DefaultProfile.getProfile("cn-shanghai", System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET")); /** use STS Token DefaultProfile profile = DefaultProfile.getProfile( "<your-region-id>", // The region ID System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), // The AccessKey ID of the RAM account System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"), // The AccessKey Secret of the RAM account System.getenv("ALIBABA_CLOUD_SECURITY_TOKEN")); // STS Token **/ IAcsClient client = new DefaultAcsClient(profile); CommonRequest request = new CommonRequest(); //request.setProtocol(ProtocolType.HTTPS); request.setMethod(MethodType.POST); request.setDomain("opensearch.cn-shanghai.aliyuncs.com"); request.setVersion("2017-12-25"); request.setUriPattern(appPath); String requestBody = "" + "{\"url\":\"https://help.aliyun.com/zh/open-search/product-overview\",\"category\":\"opensearch帮助文档\"}"; request.putHeadParameter("Content-Type", "application/json"); request.setHttpContent(requestBody.getBytes(), "utf-8", FormatType.JSON); try { CommonResponse response = client.getCommonResponse(request); System.out.println(response.getData()); } catch (ServerException e) { e.printStackTrace(); } catch (ClientException e) { e.printStackTrace(); } } }
说明API详情请参考:CreateSpider-新增网站导入任务。
效果测试
此时您已经构建了企业专属数据库,可以通过以下示例代码对企业知识库的问答效果进行测试:
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
import java.util.HashMap;
import java.util.Map;
public class LLMsearch {
private static String appName = "proLLM";
//填入您的实例名称
private static String host = "http://opensearch-cn-shanghai.aliyuncs.com";
//流量服务接入地址
private static String path = "/apps/%s/actions/knowledge-search";
//API接口
public static void main(String[] args) {
String appPath = String.format(path, appName);
//用户识别信息
//从环境变量读取配置的AccessKey ID和AccessKey Secret,运行代码示例前必须先配置环境变量
String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
//ApiReadTimeOut
OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
openSearch.setTimeout(62000);
OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);
//单个查询doc构建
JSONObject oneRequest = new JSONObject();
JSONObject question = new JSONObject();
question.put("text", "什么是OpenSearch");
//写入您要提问的问题
//question.put("session", "对话的session,设置了之后,会有多轮对话的功能");
question.put("type", "TEXT");
oneRequest.put("question", question);
Map<String, String> params = new HashMap<String, String>() {{
put("format", "full_json");
put("_POST_BODY", oneRequest.toJSONString());
}};
try {
OpenSearchResult openSearchResult = openSearchClient
.callAndDecodeResult(appPath, params, "POST");
System.out.println("RequestID=" + openSearchResult.getTraceInfo().getRequestId());
System.out.println(openSearchResult.getResult());
} catch (
OpenSearchException e) {
System.out.println("RequestID=" + e.getRequestId());
System.out.println("ErrorCode=" + e.getCode());
System.out.println("ErrorMessage=" + e.getMessage());
} catch (
OpenSearchClientException e) {
System.out.println("ErrorMessage=" + e.getMessage());
}
}
}
搜索查询返回的结果:
{"data":[{"reference":[{"tokenNum":141,"id":"c598ea1cf340fdb5a6bea0eb2c90db2a",
"title":"网站问答_智能开放搜索 OpenSearch(Open Search)-阿里云帮助中心",
"category":"LLM","url":"https://help.aliyun.com/zh/open-search/" +
"llm-intelligent-q-a-version/website-q-a?spm=a2c4g.11186623.0.0.496565707tXzDl"},
{"tokenNum":708,"id":"08d48b6f3fd96b158beca07e9858abc7",
"title":"智能开放搜索有哪些产品优势_智能开放搜索 OpenSearch(Open Search)-阿里云帮助中心",
"category":"opensearch",
"url":"https://help.aliyun.com/zh/open-search/product-overview/benefits"}],
"answer":"OpenSearch,即智能开放搜索,是阿里云提供的一项服务。它具有以下特点和优势:\n\n" +
"1. **行业算法版**:内置丰富的定制化算法模型,结合不同行业搜索特点,提供行业召回、排序算法," +
"以保障更优的搜索效果。\n\n2. **灵活、可定制**:开发者可以根据自身业务特性与数据定制算法模型、" +
"应用结构、数据处理、查询分析、排序等配置,以满足个性化搜索需求。\n\n3. **安全、稳定**:提供7×24" +
"小时的运行维护,具备故障监控、自动告警、快速定位等应急响应机制。通过安全加密对保证用户数据安全," +
"并进行权限控制和隔离。\n\n4. **弹性伸缩**:用户可以根据需要扩展或缩减资源。\n\n5. **丰富的外围" +
"功能**:支持热搜、底纹、下拉提示、统计报表等搜索外围功能。\n\n6. **开箱即用**:无需运维部署集群," +
"可快速接入搜索服务。\n\n7. **高性能检索版**:支持高吞吐,单表支持万级别写入TPS,秒级更新。\n\n8." +
" **向量检索版**:底层稳定,支持海量数据检索和实时更新,提供低成本的索引压缩策略,支持向量算法和SQL查询。" +
"\n\nOpenSearch提供的服务还包括问答测试、数据推送(如网站导入)、数据查询(如搜索Demo)以及其他功能(" +
"如文本向量化及切片向量化)。此外,它还提供了产品概述、快速入门、操作指南、实践教程、开发参考、服务支持" +
"和视频专区等文档资料供用户了解和使用。","type":"TEXT"},
{"reference":[{"id":"08d48b6f3fd96b15" +
"8beca07e9858abc7","title":"智能开放搜索有哪些产品优势_智能开放搜索 OpenSearch(Open Search)-阿里云帮助中心",
"category":"opensearch","url":"https://help.aliyun.com/zh/open-search/product-overview/benefits"}],
"answer":"https://img.alicdn.com/tfs/TB1AOdINW6qK1RjSZFmXXX0PFXa-258-258.jpg",
"type":"IMAGE"}]}
您还可以针对具体场景和期望效果设置相应参数,具体请参考:SearchKnowledge-问答文档查询。
总结
至此,您已经通过Java SDK实现企业知识库问答,后续只要将OpenSearch相应的接口接入到业务中,就可以支持企业知识库问答。通过构建不同类型的知识库、还能够支持智能文档、电商导购、教育问答等多种多样的场景。
您可以使用公开数据集进行测试,具体请参考:测试数据集。
了解更多LLM智能问答版内容,可参考:https://www.aliyun.com/activity/bigdata/opensearch/llmsearch
如有其它关于RAG系统或数据构建的相关问题,欢迎加入OpenSearch-LLM智能问答版钉钉支持群,了解更多技术细节和使用详情。