全部产品

词权重干预词典

更新时间:2020-09-23 11:51:55

概念介绍

词权重又称term weight,通过创建词权重干预词典,并在查询分析中生效该干预词典,就可以起到词权重的干预效果。

使用介绍

目前支持对系统内置的词权重词典进行人工干预。用户实现干预操作的过程有以下四步:

  1. 创建词权重干预词典。目前仅支持api/sdk创建。
  2. 新增干预词典内的干预词条。
  3. 使用干预词典。创建并新增干预数据后,可在任意应用的查询分析内选择使用。
  4. 干预词典效果测试和上线。查询分析规则使用了干预词典后,应用到线上之前建议先进行搜索效果测试,评估效果是否符合干预预期。

生效规则

干预数据的生效规则为:

  1. 全query精确匹配优先;
  2. 位置优先;
  3. 相同位置,长度优先:长度是以分词后的term为粒度,一个term的长度为1。匹配的时候有最大的长度限制,当前的限制为5。例:mysql数据库 -> 分词为:mysql数据库,有两个term,长度为2。
  4. 配置查询分析-词权重干预词典时,可以选择query是否忽略空格。


注意:

  • 在查询改写的时候,可能会改写出来两个查询query,其中第一个query会保留权重7和4的term参与召回,第二个query是用于重查(系统默认是当第一个query查询无结果才会进行重查),为了扩大召回,仅保留权重为7的term去召回。
  • 错误码:6612:term_weight makeup data fail. 错误码含义:干预数据没有生效。

示例

  • 干预数据如下:
    a. 数据库权限管理 -> 数据库:7 | 权限:4 | 管理:1
    b. mysql数据库 -> mysql:7 | 数据库:1
    c. 数据库权限 -> 数据库:4 | 权限:1
    d. linux环境中的mysql数据库权限管理 -> linux:7 | 环境:1 | 中:1 | 的:1 | mysql:7 | 数据库:4 | 权限:1 | 管理:1(检索单元长度超过5)
  • 查询query:
    • mysql数据库权限管理=>会生效b这条干预数据(位置优先)
    • sqlserver数据库权限管理=>会生效a这条干预数据(相同位置,长度优先)
    • 数据库权限如何设置=>会生效c这条干预数据(位置优先)
    • 数据库设置权限=>不会有任何的干预数据生效(无匹配的干预数据)
    • linux环境中的mysql数据库管理权限设置=>会生效d这条规则(全query精确匹配)
    • linux环境中的mysql数据库管理权限设置攻略=> 会生效b这条规则(尽管规则d是本query的部分,但因为包含分词后的term个数超过了5个,因此不会按部分匹配到d这个规则)
    • mysql数据库的数据库权限设置=>会生效b和c两条规则

实战演练

业务场景:某内容行业业务在OpenSearch的应用实例中配置使用了查询分析规则,规则包含词权重功能,但是在线上发现了badcase,于是决定使用干预功能。
badcase:用户搜索Query数据权限管理,改写的query为:default:'权限' RANK default:'数据' RANK default:'管理',但用户实际想命中“数据”而非“权限”。
问题诊断:内置词权重干预badcase,需要进行词权重干预。
解决方案:新建词权重干预词典,再将该干预词典应用在线上使用的查询分析规则中。

操作步骤

第一步:搜索测试后发现词权重功能的Query改写与预期不符,于是需要使用词权重干预,人工添加干预词条,在添加干预词条前需要确认分词term,按分词term进行词权重干预:1

第二步:使用api/sdk创建qp及干预词典:
Java SDK maven依赖:

  1. <dependency>
  2. <groupId>com.aliyun</groupId>
  3. <artifactId>aliyun-java-sdk-opensearch</artifactId>
  4. <version>0.7.0</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>com.aliyun</groupId>
  8. <artifactId>aliyun-java-sdk-core</artifactId>
  9. <version>4.5.0</version>
  10. </dependency>

Java SDK Demo

  1. public class TestTermWeightingInQueryProcessor {
  2. private static DefaultAcsClient client;
  3. public static void main(String[] args) throws Exception {
  4. String regionId = "cn-hangzhou"; // region Id
  5. IClientProfile profile = DefaultProfile.getProfile(regionId, "{ak}", "{secret}");
  6. DefaultProfile.addEndpoint(regionId, regionId, "Opensearch", "opensearch." + regionId + ".aliyuncs.com");
  7. DefaultAcsClient client = new DefaultAcsClient(profile);
  8. String dictionaryName = "词典、qp名称"; // 要创建的词典qp名称
  9. String appName = "应用名称"; // 要使用的应用名称
  10. int versionId = 1234; // 应用版本id
  11. // System.out.println("List intervention dictionaries");
  12. // listInterventionDictionaries();
  13. Thread.sleep(10000);
  14. System.out.println("Create intervention dictionary: " + dictionaryName);
  15. createDictionary(dictionaryName);
  16. //
  17. // Thread.sleep(10000);
  18. // System.out.println("Describe intervention dictionary");
  19. // describeInterventionDictionary(dictionaryName);
  20. //
  21. // Thread.sleep(10000);
  22. // System.out.println("List intervention dictionary entries before");
  23. // listEntries(dictionaryName);
  24. Thread.sleep(10000);
  25. System.out.println("Post dictionary entries added");
  26. postEntries(dictionaryName, "add");
  27. Thread.sleep(10000);
  28. System.out.println("List intervention dictionary entries after add");
  29. listEntries(dictionaryName);
  30. Thread.sleep(10000);
  31. System.out.print("Set intervention dictionary to qp");
  32. setQueryProcessor(appName, versionId, dictionaryName);
  33. // 谨慎操作,确定在搜索测试页中测试当前qp 满足预期了以后才设置为默认qp
  34. Thread.sleep(10000);
  35. System.out.println("Set default query processor");
  36. setDefaultQueryProcessor(appName, versionId, dictionaryName);
  37. // Thread.sleep(10000);
  38. // System.out.println("Delete dictionary");
  39. // deleteDictionary(dictionaryName);
  40. }
  41. public static void listInterventionDictionaries() throws ClientException {
  42. ListInterventionDictionariesRequest listInterventionDictionariesRequest = new ListInterventionDictionariesRequest();
  43. listInterventionDictionariesRequest.setPageSize(50);
  44. HttpResponse response = client.doAction(listInterventionDictionariesRequest);
  45. System.out.println(response.getHttpContentString());
  46. }
  47. public static void createDictionary(String dictionaryName) throws UnsupportedEncodingException, ClientException {
  48. CreateInterventionDictionaryRequest request = new CreateInterventionDictionaryRequest();
  49. String body = "{\"name\": \"" + dictionaryName + "\", \"type\": \"term_weighting\"}";
  50. request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
  51. HttpResponse response = client.doAction(request);
  52. System.out.println(response.getHttpContentString());
  53. }
  54. public static void describeInterventionDictionary(String dictionaryName) throws ClientException {
  55. DescribeInterventionDictionaryRequest request = new DescribeInterventionDictionaryRequest();
  56. request.setName(dictionaryName);
  57. HttpResponse response = client.doAction(request);
  58. System.out.println(response.getHttpContentString());
  59. }
  60. public static void listEntries(String dictionaryName) throws ClientException {
  61. ListInterventionDictionaryEntriesRequest request = new ListInterventionDictionaryEntriesRequest();
  62. request.setName(dictionaryName);
  63. HttpResponse response = client.doAction(request);
  64. System.out.println(response.getHttpContentString());
  65. }
  66. public static void postEntries(String dictionaryName, String cmd) throws UnsupportedEncodingException, ClientException {
  67. PushInterventionDictionaryEntriesRequest request = new PushInterventionDictionaryEntriesRequest();
  68. request.setName(dictionaryName);
  69. // 修改为要干预的词条数据,权重高中低为 7 4 1
  70. //数据结构参考:https://help.aliyun.com/document_detail/173606.html?spm=a2c4g.11186623.6.727.28514cf4my0AzY
  71. String body = "[{\n" +
  72. " \"word\": \"数据权限管理\",\n" +
  73. " \"cmd\": \"" + cmd + "\",\n" +
  74. " \"tokens\": [\n" +
  75. " {\n" +
  76. " \"token\": \"数据\",\n" +
  77. " \"weight\": 7\n" +
  78. " },\n" +
  79. " {\n" +
  80. " \"token\": \"权限\",\n" +
  81. " \"weight\": 4\n" +
  82. " },\n" +
  83. " {\n" +
  84. " \"token\": \"管理\",\n" +
  85. " \"weight\": 1\n" +
  86. " }\n" +
  87. " ]\n" +
  88. "}]";
  89. request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
  90. HttpResponse response = client.doAction(request);
  91. System.out.println(response.getHttpContentString());
  92. }
  93. public static void deleteDictionary(String dictionaryName) throws ClientException {
  94. RemoveInterventionDictionaryRequest request = new RemoveInterventionDictionaryRequest();
  95. request.setName(dictionaryName);
  96. HttpResponse response = client.doAction(request);
  97. System.out.println(response.getHttpContentString());
  98. }
  99. public static void setQueryProcessor(String appName, int versionId, String dictionaryName) throws UnsupportedEncodingException, ClientException {
  100. CreateQueryProcessorRequest request = new CreateQueryProcessorRequest();
  101. request.setAppGroupIdentity(appName);
  102. request.setAppId(versionId);
  103. // query Processor 文档:https://help.aliyun.com/document_detail/170014.html?spm=a2c4g.11186623.6.719.1002401051A9Lq
  104. String body = "{\"name\":\""+ dictionaryName +"\",\"domain\":\"GENERAL\",\"indexes\":[\"default\"],\"processors\":[{\"name\":\"term_weighting\",\"useSystemDictionary\":true, \"interventionDictionary\":\""+dictionaryName+"\"}]}";
  105. request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
  106. HttpResponse response = client.doAction(request);
  107. System.out.println(response.getHttpContentString());
  108. }
  109. public static void setDefaultQueryProcessor(String appName, int versionId, String dictionaryName) throws UnsupportedEncodingException, ClientException {
  110. ModifyQueryProcessorRequest request = new ModifyQueryProcessorRequest();
  111. request.setAppGroupIdentity(appName);
  112. request.setAppId(versionId);
  113. request.setName(dictionaryName);
  114. String body = "{\"active\":true}";
  115. request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
  116. HttpResponse response = client.doAction(request);
  117. System.out.println(response.getHttpContentString());
  118. }
  119. public static void deleteQueryProcessor(String appName, int versionId, String dictionaryName) throws ClientException {
  120. RemoveQueryProcessorRequest request = new RemoveQueryProcessorRequest();
  121. request.setAppGroupIdentity(appName);
  122. request.setAppId(versionId);
  123. request.setName(dictionaryName);
  124. HttpResponse response = client.doAction(request);
  125. System.out.println(response.getHttpContentString());
  126. }
  127. public static void listQueryProcessors(String appName, int versionId) throws ClientException {
  128. ListQueryProcessorsRequest request = new ListQueryProcessorsRequest();
  129. request.setAppGroupIdentity(appName);
  130. request.setAppId(versionId);
  131. HttpResponse response = client.doAction(request);
  132. System.out.println(response.getHttpContentString());
  133. }
  134. }

第三步:搜索测试,效果查看,满足预期:

2