rum插件_云原生数据库 PolarDB(PolarDB)-阿里云帮助中心

rum插件扩展了GIN索引（通用倒排索引）的基本概念，基于GIN索引Access Method代码，实现了更快的全文搜索方法。

前提条件

支持的PolarDB PostgreSQL版（兼容Oracle）的版本如下：

Oracle语法兼容 2.0（内核小版本2.0.14.3.0及以上）。

说明

您可通过如下语句查看PolarDB PostgreSQL版（兼容Oracle）的内核小版本号：

show polar_version;

背景

GIN索引支持通过tsvector和tsquery两种数据类型进行全文检索，但是有如下几个问题：

排序慢：需要有关词汇的位置信息才能进行排序。由于GIN索引不存储词汇的位置，因此在索引扫描之后，需要额外的扫描来检索词汇位置。
短语查询慢：GIN索引需要位置信息来执行短语搜索。
时间戳排序慢：GIN索引无法在带有词素的索引中存储一些相关信息，因此需要执行额外的扫描。

RUM插件基于GIN索引，通过在RUM索引中存储额外的信息（词汇位置或时间戳的位置信息）来解决以上问题。

说明

由于RUM索引需要存储除密钥之外的其他信息以及使用通用的WAL日志记录，RUM索引构建和插入时间慢于GIN索引。

rum通用操作符

rum模块提供以下操作符：

操作符	返回值数据类型	描述
`tsvector` `<=>` `tsquery`	float4	用于计算`tsvector`和`tsquery`之间的距离。
`timestamp` `<=>` `timestamp`	float8	用于计算两个时间戳之间的距离。
`timestamp` `<=\|` `timestamp`	float8	用于计算所有小于当前时间戳的距离。
`timestamp` `\|=>` `timestamp`	float8	用于计算所有大于当前时间戳的距离。

<=>、<=|和|=>操作符也适用于以下数据类型：

timestamptz
int2
int4
int8
float4
float8
money
oid

使用方法

创建插件

您可以执行以下SQL语句安装插件。

CREATE EXTENSION rum;

说明

如果在满足版本要求的条件下创建插件失败，请联系我们处理。

函数使用介绍

rum_tsvector_ops用于存储带有位置信息的tsvector词组，支持按<=>运算符排序和前缀搜索。

示例：

准备测试数据：

CREATE TABLE test_rum(t text, a tsvector);
CREATE TRIGGER tsvectorupdate
    BEFORE UPDATE OR INSERT ON test_rum
    FOR EACH ROW
    EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');
INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');
INSERT INTO test_rum(t) VALUES ('It is a beautiful');
INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');

创建rum索引：

CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);

执行如下查询：

查询语句一：

SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'beautiful | place')
ORDER BY a <=> to_tsquery('english', 'beautiful | place');

查询结果如下：

                t                |   rank
---------------------------------+----------
 It looks like a beautiful place |  8.22467
 The situation is most beautiful | 16.44934
 It is a beautiful               | 16.44934
(3 rows)

查询语句二：

SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'place | situation')
ORDER BY a <=> to_tsquery('english', 'place | situation');

查询结果如下：

                t                |   rank
---------------------------------+----------
 The situation is most beautiful | 16.44934
 It looks like a beautiful place | 16.44934
(2 rows)

rum_tsvector_hash_ops用于存储tsvector词组的哈希值和位置信息。支持按<=>运算符排序，但不支持前缀搜索。
说明
rum_tsvector_hash_ops支持使用<=>、<=|、|=>操作符进行排序，可以与rum_tsvector_addon_ops、rum_tsvector_hash_addon_ops和rum_anyarray_addon_ops一起使用。
rum_TYPE_ops为操作符，适应的数据类型和支持的操作符如下：
适用数据类型：int2，int4，int8，float4，float8，money，oid，time，timetz，date，interval，macaddr，inet，cidr，text，varchar，char，bytea，bit，varbit，numeric，timestamp，timestamptz。
支持的操作符：<、<=、=、>=、>操作符支持所有数据类型，<=>、<=|、|=>操作符支持 int2、int4、int8、float4、float8、money、oid、timestamp、timestamptz 数据类型。

rum_tsvector_addon_ops用于存储tsvector词法，以及模块字段支持的任何词法。

示例：

准备数据：

CREATE TABLE tsts (id int, t tsvector, d timestamp);
\copy tsts from 'external/rum/data/tsts.data'
CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d) WITH (attach = 'd', to = 't');

执行如下计划：

EXPLAIN (costs off)
SELECT id, d, d <=> '2016-05-16 14:21:25'
FROM tsts
WHERE t @@ 'wr&qh'
ORDER BY d <=> '2016-05-16 14:21:25'
LIMIT 5;

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit
   ->  Index Scan using tsts_idx on tsts
         Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
         Order By: (d <=> '2016-05-16 14:21:25'::timestamp without time zone)
(4 rows)

执行如下查询命令：

SELECT id, d, d <=> '2016-05-16 14:21:25'
FROM tsts
WHERE t @@ 'wr&qh'
ORDER BY d <=> '2016-05-16 14:21:25'
LIMIT 5;

查询结果如下：

 id  |             d              |   ?column?
-----+----------------------------+---------------
 355 | 2016-05-16 14:21:22.326724 |      2.673276
 354 | 2016-05-16 13:21:22.326724 |   3602.673276
 371 | 2016-05-17 06:21:22.326724 |  57597.326724
 406 | 2016-05-18 17:21:22.326724 | 183597.326724
 415 | 2016-05-19 02:21:22.326724 | 215997.326724
(5 rows)

说明

由于后缀树具有固定长度的右边界和固定长度的无子节点后缀项，RUM在使用按引用传递附加信息进行排序来创建索引时可能有缺陷。

rum_tsvector_hash_addon_ops用于存储tsvector词库的哈希值以及任何支持模块的字段，不支持前缀搜索。

rum_tsquery_ops适用tsquery数据类型，用于在其他信息中存储查询树的分支。

示例：

准备数据：

CREATE TABLE test_array (i int2[]);
INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');
CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);

执行如下指令：

SET enable_seqscan TO off;
EXPLAIN (COSTS OFF)
SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

                QUERY PLAN
------------------------------------------
 Index Scan using idx_array on test_array
   Index Cond: (i && '{1}'::smallint[])
   Order By: (i <=> '{1}'::smallint[])
(3 rows)

执行如下查询：

SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

查询结果如下：

     i
-----------
 {1}
 {1,2}
 {1,2,3}
 {1,2,3,4}
(4 rows)

rum_anyarray_ops用于存储具有数组长度的anyarrray元素。支持运算符&&、@>、<@、=、％，支持按<=>运算符排序。

示例：

准备数据：

CREATE TABLE test_array (i int2[]);
INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');
CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);

执行如下指令：

SET enable_seqscan TO off;

EXPLAIN (COSTS OFF)
SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

                QUERY PLAN
------------------------------------------
 Index Scan using idx_array on test_array
   Index Cond: (i && '{1}'::smallint[])
   Order By: (i <=> '{1}'::smallint[])
(3 rows)

执行如下查询：

SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

查询结果如下：

     i
-----------
 {1}
 {1,2}
 {1,2,3}
 {1,2,3,4}
(4 rows)

rum_anyarray_addon_ops用于存储anyarrray元素以及模块字段支持的任何元素。

卸载插件

您可以执行以下SQL语句卸载插件。

DROP EXTENSION rum;

rum（全文检索加速）

前提条件

背景

rum通用操作符

使用方法

创建插件

函数使用介绍

卸载插件

相关参考