Quick Start-PolarDB(PolarDB)-阿里云帮助中心

When keyword search with LIKE is too slow or lacks linguistic intelligence — no stemming, no relevance ranking, no index support — full-text search solves these problems. This guide walks you through full-text search on PolarDB for PostgreSQL (Compatible with Oracle) using the RUM extension: installing extensions, creating tables and indexes, and running ranked queries on both English and Chinese text.

Prerequisites

Before you begin, make sure you have:

A PolarDB for PostgreSQL (Compatible with Oracle) cluster
Database connection credentials with the CREATE EXTENSION privilege

Install extensions

Install the RUM extension for full-text search:

CREATE EXTENSION rum;

To support Chinese text, also install the pg_jieba extension for Chinese word segmentation:

CREATE EXTENSION pg_jieba;

Note

For version requirements, see pg_jieba and rum.

Create tables

Create a table for English text:

CREATE TABLE test_english(id serial, t text, d timestamp);

Create a table for Chinese text. The v column stores the pre-computed tsvector used for indexing and querying:

CREATE TABLE test_chinese(id serial, t text, d timestamp, v tsvector);

Insert data

The examples in this guide use the following test data.

English table (`test_english`)

id	t	d
1	The situation is most beautiful	2016-05-01 20:21:22
2	It is a beautiful	2016-05-01 21:21:22
3	It looks like a beautiful place	2016-05-01 22:21:22
4	It looks l a beautiful place	2016-05-02 00:21:22
5	It looks like a beautiful places	2016-05-02 02:21:22
6	It looks like a beaut places	2016-05-02 03:21:22
7	It looks like a pig	2016-05-02 03:21:22

Chinese table (`test_chinese`) — four rows of Chinese financial news text, all timestamped 2016-05-01 20:21:22.

id	t (excerpt)	d
1	According to Nancai Wealth Management data，5May public wealth management products...	2016-05-01 20:21:22
2	Among them，CITIC Bank Wealth Management"Anying Xiang Fixed Income Steady Profit..."...	2016-05-01 20:21:22
3	5The largest fundraising scale for new products launched in May was CITIC Bank Wealth Management...	2016-05-01 20:21:22
4	According to Nancai Wealth Management data，"Anying Xiang Fixed Income Steady Profit..."...	2016-05-01 20:21:22

Populate the English table:

INSERT INTO test_english(t,d) VALUES ('The situation is most beautiful', '2016-05-01 20:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It is a beautiful', '2016-05-01 21:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a beautiful place', '2016-05-01 22:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks l a beautiful place', '2016-05-02 00:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a beautiful places', '2016-05-02 02:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a beaut places', '2016-05-02 03:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a pig', '2016-05-02 03:21:22.326724');

Populate the Chinese table, then compute the tsvector column using the jiebacfg configuration for Chinese word segmentation:

INSERT INTO test_chinese(t,d) VALUES ('南财理财通数据显示，5月公募理财产品吸金榜前十名归属于5家机构，信银理财有5只产品上榜，南银理财占两席，民生理财、恒丰理财、浦银理财各占一席，上榜产品均为中低风险的封闭式固收类产品。从投资周期看，本月榜单产品中有6只产品投资周期为1-2年，3-6个月及6-12个月期限产品各有2只。', '2016-05-01 20:21:22.326724');
INSERT INTO test_chinese(t,d) VALUES ('其中，信银理财"安盈象固收稳利十四个月封闭式95号"理财产品夺得吸金榜冠军，募集规模达到106.204 亿元；民生理财"富竹固收优选14个月封闭39号"位列第二，募集规模为46.606亿元；信银理财"安盈象固收稳利十四个月封闭式94号"排名第三，募集规模为41.365 亿元。', '2016-05-01 20:21:22.326724');
INSERT INTO test_chinese(t,d) VALUES ('5月新发产品募集规模最大的是信银理财"安盈象固收稳利十四个月封闭式95号"，实际募集资金高达106.204亿元，该系列产品已蝉联4个月榜首。', '2016-05-01 20:21:22.326724');
INSERT INTO test_chinese(t,d) VALUES ('南财理财通数据显示，"安盈象固收稳利十四个月封闭式95号"理财产品成立于2024年05月16日，将于2025年07月16日到期，风险评级PR2（中低风险），产品有A、C、D、K类份额，投资周期1-2年，业绩比较基准2.90%~3.45%。', '2016-05-01 20:21:22.326724');

-- Populate the tsvector column. Run this after each INSERT, or automate it with a trigger.
UPDATE test_chinese
SET v = to_tsvector('jiebacfg', t);

Note

The UPDATE above populates existing rows. In production, keep the v column current by running this UPDATE after each write, or by creating a trigger that calls to_tsvector('jiebacfg', t) automatically on every INSERT or UPDATE.

Create indexes

Create a RUM index on the English table using rum_text_ops, which indexes the raw text column directly:

CREATE INDEX ON test_english USING rum (t rum_text_ops);

Create a RUM index on the Chinese table using rum_tsvector_ops, which indexes the pre-computed tsvector column:

CREATE INDEX ON test_chinese USING rum (v rum_tsvector_ops);

Run queries

Two operators power full-text search queries:

Operator	Description
`@@`	Match operator. Returns `true` if a `tsvector` matches a `tsquery`.
`<=>`	Distance operator (RUM-specific). Returns a float score where a higher value means lower relevance (greater distance from the query terms).

Query the English table

Match rows containing `beautiful` or `place`:

SELECT * FROM test_english
WHERE to_tsvector('english', t) @@ to_tsquery('english', 'beautiful | place');

Output:

 id |                t                 |             d
----+----------------------------------+----------------------------
  1 | The situation is most beautiful  | 2016-05-01 20:21:22.326724
  2 | It is a beautiful                | 2016-05-01 21:21:22.326724
  3 | It looks like a beautiful place  | 2016-05-01 22:21:22.326724
  4 | It looks l a beautiful place     | 2016-05-02 00:21:22.326724
  5 | It looks like a beautiful places | 2016-05-02 02:21:22.326724
  6 | It looks like a beaut places     | 2016-05-02 03:21:22.326724

Match rows containing `beautiful` or `place`, ordered by distance score:

The <=> operator assigns a distance score to each matching row. Because a higher score means lower relevance, ordering by DESC places the least-relevant rows first. Rows that match only one query term receive a higher score than rows that match both terms.

SELECT *, to_tsvector('english', t) <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_english
WHERE to_tsvector('english', t) @@ to_tsquery('english', 'beautiful | place')
ORDER BY to_tsvector('english', t) <=> to_tsquery('english', 'beautiful | place') DESC;

Output:

 id |                t                 |             d              |  rank
----+----------------------------------+----------------------------+---------
  1 | The situation is most beautiful  | 2016-05-01 20:21:22.326724 | 16.4493
  2 | It is a beautiful                | 2016-05-01 21:21:22.326724 | 16.4493
  6 | It looks like a beaut places     | 2016-05-02 03:21:22.326724 | 16.4493
  3 | It looks like a beautiful place  | 2016-05-01 22:21:22.326724 | 8.22467
  4 | It looks l a beautiful place     | 2016-05-02 00:21:22.326724 | 8.22467
  5 | It looks like a beautiful places | 2016-05-02 02:21:22.326724 | 8.22467
(6 rows)

Rows 1, 2, and 6 each match only one query term (beautiful or place but not both), giving them a higher distance score (16.4493). Rows 3–5 match both terms, so their distance is lower (8.22467). To sort by most-relevant first, change DESC to ASC.

Query the Chinese table

Match rows containing `数据` or `理财`, ordered by distance score:

SELECT id, v <=> to_tsquery('jiebacfg', '数据 | 理财') as rank
FROM test_chinese
WHERE v @@ to_tsquery('jiebacfg', '数据 | 理财');

Output:

 id |   rank
----+-----------
  1 | 6.6024785
  2 |  12.08523
  3 |  16.44934
  4 |   8.22467

Match rows where `v` contains `数据`, ranked by similarity to `数据 | 理财`:

SELECT id, v <=> to_tsquery('jiebacfg', '数据 | 理财') as rank
FROM test_chinese
WHERE v @@ to_tsquery('jiebacfg', '数据');

Output:

 id |   rank
----+-----------
  1 | 6.6024785
  4 |   8.22467

Uninstall extensions (optional)

DROP EXTENSION rum CASCADE;
DROP EXTENSION pg_jieba CASCADE;

What's next

pg_jieba — Chinese word segmentation configuration and options
rum — RUM extension reference, including version requirements and operator details