本文介绍PolarDB通用倒排索引GIN(Generalized Inverted Index)。
GIN是一种存储(Key和Posting list)对集合的索引结构,其中Key是一个键值,Posting list代表一组包含该Key的位置。例如,在'hello', '14:2 23:4'
中,表示字符串hello
在位置14:2
和23:4
中出现过,这些位置实际上对应于元组的tid(行号,包括数据块ID,大小为32 bit,以及item point,大小为16 bit)。通过这种索引结构,可以快速查找到包含指定关键字的元组,因此GIN索引特别适用于多值类型元素的搜索。
操作符
操作符 | 示例 |
<@ | select * from test where id <@ array[1,2]; |
@> | select * from test where id @> array[1,2]; |
= | select * from test where id = array[1,2]; |
&& | select * from test where id && array[1,2]; |
您也可以通过btree_gin插件,支持btree相关的操作符类。
索引结构
组成结构 | 说明 |
Entry | GIN索引中的一个元素。 |
Entry tree | 在Entry上构建的B树。 |
Posting list | Entry物理位置的链表。 |
Posting tree | Posting list构建的B树。 |
Pending list | 索引元组的临时存储链表,用于fastupdate插入。 |
应用场景
搜索多值类型
创建表及插入数据。
CREATE TABLE t_gin1 (id int, arr int[]); do language plpgsql $$ declare begin for i in 1..10000 loop insert into t_gin1 select i, array(select random()*1000 from generate_series(1,10)); end loop; end; $$;
查询数据。
select * from t_gin1 limit 3;
id | arr 1 | {819,751,515,192,909,255,683,39,258,423} 2 | {844,43,834,710,447,344,480,749,814,18} 3 | {339,69,28,428,859,510,560,465,906,788}
创建索引。
CREATE INDEX idx_t_gin1_1 ON t_gin1 USING gin (arr);
执行计划。
EXPLAIN (analyze, verbose, timing, costs, buffers) SELECT * FROM t_gin1 WHERE arr && array[1,2];
Bitmap Heap Scan on public.t_gin1 (cost=7.08..126.44 rows=204 width=65) (actual time=0.042..0.161 rows=204 loops=1) Output: id, arr Recheck Cond: (t_gin1.arr && '{1,2}'::integer[]) Heap Blocks: exact=106 Buffers: shared hit=111 (main=111 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin1_1 (cost=0.00..7.03 rows=204 width=0) (actual time=0.026..0.026 rows=204 loops=1) Index Cond: (t_gin1.arr && '{1,2}'::integer[]) Buffers: shared hit=5 (main=5 vm=0 fsm=0) Query Identifier: -6536619930866726302 Planning: Buffers: shared hit=44 (main=44 vm=0 fsm=0) dirtied=2 (main=0 vm=0 fsm=0) Planning Time: 0.160 ms Execution Time: 0.196 ms
按照任意列进行搜索
安装插件。
create extension btree_gin;
创建表及插入数据。
CREATE TABLE t_gin2 (id int, c1 int); INSERT INTO t_gin2 SELECT generate_series(1,100000), random()*10 ;
创建索引。
CREATE INDEX idx_t_gin2_1 on t_gin2 using gin (c1);
执行计划。
EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT * FROM t_gin2 WHERE c1=1;
Bitmap Heap Scan on public.t_gin2 (cost=81.66..745.49 rows=9827 width=8) (actual time=0.743..2.775 rows=9964 loops=1) Output: id, c1 Recheck Cond: (t_gin2.c1 = 1) Heap Blocks: exact=541 Buffers: shared hit=546 (main=546 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin2_1 (cost=0.00..79.20 rows=9827 width=0) (actual time=0.691..0.692 rows=9964 loops=1) Index Cond: (t_gin2.c1 = 1) Buffers: shared hit=5 (main=5 vm=0 fsm=0) Query Identifier: -7119324855706379201 Planning: Buffers: shared hit=43 (main=43 vm=0 fsm=0) dirtied=1 (main=0 vm=0 fsm=0) Planning Time: 0.136 ms Execution Time: 3.490 ms
查找的数据比较稀疏
创建表及插入数据。
CREATE TABLE t_gin3 (id int, c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int, c9 int); INSERT INTO t_gin3 SELECT generate_series(1,100000), random()*10, random()*20, random()*30, random()*40, random()*50, random()*60, random()*70, random()*80, random()*90;
创建索引。
CREATE INDEX idx_t_gin3_1 ON t_gin3 USING gin (c1,c2,c3, c4, c5, c6,c7,c8,c9);
执行计划。
EXPLAIN (analyze,verbose,timing,costs, buffers) SELECT * FROm t_gin3 WHERE c1=1 OR c2=1 AND c3=1 OR c4=1 AND (c6=1 OR c7=2) OR c8=9 OR c9=10;
Bitmap Heap Scan on public.t_gin3 (cost=141.08..1445.89 rows=12051 width=40) (actual time=1.539..4.541 rows=12240 loops=1) Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9 Recheck Cond: ((t_gin3.c1 = 1) OR ((t_gin3.c2 = 1) AND (t_gin3.c3 = 1)) OR (((t_gin3.c4 = 1) AND (t_gin3.c6 = 1)) OR ((t_gin3.c4 = 1) AND (t_gin3.c7 = 2))) OR (t_gin3.c8 = 9) OR (t_gin3.c9 = 10)) Heap Blocks: exact=935 Buffers: shared hit=968 (main=968 vm=0 fsm=0) -> BitmapOr (cost=141.08..141.08 rows=12327 width=0) (actual time=1.441..1.445 rows=0 loops=1) Buffers: shared hit=33 (main=33 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin3_1 (cost=0.00..79.12 rows=9670 width=0) (actual time=0.816..0.817 rows=9844 loops=1) Index Cond: (t_gin3.c1 = 1) Buffers: shared hit=6 (main=6 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin3_1 (cost=0.00..7.09 rows=159 width=0) (actual time=0.262..0.262 rows=178 loops=1) Index Cond: ((t_gin3.c2 = 1) AND (t_gin3.c3 = 1)) Buffers: shared hit=8 (main=8 vm=0 fsm=0) -> BitmapOr (cost=17.85..17.85 rows=82 width=0) (actual time=0.222..0.224 rows=0 loops=1) Buffers: shared hit=13 (main=13 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin3_1 (cost=0.00..5.94 rows=44 width=0) (actual time=0.119..0.119 rows=49 loops=1) Index Cond: ((t_gin3.c4 = 1) AND (t_gin3.c6 = 1)) Buffers: shared hit=6 (main=6 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin3_1 (cost=0.00..5.88 rows=38 width=0) (actual time=0.103..0.103 rows=38 loops=1) Index Cond: ((t_gin3.c4 = 1) AND (t_gin3.c7 = 2)) Buffers: shared hit=7 (main=7 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin3_1 (cost=0.00..12.93 rows=1283 width=0) (actual time=0.072..0.072 rows=1289 loops=1) Index Cond: (t_gin3.c8 = 9) Buffers: shared hit=3 (main=3 vm=0 fsm=0) -> Bitmap Index Scan on idx_t_gin3_1 (cost=0.00..11.80 rows=1133 width=0) (actual time=0.063..0.063 rows=1143 loops=1) Index Cond: (t_gin3.c9 = 10) Buffers: shared hit=3 (main=3 vm=0 fsm=0) Query Identifier: -7228854819162599043 Planning: Buffers: shared hit=111 (main=111 vm=0 fsm=0) dirtied=4 (main=0 vm=0 fsm=0) Planning Time: 0.220 ms Execution Time: 5.459 ms
文档内容是否对您有帮助?