全局索引

全局索引(Global Index)是分区表上的一种索引技术,可以创建在分区表的非分区键上,也支持提供唯一约束。

背景信息

分区表的Local Index,指每个分区都以一个Local Index分区,Local Index的分区规则和表的分区规则保持一致,因此Local Index只有包含分区键时才能作为唯一索引。

Global Index与Local Index不同,全局索引是分区表父表上的单个索引,该索引映射到许多基础表分区。 父表本身没有单个统一的基础存储,因此,它必须从物理分布式表中检索满足索引约束的数据。全局索引将数据存储在一个位置,因此可以一次访问跨多个分区的数据,而不是分别查询每个分区。

Global Index是建立在分区表父表上的一个btree index,分区上则没有。它支持创建在分区表的非分区键上,也支持提供唯一约束。

全局分区索引(Global Partitioned Index

Global Partitioned Index(GPI)旨在在分区表上创建一个全局的分区索引。GPI常用于数据量较大的分区表,其特点是数据量大,构建索引更复杂。相比于Global Index,GPI有自己的分区策略,和分区表的分区并不一致。传统的本地分区索引与子表一一对应,相当于子表的本地索引,而GPI既有本地索引的分区特性,又有全局索引的全局特性。

创建Global Index

语法

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ INCLUDE ( column_name [, ...] ) ]
    [ WITH ( storage_parameter [= value] [, ... ] ) ]
[ GLOBAL/LOCAL/global_partitioned_index ]
    [ TABLESPACE tablespace_name ]
    [ WHERE predicate ]
global_partitioned_index:
    GLOBAL PARTITION BY {RANGE (column_list) (index_partitioning_clause) |
                            HASH (column_list) (hash_partitions_by_quantity)}

在分区表上创建一个Global Index。

CREATE unique index m_city_id_idx_global ON measurement(city_id) global;

在分区表上创建一个Global Partitioned Index。

CREATE UNIQUE INDEX m_peaktemp_idx_global ON measurement(peaktemp) global
PARTITION BY range (peaktemp)
(
PARTITION m_peaktemp_idx_global_p1 values less than(1000),
PARTITION m_peaktemp_idx_global_p2 values less than(5000),
PARTITION m_peaktemp_idx_global_p3 values less than(MAXVALUE)
);

说明

  • GLOBAL/LOCAL参数指定为GLOBAL即创建Global Index。

  • 如果不指定创建GLOBAL/LOCAL参数,则默认创建Local Index。

  • Global Index的CREATE语法支持使用CONCURRENTLY模式创建。

  • 非分区表,包括分区表的子表上不支持创建Global Index。

  • Global Index不支持表达式索引。

  • Global Partitioned Index支持将Global Index进行range或hash分区,分区的语法和partition table类似。

  • Global Partitioned Index不支持按照List分区。

  • Global Partitioned Index只支持在分区表主表上创建,支持创建在分区键的列上。

  • Global Partitioned Index支持全局唯一约束。

  • Global Partitioned Index分区列必须指定索引列的左前缀。

  • Global Partitioned range Index支持有序扫描。

  • Global Partitioned hash Index不支持有序扫描。

  • Global Partitioned Index支持索引剪枝优化,即优化器剪枝和执行器剪枝。

Global Index Scan

Global index和普通的btree索引一样,当使用索引键进行查询时,index scan将会提升查询的性能。

Global Index支持以下几种index scan:

  • Global Index Scan

  • Global Partitioned Index Scan

  • Global Index Only Scan

  • Global Partitioned Index Only Scan

  • Global Index Bitmap Scan

  • Global Partitioned Index Bitmap Scan

  • Global Index Parallel Scan

  • Global Only Index Parallel Scan

示例

  • Global Index Scan

    explain select * from measurement where city_id = 5;
                                               QUERY PLAN                                           
    ------------------------------------------------------------------------------------------------
     Global Index Scan using m_city_id_idx_global on measurement  (cost=0.12..8.14 rows=1 width=20)
       Index Cond: (city_id = 5)
    (2 rows)
  • Global Partitioned Index Scan

    explain select * from measurement where peaktemp = 5;
                                                        QUERY PLAN                                                    
    ------------------------------------------------------------------------------------------------------------------
     Global Partitioned Index Scan(1/3) using m_peaktemp_idx_global on measurement  (cost=0.50..8.52 rows=1 width=20)
       Index Cond: (peaktemp = 5)
    (2 rows)
  • Global Index Only Scan

    explain select city_id from measurement where city_id = 5;
                                                 QUERY PLAN                                             
    ----------------------------------------------------------------------------------------------------
     Global Index Only Scan using m_city_id_idx_global on measurement  (cost=0.12..8.14 rows=1 width=4)
       Index Cond: (city_id = 5)
    (2 rows)
  • Global Partitioned Index Only Scan

    explain select peaktemp from measurement where peaktemp = 5;
                                                          QUERY PLAN                                                      
    ----------------------------------------------------------------------------------------------------------------------
     Global Partitioned Index Only Scan(1/3) using m_peaktemp_idx_global on measurement  (cost=0.12..8.14 rows=1 width=4)
       Index Cond: (peaktemp = 5)
    (2 rows)
  • Global Index Bitmap Scan

    explain select * from measurement where city_id > 5 and city_id < 10000;
                                               QUERY PLAN                                           
    ------------------------------------------------------------------------------------------------
     Append  (cost=1.03..14.47 rows=32 width=20)
       ->  Bitmap Heap Scan on measurement_y2023q1  (cost=1.03..3.58 rows=8 width=20)
             Recheck Cond: ((city_id > 5) AND (city_id < 10000))
             Filter: ((city_id > 5) AND (city_id < 10000))
             ->  Bitmap Global Index Scan on m_city_id_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((city_id > 5) AND (city_id < 10000))
       ->  Bitmap Heap Scan on measurement_y2023q2  (cost=1.03..3.58 rows=8 width=20)
             Recheck Cond: ((city_id > 5) AND (city_id < 10000))
             Filter: ((city_id > 5) AND (city_id < 10000))
             ->  Bitmap Global Index Scan on m_city_id_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((city_id > 5) AND (city_id < 10000))
       ->  Bitmap Heap Scan on measurement_y2023q3  (cost=1.03..3.58 rows=8 width=20)
             Recheck Cond: ((city_id > 5) AND (city_id < 10000))
             Filter: ((city_id > 5) AND (city_id < 10000))
             ->  Bitmap Global Index Scan on m_city_id_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((city_id > 5) AND (city_id < 10000))
       ->  Bitmap Heap Scan on measurement_y2023q4  (cost=1.03..3.58 rows=8 width=20)
             Recheck Cond: ((city_id > 5) AND (city_id < 10000))
             Filter: ((city_id > 5) AND (city_id < 10000))
             ->  Bitmap Global Index Scan on m_city_id_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((city_id > 5) AND (city_id < 10000))
    (21 rows)
  • Global Partitioned Index Bitmap Scan

    explain select peaktemp from measurement where peaktemp > 5 and peaktemp< 1000;
                                                        QUERY PLAN                                                    
    ------------------------------------------------------------------------------------------------------------------
     Append  (cost=1.03..14.47 rows=32 width=4)
       ->  Bitmap Heap Scan on measurement_y2023q1  (cost=1.03..3.58 rows=8 width=4)
             Recheck Cond: ((peaktemp > 5) AND (peaktemp < 1000))
             Filter: ((peaktemp > 5) AND (peaktemp < 1000))
             ->  Bitmap Global Partitioned Index Scan(1/3) on m_peaktemp_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((peaktemp > 5) AND (peaktemp < 1000))
       ->  Bitmap Heap Scan on measurement_y2023q2  (cost=1.03..3.58 rows=8 width=4)
             Recheck Cond: ((peaktemp > 5) AND (peaktemp < 1000))
             Filter: ((peaktemp > 5) AND (peaktemp < 1000))
             ->  Bitmap Global Partitioned Index Scan(1/3) on m_peaktemp_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((peaktemp > 5) AND (peaktemp < 1000))
       ->  Bitmap Heap Scan on measurement_y2023q3  (cost=1.03..3.58 rows=8 width=4)
             Recheck Cond: ((peaktemp > 5) AND (peaktemp < 1000))
             Filter: ((peaktemp > 5) AND (peaktemp < 1000))
             ->  Bitmap Global Partitioned Index Scan(1/3) on m_peaktemp_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((peaktemp > 5) AND (peaktemp < 1000))
       ->  Bitmap Heap Scan on measurement_y2023q4  (cost=1.03..3.58 rows=8 width=4)
             Recheck Cond: ((peaktemp > 5) AND (peaktemp < 1000))
             Filter: ((peaktemp > 5) AND (peaktemp < 1000))
             ->  Bitmap Global Partitioned Index Scan(1/3) on m_peaktemp_idx_global  (cost=0.00..4.13 rows=1 width=0)
                   Index Cond: ((peaktemp > 5) AND (peaktemp < 1000))
    (21 rows)

Global Index作为主键/外键引用

PolarDB PostgreSQL版(兼容Oracle)支持分区表中的任意一列作为主键或者外键引用,该优势依赖于Global Index,无论主键还是外键引用都需要有唯一索引,而Local Index作为唯一索引的前提是它的索引键必须包括分区键,如果没有Global Index,主键必须包含分区键,外键引用也必须包括分区键。但Global Index没有该限制,它支持在任意一列上创建主键或者作为外键引用。

您可以在创建分区表时指定主键,PolarDB PostgreSQL版(兼容Oracle)将根据您指定的主键,智能的选择使用Local Index还是Global Index。简单来说,当您指定主键包含分区键时,PolarDB PostgreSQL版(兼容Oracle)会使用Local Index为您创建唯一索引,否则将使用Global Index创建唯一索引。

示例

  • 分区表创建主键。

    -- local index
    CREATE TABLE pk_rel(a int primary key, b int, c int, d int) PARTITION BY RANGE (a);
    
    postgres=# \d pk_rel
                   Table "public.pk_rel"
     Column |  Type   | Collation | Nullable | Default 
    --------+---------+-----------+----------+---------
     a      | integer |           | not null | 
     b      | integer |           |          | 
     c      | integer |           |          | 
     d      | integer |           |          | 
    Partition key: RANGE (a)
    Indexes:
        "pk_rel_pkey" PRIMARY KEY, btree (a)
    Number of partitions: 0
    
    --- global index 
    CREATE TABLE pk_rel(a int, b int primary key, c int, d int) PARTITION BY RANGE (a);
    
    postgres=# \d pk_rel
                   Table "public.pk_rel"
     Column |  Type   | Collation | Nullable | Default 
    --------+---------+-----------+----------+---------
     a      | integer |           |          | 
     b      | integer |           | not null | 
     c      | integer |           |          | 
     d      | integer |           |          | 
    Partition key: RANGE (a)
    Indexes:
        "pk_rel_pkey" PRIMARY KEY, btree (b) GLOBAL
    Number of partitions: 0
    
    说明
    • 您也可以使用语句alter table xxx add primary key using index xxx显式的指定某种索引作为主键。

    • PolarDB PostgreSQL版(兼容Oracle)可以使用分区表的任意列作为外键引用,像一个普通表一样,前提是需要为这些列创建唯一索引或者主键。同样的,如果这些列是分区键,则使用Local Index,否则使用Global Index。

  • 分区表引用外键。

    CREATE TABLE fk_rel(a int, b int REFERENCES pk_rel(b), c int, d int) PARTITION BY RANGE (a);
    
    postgres=# \d fk_rel
                   Table "public.fk_rel"
     Column |  Type   | Collation | Nullable | Default 
    --------+---------+-----------+----------+---------
     a      | integer |           |          | 
     b      | integer |           |          | 
     c      | integer |           |          | 
     d      | integer |           |          | 
    Partition key: RANGE (a)
    Foreign-key constraints:
        "fk_rel_b_fkey" FOREIGN KEY (b) REFERENCES pk_rel(b)
    Number of partitions: 0

性能测试

使用pgbench工具产生scale为80000的数据,分别创建分区表与非分区表。

说明

TPS结果不能作为绝对标准,根据不同物理环境的不同而有差异,这里的数据用于对比Global Index和Local Index的性能。

非分区键上的点查性能

类别

TPS

Prepared Statement

不使用

使用

并发数

1

32

64

1

32

64

普通表

27,732

494,433

430,848

53,935

985,880

886,882

分区表+局部索引

367

4,155

3,688

856

8,742

6,790

分区表+全局索引

19,006

308,128

262,941

45,090

820,924

731,557

非分区键上的TPC-B性能

说明

包含了点查和DML。

类别

TPS

Prepared Statement

不使用

使用

并发数

1

32

64

1

32

64

普通表

1,115

51,025

60,409

4,822

90,312

100,802

分区表+局部索引

271

2,903

2,524

550

5,276

4,237

分区表+全局索引

3,453

36,320

39,941

4,334

69,040

75,232

结论

全局索引对分区表的点查和DML能够带来一个数量级的性能提升。