Bitmap index

更新时间:
复制 MD 格式

This topic describes how to use bitmap indexes in Hologres.

Overview

In Hologres, the bitmap_columns property specifies a bitmap index. A bitmap index is an independent index structure, separate from data storage, designed to accelerate equality queries. It uses a bitmap vector to rapidly filter data within file blocks, making it suitable for equality filtering scenarios. Use the following syntax:

-- Syntax supported by Hologres V2.1 and later
CREATE TABLE <table_name> (...) WITH (bitmap_columns = '[<columnName>{:[on|off]}[,...]]');
-- Syntax supported by all versions
CREATE TABLE <table_name> (...);
CALL set_table_property('<table_name>', 'bitmap_columns', '[<columnName>{:[on|off]}[,...]]');

Parameter

Description

table_name

The name of the table.

columnName

The name of the column.

on

Enables the bitmap index for the specified column.

off

Disables the bitmap index for the specified column.

Recommendations

  • Create bitmap indexes on columns that are frequently used in equality queries. This allows the system to quickly locate the row numbers of matching data. However, using bitmap indexes on high-cardinality columns (those with few duplicate values) can cause significant storage overhead.

  • Avoid creating bitmap indexes on every column. This practice not only incurs extra storage overhead but also impairs write throughput, as Hologres must construct an index for each column during data ingestion.

  • Do not create bitmap indexes on text columns that store JSON-formatted data.

Limitations

  • Bitmap indexes are supported on column-oriented and row-column hybrid tables, but not on row-oriented tables.

  • Columns with a bitmap index can contain null values.

  • By default, bitmap indexes are implicitly created on all columns of the text data type.

  • You can run commands to set bitmap indexes outside of a transaction. The changes do not take effect immediately, as Hologres performs the creation and deletion of bitmap encodings asynchronously in the background. For more information, see ALTER TABLE.

  • The bitmap_columns property supports only on or off. As of Hologres V2.0, setting the bitmap_columns property to auto is no longer supported.

How it works

Unlike a distribution key or a clustering key, a bitmap index is an independent index structure separate from data storage. When a query uses this index, it can quickly locate the row number to filter data. However, bitmap indexes have overhead. Consider the following scenarios:

  • Columns with high cardinality: Hologres generates a bitmap for each distinct value. If there are many non-repeating values, this process forms a sparse array that consumes a large amount of storage.

  • All columns of a wide table have bitmap indexes: If you create a bitmap index for each column in a wide table, Hologres must construct a bitmap for each value during data ingestion, which creates system overhead and impairs write throughput.

In summary, a bitmap index is a space-for-time trade-off. It is most cost-effective for columns with a relatively even data distribution.位图索引

As shown in the following example, you can use explain SQL to check whether a query uses a bitmap index. If the execution plan contains Bitmap Filter, the query used the index.

  • Syntax supported by Hologres V2.1 and later:

    CREATE TABLE bitmap_test (
        uid int NOT NULL,
        name text NOT NULL,
        gender text NOT NULL,
        class text NOT NULL,
        PRIMARY KEY (uid)
    )
    WITH (
        bitmap_columns = 'gender,class'
    );
    INSERT INTO bitmap_test VALUES 
    (1,'Bob','Male','Class 1'),
    (2,'Eric','Male','Class 3'),
    (3,'Ada','Female','Class 2'),
    (4,'Joyce','Female','Class 2'),
    (5,'Leo','Male','Class 2'),
    (6,'Steve','Male','Class 3'),
    (7,'Dora','Female','Class 1');
    explain SELECT * FROM bitmap_test where gender='Male' AND  class='Class 1';
  • Syntax supported by all versions:

    begin;
    create table bitmap_test (
      uid int not null,
      name text not null,
      gender text not null,
      class text not null,
      PRIMARY KEY (uid)
    );
    call set_table_property('bitmap_test', 'bitmap_columns', 'gender,class');
    commit;
    INSERT INTO bitmap_test VALUES
    (1,'Bob','Male','Class 1'),
    (2,'Eric','Male','Class 3'),
    (3,'Ada','Female','Class 2'),
    (4,'Joyce','Female','Class 2'),
    (5,'Leo','Male','Class 2'),
    (6,'Steve','Male','Class 3'),
    (7,'Dora','Female','Class 1');
    explain SELECT * FROM bitmap_test where gender='Male' AND  class='Class 1';

The presence of the Bitmap Filter operator in the execution plan, such as Bitmap Filter: (gender = 'Male'::text) AND (class = 'Class 1'::text), confirms that the query used the bitmap index.

Bitmap index vs. clustering key

  • Both bitmap indexes and clustering keys filter data within files.

  • Bitmap indexes are ideal for equality queries because they locate data by row number. In contrast, clustering keys sort data within files, making them better for range queries.

    A clustering key has a higher priority than a bitmap index. If both are created on the same column, the optimizer prefers to use the clustering key. The following example shows the details:

    • Syntax supported by Hologres V2.1 and later:

      -- The uid, class, and date columns are set as the clustering key.
      -- By default, text columns such as name, class, and date have bitmap indexes.
      CREATE TABLE ck_bit_test (
          uid int NOT NULL,
          name text NOT NULL,
          class text NOT NULL,
          date text NOT NULL,
          PRIMARY KEY (uid)
      )
      WITH (
          clustering_key = 'uid,class,date',
          bitmap_columns = 'name,class,date'
      );
      INSERT INTO ck_bit_test VALUES 
      (1,'Bob','1','2022-10-19'),
      (2,'Eric','3','2022-10-19'),
      (3,'Ada','2','2022-10-20'),
      (4,'Joyce','2','2022-10-20'),
      (5,'Leo','2','2022-10-18'),
      (6,'Steve','3','2022-10-17'),
      (7,'Dora','3','2022-10-20');
    • Syntax supported by all versions:

      -- The uid, class, and date columns are set as the clustering key.
      -- By default, text columns such as name, class, and date have bitmap indexes.
      begin;
      create table ck_bit_test (
        uid int not null,
        name text not null,
        class text not null,
        date text not null,
        PRIMARY KEY (uid)
      );
      call set_table_property('ck_bit_test', 'clustering_key', 'uid,class,date');
      call set_table_property('ck_bit_test', 'bitmap_columns', 'name,class,date');
      commit;
      INSERT INTO ck_bit_test VALUES
      (1,'Bob','1','2022-10-19'),
      (2,'Eric','3','2022-10-19'),
      (3,'Ada','2','2022-10-20'),
      (4,'Joyce','2','2022-10-20'),
      (5,'Leo','2','2022-10-18'),
      (6,'Steve','3','2022-10-17'),
      (7,'Dora','3','2022-10-20');
    • This equality query targets the uid,class,date columns. Because it follows the leftmost matching principle, the optimizer uses the clustering key instead of the bitmap index.

      SELECT * FROM ck_bit_test WHERE uid = '3' AND class ='2' AND date > '2022-10-17';

      The execution plan contains the Cluster Filter operator but not the Bitmap Filter operator, indicating that the query used the clustering key. The plan shows an Index Scan using holo_index:[1] on ck_bit_test and filters using Cluster Filter: ((uid = 3) AND (class = '2'::text) AND (date > '2022-10-17'::text)), which confirms the use of the clustering key.

    • The query is performed on the uid,class,date columns. However, the class column is used in a range query. According to the leftmost matching principle, the matching stops if > or < is used in an SQL query. Therefore, the predicate on the date column cannot use the clustering key. Because a bitmap index exists on the date column, the query uses it instead.

      SELECT * FROM ck_bit_test WHERE uid = '3' AND class >'2' AND date = '2022-10-17';

      The execution plan contains both the Cluster Filter operator and the Bitmap Filter operator. This indicates that the query used the clustering key for the uid,class columns and the bitmap index for the date column.

      explain SELECT * FROM ck_bit_test WHERE uid = '3' AND class >'2' AND date = '2022-10-17';
      QUERY PLAN
      Exchange (Gather Exchange)  (cost=0.00..1.10 rows=1 width=24)
        ->  Decode  (cost=0.00..1.10 rows=1 width=24)
              ->  Index Scan using holo_index:[1] on ck_bit_test  (cost=0.00..1.00 rows=1 width=24)
                    Cluster Filter: ((uid = 3) AND (class > '2'::text))
                    Bitmap Filter: (date = '2022-10-17'::text)
                    Shard Selector(Eagerly):
                      ->: l0 [3]
      Optimizer: HQO version 1.3.0

Examples

  • Syntax supported by Hologres V2.1 and later:

    CREATE TABLE tbl (
        a text NOT NULL,
        b text NOT NULL
    )
    WITH (
        bitmap_columns = 'a:on,b:off'
    );
    -- Modify bitmap_columns
    ALTER TABLE tbl SET (bitmap_columns = 'a:off');-- The ALTER TABLE statement supports only full modification.
  • Syntax supported by all versions:

    -- Create the tbl table and set a bitmap index.
    begin; 
    create table tbl (
      a text not null,
      b text not null
    );
    call set_table_property('tbl', 'bitmap_columns', 'a:on,b:off');
    commit;
    -- Modify the bitmap index.
    call set_table_property('tbl', 'bitmap_columns', 'a:off');-- Full modification, which disables the bitmap index for column a.
    call update_table_property('tbl', 'bitmap_columns', 'b:off');-- Incrementally modifies the property, disabling the bitmap index for column b while retaining the setting for column a.

Related documents

The following topics provide more information about Data Definition Language (DDL) statements for internal tables in Hologres: