Bitmap indexing

This topic describes how to create and manage a bitmap index, along with usage cases.

A bitmap index is a special database index that uses bitmaps, which are an array of bits. A bit is always in one of two values: 0 and 1. Each bit in the bitmap corresponds to a single row in the table. The value of each bit depends upon the value of the corresponding row.

A bitmap index can help improve the query performance on a given column. If a query hits a sort key column, CelerData efficiently returns the query result by using the prefix index. However, the prefix index entry for a data block cannot exceed 36 bytes in length. If you want to improve the query performance on a column, which is not used as a sort key, you can create a bitmap index for the column.

Benefits

You can benefit from bitmap indexes in the following aspects:

  • Reduce response time when the column has low cardinality, such as the columns of the ENUM type. If the number of distinct values in a column is relatively high, we recommend that you use a bloom filter index to improve query speed. For more information, see Bloom filter indexing.
  • Use less storage space compared to other indexing techniques. Bitmap indexes typically take up only a fraction of the size of the indexed data in a table.
  • Combine multiple bitmap indexes together to fire queries on multiple columns. For more information, see Query multiple columns.

Usage notes

  • You can create a bitmap index for a column that can be filtered by using the equal (=) or [NOT] IN operator.
  • You can create bitmap indexes for all columns of a table that uses the Duplicate Key table or Unique Key table. For a table that uses the Aggregate table or Primary Key table, you can only create bitmap indexes for key columns.
  • The columns of the FLOAT, DOUBLE, BOOLEAN, and DECIMAL types do not support creating bitmap indexes.
  • You can check whether a query uses bitmap indexes by viewing the BitmapIndexFilterRows field of the query's profile.

Create a bitmap index

There are two ways to create a bitmap index for a column.

  • Create a bitmap index for a column when you create a table. Example:

    CREATE TABLE d0.table_hash
    (
        k1 TINYINT,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        v1 CHAR(10) REPLACE,
        v2 INT SUM,
        INDEX index_name (column_name [, ...]) USING BITMAP COMMENT ''
    )
    ENGINE = olap
    AGGREGATE KEY(k1, k2)
    DISTRIBUTED BY HASH(k1) BUCKETS 10
    PROPERTIES ("storage_type" = "column");

    The following table describes the parameters related to the bitmap index.

    ParameterRequiredDescription
    index_nameYesThe name of the bitmap index.
    column_nameYesThe name of the column on which a bitmap index is created. You can create the index for multiple columns at a time by specifying these column names.
    COMMENTNoThe comment of the bitmap index.

    For other parameter descriptions of the CREATE TABLE statement, see CREATE TABLE.

  • Create a bitmap index for a column of a table using the CREATE INDEX statement. For parameter descriptions and examples, see CREATE INDEX.

    CREATE INDEX index_name ON table_name (column_name) [USING BITMAP] [COMMENT ''];

Display bitmap indexes

You can view all bitmap indexes created in a table using the SHOW INDEX statement. For parameter descriptions and examples, see SHOW INDEX.

SHOW { INDEX[ES] | KEY[S] } FROM [db_name.]table_name [FROM db_name];

Note

Creating indexes is an asynchronous process. Therefore, you can only see the indexes that have completed the creation process.

Delete a bitmap index

You can delete a bitmap index from a table using the DROP INDEX statement. For parameter descriptions and examples, see DROP INDEX.

DROP INDEX index_name ON [db_name.]table_name;

Usage cases

For example, the following table employee shows a portion of a company's employee information.

IDGenderPositionIncome_level
01femaleDeveloperlevel_1
02femaleAnalystlevel_2
03femaleSalesmanlevel_1
04maleAccountantlevel_3

Query a single column

For example, if you want to improve the query performance on the Gender column, you can create a bitmap index for the column by using the following statement.

CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';

After you execute the preceding statement, the bitmap index is generated as shown in the following figure.

figure

  1. Build a dictionary: CelerData builds a dictionary for the Gender column and maps female and male to coded values of the INT type: 0 and 1.
  2. Generate bitmaps: CelerData generates bitmaps for female and male based on the coded values. The bitmap of female is 1110 because female displays in the first three rows. The bitmap of male is 0001 because male only displays in the fourth row.

If you want to find out the male employee in the company, you can send a query as follows.

SELECT xxx FROM employee WHERE Gender = male;

After the query is sent, CelerData searches for the dictionary to get the coded value of male, which is 1and then gets the bitmap of male, which is 0001. This means that only the fourth row matches the query condition. Then CelerData will skip the first three rows and read only the fourth row.

Query multiple columns

For example, if you want to improve the query performance on the Gender and Income_levelcolumn, you can create bitmap indexes for these two columns by using the following statements.

  • Gender

    CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';
  • Income_level

    CREATE INDEX index2 ON employee (Income_level) USING BITMAP COMMENT 'index2';

After you execute the preceding two statements, the bitmap indexes are generated as shown in the following figure.

figure

CelerData respectively builds a dictionary for the Gender and Income_level columns and then generates bitmaps for the distinct values in these two columns.

  • Gender: The bitmap of female is 1110 and the bitmap of male is 0001.
  • Producer: The bitmap of level_1 is 1010, the bitmap of level_2 is 0100, and the bitmap of level_3 is 0001.

If you want to find out the female employees whose salary is in the level_1, you can send a query as follows.

 SELECT xxx FROM employee 
 WHERE Gender = female AND Income_level = Level_1;

After the query is sent, CelerData search for the dictionaries of Gender and Income_level at the same time to get the following information:

  • The coded value of female is 0 and the bitmap of female is 1110.
  • The coded value of level_1 is 0 and the bitmap of level_1 is 1010.

CelerData performs a bitwise logical operation 1110 & 1010 based on the AND operator to get the result 1010. According to the result, CelerData only reads the first row and the third row.