Bitmap indexing
This topic describes how to create and manage a bitmap index, along with usage cases.
A bitmap index is a special database index that uses bitmaps, which are an array of bits. A bit is always in one of two values: 0 and 1. Each bit in the bitmap corresponds to a single row in the table. The value of each bit depends upon the value of the corresponding row.
A bitmap index can help improve the query performance on a given column. If a query hits a sort key column, CelerData efficiently returns the query result by using the prefix index. However, the prefix index entry for a data block cannot exceed 36 bytes in length. If you want to improve the query performance on a column, which is not used as a sort key, you can create a bitmap index for the column.
Benefits
You can benefit from bitmap indexes in the following aspects:
- Reduce response time when the column has low cardinality, such as the columns of the ENUM type. If the number of distinct values in a column is relatively high, we recommend that you use a bloom filter index to improve query speed. For more information, see Bloom filter indexing.
- Use less storage space compared to other indexing techniques. Bitmap indexes typically take up only a fraction of the size of the indexed data in a table.
- Combine multiple bitmap indexes together to fire queries on multiple columns. For more information, see Query multiple columns.
Usage notes
- You can create a bitmap index for a column that can be filtered by using the equal (
=
) or [NOT] IN operator. - You can create bitmap indexes for all columns of a table that uses the Duplicate Key table or Unique Key table. For a table that uses the Aggregate table or Primary Key table, you can only create bitmap indexes for key columns.
- The columns of the FLOAT, DOUBLE, BOOLEAN, and DECIMAL types do not support creating bitmap indexes.
- You can check whether a query uses bitmap indexes by viewing the
BitmapIndexFilterRows
field of the query's profile.
Create a bitmap index
There are two ways to create a bitmap index for a column.
-
Create a bitmap index for a column when you create a table. Example:
CREATE TABLE d0.table_hash
(
k1 TINYINT,
k2 DECIMAL(10, 2) DEFAULT "10.5",
v1 CHAR(10) REPLACE,
v2 INT SUM,
INDEX index_name (column_name [, ...]) USING BITMAP COMMENT ''
)
ENGINE = olap
AGGREGATE KEY(k1, k2)
DISTRIBUTED BY HASH(k1) BUCKETS 10
PROPERTIES ("storage_type" = "column");The following table describes the parameters related to the bitmap index.
Parameter Required Description index_name Yes The name of the bitmap index. column_name Yes The name of the column on which a bitmap index is created. You can create the index for multiple columns at a time by specifying these column names. COMMENT No The comment of the bitmap index. For other parameter descriptions of the CREATE TABLE statement, see CREATE TABLE.
-
Create a bitmap index for a column of a table using the CREATE INDEX statement. For parameter descriptions and examples, see CREATE INDEX.
CREATE INDEX index_name ON table_name (column_name) [USING BITMAP] [COMMENT ''];
Display bitmap indexes
You can view all bitmap indexes created in a table using the SHOW INDEX statement. For parameter descriptions and examples, see SHOW INDEX.
SHOW { INDEX[ES] | KEY[S] } FROM [db_name.]table_name [FROM db_name];
Note
Creating indexes is an asynchronous process. Therefore, you can only see the indexes that have completed the creation process.
Delete a bitmap index
You can delete a bitmap index from a table using the DROP INDEX statement. For parameter descriptions and examples, see DROP INDEX.
DROP INDEX index_name ON [db_name.]table_name;
Usage cases
For example, the following table employee
shows a portion of a company's employee information.
ID | Gender | Position | Income_level |
---|---|---|---|
01 | female | Developer | level_1 |
02 | female | Analyst | level_2 |
03 | female | Salesman | level_1 |
04 | male | Accountant | level_3 |
Query a single column
For example, if you want to improve the query performance on the Gender
column, you can create a bitmap index for the column by using the following statement.
CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';
After you execute the preceding statement, the bitmap index is generated as shown in the following figure.
- Build a dictionary: CelerData builds a dictionary for the
Gender
column and mapsfemale
andmale
to coded values of the INT type:0
and1
. - Generate bitmaps: CelerData generates bitmaps for
female
andmale
based on the coded values. The bitmap offemale
is1110
becausefemale
displays in the first three rows. The bitmap ofmale
is0001
becausemale
only displays in the fourth row.
If you want to find out the male employee in the company, you can send a query as follows.
SELECT xxx FROM employee WHERE Gender = male;
After the query is sent, CelerData searches for the dictionary to get the coded value of male
, which is 1
and then gets the bitmap of male
, which is 0001
. This means that only the fourth row matches the query condition. Then CelerData will skip the first three rows and read only the fourth row.
Query multiple columns
For example, if you want to improve the query performance on the Gender
and Income_level
column, you can create bitmap indexes for these two columns by using the following statements.
-
Gender
CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';
-
Income_level
CREATE INDEX index2 ON employee (Income_level) USING BITMAP COMMENT 'index2';
After you execute the preceding two statements, the bitmap indexes are generated as shown in the following figure.
CelerData respectively builds a dictionary for the Gender
and Income_level
columns and then generates bitmaps for the distinct values in these two columns.
Gender
: The bitmap offemale
is1110
and the bitmap ofmale
is0001
.Producer
: The bitmap oflevel_1
is1010
, the bitmap oflevel_2
is0100
, and the bitmap oflevel_3
is0001
.
If you want to find out the female employees whose salary is in the level_1
, you can send a query as follows.
SELECT xxx FROM employee
WHERE Gender = female AND Income_level = Level_1;
After the query is sent, CelerData search for the dictionaries of Gender
and Income_level
at the same time to get the following information:
- The coded value of
female
is0
and the bitmap offemale
is1110
. - The coded value of
level_1
is0
and the bitmap oflevel_1
is1010
.
CelerData performs a bitwise logical operation 1110 & 1010
based on the AND
operator to get the result 1010
. According to the result, CelerData only reads the first row and the third row.