- Release Notes
- Get Started
- Clusters
- Cloud Settings
- Table Type
- Query Data Lakes
- Integration
- Query Acceleration
- Data Loading
- Concepts
- Batch load data from Amazon S3
- Batch load data from Azure cloud storage
- Load data from a local file system
- Load data from Confluent Cloud
- Load data from Amazon MSK
- Load data from Amazon Kinesis
- Data Unloading
- Data Backup
- Security
- Console Access Control
- Data Access Control
- Application keys
- Service accounts
- Use SSL connection
- Alarm
- Usage and Billing
- Organizations and Accounts
- Reference
- Amazon Web Services (AWS)
- Microsoft Azure
- SQL Reference
- Keywords
- ALL statements
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW TABLE STATUS
- SHOW FILE
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- SQL Functions
- Function list
- Java UDFs
- Window functions
- Lambda expression
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_diff
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayofweek_iso
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- jodatime_format
- last_day
- makedate
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- next_day
- now
- previous_day
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str_to_jodatime
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- to_iso8601
- to_tera_date
- to_tera_timestamp
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Aggregate Functions
- any_value
- approx_count_distinct
- array_agg
- avg
- bitmap
- bitmap_agg
- count
- count_if
- corr
- covar_pop
- covar_samp
- group_concat
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- min_by
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- substring_index
- starts_with
- strleft
- strright
- str_to_map
- substring
- trim
- ucase
- unhex
- upper
- url_decode
- url_encode
- Pattern Matching Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bit Functions
- Bitmap Functions
- Array Functions
- all_match
- any_match
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_generate
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Map Functions
- Binary Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Struct Functions
- Table Functions
- Utility Functions
- AUTO_INCREMENT
- Generated columns
- System variables
- System limits
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- System Metadatabase
- API
- Overview
- Actions
- Clusters
- Create and Manage Clusters
- Query Clusters
- Identity and Access Management
- Organization and Account
- Usage and Billing
- Clusters
- Terraform Provider
- Run scripts
Bitmap indexing
This topic describes how to create and manage a bitmap index, along with usage cases.
A bitmap index is a special database index that uses bitmaps, which are an array of bits. A bit is always in one of two values: 0 and 1. Each bit in the bitmap corresponds to a single row in the table. The value of each bit depends upon the value of the corresponding row.
A bitmap index can help improve the query performance on a given column. If a query hits a sort key column, CelerData efficiently returns the query result by using the prefix index. However, the prefix index entry for a data block cannot exceed 36 bytes in length. If you want to improve the query performance on a column, which is not used as a sort key, you can create a bitmap index for the column.
Benefits
You can benefit from bitmap indexes in the following aspects:
- Reduce response time when the column has low cardinality, such as the columns of the ENUM type. If the number of distinct values in a column is relatively high, we recommend that you use a bloom filter index to improve query speed. For more information, see Bloom filter indexing.
- Use less storage space compared to other indexing techniques. Bitmap indexes typically take up only a fraction of the size of the indexed data in a table.
- Combine multiple bitmap indexes together to fire queries on multiple columns. For more information, see Query multiple columns.
Usage notes
- You can create a bitmap index for a column that can be filtered by using the equal (
=
) or [NOT] IN operator. - You can create bitmap indexes for all columns of a table that uses the Duplicate Key table or Unique Key table. For a table that uses the Aggregate table or Primary Key table, you can only create bitmap indexes for key columns.
- The columns of the FLOAT, DOUBLE, BOOLEAN, and DECIMAL types do not support creating bitmap indexes.
- You can check whether a query uses bitmap indexes by viewing the
BitmapIndexFilterRows
field of the query's profile.
Create a bitmap index
There are two ways to create a bitmap index for a column.
Create a bitmap index for a column when you create a table. Example:
CREATE TABLE d0.table_hash ( k1 TINYINT, k2 DECIMAL(10, 2) DEFAULT "10.5", v1 CHAR(10) REPLACE, v2 INT SUM, INDEX index_name (column_name [, ...]) USING BITMAP COMMENT '' ) ENGINE = olap AGGREGATE KEY(k1, k2) DISTRIBUTED BY HASH(k1) BUCKETS 10 PROPERTIES ("storage_type" = "column");
The following table describes the parameters related to the bitmap index.
Parameter Required Description index_name Yes The name of the bitmap index. column_name Yes The name of the column on which a bitmap index is created. You can create the index for multiple columns at a time by specifying these column names. COMMENT No The comment of the bitmap index. For other parameter descriptions of the CREATE TABLE statement, see CREATE TABLE.
Create a bitmap index for a column of a table using the CREATE INDEX statement. For parameter descriptions and examples, see CREATE INDEX.
CREATE INDEX index_name ON table_name (column_name) [USING BITMAP] [COMMENT ''];
Display bitmap indexes
You can view all bitmap indexes created in a table using the SHOW INDEX statement. For parameter descriptions and examples, see SHOW INDEX.
SHOW { INDEX[ES] | KEY[S] } FROM [db_name.]table_name [FROM db_name];
Note
Creating indexes is an asynchronous process. Therefore, you can only see the indexes that have completed the creation process.
Delete a bitmap index
You can delete a bitmap index from a table using the DROP INDEX statement. For parameter descriptions and examples, see DROP INDEX.
DROP INDEX index_name ON [db_name.]table_name;
Usage cases
For example, the following table employee
shows a portion of a company's employee information.
ID | Gender | Position | Income_level |
---|---|---|---|
01 | female | Developer | level_1 |
02 | female | Analyst | level_2 |
03 | female | Salesman | level_1 |
04 | male | Accountant | level_3 |
Query a single column
For example, if you want to improve the query performance on the Gender
column, you can create a bitmap index for the column by using the following statement.
CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';
After you execute the preceding statement, the bitmap index is generated as shown in the following figure.
- Build a dictionary: CelerData builds a dictionary for the
Gender
column and mapsfemale
andmale
to coded values of the INT type:0
and1
. - Generate bitmaps: CelerData generates bitmaps for
female
andmale
based on the coded values. The bitmap offemale
is1110
becausefemale
displays in the first three rows. The bitmap ofmale
is0001
becausemale
only displays in the fourth row.
If you want to find out the male employee in the company, you can send a query as follows.
SELECT xxx FROM employee WHERE Gender = male;
After the query is sent, CelerData searches for the dictionary to get the coded value of male
, which is 1
and then gets the bitmap of male
, which is 0001
. This means that only the fourth row matches the query condition. Then CelerData will skip the first three rows and read only the fourth row.
Query multiple columns
For example, if you want to improve the query performance on the Gender
and Income_level
column, you can create bitmap indexes for these two columns by using the following statements.
Gender
CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';
Income_level
CREATE INDEX index2 ON employee (Income_level) USING BITMAP COMMENT 'index2';
After you execute the preceding two statements, the bitmap indexes are generated as shown in the following figure.
CelerData respectively builds a dictionary for the Gender
and Income_level
columns and then generates bitmaps for the distinct values in these two columns.
Gender
: The bitmap offemale
is1110
and the bitmap ofmale
is0001
.Producer
: The bitmap oflevel_1
is1010
, the bitmap oflevel_2
is0100
, and the bitmap oflevel_3
is0001
.
If you want to find out the female employees whose salary is in the level_1
, you can send a query as follows.
SELECT xxx FROM employee
WHERE Gender = female AND Income_level = Level_1;
After the query is sent, CelerData search for the dictionaries of Gender
and Income_level
at the same time to get the following information:
- The coded value of
female
is0
and the bitmap offemale
is1110
. - The coded value of
level_1
is0
and the bitmap oflevel_1
is1010
.
CelerData performs a bitwise logical operation 1110 & 1010
based on the AND
operator to get the result 1010
. According to the result, CelerData only reads the first row and the third row.