- Release Notes
- Get Started
- Clusters
- Cloud Settings
- Table Type
- Query Data Lakes
- Integration
- Query Acceleration
- Data Loading
- Concepts
- Batch load data from Amazon S3
- Batch load data from Azure cloud storage
- Load data from a local file system
- Load data from Confluent Cloud
- Load data from Amazon MSK
- Load data from Amazon Kinesis
- Data Unloading
- Data Backup
- Security
- Console Access Control
- Data Access Control
- Application keys
- Service accounts
- Use SSL connection
- Alarm
- Usage and Billing
- Organizations and Accounts
- Reference
- Amazon Web Services (AWS)
- Microsoft Azure
- SQL Reference
- Keywords
- ALL statements
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW TABLE STATUS
- SHOW FILE
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- SQL Functions
- Function list
- Java UDFs
- Window functions
- Lambda expression
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_diff
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayofweek_iso
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- jodatime_format
- last_day
- makedate
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- next_day
- now
- previous_day
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str_to_jodatime
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- to_iso8601
- to_tera_date
- to_tera_timestamp
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Aggregate Functions
- any_value
- approx_count_distinct
- array_agg
- avg
- bitmap
- bitmap_agg
- count
- count_if
- corr
- covar_pop
- covar_samp
- group_concat
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- min_by
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- substring_index
- starts_with
- strleft
- strright
- str_to_map
- substring
- trim
- ucase
- unhex
- upper
- url_decode
- url_encode
- Pattern Matching Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bit Functions
- Bitmap Functions
- Array Functions
- all_match
- any_match
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_generate
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Map Functions
- Binary Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Struct Functions
- Table Functions
- Utility Functions
- AUTO_INCREMENT
- Generated columns
- System variables
- System limits
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- System Metadatabase
- API
- Overview
- Actions
- Clusters
- Create and Manage Clusters
- Query Clusters
- Identity and Access Management
- Organization and Account
- Usage and Billing
- Clusters
- Terraform Provider
- Run scripts
Synchronous materialized view
This topic describes how to create, use, and manage a synchronous materialized view (Rollup).
For a synchronous materialized view, all changes in the base table are simultaneously updated to the corresponding synchronous materialized views. The refresh of a synchronous materialized view is triggered automatically. Synchronous materialized views are significantly inexpensive to maintain and update, making them suitable for transparent acceleration of real-time, single-table aggregate queries.
Synchronous materialized views in CelerData can be created only on a single base table from the default catalog. They are essentially a special index for query acceleration rather than a physical table like asynchronous materialized views.
CelerData provides asynchronous materialized views, which supports creation on multiple tables and more aggregation operators. For the usage of asynchronous materialized views, see Asynchronous materialized view.
The following table compares the asynchronous materialized views (ASYNC MVs), and the synchronous materialized view (SYNC MV) in the perspective of features that they support:
Single-table aggregation | Multi-table join | Query rewrite | Refresh strategy | Base table | |
---|---|---|---|---|---|
ASYNC MV | Yes | Yes | Yes |
| Multiple tables from:
|
SYNC MV (Rollup) | Limited choices of aggregate functions | No | Yes | Synchronous refresh during data loading | Single table in the default catalog |
Basic concepts
Base table
Base tables are the driving tables of a materialized view.
For CelerData' synchronous materialized views, base tables must be a single native table from the default catalog. CelerData supports creating synchronous materialized views on Duplicate Key table, Aggregate tables, and Unique Key tables.
Refresh
A synchronous materialized view updates itself every time the data in the base table changes. You do not need to trigger the refresh manually.
Query rewrite
Query rewrite means that when executing a query on base tables with materialized views built on, the system automatically judges whether the pre-computed results in the materialized view can be reused for the query. If they can be reused, the system will load the data directly from the relevant materialized view to avoid the time- and resource-consuming computations or joins.
Synchronous materialized views support query rewrite based on some of the aggregate operators. For more information, see Correspondence of aggregate operators.
Preparation
Before creating a synchronous materialized view, check if your data warehouse is eligible for query acceleration through synchronous materialized views. For example, check if the queries reuse certain sub-query statements.
The following example is based on the table sales_records
, which contains the transaction ID record_id
, salesperson ID seller_id
, store ID store_id
, date sale_date
, and sales amount sale_amt
for each transaction. Follow these steps to create the table and insert data into it:
CREATE TABLE sales_records(
record_id INT,
seller_id INT,
store_id INT,
sale_date DATE,
sale_amt BIGINT
) DISTRIBUTED BY HASH(record_id);
INSERT INTO sales_records
VALUES
(001,01,1,"2022-03-13",8573),
(002,02,2,"2022-03-14",6948),
(003,01,1,"2022-03-14",4319),
(004,03,3,"2022-03-15",8734),
(005,03,3,"2022-03-16",4212),
(006,02,2,"2022-03-17",9515);
The business scenario of this example demands frequent analyses on the sales amounts of different stores. As a result, the sum()
function is used on each query, consuming a massive amount of compute resources. You can run the query to record its time, and view its query profile by using EXPLAIN command.
MySQL > SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+----------+-----------------+
| store_id | sum(`sale_amt`) |
+----------+-----------------+
| 2 | 16463 |
| 3 | 12946 |
| 1 | 12892 |
+----------+-----------------+
3 rows in set (0.02 sec)
MySQL > EXPLAIN SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:3: store_id | 6: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 3: store_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: sum(6: sum) |
| | group by: 3: store_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 3: store_id |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(5: sale_amt) |
| | group by: 3: store_id |
| | |
| 0:OlapScanNode |
| TABLE: sales_records |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: sales_records |
| tabletRatio=10/10 |
| tabletList=12049,12053,12057,12061,12065,12069,12073,12077,12081,12085 |
| cardinality=1 |
| avgRowSize=2.0 |
| numNodes=0 |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)
It can be observed that the query takes about 0.02 seconds, and no synchronous materialized view is used to accelerate the query because the value of rollup
field in the query profile is sales_records
, which is the base table.
Create a synchronous materialized view
You can create a synchronous materialized view based on a specific query statement using CREATE MATERIALIZED VIEW.
Based on the table sales_records
and the query statement mentioned above, the following example creates the synchronous materialized view store_amt
to analyze the sum of sales amount in each store.
CREATE MATERIALIZED VIEW store_amt AS
SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
CAUTION
- When using aggregate functions in synchronous materialized views, you must use the GROUP BY clause and specify at least one GROUP BY column in your SELECT list.
- Synchronous materialized views do not support using one aggregate function on multiple columns. Query statements in the form of
sum(a+b)
are not supported.- Synchronous materialized views do not support using multiple aggregate functions on one column. Query statements in the form of
select sum(a), min(a) from table
are not supported.- JOIN and WHERE clauses are not supported when creating a synchronous materialized view.
- When using ALTER TABLE DROP COLUMN to drop a specific column in a base table, you need to ensure that all synchronous materialized views of the base table do not contain the dropped column, otherwise the drop operation cannot be performed. To drop a column that used in synchronous materialized views, you need to first drop all synchronous materialized views that contain the column, and then drop the column.
- Creating too many synchronous materialized views for a table will affect the data loading efficiency. When data is being loaded to the base table, the data in synchronous materialized views and base table are updated synchronously. If the base table contains
n
synchronous materialized views, the efficiency of loading data into the base table is about the same as that of loading data inton
tables.- Currently, CelerData does not support creating multiple synchronous materialized views at the same time. A new synchronous materialized view can only be created when the previous one is completed.
Check the building status of a synchronous materialized view
Creating a synchronous materialized view is an asynchronous operation. Executing CREATE MATERIALIZED VIEW successfully indicates that the task of creating the materialized view is submitted successfully. You can view the building status of the synchronous materialized view in a database via SHOW ALTER MATERIALIZED VIEW.
MySQL > SHOW ALTER MATERIALIZED VIEW\G
*************************** 1. row ***************************
JobId: 12090
TableName: sales_records
CreateTime: 2022-08-25 19:41:10
FinishedTime: 2022-08-25 19:41:39
BaseIndexName: sales_records
RollupIndexName: store_amt
RollupId: 12091
TransactionId: 10
State: FINISHED
Msg:
Progress: NULL
Timeout: 86400
1 row in set (0.00 sec)
The RollupIndexName
section indicates the name of the synchronous materialized view, and State
section indicates if the building is completed.
Query a synchronous materialized view
Because a synchronous materialized view is essentially an index of the base table rather than a physical table, you can only query a synchronous materialized view using the hint [_SYNC_MV_]
:
-- Do not omit the brackets [] in the hint.
MySQL > SELECT * FROM store_amt [_SYNC_MV_];
+----------+----------+
| store_id | sale_amt |
+----------+----------+
| 2 | 6948 |
| 3 | 8734 |
| 1 | 4319 |
| 2 | 9515 |
| 3 | 4212 |
| 1 | 8573 |
+----------+----------+
CAUTION
Currently, CelerData automatically generates names for columns in a synchronous materialized view even if you have specified aliases for them.
Query with the synchronous materialized view
The synchronous materialized view you created contains the complete set of pre-computed results in accordance with the query statement. Subsequent queries use the data within it. You can run the same query to test the query time as you did in the preparation.
MySQL > SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+----------+-----------------+
| store_id | sum(`sale_amt`) |
+----------+-----------------+
| 2 | 16463 |
| 3 | 12946 |
| 1 | 12892 |
+----------+-----------------+
3 rows in set (0.01 sec)
It can be observed that the query time is reduced to 0.01 seconds.
Check if a query hits the synchronous materialized view
Execute EXPLAIN command again to check if the query hits the synchronous materialized view.
MySQL > EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:3: store_id | 6: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 3: store_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: sum(6: sum) |
| | group by: 3: store_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 3: store_id |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(5: sale_amt) |
| | group by: 3: store_id |
| | |
| 0:OlapScanNode |
| TABLE: sales_records |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: store_amt |
| tabletRatio=10/10 |
| tabletList=12092,12096,12100,12104,12108,12112,12116,12120,12124,12128 |
| cardinality=6 |
| avgRowSize=2.0 |
| numNodes=0 |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)
It can be observed that the value of rollup
section in the query profile is now store_amt
, which is the synchronous materialized view you have built. That means this query has hit the synchronous materialized view.
Show synchronous materialized views
You can execute DESC \<tbl_name> ALL to check the schema of a table and its subordinate synchronous materialized views.
MySQL > DESC sales_records ALL;
+---------------+---------------+-----------+--------+------+-------+---------+-------+
| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |
+---------------+---------------+-----------+--------+------+-------+---------+-------+
| sales_records | DUP_KEYS | record_id | INT | Yes | true | NULL | |
| | | seller_id | INT | Yes | true | NULL | |
| | | store_id | INT | Yes | true | NULL | |
| | | sale_date | DATE | Yes | false | NULL | NONE |
| | | sale_amt | BIGINT | Yes | false | NULL | NONE |
| | | | | | | | |
| store_amt | AGG_KEYS | store_id | INT | Yes | true | NULL | |
| | | sale_amt | BIGINT | Yes | false | NULL | SUM |
+---------------+---------------+-----------+--------+------+-------+---------+-------+
8 rows in set (0.00 sec)
Drop a synchronous materialized view
Under the following circumstances, you need to drop the synchronous materialized view:
- You have created a wrong materialized view and you need to drop it before the building completed.
- You have created too many materialized views, which results in a huge drop in load performance, and some of the materialized views are duplicate.
- The frequency of the involved queries is low, and you can tolerate a relatively high query latency.
Drop an unfinished synchronous materialized view
You can drop a synchronous materialized view that is being created by canceling the in-progress creation task. First, you need to get the job ID JobID
of the materialized view creation task by checking the building status of the materialized view. After getting the job ID, you need to cancel the creation task with the CANCEL ALTER command.
CANCEL ALTER TABLE ROLLUP FROM sales_records (12090);
Drop an existing synchronous materialized view
You can drop an existing synchronous materialized view with the DROP MATERIALIZED VIEW command.
DROP MATERIALIZED VIEW store_amt;
Best practices
Exact count distinct
The following example is based on an advertisement business analysis table advertiser_view_record
, which records the date that the ad is viewed click_time
, the name of the ad advertiser
, the channel of the ad channel
, and the ID of the user who viewed the ID user_id
.
CREATE TABLE advertiser_view_record(
click_time DATE,
advertiser VARCHAR(10),
channel VARCHAR(10),
user_id INT
) distributed BY hash(click_time);
Analysis is mainly focused on the UV of the ads.
SELECT advertiser, channel, count(distinct user_id)
FROM advertiser_view_record
GROUP BY advertiser, channel;
To accelerate exact count distinct, you can create a synchronous materialized view based on this table and use the bitmap_union function to pre-aggregate the data.
CREATE MATERIALIZED VIEW advertiser_uv AS
SELECT advertiser, channel, bitmap_union(to_bitmap(user_id))
FROM advertiser_view_record
GROUP BY advertiser, channel;
After the synchronous materialized view is created, the sub-query count(distinct user_id)
in the subsequent queries will be automatically rewritten as bitmap_union_count (to_bitmap(user_id))
so that they can hit the synchronous materialized view.
Approximate count distinct
Use the table advertiser_view_record
above as an example again. To accelerate approximate count distinct, you can create a synchronous materialized view based on this table and use the hll_union() function to pre-aggregate the data.
CREATE MATERIALIZED VIEW advertiser_uv2 AS
SELECT advertiser, channel, hll_union(hll_hash(user_id))
FROM advertiser_view_record
GROUP BY advertiser, channel;
Set extra sort keys
Suppose that the base table tableA
contains columns k1
, k2
and k3
, where only k1
and k2
are sort keys. If the query including the sub-query where k3=x
must be accelerated, you can create a synchronous materialized view with k3
as the first column.
CREATE MATERIALIZED VIEW k3_as_key AS
SELECT k3, k2, k1
FROM tableA
Correspondence of aggregate operators
When a query is executed with a synchronous materialized view, the original query statement will be automatically rewritten and used to query the intermediate results stored in the synchronous materialized view. The following table shows the correspondence between the aggregate function in the original query and the aggregate function used to construct the synchronous materialized view. You can select the corresponding aggregate function to build a synchronous materialized view according to your business scenario.
aggregate function in the original query | aggregate function of the materialized view |
---|---|
sum | sum |
min | min |
max | max |
count | count |
bitmap_union, bitmap_union_count, count(distinct) | bitmap_union |
hll_raw_agg, hll_union_agg, ndv, approx_count_distinct | hll_union |
- Synchronous materialized view
- Basic concepts
- Preparation
- Create a synchronous materialized view
- Check the building status of a synchronous materialized view
- Query a synchronous materialized view
- Query with the synchronous materialized view
- Check if a query hits the synchronous materialized view
- Show synchronous materialized views
- Drop a synchronous materialized view
- Best practices
- Correspondence of aggregate operators