- Release Notes
- Introduction to CelerData Cloud Serverless
- Quick Start
- Sign up for CelerData Cloud Serverless
- A quick tour of the console
- Connect to CelerData Cloud Serverless
- Create an IAM integration
- Create and assign a warehouse
- Create an external catalog
- Load data from cloud storage
- Load data from Apache Kafka/Confluent Cloud
- Try your first query
- Invite new users
- Design data access control policy
- Warehouses
- Catalog, database, table, view, and MV
- Overview of database objects
- Catalog
- Table types
- Asynchronous materialized views
- Data Loading
- Data access control
- Networking and private connectivity
- Usage and Billing
- Organization and Account
- Integration
- Query Acceleration
- Reference
- AWS IAM policies
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- Data Types
- System Metadatabase
- Keywords
- SQL Statements
- Account Management
- Data Definition
- CREATE TABLE
- ALTER TABLE
- DROP CATALOG
- CREATE TABLE LIKE
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- DROP TABLE
- RECOVER
- USE
- CREATE MATERIALIZED VIEW
- DROP DATABASE
- ALTER MATERIALIZED VIEW
- DROP REPOSITORY
- CANCEL RESTORE
- DROP INDEX
- DROP MATERIALIZED VIEW
- CREATE DATABASE
- CREATE TABLE AS SELECT
- BACKUP
- CANCEL BACKUP
- CREATE REPOSITORY
- CREATE INDEX
- Data Manipulation
- INSERT
- SHOW CREATE DATABASE
- SHOW BACKUP
- SHOW ALTER MATERIALIZED VIEW
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- ALTER ROUTINE LOAD
- SHOW TABLES
- STREAM LOAD
- SHOW PARTITIONS
- CANCEL REFRESH MATERIALIZED VIEW
- SHOW CREATE CATALOG
- SHOW ROUTINE LOAD TASK
- SHOW RESTORE
- CREATE ROUTINE LOAD
- STOP ROUTINE LOAD
- SHOW DATABASES
- BROKER LOAD
- SHOW ROUTINE LOAD
- PAUSE ROUTINE LOAD
- SHOW SNAPSHOT
- SHOW CREATE TABLE
- CANCEL LOAD
- REFRESH MATERIALIZED VIEW
- SHOW REPOSITORIES
- SHOW LOAD
- Administration
- DESCRIBE
- SQL Functions
- Function List
- String Functions
- CONCAT
- HEX
- LOWER
- SPLIT
- LPAD
- SUBSTRING
- PARSE_URL
- INSTR
- REPEAT
- LCASE
- REPLACE
- HEX_DECODE_BINARY
- RPAD
- SPLIT_PART
- STRCMP
- SPACE
- CHARACTER_LENGTH
- URL_ENCODE
- APPEND_TAILING_CHAR_IF_ABSENT
- LTRIM
- HEX_DECODE_STRING
- URL_DECODE
- LEFT
- STARTS_WITH
- CONCAT
- GROUP_CONCAT
- STR_TO_MAP
- STRLEFT
- STRRIGHT
- MONEY_FORMAT
- RIGHT
- SUBSTRING_INDEX
- UCASE
- TRIM
- FIND_IN_SET
- RTRIM
- ASCII
- UPPER
- REVERSE
- LENGTH
- UNHEX
- ENDS_WITH
- CHAR_LENGTH
- NULL_OR_EMPTY
- LOCATE
- CHAR
- Predicate Functions
- Map Functions
- Binary Functions
- Geospatial Functions
- Lambda Expression
- Utility Functions
- Bitmap Functions
- BITMAP_SUBSET_LIMIT
- TO_BITMAP
- BITMAP_AGG
- BITMAP_FROM_STRING
- BITMAP_OR
- BITMAP_REMOVE
- BITMAP_AND
- BITMAP_TO_BASE64
- BITMAP_MIN
- BITMAP_CONTAINS
- SUB_BITMAP
- BITMAP_UNION
- BITMAP_COUNT
- BITMAP_UNION_INT
- BITMAP_XOR
- BITMAP_UNION_COUNT
- BITMAP_HAS_ANY
- BITMAP_INTERSECT
- BITMAP_AND_NOT
- BITMAP_TO_STRING
- BITMAP_HASH
- INTERSECT_COUNT
- BITMAP_EMPTY
- BITMAP_MAX
- BASE64_TO_ARRAY
- BITMAP_TO_ARRAY
- Struct Functions
- Aggregate Functions
- RETENTION
- MI
- MULTI_DISTINCT_SUM
- WINDOW_FUNNEL
- STDDEV_SAMP
- GROUPING_ID
- HLL_HASH
- AVG
- HLL_UNION_AGG
- COUNT
- BITMAP
- HLL_EMPTY
- SUM
- MAX_BY
- PERCENTILE_CONT
- COVAR_POP
- PERCENTILE_APPROX
- HLL_RAW_AGG
- STDDEV
- CORR
- COVAR_SAMP
- MIN_BY
- MAX
- VAR_SAMP
- STD
- HLL_UNION
- APPROX_COUNT_DISTINCT
- MULTI_DISTINCT_COUNT
- VARIANCE
- ANY_VALUE
- COUNT_IF
- GROUPING
- PERCENTILE_DISC
- Array Functions
- ARRAY_CUM_SUM
- ARRAY_MAX
- ARRAY_LENGTH
- ARRAY_REMOVE
- UNNEST
- ARRAY_SLICE
- ALL_MATCH
- ARRAY_CONCAT
- ARRAY_SORT
- ARRAY_POSITION
- ARRAY_DIFFERENCE
- ARRAY_CONTAINS
- ARRAY_JOIN
- ARRAY_INTERSECT
- CARDINALITY
- ARRAY_CONTAINS_ALL
- ARRAYS_OVERLAP
- ARRAY_MIN
- ARRAY_MAP
- ELEMENT_AT
- ARRAY_APPEND
- ARRAY_SORTBY
- ARRAY_TO_BITMAP
- ARRAY_GENERATE
- ARRAY_AVG
- ARRAY_FILTER
- ANY_MATCH
- REVERSE
- ARRAY_AGG
- ARRAY_DISTINCT
- ARRAY_SUM
- Condition Functions
- Math Functions
- Date and Time Functions
- DAYNAME
- MINUTE
- FROM_UNIXTIME
- HOUR
- MONTHNAME
- MONTHS_ADD
- ADD_MONTHS
- DATE_SUB
- PREVIOUS_DAY
- TO_TERA_DATA
- MINUTES_SUB
- WEEKS_ADD
- HOURS_DIFF
- UNIX_TIMESTAMP
- DAY
- DATE_SLICE
- DATE
- CURTIME
- SECONDS_SUB
- MONTH
- WEEK
- TO_DATE
- TIMEDIFF
- MONTHS_DIFF
- STR_TO_JODATIME
- WEEK_ISO
- MICROSECONDS_SUB
- TIME_SLICE
- MAKEDATE
- DATE_TRUNC
- JODATIME
- DAYOFWEEK
- YEARS_SUB
- TIMESTAMP_ADD
- HOURS_SUB
- STR2DATE
- TIMESTAMP
- FROM_DAYS
- WEEK_OF_YEAR
- YEAR
- TIMESTAMP_DIFF
- TO_TERA_TIMESTAMP
- DAYOFMONTH
- DAYOFYEAR
- DATE_FORMAT
- MONTHS_SUB
- NEXT_DAY
- MINUTES_DIFF
- DATA_ADD
- MINUTES_ADD
- CURDATE
- DAY_OF_WEEK_ISO
- CURRENt_TIMESTAMP
- STR_TO_DATE
- LAST_DAY
- WEEKS_SUB
- TO_DAYS
- DATEDIFF
- NOW
- TO_ISO8601
- TIME_TO_SEC
- QUARTER
- SECONDS_DIFF
- UTC_TIMESTAMP
- DATA_DIFF
- SECONDS_ADD
- ADDDATE
- WEEKSDIFF
- CONVERT_TZ
- MICROSECONDS_ADD
- SECOND
- YEARS_DIFF
- YEARS_ADD
- HOURS_ADD
- DAYS_SUB
- DAYS_DIFF
- Cryptographic Functions
- Percentile Functions
- Bit Functions
- JSON Functions
- Hash Functions
- Scalar Functions
- Table Functions
Query rewrite with materialized views
This topic describes how to leverage CelerData's asynchronous materialized views to rewrite and accelerate queries.
NOTE
This feature is included in the Premium software edition. Please see the Software editions documentation for details on the differences between Standard and Premium editions if you are subscribed to the Standard edition.
Overview
CelerData's asynchronous materialized view uses a widely adopted transparent query rewrite algorithm based on the SPJG (select-project-join-group-by) form. Without the need to modify the query statement, CelerData can automatically rewrite queries against the base tables into queries against the corresponding materialized view that contains the pre-computed results. As a result, materialized views can help you significantly reduce computational costs, and substantially accelerate query execution.
The query rewrite feature based on asynchronous materialized views is particularly useful in the following scenarios:
Pre-aggregation of metrics
You can use materialized views to create a pre-aggregated metric layer if you are dealing with a high dimensionality of data.
Joins of wide tables
Materialized views allow you to transparently accelerate queries with joins of multiple large wide tables in complex scenarios.
Query acceleration in the data lake
Building an external catalog-based materialized view can easily accelerate queries against data in your data lake.
NOTE
Asynchronous materialized views created on base tables in a JDBC catalog do not support query rewrite.
Features
CelerData's asynchronous materialized view-based automatic query rewrite features the following attributes:
- Strong data consistency: If the base tables are native tables, CelerData ensures that the results obtained through the materialized view-based query rewrite are consistent with the results returned from the direct query against the base tables.
- Staleness rewrite: CelerData supports staleness rewrite, allowing you to tolerate a certain level of data expiration to cope with scenarios with frequent data changes.
- Multi-table joins: CelerData's asynchronous materialized view supports various types of joins, including some complex join scenarios like View Delta Joins and Derivable Joins, allowing you to accelerate queries in scenarios involving large wide tables.
- Aggregation rewrite: CelerData can rewrite queries with aggregations to improve report performance.
- Nested materialized view: CelerData supports rewriting complex queries based on nested materialized views, expanding the scope of queries that can be rewritten.
- Union rewrite: You can combine the Union rewrite feature with the TTL (Time-to-Live) of the materialized view's partitions to achieve the separation of the hot and cold data, which allows you to query hot data from materialized views and historical data from the base table.
- Materialized views on views: You can accelerate the queries in scenarios with data modeling based on views.
- Materialized views on external catalogs: You can accelerate queries in data lakes.
- Complex expression rewrite: It can handle complex expressions, including function calls and arithmetic operations, catering to advanced analytical and calculation requirements.
These features will be elaborated in the following sections.
Join rewrite
CelerData supports rewriting queries with various types of joins, including Inner Join, Cross Join, Left Outer Join, Full Outer Join, Right Outer Join, Semi Join, and Anti Join.
The following is an example of rewriting queries with joins. Create two base tables as follows:
CREATE TABLE customer (
c_custkey INT(11) NOT NULL,
c_name VARCHAR(26) NOT NULL,
c_address VARCHAR(41) NOT NULL,
c_city VARCHAR(11) NOT NULL,
c_nation VARCHAR(16) NOT NULL,
c_region VARCHAR(13) NOT NULL,
c_phone VARCHAR(16) NOT NULL,
c_mktsegment VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(c_custkey)
DISTRIBUTED BY HASH(c_custkey) BUCKETS 12;
CREATE TABLE lineorder (
lo_orderkey INT(11) NOT NULL,
lo_linenumber INT(11) NOT NULL,
lo_custkey INT(11) NOT NULL,
lo_partkey INT(11) NOT NULL,
lo_suppkey INT(11) NOT NULL,
lo_orderdate INT(11) NOT NULL,
lo_orderpriority VARCHAR(16) NOT NULL,
lo_shippriority INT(11) NOT NULL,
lo_quantity INT(11) NOT NULL,
lo_extendedprice INT(11) NOT NULL,
lo_ordtotalprice INT(11) NOT NULL,
lo_discount INT(11) NOT NULL,
lo_revenue INT(11) NOT NULL,
lo_supplycost INT(11) NOT NULL,
lo_tax INT(11) NOT NULL,
lo_commitdate INT(11) NOT NULL,
lo_shipmode VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(lo_orderkey)
DISTRIBUTED BY HASH(lo_orderkey) BUCKETS 48;
With the above base tables, you can create a materialized view as follows:
CREATE MATERIALIZED VIEW join_mv1
DISTRIBUTED BY HASH(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, lo_partkey, c_name, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;
Such a materialized view can rewrite the following query:
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_name, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;
CelerData supports rewriting join queries with complex expressions, such as arithmetic operations, string functions, date functions, CASE WHEN expressions, and OR predicates. For example, the above materialized view can rewrite the following query:
SELECT
lo_orderkey,
lo_linenumber,
(2 * lo_revenue + 1) * lo_linenumber,
upper(c_name),
substr(c_address, 3)
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;
In addition to the conventional scenario, CelerData further supports rewriting join queries in more complicated scenarios.
Query Delta Join rewrite
Query Delta Join refers to a scenario in which the tables joined in a query are a superset of the tables joined in a materialized view. For instance, consider the following query that involves joins of three tables: lineorder
, customer
, and part
. If the materialized view join_mv1
contains only the join of lineorder
and customer
, CelerData can rewrite the query using join_mv1
.
Example:
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_name, c_address, p_name
FROM
lineorder INNER JOIN customer ON lo_custkey = c_custkey
INNER JOIN part ON lo_partkey = p_partkey;
Its original query plan and the one after the rewrite are as follows:
View Delta Join rewrite
View Delta Join refers to a scenario in which the tables joined in a query are a subset of the tables joined in a materialized view. This feature is typically used in scenarios involving large wide tables. For example, in the context of the Star Schema Benchmark (SSB), you can create a materialized view that joins all tables to improve query performance. Through testing, it has been found that query performance for multi-table joins can achieve the same level of performance as querying the corresponding large wide table after transparently rewriting the queries through the materialized view.
To perform a View Delta Join rewrite, the materialized view must contain the 1:1 cardinality preservation join that does not exist in the query. Here are the nine types of joins that are considered cardinality preservation joins, and satisfying any one of them enables View Delta Join rewriting:
Take SSB tests as an example, create the following base tables:
CREATE TABLE customer (
c_custkey INT(11) NOT NULL,
c_name VARCHAR(26) NOT NULL,
c_address VARCHAR(41) NOT NULL,
c_city VARCHAR(11) NOT NULL,
c_nation VARCHAR(16) NOT NULL,
c_region VARCHAR(13) NOT NULL,
c_phone VARCHAR(16) NOT NULL,
c_mktsegment VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(c_custkey)
DISTRIBUTED BY HASH(c_custkey) BUCKETS 12
PROPERTIES (
"unique_constraints" = "c_custkey" -- Specify the unique constraints.
);
CREATE TABLE dates (
d_datekey DATE NOT NULL,
d_date VARCHAR(20) NOT NULL,
d_dayofweek VARCHAR(10) NOT NULL,
d_month VARCHAR(11) NOT NULL,
d_year INT(11) NOT NULL,
d_yearmonthnum INT(11) NOT NULL,
d_yearmonth VARCHAR(9) NOT NULL,
d_daynuminweek INT(11) NOT NULL,
d_daynuminmonth INT(11) NOT NULL,
d_daynuminyear INT(11) NOT NULL,
d_monthnuminyear INT(11) NOT NULL,
d_weeknuminyear INT(11) NOT NULL,
d_sellingseason VARCHAR(14) NOT NULL,
d_lastdayinweekfl INT(11) NOT NULL,
d_lastdayinmonthfl INT(11) NOT NULL,
d_holidayfl INT(11) NOT NULL,
d_weekdayfl INT(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(d_datekey)
DISTRIBUTED BY HASH(d_datekey) BUCKETS 1
PROPERTIES (
"unique_constraints" = "d_datekey" -- Specify the unique constraints.
);
CREATE TABLE supplier (
s_suppkey INT(11) NOT NULL,
s_name VARCHAR(26) NOT NULL,
s_address VARCHAR(26) NOT NULL,
s_city VARCHAR(11) NOT NULL,
s_nation VARCHAR(16) NOT NULL,
s_region VARCHAR(13) NOT NULL,
s_phone VARCHAR(16) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(s_suppkey)
DISTRIBUTED BY HASH(s_suppkey) BUCKETS 12
PROPERTIES (
"unique_constraints" = "s_suppkey" -- Specify the unique constraints.
);
CREATE TABLE part (
p_partkey INT(11) NOT NULL,
p_name VARCHAR(23) NOT NULL,
p_mfgr VARCHAR(7) NOT NULL,
p_category VARCHAR(8) NOT NULL,
p_brand VARCHAR(10) NOT NULL,
p_color VARCHAR(12) NOT NULL,
p_type VARCHAR(26) NOT NULL,
p_size TINYINT(11) NOT NULL,
p_container VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(p_partkey)
DISTRIBUTED BY HASH(p_partkey) BUCKETS 12
PROPERTIES (
"unique_constraints" = "p_partkey" -- Specify the unique constraints.
);
CREATE TABLE lineorder (
lo_orderdate DATE NOT NULL, -- Specify it as NOT NULL.
lo_orderkey INT(11) NOT NULL,
lo_linenumber TINYINT NOT NULL,
lo_custkey INT(11) NOT NULL, -- Specify it as NOT NULL.
lo_partkey INT(11) NOT NULL, -- Specify it as NOT NULL.
lo_suppkey INT(11) NOT NULL, -- Specify it as NOT NULL.
lo_orderpriority VARCHAR(100) NOT NULL,
lo_shippriority TINYINT NOT NULL,
lo_quantity TINYINT NOT NULL,
lo_extendedprice INT(11) NOT NULL,
lo_ordtotalprice INT(11) NOT NULL,
lo_discount TINYINT NOT NULL,
lo_revenue INT(11) NOT NULL,
lo_supplycost INT(11) NOT NULL,
lo_tax TINYINT NOT NULL,
lo_commitdate DATE NOT NULL,
lo_shipmode VARCHAR(100) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(lo_orderdate,lo_orderkey)
PARTITION BY RANGE(lo_orderdate)
(PARTITION p1 VALUES [("0000-01-01"), ("1993-01-01")),
PARTITION p2 VALUES [("1993-01-01"), ("1994-01-01")),
PARTITION p3 VALUES [("1994-01-01"), ("1995-01-01")),
PARTITION p4 VALUES [("1995-01-01"), ("1996-01-01")),
PARTITION p5 VALUES [("1996-01-01"), ("1997-01-01")),
PARTITION p6 VALUES [("1997-01-01"), ("1998-01-01")),
PARTITION p7 VALUES [("1998-01-01"), ("1999-01-01")))
DISTRIBUTED BY HASH(lo_orderkey) BUCKETS 48
PROPERTIES (
"foreign_key_constraints" = "
(lo_custkey) REFERENCES customer(c_custkey);
(lo_partkey) REFERENCES part(p_partkey);
(lo_suppkey) REFERENCES supplier(s_suppkey)" -- Specify the Foreign Keys.
);
Create the materialized view lineorder_flat_mv
that joins lineorder
, customer
, supplier
, part
, and dates
:
CREATE MATERIALIZED VIEW lineorder_flat_mv
DISTRIBUTED BY HASH(LO_ORDERDATE, LO_ORDERKEY) BUCKETS 48
PARTITION BY LO_ORDERDATE
REFRESH MANUAL
PROPERTIES (
"session.partition_refresh_number"="1"
)
AS SELECT /*+ SET_VAR(query_timeout = 7200) */ -- Set timeout for the refresh operation.
l.LO_ORDERDATE AS LO_ORDERDATE,
l.LO_ORDERKEY AS LO_ORDERKEY,
l.LO_LINENUMBER AS LO_LINENUMBER,
l.LO_CUSTKEY AS LO_CUSTKEY,
l.LO_PARTKEY AS LO_PARTKEY,
l.LO_SUPPKEY AS LO_SUPPKEY,
l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
l.LO_QUANTITY AS LO_QUANTITY,
l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
l.LO_DISCOUNT AS LO_DISCOUNT,
l.LO_REVENUE AS LO_REVENUE,
l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
l.LO_TAX AS LO_TAX,
l.LO_COMMITDATE AS LO_COMMITDATE,
l.LO_SHIPMODE AS LO_SHIPMODE,
c.C_NAME AS C_NAME,
c.C_ADDRESS AS C_ADDRESS,
c.C_CITY AS C_CITY,
c.C_NATION AS C_NATION,
c.C_REGION AS C_REGION,
c.C_PHONE AS C_PHONE,
c.C_MKTSEGMENT AS C_MKTSEGMENT,
s.S_NAME AS S_NAME,
s.S_ADDRESS AS S_ADDRESS,
s.S_CITY AS S_CITY,
s.S_NATION AS S_NATION,
s.S_REGION AS S_REGION,
s.S_PHONE AS S_PHONE,
p.P_NAME AS P_NAME,
p.P_MFGR AS P_MFGR,
p.P_CATEGORY AS P_CATEGORY,
p.P_BRAND AS P_BRAND,
p.P_COLOR AS P_COLOR,
p.P_TYPE AS P_TYPE,
p.P_SIZE AS P_SIZE,
p.P_CONTAINER AS P_CONTAINER,
d.D_DATE AS D_DATE,
d.D_DAYOFWEEK AS D_DAYOFWEEK,
d.D_MONTH AS D_MONTH,
d.D_YEAR AS D_YEAR,
d.D_YEARMONTHNUM AS D_YEARMONTHNUM,
d.D_YEARMONTH AS D_YEARMONTH,
d.D_DAYNUMINWEEK AS D_DAYNUMINWEEK,
d.D_DAYNUMINMONTH AS D_DAYNUMINMONTH,
d.D_DAYNUMINYEAR AS D_DAYNUMINYEAR,
d.D_MONTHNUMINYEAR AS D_MONTHNUMINYEAR,
d.D_WEEKNUMINYEAR AS D_WEEKNUMINYEAR,
d.D_SELLINGSEASON AS D_SELLINGSEASON,
d.D_LASTDAYINWEEKFL AS D_LASTDAYINWEEKFL,
d.D_LASTDAYINMONTHFL AS D_LASTDAYINMONTHFL,
d.D_HOLIDAYFL AS D_HOLIDAYFL,
d.D_WEEKDAYFL AS D_WEEKDAYFL
FROM lineorder AS l
INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY
INNER JOIN dates AS d ON l.LO_ORDERDATE = d.D_DATEKEY;
SSB Q2.1 involves joining four tables, but it lacks the customer
table compared to the materialized view lineorder_flat_mv
. In lineorder_flat_mv
, lineorder INNER JOIN customer
is essentially a cardinality preservation join. Therefore, logically, this join can be eliminated without affecting the query results. As a result, Q2.1 can be rewritten using lineorder_flat_mv
.
SSB Q2.1:
SELECT sum(lo_revenue) AS lo_revenue, d_year, p_brand
FROM lineorder
JOIN dates ON lo_orderdate = d_datekey
JOIN part ON lo_partkey = p_partkey
JOIN supplier ON lo_suppkey = s_suppkey
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
Its original query plan and the one after the rewrite are as follows:
Similarly, other queries in the SSB can also be transparently rewritten using lineorder_flat_mv
, thus optimizing query performance.
Join Derivability rewrite
Join Derivability refers to a scenario in which the join types in the materialized view and the query are not consistent, but the materialized view's join results contain the results of the query's join. Currently, it supports two scenarios - joining three or more tables, and joining two tables.
Scenario one: Joining three or more tables
Suppose the materialized view contains a Left Outer Join between tables
t1
andt2
and an Inner Join between tablest2
andt3
. In both joins, the join condition includes columns fromt2
.The query, on the other hand, contains an Inner Join between t1 and t2, and an Inner Join between t2 and t3. In both joins, the join condition includes columns from t2.
In this case, the query can be rewritten using the materialized view. This is because in the materialized view, the Left Outer Join is executed first, followed by the Inner Join. The right table generated by the Left Outer Join has no results for the matching (that is, columns in the right table are NULL). These results are subsequently filtered out during the Inner Join. Therefore, the logic of the materialized view and the query is equivalent, and the query can be rewritten.
Example:
Create the materialized view
join_mv5
:CREATE MATERIALIZED VIEW join_mv5 PARTITION BY lo_orderdate DISTRIBUTED BY hash(lo_orderkey) PROPERTIES ( "partition_refresh_number" = "1" ) AS SELECT lo_orderkey, lo_orderdate, lo_linenumber, lo_revenue, c_custkey, c_address, p_name FROM customer LEFT OUTER JOIN lineorder ON c_custkey = lo_custkey INNER JOIN part ON p_partkey = lo_partkey;
join_mv5
can rewrite the following query:SELECT lo_orderkey, lo_orderdate, lo_linenumber, lo_revenue, c_custkey, c_address, p_name FROM customer INNER JOIN lineorder ON c_custkey = lo_custkey INNER JOIN part ON p_partkey = lo_partkey;
Its original query plan and the one after the rewrite are as follows:
Similarly, if the materialized view is defined as
t1 INNER JOIN t2 INNER JOIN t3
, and the query isLEFT OUTER JOIN t2 INNER JOIN t3
, the query can also be rewritten. Furthermore, this rewriting capability extends to scenarios involving more than three tables.Scenario two: Joining two tables
The Join Derivability Rewrite feature involving two tables supports the following specific cases:
In cases 1 to 9, filtering predicates must be added to the rewritten result to ensure semantic equivalence. For example, create a materialized view as follows:
CREATE MATERIALIZED VIEW join_mv3 DISTRIBUTED BY hash(lo_orderkey) AS SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address FROM lineorder LEFT OUTER JOIN customer ON lo_custkey = c_custkey;
The following query can be rewritten using
join_mv3
, and the predicatec_custkey IS NOT NULL
is added to the rewritten result:SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address FROM lineorder INNER JOIN customer ON lo_custkey = c_custkey;
Its original query plan and the one after the rewrite are as follows:
In case 10, the Left Outer Join query must include the filtering predicate
IS NOT NULL
in the right table, for example,=
,<>
,>
,<
,<=
,>=
,LIKE
,IN
,NOT LIKE
, orNOT IN
. For example, create a materialized view as follows:CREATE MATERIALIZED VIEW join_mv4 DISTRIBUTED BY hash(lo_orderkey) AS SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address FROM lineorder INNER JOIN customer ON lo_custkey = c_custkey;
join_mv4
can rewrite the following query, wherecustomer.c_address = "Sb4gxKs7"
is the filtering predicateIS NOT NULL
:SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address FROM lineorder LEFT OUTER JOIN customer ON lo_custkey = c_custkey WHERE customer.c_address = "Sb4gxKs7";
Its original query plan and the one after the rewrite are as follows:
Aggregation rewrite
CelerData's asynchronous materialized view supports rewriting multi-table aggregate queries with all available aggregate functions, including bitmap_union, hll_union, and percentile_union. For example, create a materialized view as follows:
CREATE MATERIALIZED VIEW agg_mv1
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
lo_linenumber,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey, lo_linenumber, c_name;
It can rewrite the following query:
SELECT
lo_orderkey,
lo_linenumber,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey, lo_linenumber, c_name;
Its original query plan and the one after the rewrite are as follows:
The following sections expound on the scenarios where the Aggregation Rewrite feature can be useful.
Aggregation Rollup rewrite
CelerData supports rewriting queries with Aggregation Rollup, that is, CelerData can rewrite aggregate queries with a GROUP BY a
clause using an asynchronous materialized view created with a GROUP BY a,b
clause. For example, the following query can be rewritten using agg_mv1
:
SELECT
lo_orderkey,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey, c_name;
Its original query plan and the one after the rewrite are as follows:
NOTE
Currently, rewriting the grouping set, grouping set with rollup, or grouping set with cube is not supported.
Only certain aggregate functions support query rewrite with Aggregate Rollup. In the preceding example, if the materialized view order_agg_mv
uses count(distinct client_id)
instead of bitmap_union(to_bitmap(client_id))
, CelerData cannot rewrite the queries with Aggregate Rollup.
The following table shows the correspondence between the aggregate functions in the original query and the aggregate function used to build the materialized view. You can select the corresponding aggregate functions to build a materialized view according to your business scenario.
Aggregate function supported in original queries | Function supported Aggregate Rollup in materialized view |
---|---|
sum | sum |
count | count |
min | min |
max | max |
avg | sum / count |
bitmap_union, bitmap_union_count, count(distinct) | bitmap_union |
hll_raw_agg, hll_union_agg, ndv, approx_count_distinct | hll_union |
percentile_approx, percentile_union | percentile_union |
DISTINCT aggregates without the corresponding GROUP BY column cannot be rewritten with Aggregate Rollup. However, if a query with an Aggregate Rollup DISTINCT aggregate function does not have a GROUP BY column but an equal predicate, it can also be rewritten by the relevant materialized view because CelerData can convert the equal predicates into a GROUP BY constant expression.
In the following example, CelerData can rewrite the query with the materialized view order_agg_mv1
.
CREATE MATERIALIZED VIEW order_agg_mv1
DISTRIBUTED BY HASH(`order_id`) BUCKETS 12
REFRESH ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)
AS
SELECT
order_date,
count(distinct client_id)
FROM order_list
GROUP BY order_date;
-- Query
SELECT
order_date,
count(distinct client_id)
FROM order_list WHERE order_date='2023-07-03';
COUNT DISTINCT rewrite
CelerData supports rewriting COUNT DISTINCT calculations into bitmap-based calculations, enabling high-performance, precise deduplication using materialized views. For example, create a materialized view as follows:
CREATE MATERIALIZED VIEW distinct_mv
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, bitmap_union(to_bitmap(lo_custkey)) AS distinct_customer
FROM lineorder
GROUP BY lo_orderkey;
It can rewrite the following query:
SELECT lo_orderkey, count(distinct lo_custkey)
FROM lineorder
GROUP BY lo_orderkey;
Nested materialized view rewrite
CelerData supports rewriting queries using nested materialized view. For example, create the materialized views join_mv2
, agg_mv2
, and agg_mv3
as follows:
CREATE MATERIALIZED VIEW join_mv2
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_name, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;
CREATE MATERIALIZED VIEW agg_mv2
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
lo_linenumber,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM join_mv2
GROUP BY lo_orderkey, lo_linenumber, c_name;
CREATE MATERIALIZED VIEW agg_mv3
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
sum(total_revenue) AS total_revenue,
max(max_discount) AS max_discount
FROM agg_mv2
GROUP BY lo_orderkey;
Their relationship is as follows:
agg_mv3
can rewrite the following query:
SELECT
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey;
Its original query plan and the one after the rewrite are as follows:
Union rewrite
Predicate Union rewrite
When the predicate scope of a materialized view is a subset of the predicate scope of a query, the query can be rewritten using a UNION operation.
For example, create a materialized view as follows:
CREATE MATERIALIZED VIEW agg_mv4
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder
WHERE lo_orderkey < 300000000
GROUP BY lo_orderkey;
It can rewrite the following query:
select
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder
GROUP BY lo_orderkey;
Its original query plan and the one after the rewrite are as follows:
In this context, agg_mv5
contains data where lo_orderkey < 300000000
. Data where lo_orderkey >= 300000000
is directly obtained from the base table lineorder
. Finally, these two sets of data are combined using a UNION operation and then aggregated to obtain the final result.
Partition Union rewrite
Suppose you created a partitioned materialized view based on a partitioned table. When the partition range that a rewritable query scanned is a superset of the most recent partition range of the materialized view, the query will be rewritten using a UNION operation.
For example, consider the following materialized view agg_mv4
. Its base table lineorder
currently contains partitions from p1
to p7
, and the materialized view also contains partitions from p1
to p7
.
CREATE MATERIALIZED VIEW agg_mv5
DISTRIBUTED BY hash(lo_orderkey)
PARTITION BY RANGE(lo_orderdate)
REFRESH MANUAL
AS
SELECT
lo_orderdate,
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder
GROUP BY lo_orderkey;
If a new partition p8
, with a partition range of [("19990101"), ("20000101"))
, is added to lineorder
, the following query can be rewritten using a UNION operation:
SELECT
lo_orderdate,
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder
GROUP BY lo_orderkey;
Its original query plan and the one after the rewrite are as follows:
As shown above, agg_mv5
contains the data from partitions p1
to p7
, and the data from partition p8
is directly queried from lineorder
. Finally, these two sets of data are combined using a UNION operation.
View-based materialized view rewrite
CelerData supports creating materialized views based on views. Subsequent queries against the views can be transparently rewritten.
For example, create the following views:
CREATE VIEW customer_view1
AS
SELECT c_custkey, c_name, c_address
FROM customer;
CREATE VIEW lineorder_view1
AS
SELECT lo_orderkey, lo_linenumber, lo_custkey, lo_revenue
FROM lineorder;
Then, create the following materialized view based on the views:
CREATE MATERIALIZED VIEW join_mv1
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_name
FROM lineorder_view1 INNER JOIN customer_view1
ON lo_custkey = c_custkey;
During query rewrite, queries against customer_view1
and lineorder_view1
are automatically expanded to the base tables and then transparently matched and rewritten.
External catalog-based materialized view rewrite
CelerData supports building asynchronous materialized views on Hive catalogs, Hudi catalogs, and Iceberg catalogs, and transparently rewriting queries with them. External catalog-based materialized views support most of the query rewrite capabilities, but there are some limitations:
- Hudi, Iceberg, or JDBC catalog-based materialized views do not support Union rewrite.
- Hudi, Iceberg, or JDBC catalog-based materialized views do not support View Delta Join rewrite.
- Hudi, Iceberg, or JDBC catalog-based materialized views do not support the incremental refresh of partitions.
Configure query rewrite
You can configure the asynchronous materialized view query rewrite through the following session variables:
Variable | Default | Description |
---|---|---|
enable_materialized_view_union_rewrite | true | Boolean value to control if to enable materialized view Union query rewrite. |
enable_rule_based_materialized_view_rewrite | true | Boolean value to control if to enable rule-based materialized view query rewrite. This variable is mainly used in single-table query rewrite. |
nested_mv_rewrite_max_level | 3 | The maximum levels of nested materialized views that can be used for query rewrite. Type: INT. Range: [1, +∞). The value of 1 indicates that materialized views created on other materialized views will not be used for query rewrite. |
Check if a query is rewritten
You can check if your query is rewritten by viewing its query plan using the EXPLAIN statement. If the field TABLE
under the section OlapScanNode
shows the name of the corresponding materialized view, it means that the query has been rewritten based on the materialized view.
mysql> EXPLAIN SELECT
order_id, sum(goods.price) AS total
FROM order_list INNER JOIN goods
ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;
+------------------------------------+
| Explain String |
+------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:1: order_id | 8: sum |
| PARTITION: RANDOM |
| |
| RESULT SINK |
| |
| 1:Project |
| | <slot 1> : 9: order_id |
| | <slot 8> : 10: total |
| | |
| 0:OlapScanNode |
| TABLE: order_mv |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: order_mv |
| tabletRatio=0/12 |
| tabletList= |
| cardinality=3 |
| avgRowSize=4.0 |
| numNodes=0 |
+------------------------------------+
20 rows in set (0.01 sec)
Disable query rewrite
By default, CelerData enables query rewrite for asynchronous materialized views created based on the default catalog. You can disable this feature by setting the session variable enable_materialized_view_rewrite
to false
.
For asynchronous materialized views created based on an external catalog, you can disable this feature by setting the materialized view property force_external_table_query_rewrite
to false
using ALTER MATERIALIZED VIEW.
Limitations
In terms of materialized view-based query rewrite, CelerData currently has the following limitations:
- CelerData does not support rewriting queries with non-deterministic functions, including rand, random, uuid, and sleep.
- CelerData does not support rewriting queries with window functions.
- Materialized views defined with statements containing LIMIT, ORDER BY, UNION, EXCEPT, INTERSECT, MINUS, GROUPING SETS, WITH CUBE, or WITH ROLLUP cannot be used for query rewrite.
- Strong consistency of query results is not guaranteed between base tables and materialized views built on external catalogs.
- Asynchronous materialized views created on base tables in a JDBC catalog do not support query rewrite.