- Release Notes
- Get Started
- Clusters
- Cloud Settings
- Table Type
- Query Data Lakes
- Integration
- Query Acceleration
- Data Loading
- Concepts
- Batch load data from Amazon S3
- Batch load data from Azure cloud storage
- Load data from a local file system
- Load data from Confluent Cloud
- Load data from Amazon MSK
- Load data from Amazon Kinesis
- Data Unloading
- Data Backup
- Security
- Console Access Control
- Data Access Control
- Application keys
- Service accounts
- Use SSL connection
- Alarm
- Usage and Billing
- Organizations and Accounts
- Reference
- Amazon Web Services (AWS)
- Microsoft Azure
- SQL Reference
- Keywords
- ALL statements
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW TABLE STATUS
- SHOW FILE
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- SQL Functions
- Function list
- Java UDFs
- Window functions
- Lambda expression
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_diff
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayofweek_iso
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- jodatime_format
- last_day
- makedate
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- next_day
- now
- previous_day
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str_to_jodatime
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- to_iso8601
- to_tera_date
- to_tera_timestamp
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Aggregate Functions
- any_value
- approx_count_distinct
- array_agg
- avg
- bitmap
- bitmap_agg
- count
- count_if
- corr
- covar_pop
- covar_samp
- group_concat
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- min_by
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- substring_index
- starts_with
- strleft
- strright
- str_to_map
- substring
- trim
- ucase
- unhex
- upper
- url_decode
- url_encode
- Pattern Matching Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bit Functions
- Bitmap Functions
- Array Functions
- all_match
- any_match
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_generate
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Map Functions
- Binary Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Struct Functions
- Table Functions
- Utility Functions
- AUTO_INCREMENT
- Generated columns
- System variables
- System limits
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- System Metadatabase
- API
- Overview
- Actions
- Clusters
- Create and Manage Clusters
- Query Clusters
- Identity and Access Management
- Organization and Account
- Usage and Billing
- Clusters
- Terraform Provider
- Run scripts
Generated columns
CelerData supports generated columns. Generated columns can be used to accelerate queries with complex expressions. This feature supports precomputing and storing the results of expressions and query rewrites, which significantly accelerates queries with the same complex expressions.
You can define one or more generated columns to store the results of expressions at table creation. As such, when executing queries that contain the expression whose results are stored in the generated column you have defined, the CBO rewrites the query to read data directly from the generated column. Alternatively, you can directly query the data in the generated column.
It is also recommended to evaluate the impact of generated columns on loading performance because computing expressions takes some time. Additionally, it is advised to create generated columns at table creation rather than adding or modifying them after table creation. Because it is time-consuming and costly to add or modify generated columns after table creation.
Basic operations
Create generated columns
Syntax
<col_name> <data_type> [NULL] AS <expr> [COMMENT 'string']
Create generated columns at table creation (Recommended)
Create a table named test_tbl1
with five columns of which columns newcol1
and newcol2
are generated columns whose values are computed by using the specified expressions and referencing the values of the regular columns data_array
and data_json
respectively.
CREATE TABLE test_tbl1
(
id INT NOT NULL,
data_array ARRAY<int> NOT NULL,
data_json JSON NOT NULL,
newcol1 DOUBLE AS array_avg(data_array),
newcol2 String AS json_string(json_query(data_json, "$.a"))
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id);
NOTICE:
- Generated columns must be defined after regular columns.
- Aggregate functions cannot be used in the expressions for generated columns.
- The expressions for generated column cannot reference other generated columns or auto-increment columns, but the expressions can reference multiple regular columns.
- The data type of a generated column must match the data type of the result generated by the expression for the generated column.
- Generated columns cannot be created on Aggregate tables.
Add generated columns after table creation
NOTICE
This operation is time-consuming and resource-intensive. Therefore, it is recommended to add generated columns at table creation. If it is unavoidable to use ALTER TABLE to add generated columns, it is recommended to evaluate the cost and time involved in advance.
Create a table named
test_tbl2
with three regular columnsid
,data_array
, anddata_json
. Insert a data row into the table.-- Create a table. CREATE TABLE test_tbl2 ( id INT NOT NULL, data_array ARRAY<int> NOT NULL, data_json JSON NOT NULL ) PRIMARY KEY (id) DISTRIBUTED BY HASH(id); -- Insert a data row. INSERT INTO test_tbl2 VALUES (1, [1,2], parse_json('{"a" : 1, "b" : 2}')); -- Query the table. MySQL [example_db]> select * from test_tbl2; +------+------------+------------------+ | id | data_array | data_json | +------+------------+------------------+ | 1 | [1,2] | {"a": 1, "b": 2} | +------+------------+------------------+ 1 row in set (0.04 sec)
Execute ALTER TABLE ... ADD COLUMN ... to add generated columns
newcol1
andnewcol2
, which are created by evaluating the expressions based on the values of regular columnsdata_array
anddata_json
.ALTER TABLE test_tbl2 ADD COLUMN newcol1 DOUBLE AS array_avg(data_array); ALTER TABLE test_tbl2 ADD COLUMN newcol2 String AS json_string(json_query(data_json, "$.a"));
NOTICE:
- Adding generated columns to Aggregate tables is not supported.
- Regular columns need to be defined before generated columns. When you use the ALTER TABLE ... ADD COLUMN ... statement to add a regular column without specifying the position of the new regular column, the system automatically places it before the generated columns. Moreover, you cannot use AFTER to explicitly place the regular column after a generated column.
Query the table data.
MySQL [example_db]> SELECT * FROM test_tbl2; +------+------------+------------------+---------+---------+ | id | data_array | data_json | newcol1 | newcol2 | +------+------------+------------------+---------+---------+ | 1 | [1,2] | {"a": 1, "b": 2} | 1.5 | 1 | +------+------------+------------------+---------+---------+ 1 row in set (0.04 sec)
The result shows that the generated columns
newcol1
andnewcol2
are added to the table, and CelerData automatically computes their values based on the expression.
Load data into generated columns
During data loading, CelerData automatically calculates the values for generated columns based on the expressions. You cannot specify the values of generated columns. The following example uses the INSERT INTO statement to load data:
Use INSERT INTO to insert a record into the
test_tbl1
table. Note that you cannot specify the values for the generated columns within theVALUES ()
clause.INSERT INTO test_tbl1 (id, data_array, data_json) VALUES (1, [1,2], parse_json('{"a" : 1, "b" : 2}'));
Query the table data.
MySQL [example_db]> SELECT * FROM test_tbl1; +------+------------+------------------+---------+---------+ | id | data_array | data_json | newcol1 | newcol2 | +------+------------+------------------+---------+---------+ | 1 | [1,2] | {"a": 1, "b": 2} | 1.5 | 1 | +------+------------+------------------+---------+---------+ 1 row in set (0.01 sec)
The results show that CelerData automatically computes the values for the generated columns
newcol1
andnewcol2
based on the expressions.NOTICE:
The following error is returned if you specify values for the generated columns during data loading:
MySQL [example_db]> INSERT INTO test_tbl1 (id, data_array, data_json, newcol1, newcol2) VALUES (2, [3,4], parse_json('{"a" : 3, "b" : 4}'), 3.5, "3"); ERROR 1064 (HY000): Getting analyzing error. Detail message: materialized column 'newcol1' can not be specified. MySQL [example_db]> INSERT INTO test_tbl1 VALUES (2, [3,4], parse_json('{"a" : 3, "b" : 4}'), 3.5, "3"); ERROR 1064 (HY000): Getting analyzing error. Detail message: Column count doesn't match value count.
Modify generated columns
NOTICE
This operation is time-consuming and resource-intensive. If it is unavoidable to use ALTER TABLE to modify generated columns, it is recommended to evaluate the cost and time involved in advance.
You can modify the data type and expression of a generated column.
Create a table
test_tbl3
with five columns of which columnsnewcol1
andnewcol2
are generated columns whose values are computed by using the specified expressions and referencing the values of the regular columnsdata_array
anddata_json
respectively. Insert a data row into the table.-- Create a table. MySQL [example_db]> CREATE TABLE test_tbl3 ( id INT NOT NULL, data_array ARRAY<int> NOT NULL, data_json JSON NOT NULL, -- The data types and expressions of generated columns are specified as follows: newcol1 DOUBLE AS array_avg(data_array), newcol2 String AS json_string(json_query(data_json, "$.a")) ) PRIMARY KEY (id) DISTRIBUTED BY HASH(id); -- Insert a data row. INSERT INTO test_tbl3 (id, data_array, data_json) VALUES (1, [1,2], parse_json('{"a" : 1, "b" : 2}')); -- Query the table. MySQL [example_db]> select * from test_tbl3; +------+------------+------------------+---------+---------+ | id | data_array | data_json | newcol1 | newcol2 | +------+------------+------------------+---------+---------+ | 1 | [1,2] | {"a": 1, "b": 2} | 1.5 | 1 | +------+------------+------------------+---------+---------+ 1 row in set (0.01 sec)
Modified the generated columns
newcol1
andnewcol2
:Change the data type of the generated column
newcol1
toARRAY<INT>
and change its expression todata_array
.ALTER TABLE test_tbl3 MODIFY COLUMN newcol1 ARRAY<INT> AS data_array;
Modify the expression of the generated column
newcol2
to extract the values of fieldb
from the regular columndata_json
.ALTER TABLE test_tbl3 MODIFY COLUMN newcol2 String AS json_string(json_query(data_json, "$.b"));
View the modified schema and the data in the table.
View the modified schema.
MySQL [example_db]> show create table test_tbl3\G **** 1. row **** Table: test_tbl3 Create Table: CREATE TABLE test_tbl3 ( id int(11) NOT NULL COMMENT "", data_array array<int(11)> NOT NULL COMMENT "", data_json json NOT NULL COMMENT "", -- After modification, the data types and expressions of generated columns are as follows: newcol1 array<int(11)> NULL AS example_db.test_tbl3.data_array COMMENT "", newcol2 varchar(65533) NULL AS json_string(json_query(example_db.test_tbl3.data_json, '$.b')) COMMENT "" ) ENGINE=OLAP PRIMARY KEY(id) DISTRIBUTED BY HASH(id) PROPERTIES (...); 1 row in set (0.00 sec)
Query the table data after modification. The result shows that CelerData recalculates the values of the generated columns
newcol1
andnewcol2
based on the modified expressions.MySQL [example_db]> select * from test_tbl3; +------+------------+------------------+---------+---------+ | id | data_array | data_json | newcol1 | newcol2 | +------+------------+------------------+---------+---------+ | 1 | [1,2] | {"a": 1, "b": 2} | [1,2] | 2 | +------+------------+------------------+---------+---------+ 1 row in set (0.01 sec)
Drop a generated column
Drop column newcol1
from the table test_tbl3
ALTER TABLE test_tbl3 DROP COLUMN newcol1;
NOTICE:
If a generated colum references a regular column in the expression, you cannot directly drop or modify that regular column. Instead, you need to first drop the generated column and then drop or modify the regular column.
Query rewrites
If the expression in a query matches the expression of a generated column, the optimizer automatically rewrites the query to directly read the values of the generated column.
Suppose that you create a table
test_tbl4
with the following schema:CREATE TABLE test_tbl4 ( id INT NOT NULL, data_array ARRAY<int> NOT NULL, data_json JSON NOT NULL, newcol1 DOUBLE AS array_avg(data_array), newcol2 String AS json_string(json_query(data_json, "$.a")) ) PRIMARY KEY (id) DISTRIBUTED BY HASH(id);
If you query the data in the table
test_tbl4
by using theSELECT array_avg(data_array), json_string(json_query(data_json, "$.a")) FROM test_tbl4;
statement, the query involves only the regular columnsdata_array
anddata_json
. However, the expressions in the query match the expressions of the generated columnsnewcol1
andnewcol2
. In this case, the execution plan shows that the CBO automatically rewrites the query to read the values of the generated columnsnewcol1
andnewcol2
.MySQL [example_db]> EXPLAIN SELECT array_avg(data_array), json_string(json_query(data_json, "$.a")) FROM test_tbl4; +---------------------------------------+ | Explain String | +---------------------------------------+ | PLAN FRAGMENT 0 | | OUTPUT EXPRS:4: newcol1 | 5: newcol2 | -- The query is rewritten to read data from the generated columns newcol1 and newcol2 are accessed. | PARTITION: RANDOM | | | | RESULT SINK | | | | 0:OlapScanNode | | TABLE: test_tbl4 | | PREAGGREGATION: ON | | partitions=0/1 | | rollup: test_tbl4 | | tabletRatio=0/0 | | tabletList= | | cardinality=1 | | avgRowSize=2.0 | +---------------------------------------+ 15 rows in set (0.00 sec)
Partial updates and generated columns
To perform partial updates on a Primary Key table, you must specify all the regular columns referenced by the generated columns in the columns
parameter. The following example uses Stream Load to perform partial updates.
Create a table
test_tbl5
with five columns of which columnsnewcol1
andnewcol2
are generated columns whose values are computed by using the specified expressions and referencing the values of the regular columnsdata_array
anddata_json
respectively. Insert a data row into the table.-- Create a table. CREATE TABLE test_tbl5 ( id INT NOT NULL, data_array ARRAY<int> NOT NULL, data_json JSON NULL, newcol1 DOUBLE AS array_avg(data_array), newcol2 String AS json_string(json_query(data_json, "$.a")) ) PRIMARY KEY (id) DISTRIBUTED BY HASH(id); -- Insert into a data row. INSERT INTO test_tbl5 (id, data_array, data_json) VALUES (1, [1,2], parse_json('{"a" : 1, "b" : 2}')); -- Query the table. MySQL [example_db]> select * from test_tbl5; +------+------------+------------------+---------+---------+ | id | data_array | data_json | newcol1 | newcol2 | +------+------------+------------------+---------+---------+ | 1 | [1,2] | {"a": 1, "b": 2} | 1.5 | 1 | +------+------------+------------------+---------+---------+ 1 row in set (0.01 sec)
Prepare a CSV file
my_data1.csv
to update some columns in thetest_tbl5
table.1|[3,4]|{"a": 3, "b": 4} 2|[3,4]|{"a": 3, "b": 4}
Use Stream Load with the
my_data1.csv
file to update some columns of thetest_tbl5
table. You need to setpartial_update:true
and specify all the regular columns referenced by the generated columns in thecolumns
parameter.curl --location-trusted -u <username>:<password> -H "label:1" \ -H "column_separator:|" \ -H "partial_update:true" \ -H "columns:id,data_array,data_json" \ -T my_data1.csv -XPUT \ http://<fe_host>/api/example_db/test_tbl5/_stream_load
Query the table data.
[example_db]> select * from test_tbl5; +------+------------+------------------+---------+---------+ | id | data_array | data_json | newcol1 | newcol2 | +------+------------+------------------+---------+---------+ | 1 | [3,4] | {"a": 3, "b": 4} | 3.5 | 3 | | 2 | [3,4] | {"a": 3, "b": 4} | 3.5 | 3 | +------+------------+------------------+---------+---------+ 2 rows in set (0.01 sec)
An error is returned by Stream Load if you perform partial updates without specifying all the regular columns referenced by the generated columns.
Prepare a CSV file
my_data2.csv
.1|[3,4] 2|[3,4]
When partial column updates are performed by using Stream Load with the
my_data2.csv
file, if the values for thedata_json
column are not provided inmy_data2.csv
and thecolumns
parameter in the Stream Load job does not include thedata_json
column, even if thedata_json
column allows null values, an error is returned by Stream Load because the columndata_json
is referenced by the generated columnnewcol2
.