- Release Notes
- Introduction to CelerData Cloud Serverless
- Quick Start
- Sign up for CelerData Cloud Serverless
- A quick tour of the console
- Connect to CelerData Cloud Serverless
- Create an IAM integration
- Create and assign a warehouse
- Create an external catalog
- Load data from cloud storage
- Load data from Apache Kafka/Confluent Cloud
- Try your first query
- Invite new users
- Design data access control policy
- Warehouses
- Catalog, database, table, view, and MV
- Overview of database objects
- Catalog
- Table types
- Asynchronous materialized views
- Data Loading
- Data access control
- Networking and private connectivity
- Usage and Billing
- Organization and Account
- Integration
- Query Acceleration
- Reference
- AWS IAM policies
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- Data Types
- System Metadatabase
- Keywords
- SQL Statements
- Account Management
- Data Definition
- CREATE TABLE
- ALTER TABLE
- DROP CATALOG
- CREATE TABLE LIKE
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- DROP TABLE
- RECOVER
- USE
- CREATE MATERIALIZED VIEW
- DROP DATABASE
- ALTER MATERIALIZED VIEW
- DROP REPOSITORY
- CANCEL RESTORE
- DROP INDEX
- DROP MATERIALIZED VIEW
- CREATE DATABASE
- CREATE TABLE AS SELECT
- BACKUP
- CANCEL BACKUP
- CREATE REPOSITORY
- CREATE INDEX
- Data Manipulation
- INSERT
- SHOW CREATE DATABASE
- SHOW BACKUP
- SHOW ALTER MATERIALIZED VIEW
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- ALTER ROUTINE LOAD
- SHOW TABLES
- STREAM LOAD
- SHOW PARTITIONS
- CANCEL REFRESH MATERIALIZED VIEW
- SHOW CREATE CATALOG
- SHOW ROUTINE LOAD TASK
- SHOW RESTORE
- CREATE ROUTINE LOAD
- STOP ROUTINE LOAD
- SHOW DATABASES
- BROKER LOAD
- SHOW ROUTINE LOAD
- PAUSE ROUTINE LOAD
- SHOW SNAPSHOT
- SHOW CREATE TABLE
- CANCEL LOAD
- REFRESH MATERIALIZED VIEW
- SHOW REPOSITORIES
- SHOW LOAD
- Administration
- DESCRIBE
- SQL Functions
- Function List
- String Functions
- CONCAT
- HEX
- LOWER
- SPLIT
- LPAD
- SUBSTRING
- PARSE_URL
- INSTR
- REPEAT
- LCASE
- REPLACE
- HEX_DECODE_BINARY
- RPAD
- SPLIT_PART
- STRCMP
- SPACE
- CHARACTER_LENGTH
- URL_ENCODE
- APPEND_TAILING_CHAR_IF_ABSENT
- LTRIM
- HEX_DECODE_STRING
- URL_DECODE
- LEFT
- STARTS_WITH
- CONCAT
- GROUP_CONCAT
- STR_TO_MAP
- STRLEFT
- STRRIGHT
- MONEY_FORMAT
- RIGHT
- SUBSTRING_INDEX
- UCASE
- TRIM
- FIND_IN_SET
- RTRIM
- ASCII
- UPPER
- REVERSE
- LENGTH
- UNHEX
- ENDS_WITH
- CHAR_LENGTH
- NULL_OR_EMPTY
- LOCATE
- CHAR
- Predicate Functions
- Map Functions
- Binary Functions
- Geospatial Functions
- Lambda Expression
- Utility Functions
- Bitmap Functions
- BITMAP_SUBSET_LIMIT
- TO_BITMAP
- BITMAP_AGG
- BITMAP_FROM_STRING
- BITMAP_OR
- BITMAP_REMOVE
- BITMAP_AND
- BITMAP_TO_BASE64
- BITMAP_MIN
- BITMAP_CONTAINS
- SUB_BITMAP
- BITMAP_UNION
- BITMAP_COUNT
- BITMAP_UNION_INT
- BITMAP_XOR
- BITMAP_UNION_COUNT
- BITMAP_HAS_ANY
- BITMAP_INTERSECT
- BITMAP_AND_NOT
- BITMAP_TO_STRING
- BITMAP_HASH
- INTERSECT_COUNT
- BITMAP_EMPTY
- BITMAP_MAX
- BASE64_TO_ARRAY
- BITMAP_TO_ARRAY
- Struct Functions
- Aggregate Functions
- RETENTION
- MI
- MULTI_DISTINCT_SUM
- WINDOW_FUNNEL
- STDDEV_SAMP
- GROUPING_ID
- HLL_HASH
- AVG
- HLL_UNION_AGG
- COUNT
- BITMAP
- HLL_EMPTY
- SUM
- MAX_BY
- PERCENTILE_CONT
- COVAR_POP
- PERCENTILE_APPROX
- HLL_RAW_AGG
- STDDEV
- CORR
- COVAR_SAMP
- MIN_BY
- MAX
- VAR_SAMP
- STD
- HLL_UNION
- APPROX_COUNT_DISTINCT
- MULTI_DISTINCT_COUNT
- VARIANCE
- ANY_VALUE
- COUNT_IF
- GROUPING
- PERCENTILE_DISC
- Array Functions
- ARRAY_CUM_SUM
- ARRAY_MAX
- ARRAY_LENGTH
- ARRAY_REMOVE
- UNNEST
- ARRAY_SLICE
- ALL_MATCH
- ARRAY_CONCAT
- ARRAY_SORT
- ARRAY_POSITION
- ARRAY_DIFFERENCE
- ARRAY_CONTAINS
- ARRAY_JOIN
- ARRAY_INTERSECT
- CARDINALITY
- ARRAY_CONTAINS_ALL
- ARRAYS_OVERLAP
- ARRAY_MIN
- ARRAY_MAP
- ELEMENT_AT
- ARRAY_APPEND
- ARRAY_SORTBY
- ARRAY_TO_BITMAP
- ARRAY_GENERATE
- ARRAY_AVG
- ARRAY_FILTER
- ANY_MATCH
- REVERSE
- ARRAY_AGG
- ARRAY_DISTINCT
- ARRAY_SUM
- Condition Functions
- Math Functions
- Date and Time Functions
- DAYNAME
- MINUTE
- FROM_UNIXTIME
- HOUR
- MONTHNAME
- MONTHS_ADD
- ADD_MONTHS
- DATE_SUB
- PREVIOUS_DAY
- TO_TERA_DATA
- MINUTES_SUB
- WEEKS_ADD
- HOURS_DIFF
- UNIX_TIMESTAMP
- DAY
- DATE_SLICE
- DATE
- CURTIME
- SECONDS_SUB
- MONTH
- WEEK
- TO_DATE
- TIMEDIFF
- MONTHS_DIFF
- STR_TO_JODATIME
- WEEK_ISO
- MICROSECONDS_SUB
- TIME_SLICE
- MAKEDATE
- DATE_TRUNC
- JODATIME
- DAYOFWEEK
- YEARS_SUB
- TIMESTAMP_ADD
- HOURS_SUB
- STR2DATE
- TIMESTAMP
- FROM_DAYS
- WEEK_OF_YEAR
- YEAR
- TIMESTAMP_DIFF
- TO_TERA_TIMESTAMP
- DAYOFMONTH
- DAYOFYEAR
- DATE_FORMAT
- MONTHS_SUB
- NEXT_DAY
- MINUTES_DIFF
- DATA_ADD
- MINUTES_ADD
- CURDATE
- DAY_OF_WEEK_ISO
- CURRENt_TIMESTAMP
- STR_TO_DATE
- LAST_DAY
- WEEKS_SUB
- TO_DAYS
- DATEDIFF
- NOW
- TO_ISO8601
- TIME_TO_SEC
- QUARTER
- SECONDS_DIFF
- UTC_TIMESTAMP
- DATA_DIFF
- SECONDS_ADD
- ADDDATE
- WEEKSDIFF
- CONVERT_TZ
- MICROSECONDS_ADD
- SECOND
- YEARS_DIFF
- YEARS_ADD
- HOURS_ADD
- DAYS_SUB
- DAYS_DIFF
- Cryptographic Functions
- Percentile Functions
- Bit Functions
- JSON Functions
- Hash Functions
- Scalar Functions
- Table Functions
System variables
CelerData Cloud Serverless provides many system variables that can be changed to suit your requirements. This section describes the variables supported by CelerData.
The variables in CelerData refer to the variable sets in MySQL, but some variables are only compatible with the MySQL client protocol and do not function on the MySQL database.
NOTE
Any user has the privilege to run SHOW VARIABLES and make a variable take effect at session level. However, only users with the SYSTEM-level OPERATE privilege can make a variable take effect globally. Globally effective variables do not affect the current session, only subsequent new sessions.
View variables
You can view all or some variables by using SHOW VARIABLES [LIKE 'xxx']
. Example:
-- Show all variables in the system.
SHOW VARIABLES;
-- Show variables that match a certain pattern.
SHOW VARIABLES LIKE '%time_zone%';
Set variables
Set variables globally or for a single session
You can set variables to take effect globally or only on the current session. When set to global, the new value will be used for subsequent new sessions, while the current session still uses the original value. When set to "current session only", the variable will only take effect on the current session.
A variable set by SET var_name = xxx;
only takes effect for the current session. Example:
SET query_mem_limit = 137438953472;
SET forward_to_master = true;
SET time_zone = "Asia/Shanghai";
A variable set by SET GLOBAL var_name = xxx;
takes effect globally. Example:
SET GLOBAL query_mem_limit = 137438953472;
Variables that can only be set globally effective include:
- activate_all_roles_on_login
- character_set_database
- default_rowset_type
- enable_query_queue_select
- enable_query_queue_statistic
- enable_query_queue_load
- init_connect
- lower_case_table_names
- license
- language
- query_cache_size
- query_queue_fresh_resource_usage_interval_ms
- query_queue_concurrency_limit
- query_queue_mem_used_pct_limit
- query_queue_cpu_used_permille_limit
- query_queue_pending_timeout_second
- query_queue_max_queued_queries
- system_time_zone
- version_comment
- version
In addition, variable settings also support constant expressions, such as:
SET query_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');
Set variables in a single query statement
In some scenarios, you may need to set variables specifically for certain queries. By using the SET_VAR
hint, you can set session variables that will take effect only within a single statement. Example:
SELECT /*+ SET_VAR(query_mem_limit = 8589934592) */ name FROM people ORDER BY name;
SELECT /*+ SET_VAR(query_timeout = 1) */ sleep(3);
NOTE
SET_VAR
can only be placed after theSELECT
keyword and enclosed in/*+...*/
.
You can also set multiple variables in a single statement. Example:
SELECT /*+ SET_VAR
(
exec_mem_limit = 515396075520,
query_timeout=10000000,
batch_size=4096,
parallel_fragment_exec_instance_num=32
)
*/ * FROM TABLE;
Descriptions of variables
The variables are described in alphabetical order. Variables with the global
label can only take effect globally. Other variables can take effect either globally or for a single session.
activate_all_roles_on_login (global)
Whether to enable all roles (including default roles and granted roles) for a CelerData user when the user connects to the CelerData cloud account.
- If enabled (true), all roles of the user are activated at user login.
- If disabled (false), the roles set by SET DEFAULT ROLE are activated.
Default value: false.
auto_increment_increment
Used for MySQL client compatibility. No practical usage.
autocommit
Used for MySQL client compatibility. No practical usage.
batch_size
Used to specify the number of rows of a single packet transmitted by each node during query execution. The default is 1024, i.e., every 1024 rows of data generated by the source node is packaged and sent to the destination node. A larger number of rows will improve the query throughput in large data volume scenarios, but may increase the query latency in small data volume scenarios. Also, it may increase the memory overhead of the query. We recommend to set
batch_size
between 1024 to 4096.catalog
Used to specify the catalog to which the session belongs.
cbo_enable_low_cardinality_optimize
Whether to enable low cardinality optimization. After this feature is enabled, the performance of querying STRING columns improves by about three times. Default value: true.
character_set_database (global)
The character set supported by CelerData. Only UTF8 (
utf8
) is supported.connector_io_tasks_per_scan_operator
The maximum number of concurrent I/O tasks that can be issued by a scan operator during external table queries. The value is an integer. Default value: 16.
Currently, CelerData can adaptively adjust the number of concurrent I/O tasks when querying external tables. This feature is controlled by the variable
enable_connector_adaptive_io_tasks
, which is enabled by default.count_distinct_column_buckets
The number of buckets for the COUNT DISTINCT column in a group-by-count-distinct query. This variable takes effect only when
enable_distinct_column_bucketization
is set totrue
. Default value: 1024.default_rowset_type (global)
Global variable. Used to set the default storage format used by the storage engine of the computing node. The currently supported storage formats are
alpha
andbeta
.default_table_compression
The default compression algorithm for table storage. Supported compression algorithms are
snappy, lz4, zlib, zstd
. Default value: lz4_frame.Note that if you specified the
compression
property in a CREATE TABLE statement, the compression algorithm specified bycompression
takes effect.disable_colocate_join
Used to control whether the Colocation Join is enabled. The default value is
false
, meaning the feature is enabled. When this feature is disabled, query planning will not attempt to execute Colocation Join.disable_streaming_preaggregations
Used to enable the streaming pre-aggregations. The default value is
false
, meaning it is enabled.div_precision_increment
Used for MySQL client compatibility. No practical usage.
enable_connector_adaptive_io_tasks
Whether to adaptively adjust the number of concurrent I/O tasks when querying external tables. Default value: true.
If this feature is not enabled, you can manually set the number of concurrent I/O tasks using the variable
connector_io_tasks_per_scan_operator
.enable_distinct_column_bucketization
Whether to enable bucketization for the COUNT DISTINCT colum in a group-by-count-distinct query. Use the
select a, count(distinct b) from t group by a;
query as an example. If the GROUP BY columa
is a low-cardinality column and the COUNT DISTINCT columnb
is a high-cardinality column which has severe data skew, performance bottleneck will occur. In this situation, you can split data in the COUNT DISTINCT column into multiple buckets to balance data and prevent data skew.Default value: false, which means this feature is disabled. You must use this variable with the variable
count_distinct_column_buckets
.You can also enable bucketization for the COUNT DISTINCT column by adding the
skew
hint to your query, for example,select a,count(distinct [skew] b) from t group by a;
.enable_iceberg_metadata_cache
Whether to cache pointers and partition names for Iceberg tables. From v3.2.1 to v3.2.3, this parameter is set to
true
by default, regardless of what metastore service is used. In v3.2.4 and later, if the Iceberg cluster uses AWS Glue as metastore, this parameter still defaults totrue
. However, if the Iceberg cluster uses other metastore service such as Hive metastore, this parameter defaults tofalse
.enable_insert_strict
Used to enable the strict mode when loading data using the INSERT statement. The default value is
true
, indicating the strict mode is enabled by default.enable_materialized_view_union_rewrite
Boolean value to control whether to enable materialized view Union query rewrite. Default:
true
.enable_rule_based_materialized_view_rewrite
Boolean value to control whether to enable rule-based materialized view query rewrite. This variable is mainly used in single-table query rewrite. Default:
true
.enable_spill
Whether to enable intermediate result spilling. Default:
false
. If it is set totrue
, CelerData spills the intermediate results to disk to reduce the memory usage when processing aggregate, sort, or join operators in queries.enable_profile
Specifies whether to send the profile of a query for analysis. The default value is
false
, which means no profile is required.By default, a profile is sent to the FE only when a query error occurs in the BE. Profile sending causes network overhead and therefore affects high concurrency.
If you need to analyze the profile of a query, you can set this variable to
true
. After the query is completed, the profile can be viewed on the web page of the currently connected FE (address:fe_host:fe_http_port/query
). This page displays the profiles of the latest 100 queries withenable_profile
turned on.enable_query_queue_load (global)
Boolean value to enable query queues for loading tasks. Default:
false
.enable_query_queue_select (global)
Boolean value to enable query queues for SELECT queries. Default:
false
.enable_query_queue_statistic (global)
Boolean value to enable query queues for statistics queries.
enable_scan_block_cache
Specifies whether to enable the Data Cache feature. After this feature is enabled, CelerData caches hot data read from external storage systems into blocks, which accelerates queries and analysis.
enable_populate_block_cache
Specifies whether to cache data blocks read from external storage systems in CelerData. If you do not want to cache data blocks read from external storage systems, set this variable to
false
. Default value: true.enable_tablet_internal_parallel
Whether to enable adaptive parallel scanning of tablets. After this feature is enabled, multiple threads can be used to scan one tablet by segment, increasing the scan concurrency. Default value: true.
enable_query_cache
Specifies whether to enable the Query Cache feature. Valid values: true and false.
true
specifies to enable this feature, andfalse
specifies to disable this feature.enable_adaptive_sink_dop
Specifies whether to enable adaptive parallelism for data loading. After this feature is enabled, the system automatically sets load parallelism for INSERT INTO and Broker Load jobs, which is equivalent to the mechanism of
pipeline_dop
.enable_pipeline_engine
Specifies whether to enable the pipeline execution engine.
true
indicates enabled andfalse
indicates the opposite. Default value:true
.enable_sort_aggregate
Specifies whether to enable sorted streaming.
true
indicates sorted streaming is enabled to sort data in data streams.enable_global_runtime_filter
Whether to enable global runtime filter (RF for short). RF filters data at runtime. Data filtering often occurs in the Join stage. During multi-table joins, optimizations such as predicate pushdown are used to filter data, in order to reduce the number of scanned rows for Join and the I/O in the Shuffle stage, thereby speeding up the query.
CelerData offers two types of RF: Local RF and Global RF. Local RF is suitable for Broadcast Hash Join and Global RF is suitable for Shuffle Join.
Default value:
true
, which means global RF is enabled. If this feature is disabled, global RF does not take effect. Local RF can still work.enable_multicolumn_global_runtime_filter
Whether to enable multi-column global runtime filter. Default value:
false
, which means multi-column global RF is disabled.If a Join (other than Broadcast Join and Replicated Join) has multiple equi-join conditions:
- If this feature is disabled, only Local RF works.
- If this feature is enabled, multi-column Global RF takes effect and carries
multi-column
in the partition by clause.
event_scheduler
Used for MySQL client compatibility. No practical usage.
force_streaming_aggregate
Used to control whether the aggregation node enables streaming aggregation for computing. The default value is false, meaning the feature is not enabled.
hash_join_push_down_right_table
Used to control whether the data of the left table can be filtered by using the filter condition against the right table in the Join query. If so, it can reduce the amount of data that needs to be processed during the query.
true
indicates the operation is allowed and the system decides whether the left table can be filtered.false
indicates the operation is disabled. The default value istrue
.init_connect (global)
Used for MySQL client compatibility. No practical usage.
interactive_timeout
Used for MySQL client compatibility. No practical usage.
io_tasks_per_scan_operator
The number of concurrent I/O tasks that can be issued by a scan operator. Increase this value if you want to access remote storage systems such as HDFS or S3 but the latency is high. However, a larger value causes more memory consumption.
The value is an integer. Default value: 4.
language (global)
Used for MySQL client compatibility. No practical usage.
license (global)
Displays the license of CelerData.
load_mem_limit
Specifies the memory limit for the import operation. The default value is 0, meaning that this variable is not used and
query_mem_limit
is used instead.This variable is only used for the
INSERT
operation which involves both query and import. If the user does not set this variable, the memory limit for both query and import will be set asexec_mem_limit
. Otherwise, the memory limit for query will be set asexec_mem_limit
and the memory limit for import will be asload_mem_limit
.Other import methods such as
BROKER LOAD
still useexec_mem_limit
for memory limit.lower_case_table_names (global)
Used for MySQL client compatibility. No practical usage. Table names in CelerData are case-sensitive.
max_allowed_packet
Used for compatibility with the JDBC connection pool C3P0. This variable specifies the maximum size of packets that can be transmitted between the client and server. Default value: 32 MB. Unit: Byte. You can raise this value if the client reports "PacketTooBigException".
max_scan_key_num
The maximum number of scan key segmented by each query. Default value: -1, indicating that the value in the
be.conf
file is used. If this variable is set to a value greater than 0, the value inbe.conf
is ignored.max_pushdown_conditions_per_column
The maximum number of predicates that can be pushed down for a column. Default value: -1, indicating that the value in the
be.conf
file is used. If this variable is set to a value greater than 0, the value inbe.conf
is ignored.nested_mv_rewrite_max_level
The maximum levels of nested materialized views that can be used for query rewrite. Type: INT. Range: [1, +∞). The value of
1
indicates that only materialized views created on base tables can be used for query rewrite. Default:3
.net_buffer_length
Used for MySQL client compatibility. No practical usage.
net_read_timeout
Used for MySQL client compatibility. No practical usage.
net_write_timeout
Used for MySQL client compatibility. No practical usage.
new_planner_optimize_timeout
The timeout duration of the query optimizer. When the optimizer times out, an error is returned and the query is stopped, which affects the query performance. You can set this variable to a larger value based on your query or contact CelerData technical support for troubleshooting. A timeout often occurs when a query has too many joins.
Default value: 3000. Unit: seconds.
parallel_exchange_instance_num
Used to set the number of exchange nodes that an upper-level node uses to receive data from a lower-level node in the execution plan. The default value is -1, meaning the number of exchange nodes is equal to the number of execution instances of the lower-level node. When this variable is set to be greater than 0 but smaller than the number of execution instances of the lower-level node, the number of exchange nodes equals the set value.
In a distributed query execution plan, the upper-level node usually has one or more exchange nodes to receive data from the execution instances of the lower-level node on different BEs. Usually the number of exchange nodes is equal to the number of execution instances of the lower-level node.
In some aggregation query scenarios where the amount of data decreases drastically after aggregation, you can try to modify this variable to a smaller value to reduce the resource overhead. An example would be running aggregation queries using the Duplicate Key table.
parallel_fragment_exec_instance_num
Used to set the number of instances used to scan nodes on each BE. The default value is 1.
A query plan typically produces a set of scan ranges. This data is distributed across multiple BE nodes. A BE node will have one or more scan ranges, and by default, each BE node's set of scan ranges is processed by only one execution instance. When machine resources suffice, you can increase this variable to allow more execution instances to process a scan range simultaneously for efficiency purposes.
The number of scan instances determines the number of other execution nodes in the upper level, such as aggregation nodes and join nodes. Therefore, it increases the concurrency of the entire query plan execution. Modifying this variable will help improve efficiency, but larger values will consume more machine resources, such as CPU, memory, and disk IO.
performance_schema
Used for compatibility with MySQL JDBC versions 8.0.16 and above. No practical usage.
prefer_compute_node
Specifies whether the FEs distribute query execution plans to CN nodes. Valid values:
- true: indicates that the FEs distribute query execution plans to CN nodes.
- false: indicates that the FEs do not distribute query execution plans to CN nodes.
pipeline_dop
The parallelism of a pipeline instance, which is used to adjust the query concurrency. Default value: 0, indicating the system automatically adjusts the parallelism of each pipeline instance. You can also set this variable to a value greater than 0. Generally, set the value to half the number of physical CPU cores.
CelerData adaptively adjusts this variable based on query parallelism.
pipeline_profile_level
Controls the level of the query profile. A query profile often has five layers: Fragment, FragmentInstance, Pipeline, PipelineDriver, and Operator. Different levels provide different details of the profile:
- 0: CelerData combines metrics of the profile and shows only a few core metrics.
- 1: default value. CelerData simplifies the profile and combines metrics of the profile to reduce profile layers.
- 2: CelerData retains all the layers of the profile. The profile size is large in this scenario, especially when the SQL query is complex. This value is not recommended.
query_cache_entry_max_bytes
The threshold for triggering the Passthrough mode. Valid values: 0 to 9223372036854775807. When the number of bytes or rows from the computation results of a specific tablet accessed by a query exceeds the threshold specified by
query_cache_entry_max_bytes
orquery_cache_entry_max_rows
, the query is switched to Passthrough mode.query_cache_entry_max_rows
The upper limit of rows that can be cached. See the description in
query_cache_entry_max_bytes
. Default value: 409600.query_cache_agg_cardinality_limit
The upper limit of cardinality for GROUP BY in Query Cache. Query Cache is not enabled if the rows generated by GROUP BY exceeds this value. Default value: 5000000. If
query_cache_entry_max_bytes
orquery_cache_entry_max_rows
is set to 0, the Passthrough mode is used even when no computation results are generated from the involved tablets.query_cache_size (global)
Used for MySQL client compatibility. No practical use.
query_cache_type
Used for compatibility with JDBC connection pool C3P0. No practical use.
query_mem_limit
Used to set the memory limit of a query on each backend node. The default value is 0, which means no limit for it. Units including
B/K/KB/M/MB/G/GB/T/TB/P/PB
are supported.When the
Memory Exceed Limit
error happens, you could try to increase this variable.query_queue_concurrency_limit (global)
The upper limit of concurrent queries on a BE. It takes effect only after being set greater than
0
. Default:0
.query_queue_cpu_used_permille_limit (global)
The upper limit of CPU usage permille (CPU usage * 1000) on a BE. It takes effect only after being set greater than
0
. Default:0
. Range: [0, 1000]query_queue_max_queued_queries (global)
The upper limit of queries in a queue. When this threshold is reached, incoming queries are rejected. It takes effect only after being set greater than
0
. Default:0
.query_queue_mem_used_pct_limit (global)
The upper limit of memory usage percentage on a BE. It takes effect only after being set greater than
0
. Default:0
. Range: [0, 1]query_queue_pending_timeout_second (global)
The maximum timeout of a pending query in a queue. When this threshold is reached, the corresponding query is rejected. Unit: second. Default:
300
.query_timeout
Used to set the query timeout in "seconds". This variable will act on all query statements in the current connection, as well as INSERT statements. The default value is 300 seconds. Value range: [1, 259200].
range_pruner_max_predicate
The maximum number of IN predicates that can be used for Range partition pruning. Default value: 100. A value larger than 100 may cause the system to scan all tablets, which compromises the query performance.
rewrite_count_distinct_to_bitmap_hll
Used to decide whether to rewrite count distinct queries to bitmap_union_count and hll_union_agg.
runtime_filter_on_exchange_node
Whether to place GRF on Exchange Node after GRF is pushed down across the Exchange operator to a lower-level operator. The default value is
false
, which means GRF will not be placed on Exchange Node after it is pushed down across the Exchange operator to a lower-level operator. This prevents repetitive use of GRF and reduces the computation time.However, GRF delivery is a "try-best" process. If the lower-level operator fails to receive the GRF but the GRF is not placed on Exchange Node, data cannot be filtered, which compromises filter performance.
true
means GRF will still be placed on Exchange Node even after it is pushed down across the Exchange operator to a lower-level operator.runtime_join_filter_push_down_limit
The maximum number of rows allowed for the Hash table based on which Bloom filter Local RF is generated. Local RF will not be generated if this value is exceeded. This variable prevents the generation of an excessively long Local RF.
The value is an integer. Default value: 1024000.
spill_mode
The execution mode of intermediate result spilling. Valid values:
auto
: Spilling is automatically triggered when the memory usage threshold is reached.force
: CelerData forcibly executes spilling for all relevant operators, regardless of memory usage.
This variable takes effect only when the variable
enable_spill
is set totrue
.SQL_AUTO_IS_NULL
Used for compatibility with the JDBC connection pool C3P0. No practical usage.
sql_dialect
The SQL dialect that is used. For example, you can run the
set sql_dialect = 'trino';
command to set the SQL dialect to Trino, so you can use Trino-specific SQL syntax and functions in your queries.NOTICE
After you configure CelerData to use the Trino dialect, identifiers in queries are not case-sensitive by default. Therefore, you must specify names in lowercase for your databases and tables at database and table creation. If you specify database and table names in uppercase, queries against these databases and tables will fail.
sql_mode
Used to specify the SQL mode to accommodate certain SQL dialects.
sql_safe_updates
Used for MySQL client compatibility. No practical usage.
sql_select_limit
Used for MySQL client compatibility. No practical usage.
storage_engine
The types of engines supported by CelerData:
- olap: CelerData system-owned engine.
- mysql: MySQL external tables.
- broker: Access external tables through a broker program.
- elasticsearch or es: Elasticsearch external tables.
- hive: Hive external tables.
- iceberg: Iceberg external tables.
- hudi: Hudi external tables.
- jdbc: external table for JDBC-compatible databases.
streaming_preaggregation_mode
Used to specify the preaggregation mode for the first phase of GROUP BY. If the preaggregation effect in the first phase is not satisfactory, you can use the streaming mode, which performs simple data serialization before streaming data to the destination. Valid values:
auto
: The system first tries local preaggregation. If the effect is not satisfactory, it switches to the streaming mode. This is the default value.force_preaggregation
: The system directly performs local preaggregation.force_streaming
: The system directly performs streaming.
system_time_zone
Used to display the time zone of the current system. Cannot be changed.
time_zone
Used to set the time zone of the current session. The time zone can affect the results of certain time functions.
tx_isolation
Used for MySQL client compatibility. No practical usage.
use_compute_nodes
The maximum number of CN nodes that can be used. This variable is valid when
prefer_compute_node=true
. Valid values:-1
: indicates that all CN nodes are used.0
: indicates that no CN nodes are used.
use_v2_rollup
Used to control the query to fetch data using the rollup index of the segment v2 storage format. This variable is used for validation when going online with segment v2. It is not recommended for other cases.
version (global)
Used for MySQL client compatibility. No practical usage.
version_comment (global)
Used to display the version of CelerData. Cannot be changed.
wait_timeout
Used to set the connection timeout for idle connections. When an idle connection does not interact with CelerData for that length of time, CelerData will actively disconnect the link. The default value is 8 hours, in seconds.