- Release Notes
- Get Started
- Clusters
- Cloud Settings
- Table Type
- Query Data Lakes
- Integration
- Query Acceleration
- Data Loading
- Concepts
- Batch load data from Amazon S3
- Batch load data from Azure cloud storage
- Load data from a local file system
- Load data from Confluent Cloud
- Load data from Amazon MSK
- Load data from Amazon Kinesis
- Data Unloading
- Data Backup
- Security
- Console Access Control
- Data Access Control
- Application keys
- Service accounts
- Use SSL connection
- Alarm
- Usage and Billing
- Organizations and Accounts
- Reference
- Amazon Web Services (AWS)
- Microsoft Azure
- SQL Reference
- Keywords
- ALL statements
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW TABLE STATUS
- SHOW FILE
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- SQL Functions
- Function list
- Java UDFs
- Window functions
- Lambda expression
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_diff
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayofweek_iso
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- jodatime_format
- last_day
- makedate
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- next_day
- now
- previous_day
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str_to_jodatime
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- to_iso8601
- to_tera_date
- to_tera_timestamp
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Aggregate Functions
- any_value
- approx_count_distinct
- array_agg
- avg
- bitmap
- bitmap_agg
- count
- count_if
- corr
- covar_pop
- covar_samp
- group_concat
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- min_by
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- substring_index
- starts_with
- strleft
- strright
- str_to_map
- substring
- trim
- ucase
- unhex
- upper
- url_decode
- url_encode
- Pattern Matching Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bit Functions
- Bitmap Functions
- Array Functions
- all_match
- any_match
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_generate
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Map Functions
- Binary Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Struct Functions
- Table Functions
- Utility Functions
- AUTO_INCREMENT
- Generated columns
- System variables
- System limits
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- System Metadatabase
- API
- Overview
- Actions
- Clusters
- Create and Manage Clusters
- Query Clusters
- Identity and Access Management
- Organization and Account
- Usage and Billing
- Clusters
- Terraform Provider
- Run scripts
AUTO_INCREMENT
CelerData supports the AUTO_INCREMENT
column attribute, which can simplify data management. This topic introduces the application scenarios, usage and features of the AUTO_INCREMENT
column attribute.
Introduction
When a new data row is loaded into a table and values are not specified for the AUTO_INCREMENT
column, CelerData automatically assigns an integer value for the row's AUTO_INCREMENT
column as its unique ID across the table. The subsequent values for the AUTO_INCREMENT
column automatically increase at a specific step starting from the ID of the row. An AUTO_INCREMENT
column can be used to simplify data management and speed up some queries. Here are some application scenarios of an AUTO_INCREMENT
column:
- Serve as primary keys: An
AUTO_INCREMENT
column can be used as the primary key to ensure that each row has a unique ID and make it easy to query and manage data. - Join tables: When multiple tables are joined, an
AUTO_INCREMENT
column can be used as the Join Key, which can expedite queries compared to using a column whose data type is STRING, for example, UUID. - Count the number of distinct values in a high-cardinality column: An
AUTO_INCREMENT
column can be used to represent the unique value column in a dictionary. Compared to directly counting distinct STRING values, counting distinct integer values of theAUTO_INCREMENT
column can sometimes improve the query speed by several times or even tens of times.
You need to specify an AUTO_INCREMENT
column in the CREATE TABLE statement. The data types of an AUTO_INCREMENT
column must be BIGINT. The value for an AUTO_INCREMENT column can be implicitly assigned or explicitly specified. It starts from 1, and increments by 1 for each new row.
Basic operations
Specify AUTO_INCREMENT
column at table creation
Create a table named test_tbl1
with two columns, id
and number
. Specify the column number
as the AUTO_INCREMENT
column.
CREATE TABLE test_tbl1
(
id BIGINT NOT NULL,
number BIGINT NOT NULL AUTO_INCREMENT
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
Assign values for AUTO_INCREMENT
column
Assign values implicitly
When you load data into a table, you do not need to specify the values for the AUTO_INCREMENT
column. CelerData automatically assigns unique integer values for that column and inserts them into the table.
INSERT INTO test_tbl1 (id) VALUES (1);
INSERT INTO test_tbl1 (id) VALUES (2);
INSERT INTO test_tbl1 (id) VALUES (3),(4),(5);
View data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+------+--------+
5 rows in set (0.02 sec)
When you load data into a table, you can also specify the values as DEFAULT
for the AUTO_INCREMENT
column. CelerData automatically assigns unique integer values for that column and inserts them into the table.
INSERT INTO test_tbl1 (id, number) VALUES (6, DEFAULT);
View data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
+------+--------+
6 rows in set (0.02 sec)
In actual usage, the following result may be returned when you view the data in the table. This is because CelerData cannot guarantee that the values for the AUTO_INCREMENT
column are strictly monotonic. But CelerData can guarantee that the values roughly increase in chronological order. For more information, see Monotonicity.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 200002 |
| 5 | 200003 |
| 6 | 200004 |
+------+--------+
6 rows in set (0.01 sec)
Specify values explicitly
You can also explicitly specify the values for the AUTO_INCREMENT
column and insert them into the table.
INSERT INTO test_tbl1 (id, number) VALUES (7, 100);
-- view data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 200002 |
| 5 | 200003 |
| 6 | 200004 |
| 7 | 100 |
+------+--------+
7 rows in set (0.01 sec)
Moreover, explicitly specifying values does not affect the subsequent values generated by CelerData for newly inserted data rows.
INSERT INTO test_tbl1 (id) VALUES (8);
-- view data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 200002 |
| 5 | 200003 |
| 6 | 200004 |
| 7 | 100 |
| 8 | 2 |
+------+--------+
8 rows in set (0.01 sec)
NOTICE
We recommend that you do not use implicitly assigned values and explicitly specified values for the AUTO_INCREMENT
column at the same time. Because the specified values may be the same as the values generated by CelerData, breaking the global uniqueness of auto-incremented IDs.
Basic features
Uniqueness
In general, CelerData guarantees that the values for an AUTO_INCREMENT
column are globally unique across a table. We recommend that you do not implicitly assign and explicitly specify the values for the AUTO_INCREMENT
column at the same time. If you do so, it may break the global uniqueness of auto-incremented IDs. Here is a simple example: Create a table named test_tbl2
with two columns, id
and number
. Specify the column number
as the AUTO_INCREMENT
column.
CREATE TABLE test_tbl2
(
id BIGINT NOT NULL,
number BIGINT NOT NULL AUTO_INCREMENT
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
Implicitly assign and explicitly specify the values for the AUTO_INCREMENT
column number
in the table test_tbl2
.
INSERT INTO test_tbl2 (id, number) VALUES (1, DEFAULT);
INSERT INTO test_tbl2 (id, number) VALUES (2, 2);
INSERT INTO test_tbl2 (id) VALUES (3);
Query the table test_tbl2
.
mysql > SELECT * FROM test_tbl2 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 100001 |
+------+--------+
3 rows in set (0.08 sec)
Monotonicity
In order to improve the performance of allocating auto-incremented IDs, BEs cache some auto-incremented IDs locally. In this situation, CelerData cannot guarantee that the values for the AUTO_INCREMENT
column are strictly monotonic. It can only be ensured that the values roughly increase in chronological order.
NOTE
The number of auto-incremented IDs cached by the BEs is determined by the FE dynamic parameter
auto_increment_cache_size
, which defaults to100,000
. You can modify the value by usingADMIN SET FRONTEND CONFIG ("auto_increment_cache_size" = "xxx");
For example, a CelerData cluster has one FE node and two BE nodes. Create a table named test_tbl3
and insert five rows of data as follows:
CREATE TABLE test_tbl3
(
id BIGINT NOT NULL,
number BIGINT NOT NULL AUTO_INCREMENT
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
INSERT INTO test_tbl3 VALUES (1, DEFAULT);
INSERT INTO test_tbl3 VALUES (2, DEFAULT);
INSERT INTO test_tbl3 VALUES (3, DEFAULT);
INSERT INTO test_tbl3 VALUES (4, DEFAULT);
INSERT INTO test_tbl3 VALUES (5, DEFAULT);
The auto-incremented IDs in the table test_tbl3
do not monotonically increase, because the two BE nodes cache auto-incremented IDs, [1, 100000] and [100001, 200000], respectively. When data is loaded by using multiple INSERT statements, the data is sent to different BE nodes which allocate auto-incremented IDs independently. Therefore, it cannot be guaranteed that auto-incremented IDs are strictly monotonic.
mysql > SELECT * FROM test_tbl3 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 2 |
| 5 | 100002 |
+------+--------+
5 rows in set (0.07 sec)
Partial updates and AUTO_INCREMENT
column
This section explains how to update only a few specified columns in a table that contains an AUTO_INCREMENT
column.
NOTE
Currently, only primary key tables support partial updates.
AUTO_INCREMENT
column is primary key
You need to specify the primary key during partial updates. Therefore, if the AUTO_INCREMENT
column is the primary key or part of the primary key, the user behavior for partial updates is exactly the same as when the AUTO_INCREMENT
column is not defined.
Create a table
test_tbl4
in the databaseexample_db
and insert one data row.-- Create a table. CREATE TABLE test_tbl4 ( id BIGINT AUTO_INCREMENT, name BIGINT NOT NULL, job1 BIGINT NOT NULL, job2 BIGINT NOT NULL ) PRIMARY KEY (id, name) DISTRIBUTED BY HASH(id) PROPERTIES("replicated_storage" = "true"); -- Prepared data. mysql > INSERT INTO test_tbl4 (id, name, job1, job2) VALUES (0, 0, 1, 1); Query OK, 1 row affected (0.04 sec) {'label':'insert_6af28e77-7d2b-11ed-af6e-02424283676b', 'status':'VISIBLE', 'txnId':'152'} -- Query the table. mysql > SELECT * FROM test_tbl4 ORDER BY id; +------+------+------+------+ | id | name | job1 | job2 | +------+------+------+------+ | 0 | 0 | 1 | 1 | +------+------+------+------+ 1 row in set (0.01 sec)
Prepare the CSV file my_data4.csv to update table
test_tbl4
. The CSV file includes values for theAUTO_INCREMENT
column and does not include values for the columnjob1
. The primary key of the first row already exists in tabletest_tbl4
, while the primary key of the second row does not exist in the table.0,0,99 1,1,99
Run a Stream Load job and use the CSV file to update table
test_tbl4
.curl --location-trusted -u <username>:<password> -H "label:1" \ -H "column_separator:," \ -H "partial_update:true" \ -H "columns:id,name,job2" \ -T my_data4.csv -XPUT \ http://<fe_host>/api/example_db/test_tbl4/_stream_load
Query the updated table. The first row of data already exists in table
test_tbl4
, and the value for the columnjob1
remains unchanged. The second row of data is newly inserted, and because the default value for the columnjob1
is not specified, the partial update framework directly sets the value for this column to0
.mysql > SELECT * FROM test_tbl4 ORDER BY id; +------+------+------+------+ | id | name | job1 | job2 | +------+------+------+------+ | 0 | 0 | 1 | 99 | | 1 | 1 | 0 | 99 | +------+------+------+------+ 2 rows in set (0.01 sec)
AUTO_INCREMENT
column is not primary key
If the AUTO_INCREMENT
column is not a primary key or a part of the primary key, and auto-incremented IDs are not provided in a Stream Load job, the following situations occur:
- If the row already exists in the table, CelerData does not update the auto-incremented ID.
- If the row is newly loaded into the table, CelerData generates a new auto-incremented ID.
This feature can be used to build a dictionary table for quickly computing distinct STRING values.
In the database
example_db
, create a tabletest_tbl5
and specify the columnjob1
as theAUTO_INCREMENT
column and insert a data row into the tabletest_tbl5
.-- Create a table. CREATE TABLE test_tbl5 ( id BIGINT NOT NULL, name BIGINT NOT NULL, job1 BIGINT NOT NULL AUTO_INCREMENT, job2 BIGINT NOT NULL ) PRIMARY KEY (id, name) DISTRIBUTED BY HASH(id) PROPERTIES("replicated_storage" = "true"); -- Prepare data. mysql > INSERT INTO test_tbl5 VALUES (0, 0, -1, -1); Query OK, 1 row affected (0.04 sec) {'label':'insert_458d9487-80f6-11ed-ae56-aa528ccd0ebf', 'status':'VISIBLE', 'txnId':'94'} -- Query the table. mysql > SELECT * FROM test_tbl5 ORDER BY id; +------+------+------+------+ | id | name | job1 | job2 | +------+------+------+------+ | 0 | 0 | -1 | -1 | +------+------+------+------+ 1 row in set (0.01 sec)
Prepare a CSV file my_data5.csv to update table
test_tbl5
. The CSV file does not contain values for theAUTO_INCREMENT
columnjob1
. The primary key of the first row already exists in the table while the primary keys of the second and third rows do not.0,0,99 1,1,99 2,2,99
Run a Stream Load job to load data from the CSV file into table
test_tbl5
.curl --location-trusted -u <username>:<password> -H "label:2" \ -H "column_separator:," \ -H "partial_update:true" \ -H "columns: id,name,job2" \ -T my_data5.csv -XPUT \ http://<fe_host>/api/example_db/test_tbl5/_stream_load
Query the updated table. The first row of data already exists in table
test_tbl5
, so theAUTO_INCREMENT
columnjob1
retains its original value. The second and third rows of data are newly inserted, so CelerData generates new values for theAUTO_INCREMENT
columnjob1
.mysql > SELECT * FROM test_tbl5 ORDER BY id; +------+------+--------+------+ | id | name | job1 | job2 | +------+------+--------+------+ | 0 | 0 | -1 | 99 | | 1 | 1 | 1 | 99 | | 2 | 2 | 100001 | 99 | +------+------+--------+------+ 3 rows in set (0.01 sec)
Limits
When a table with an
AUTO_INCREMENT
column is created,'replicated_storage' = 'true'
must be set to ensure that all replicas have the same auto-incremented IDs.Each table can have only one
AUTO_INCREMENT
column.The data type of the
AUTO_INCREMENT
column must be BIGINT.The
AUTO_INCREMENT
column must beNOT NULL
and does not have a default value.You can delete data from a Primary Key table with an
AUTO_INCREMENT
column. However, if theAUTO_INCREMENT
column is not the primary key or part of the primary key, you need to note the following limits when you delete data in the following scenarios:- During the DELETE operation, there is also a load job for partial updates, which only contains UPSERT operations. If both the UPSERT and DELETE operations hit the same data row and the UPSERT operation is executed after the DELETE operation, the UPSERT operation may not take effect.
- There is a load job for partial updates, which includes multiple UPSERT and DELETE operations on the same data row. If a certain UPSERT operation is executed after the DELETE operation, the UPSERT operation may not take effect.
Adding the
AUTO_INCREMENT
attribute by using ALTER TABLE is not supported.CelerData does not support specifying the starting value and step size for the
AUTO_INCREMENT
column.
Keywords
AUTO_INCREMENT, AUTO INCREMENT