CREATE TABLE
Create a new table in StarRocks.
This operation requires the CREATE TABLE privilege on the destination database.
Syntax
CREATE [EXTERNAL] [TEMPORARY] TABLE [IF NOT EXISTS] [database.]table_name
(column_definition1[, column_definition2, ...]
[, index_definition1[, index_definition12,]])
[ENGINE = [olap|mysql|elasticsearch|hive|hudi|iceberg|jdbc]]
[key_desc]
[COMMENT "table comment"]
[partition_desc]
[distribution_desc]
[rollup_index]
[ORDER BY (column_name1,...)]
[PROPERTIES ("key"="value", ...)]
[BROKER PROPERTIES ("key"="value", ...)]
Parameters
- The table name, partition name, column name, and index name you create must follow the naming conventions in System limits.
- When you specify database name, table name, column name, or partition name, note that some literals are used as reserved keywords in StarRocks. Do not directly use these keywords in SQL statements. If you want to use such a keyword in an SQL statement, enclose it in a pair of backticks (`). See Keywords for these reserved keywords.
column_definition
Syntax:
col_name col_type [agg_type] [NULL | NOT NULL] [DEFAULT "default_value"] [AUTO_INCREMENT] [AS generation_expr]
col_name: column name.
Note that normally you cannot create a column whose name is initiated with __op
or __row
because these name formats are reserved for special purposes in StarRocks and creating such columns may result in undefined behavior. If you do need to create such column, set the FE dynamic parameter allow_system_reserved_names
to TRUE
.
col_type: Column type. Specific column information, such as types and ranges:
-
TINYINT (1 byte): Ranges from -2^7 + 1 to 2^7 - 1.
-
SMALLINT (2 bytes): Ranges from -2^15 + 1 to 2^15 - 1.
-
INT (4 bytes): Ranges from -2^31 + 1 to 2^31 - 1.
-
BIGINT (8 bytes): Ranges from -2^63 + 1 to 2^63 - 1.
-
LARGEINT (16 bytes): Ranges from -2^127 + 1 to 2^127 - 1.
-
FLOAT (4 bytes): Supports scientific notation.
-
DOUBLE (8 bytes): Supports scientific notation.
-
DECIMAL[(precision, scale)] (16 bytes)
-
Default value: DECIMAL(10, 0)
-
precision: 1 ~ 38
-
scale: 0 ~ precision
-
Integer part: precision - scale
Scientific notation is not supported.
-
-
DATE (3 bytes): Ranges from 0000-01-01 to 9999-12-31.
-
DATETIME (8 bytes): Ranges from 0000-01-01 00:00:00 to 9999-12-31 23:59:59.
-
CHAR[(length)]: Fixed length string. Range: 1 ~ 255. Default value: 1.
-
VARCHAR[(length)]: A variable-length string. The default value is 1. Unit: bytes. In versions earlier than StarRocks 2.1, the value range of
length
is 1–65533. [Preview] In StarRocks 2.1 and later versions, the value range oflength
is 1–1048576. -
HLL (1~16385 bytes): For HLL type, there's no need to specify length or default value. The length will be controlled within the system according to data aggregation. HLL column can only be queried or used by hll_union_agg, Hll_cardinality, and hll_hash.
-
BITMAP: Bitmap type does not require specified length or default value. It represents a set of unsigned bigint numbers. The largest element could be up to 2^64 - 1.
agg_type: aggregation type. If not specified, this column is key column. If specified, it is value column. The aggregation types supported are as follows:
- SUM, MAX, MIN, REPLACE
- HLL_UNION (only for HLL type)
- BITMAP_UNION(only for BITMAP)
- REPLACE_IF_NOT_NULL: This means the imported data will only be replaced when it is of non-null value. If it is of null value, StarRocks will retain the original value.
NOTE
- When the column of aggregation type BITMAP_UNION is imported, its original data types must be TINYINT, SMALLINT, INT, and BIGINT.
- If NOT NULL is specified by REPLACE_IF_NOT_NULL column when the table was created, StarRocks will still convert the data to NULL without sending an error report to the user. With this, the user can import selected columns.
This aggregation type applies ONLY to the Aggregate table whose key_desc type is AGGREGATE KEY. Since v3.1.9, REPLACE_IF_NOT_NULL
newly supports the columns of the BITMAP type.
NULL | NOT NULL: Whether the column is allowed to be NULL
. By default, NULL
is specified for all columns in a table that uses the Duplicate Key, Aggregate, or Unique Key table. In a table that uses the Primary Key table, by default, value columns are specified with NULL
, whereas key columns are specified with NOT NULL
. If NULL
values are included in the raw data, present them with \N
. StarRocks treats \N
as NULL
during data loading.
DEFAULT "default_value": the default value of a column. When you load data into StarRocks, if the source field mapped onto the column is empty, StarRocks automatically fills the default value in the column. You can specify a default value in one of the following ways:
- DEFAULT current_timestamp: Use the current time as the default value. For more information, see current_timestamp().
- DEFAULT
<default_value>
: Use a given value of the column data type as the default value. For example, if the data type of the column is VARCHAR, you can specify a VARCHAR string, such as beijing, as the default value, as presented inDEFAULT "beijing"
. Note that default values cannot be any of the following types: ARRAY, BITMAP, JSON, HLL, and BOOLEAN. - DEFAULT (<expr>): Use the result returned by a given function as the default value. Only the uuid() and uuid_numeric() expressions are supported.
AUTO_INCREMENT: specifies an AUTO_INCREMENT
column. The data types of AUTO_INCREMENT
columns must be BIGINT. Auto-incremented IDs start from 1 and increase at a step of 1. For more information about AUTO_INCREMENT
columns, see AUTO_INCREMENT. Since v3.0, StarRocks supports AUTO_INCREMENT
columns.
AS generation_expr: specifies the generated column and its expression. The generated column can be used to precompute and store the results of expressions, which significantly accelerates queries with the same complex expressions. Since v3.1, StarRocks supports generated columns.
index_definition
INDEX index_name (col_name[, col_name, ...]) [USING BITMAP] COMMENT 'xxxxxx'
For more information about parameter descriptions and usage notes, see Bitmap indexing.
ENGINE type
Default value: olap
. If this parameter is not specified, an OLAP table (StarRocks native table) is created by default.
Optional value: mysql
, elasticsearch
, hive
, jdbc
(2.3 and later), iceberg
, and hudi
(2.2 and later). If you want to create an external table to query external data sources, specify CREATE EXTERNAL TABLE
and set ENGINE
to any of these values. You can refer to External table for more information.
We recommend that you use catalogs to query data from Hive, Iceberg, Hudi, and JDBC data sources. External tables are deprecated.
From v3.1 onwards, StarRocks supports creating Parquet-formatted tables in Iceberg catalogs, and you can insert data to these Parquet-formatted Iceberg tables by using INSERT INTO.
From v3.2 onwards, StarRocks supports creating Parquet-formatted tables in Hive catalogs, and supports sinking data to these Parquet-formatted Hive tables by using INSERT INTO. From v3.3 onwards, StarRocks supports creating ORC- and Textfile-formatted tables in Hive catalogs, and supports sinking data to these ORC- and Textfile-formatted Hive tables by using INSERT INTO
-
For MySQL, specify the following properties:
PROPERTIES (
"host" = "mysql_server_host",
"port" = "mysql_server_port",
"user" = "your_user_name",
"password" = "your_password",
"database" = "database_name",
"table" = "table_name"
)Note:
"table_name" in MySQL should indicate the real table name. In contrast, "table_name" in CREATE TABLE statement indicates the name of this mysql table on StarRocks. They can either be different or the same.
The aim of creating MySQL tables in StarRocks is to access MySQL database. StarRocks itself does not maintain or store any MySQL data.
-
For Elasticsearch, specify the following properties:
PROPERTIES (
"hosts" = "http://192.168.0.1:8200,http://192.168.0.2:8200",
"user" = "root",
"password" = "root",
"index" = "tindex",
"type" = "doc"
)hosts
: the URL that is used to connect your Elasticsearch cluster. You can specify one or more URLs.user
: the account of the root user that is used to log in to your Elasticsearch cluster for which basic authentication is enabled.password
: the password of the preceding root account.index
: the index of the StarRocks table in your Elasticsearch cluster. The index name is the same as the StarRocks table name. You can set this parameter to the alias of the StarRocks table.type
: the type of index. The default value isdoc
.
-
For Hive, specify the following properties:
PROPERTIES (
"database" = "hive_db_name",
"table" = "hive_table_name",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
)Here, database is the name of the corresponding database in Hive table. Table is the name of Hive table.
hive.metastore.uris
is the server address. -
For JDBC, specify the following properties:
PROPERTIES (
"resource"="jdbc0",
"table"="dest_tbl"
)resource
is the JDBC resource name andtable
is the destination table. -
For Iceberg, specify the following properties:
PROPERTIES (
"resource" = "iceberg0",
"database" = "iceberg",
"table" = "iceberg_table"
)resource
is the Iceberg resource name.database
is the Iceberg database.table
is the Iceberg table. -
For Hudi, specify the following properties:
PROPERTIES (
"resource" = "hudi0",
"database" = "hudi",
"table" = "hudi_table"
)
key_desc
Syntax:
key_type(k1[,k2 ...])
Data is sequenced in specified key columns and has different attributes for different key types:
- AGGREGATE KEY: Identical content in key columns will be aggregated into value columns according to the specified aggregation type. It usually applies to business scenarios such as financial statements and multi-dimensional analysis.
- UNIQUE KEY/PRIMARY KEY: Identical content in key columns will be replaced in value columns according to the import sequence. It can be applied to make addition, deletion, modification and query on key columns.
- DUPLICATE KEY: Identical content in key columns, which also exists in StarRocks at the same time. It can be used to store detailed data or data with no aggregation attributes. DUPLICATE KEY is the default type. Data will be sequenced according to key columns.
NOTE
Value columns do not need to specify aggregation types when other key_type is used to create tables with the exception of AGGREGATE KEY.
COMMENT
You can add a table comment when you create a table, optional. Note that COMMENT must be placed after key_desc
. Otherwise, the table cannot be created.
From v3.1 onwards, you can modify the table comment suing ALTER TABLE <table_name> COMMENT = "new table comment"
.
partition_desc
Partition description can be used in the following ways:
Create partitions dynamically
Dynamic partitioning provides a time-to-live (TTL) management for partitions. StarRocks automatically creates new partitions in advance and removes expired partitions to ensure data freshness. To enable this feature, you can configure Dynamic partitioning related properties at table creation.
Create partitions one by one
Specify only the upper bound for a partition
Syntax:
PARTITION BY RANGE ( <partitioning_column1> [, <partitioning_column2>, ... ] )
PARTITION <partition1_name> VALUES LESS THAN ("<upper_bound_for_partitioning_column1>" [ , "<upper_bound_for_partitioning_column2>", ... ] )
[ ,
PARTITION <partition2_name> VALUES LESS THAN ("<upper_bound_for_partitioning_column1>" [ , "<upper_bound_for_partitioning_column2>", ... ] )
, ... ]
)
Note:
Please use specified key columns and specified value ranges for partitioning.
-
For the naming conventions of partitions, see System limits.
-
Before v3.3.0, columns for the range partitioning only support the following types: TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, and DATETIME. Since v3.3.0, three specific time functions can be used as columns for the range partitioning. For detailed usage, see Data distribution.
-
Partitions are left closed and right open. The left boundary of the first partition is of minimum value.
-
NULL value is stored only in partitions that contain minimum values. When the partition containing the minimum value is deleted, NULL values can no longer be imported.
-
Partition columns can either be single columns or multiple columns. The partition values are the default minimum values.
-
When only one column is specified as the partitioning column, you can set
MAXVALUE
as the upper bound for the partitioning column of the most recent partition.PARTITION BY RANGE (pay_dt) (
PARTITION p1 VALUES LESS THAN ("20210102"),
PARTITION p2 VALUES LESS THAN ("20210103"),
PARTITION p3 VALUES LESS THAN MAXVALUE
)
Please note:
- Partitions are often used for managing data related to time.
- When data backtracking is needed, you may want to consider emptying the first partition for adding partitions later when necessary.
Specify both the lower and upper bounds for a partition
Syntax:
PARTITION BY RANGE ( <partitioning_column1> [, <partitioning_column2>, ... ] )
(
PARTITION <partition_name1> VALUES [( "<lower_bound_for_partitioning_column1>" [ , "<lower_bound_for_partitioning_column2>", ... ] ), ( "<upper_bound_for_partitioning_column1?" [ , "<upper_bound_for_partitioning_column2>", ... ] ) )
[,
PARTITION <partition_name2> VALUES [( "<lower_bound_for_partitioning_column1>" [ , "<lower_bound_for_partitioning_column2>", ... ] ), ( "<upper_bound_for_partitioning_column1>" [ , "<upper_bound_for_partitioning_column2>", ... ] ) )
, ...]
)
Note:
-
Fixed Range is more flexible than LESS THAN. You can customize the left and right partitions.
-
Fixed Range is the same as LESS THAN in the other aspects.
-
When only one column is specified as the partitioning column, you can set
MAXVALUE
as the upper bound for the partitioning column of the most recent partition.PARTITION BY RANGE (pay_dt) (
PARTITION p202101 VALUES [("20210101"), ("20210201")),
PARTITION p202102 VALUES [("20210201"), ("20210301")),
PARTITION p202103 VALUES [("20210301"), (MAXVALUE))
)
Create multiple partitions in a batch
Syntax
-
If the partitioning column is of a date type.
PARTITION BY RANGE (<partitioning_column>) (
START ("<start_date>") END ("<end_date>") EVERY (INTERVAL <N> <time_unit>)
) -
If the partitioning column is of an integer type.
PARTITION BY RANGE (<partitioning_column>) (
START ("<start_integer>") END ("<end_integer>") EVERY (<partitioning_granularity>)
)
Description
You can specify the start and end values in START()
and END()
and the time unit or partitioning granularity in EVERY()
to create multiple partitions in a batch.
- Before v3.3.0, columns for the range partitioning only support the following types: TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, and DATETIME. Since v3.3.0, three specific time functions can be used as columns for the range partitioning. For detailed usage, see Data distribution.
- If the partitioning column is of a date type, you need to use the
INTERVAL
keyword to specify the time interval. You can specify the time unit as hour (since v3.0), day, week, month, or year. The naming conventions of partitions are the same as those for dynamic partitions.
For more information, see Data distribution.