Release Notes
Introduction to CelerData Cloud Serverless
Quick Start
Warehouses
- Overview of warehouses
- Manage warehouses
Catalog, database, table, view, and MV
- Overview of database objects
- Catalog
- Table types
- Asynchronous materialized views
Data Loading
- Batch load data from Amazon S3
- Load streaming data from Confluent Cloud
- Load data from a local file system
Data access control
- Overview of access control
- Privileges
- Manage user privileges
- Column and row-level security
Networking and private connectivity
- Connect with a Private Link
- IP Firewall
Usage and Billing
- Understand usage and billing
Organization and Account
- Organization and account
Integration
- Authenticate to AWS resources
- BI tools
- IDE tools
  - DataGrip
  - DBeaver
Query Acceleration
- CBO
- Colocate join
- Indexing
  - Bitmap indexing
  - Bloomfilter indexing
- Data Deduplication
  - Use Bitmap for exact count distinct
  - Use HLL for approximate count distinct
- Lateral join
Reference
- AWS IAM policies
- Information Schema
- Data Types
- System Metadatabase
- Keywords
- SQL Statements
- SQL Functions
  - Function List
  - String Functions
    - CONCAT
    - HEX
    - LOWER
    - SPLIT
    - LPAD
    - SUBSTRING
    - PARSE_URL
    - INSTR
    - REPEAT
    - LCASE
    - REPLACE
    - HEX_DECODE_BINARY
    - RPAD
    - SPLIT_PART
    - STRCMP
    - SPACE
    - CHARACTER_LENGTH
    - URL_ENCODE
    - APPEND_TAILING_CHAR_IF_ABSENT
    - LTRIM
    - HEX_DECODE_STRING
    - URL_DECODE
    - LEFT
    - STARTS_WITH
    - CONCAT
    - GROUP_CONCAT
    - STR_TO_MAP
    - STRLEFT
    - STRRIGHT
    - MONEY_FORMAT
    - RIGHT
    - SUBSTRING_INDEX
    - UCASE
    - TRIM
    - FIND_IN_SET
    - RTRIM
    - ASCII
    - UPPER
    - REVERSE
    - LENGTH
    - UNHEX
    - ENDS_WITH
    - CHAR_LENGTH
    - NULL_OR_EMPTY
    - LOCATE
    - CHAR
  - Predicate Functions
    - REGEXP
    - REGEXP_EXTRACT_ALL
    - REGEXP_EXTRACT
    - REGEXP_REPLACE
    - LIKE
  - Map Functions
    - MAP_KEYS
    - MAP_VALUES
    - MAP_CONCAT
    - MAP_SIZE
    - MAP_FROM_ARRAYS
    - MAP_APPLY
    - TRANSFORM_KEYS
    - CARDINALITY
    - DISTINCT_MAP_KEYS
    - MAP_FILTER
    - ELEMENT_AT
    - TRANSFORM_VALUES
  - Binary Functions
    - TO_BINARY
    - FROM_BINARY
  - Geospatial Functions
    - ST_X
    - ST_Y
    - S_CIRCLE
    - ST_DISTANCE_SPHERE
    - ST_POINT
    - ST_POLYGON
    - ST_ASTEXT
    - ST_CONTAINS
    - ST_GEOMETRY_FROM_TEXT
    - ST_LINE_FROM_TEXT
  - Lambda Expression
    - Lambda Expression
  - Utility Functions
    - SLEEP
    - UUID
    - IS_NOT_NULL
    - CURRENT_VERSION
    - LAST_QUERY_ID
    - HOST_NAME
    - VERSION
    - UUID_NUMERIC
    - CATALOG
    - IS_NULL
    - CURRENT_ROLE
    - DATABASE
  - Bitmap Functions
    - BITMAP_SUBSET_LIMIT
    - TO_BITMAP
    - BITMAP_AGG
    - BITMAP_FROM_STRING
    - BITMAP_OR
    - BITMAP_REMOVE
    - BITMAP_AND
    - BITMAP_TO_BASE64
    - BITMAP_MIN
    - BITMAP_CONTAINS
    - SUB_BITMAP
    - BITMAP_UNION
    - BITMAP_COUNT
    - BITMAP_UNION_INT
    - BITMAP_XOR
    - BITMAP_UNION_COUNT
    - BITMAP_HAS_ANY
    - BITMAP_INTERSECT
    - BITMAP_AND_NOT
    - BITMAP_TO_STRING
    - BITMAP_HASH
    - INTERSECT_COUNT
    - BITMAP_EMPTY
    - BITMAP_MAX
    - BASE64_TO_ARRAY
    - BITMAP_TO_ARRAY
  - Struct Functions
    - NAMED_STRUCT
    - ROW
  - Aggregate Functions
    - RETENTION
    - MI
    - MULTI_DISTINCT_SUM
    - WINDOW_FUNNEL
    - STDDEV_SAMP
    - GROUPING_ID
    - HLL_HASH
    - AVG
    - HLL_UNION_AGG
    - COUNT
    - BITMAP
    - HLL_EMPTY
    - SUM
    - MAX_BY
    - PERCENTILE_CONT
    - COVAR_POP
    - PERCENTILE_APPROX
    - HLL_RAW_AGG
    - STDDEV
    - CORR
    - COVAR_SAMP
    - MIN_BY
    - MAX
    - VAR_SAMP
    - STD
    - HLL_UNION
    - APPROX_COUNT_DISTINCT
    - MULTI_DISTINCT_COUNT
    - VARIANCE
    - ANY_VALUE
    - COUNT_IF
    - GROUPING
    - PERCENTILE_DISC
  - Array Functions
    - ARRAY_CUM_SUM
    - ARRAY_MAX
    - ARRAY_LENGTH
    - ARRAY_REMOVE
    - UNNEST
    - ARRAY_SLICE
    - ALL_MATCH
    - ARRAY_CONCAT
    - ARRAY_SORT
    - ARRAY_POSITION
    - ARRAY_DIFFERENCE
    - ARRAY_CONTAINS
    - ARRAY_JOIN
    - ARRAY_INTERSECT
    - CARDINALITY
    - ARRAY_CONTAINS_ALL
    - ARRAYS_OVERLAP
    - ARRAY_MIN
    - ARRAY_MAP
    - ELEMENT_AT
    - ARRAY_APPEND
    - ARRAY_SORTBY
    - ARRAY_TO_BITMAP
    - ARRAY_GENERATE
    - ARRAY_AVG
    - ARRAY_FILTER
    - ANY_MATCH
    - REVERSE
    - ARRAY_AGG
    - ARRAY_DISTINCT
    - ARRAY_SUM
  - Condition Functions
    - NULLIF
    - IFNULL
    - IF
    - CASE_WHEN
    - COALESCE
  - Math Functions
    - COT
    - LOG10
    - PMOD
    - RAND
    - GREATEST
    - COS_SIMILARITY
    - ROUND
    - ACOS
    - ATAN
    - EXP
    - MOD
    - SIN
    - RADIANS
    - SQUARE
    - LOG2
    - CONV
    - SINH
    - SIGN
    - COSH
    - FMOD
    - TANH
    - LOG
    - SQRT
    - POSITIVE
    - CEILING
    - POW
    - TRUNCATE
    - FLOOR
    - COS_SIMIlARITY_NORM
    - DIVIDE
    - ABS
    - BIN
    - LEAST
    - CEIL
    - PI
    - DEGREES
    - MULTIPLY
    - ASIN
    - NEGATIVE
    - LN
    - E
    - TAN
    - COS
    - ATAN2
  - Date and Time Functions
    - DAYNAME
    - MINUTE
    - FROM_UNIXTIME
    - HOUR
    - MONTHNAME
    - MONTHS_ADD
    - ADD_MONTHS
    - DATE_SUB
    - PREVIOUS_DAY
    - TO_TERA_DATA
    - MINUTES_SUB
    - WEEKS_ADD
    - HOURS_DIFF
    - UNIX_TIMESTAMP
    - DAY
    - DATE_SLICE
    - DATE
    - CURTIME
    - SECONDS_SUB
    - MONTH
    - WEEK
    - TO_DATE
    - TIMEDIFF
    - MONTHS_DIFF
    - STR_TO_JODATIME
    - WEEK_ISO
    - MICROSECONDS_SUB
    - TIME_SLICE
    - MAKEDATE
    - DATE_TRUNC
    - JODATIME
    - DAYOFWEEK
    - YEARS_SUB
    - TIMESTAMP_ADD
    - HOURS_SUB
    - STR2DATE
    - TIMESTAMP
    - FROM_DAYS
    - WEEK_OF_YEAR
    - YEAR
    - TIMESTAMP_DIFF
    - TO_TERA_TIMESTAMP
    - DAYOFMONTH
    - DAYOFYEAR
    - DATE_FORMAT
    - MONTHS_SUB
    - NEXT_DAY
    - MINUTES_DIFF
    - DATA_ADD
    - MINUTES_ADD
    - CURDATE
    - DAY_OF_WEEK_ISO
    - CURRENt_TIMESTAMP
    - STR_TO_DATE
    - LAST_DAY
    - WEEKS_SUB
    - TO_DAYS
    - DATEDIFF
    - NOW
    - TO_ISO8601
    - TIME_TO_SEC
    - QUARTER
    - SECONDS_DIFF
    - UTC_TIMESTAMP
    - DATA_DIFF
    - SECONDS_ADD
    - ADDDATE
    - WEEKSDIFF
    - CONVERT_TZ
    - MICROSECONDS_ADD
    - SECOND
    - YEARS_DIFF
    - YEARS_ADD
    - HOURS_ADD
    - DAYS_SUB
    - DAYS_DIFF
  - Cryptographic Functions
    - SHA2
    - BASE64_DECODE_BINARY
    - SM3
    - AES_DECRYPT
    - MD5
    - AES_ENCRYPT
    - MD5SUM
    - MD5SUM_NUMERIC
    - FROM_BASE64
    - TO_BASE64
    - BASE64_DECODE_STRING
  - Percentile Functions
    - PERCENTILE_APPROX_RAW
    - PERCENTILE_UNION
    - PERCENTILE_HASH
    - PERCENTILE_EMPTY
  - Bit Functions
    - BITOR
    - BITAND
    - BIT_SHIFT_RIGHT_LOGICAL
    - BITXOR
    - BITNOT
    - BIT_SHIFT_LEFT
    - BIT_SHIFT_RIGHT
  - JSON Functions
    - Overview of JSON functions and operators
    - JSON_KEYS
    - JSON_STRING
    - GET_JSON_INT
    - GET_JSON_STRING
    - GET_JSON_DOUBLE
    - JSON_EACH
    - TO_JSON
    - JSON_LENGTH
    - ARROW FUNCTION
    - CAST
    - JSON_QUERY
    - JSON_EXISTS
    - JSON Operators
    - JSON_ARRAY
    - PARSE_JSON
    - JSON_OBJECT
  - Hash Functions
    - MURMUR_HASH3_32
  - Scalar Functions
    - HLL_CARDINALITY
  - Table Functions
    - GENERATE_SERIES
    - FILES

Unique Key table

When you create a table, you can define primary key columns and metric columns. This way, queries return the most recent record among a group of records that have the same primary key. Compared with the Duplicate Key table, the Unique Key table simplifies the data loading process to better support real-time and frequent data updates.

Scenarios

The Unique Key table is suitable for business scenarios in which data needs to be frequently updated in real time. For example, in e-commerce scenarios, hundreds of millions of orders can be placed per day, and the statuses of the orders frequently change.

Principle

The Unique Key table can be considered a special Aggregate table in which the REPLACE aggregate function is specified for metric columns to return the most recent record among a group of records that have the same primary key.

When you load data into a table that uses the Unique Key table, the data is split into multiple batches. Each batch is assigned a version number. Therefore, records with the same primary key may come in multiple versions, of which the most recent version (namely, the record with the largest version number) is retrieved for queries.

As shown in the following table, ID is the primary key column, value is a metric column, and _version holds the data version numbers generated within CelerData. In this example, the record with an ID of 1 is loaded by two batches whose version numbers are 1 and 2, and the record with an ID of 2 is loaded by three batches whose version numbers are 3, 4, and 5.

ID	value	_version
1	100	1
1	101	2
2	100	3
2	101	4
2	102	5

When you query the record with an ID of 1, the most recent record with the largest version number, which is 2 in this case, is returned. When you query the record with an ID of 2, the most recent record with the largest version number, which is 5 in this case, is returned. The following table shows the records returned by the two queries:

ID	value
1	101
2	102

Create a table

In e-commerce scenarios, you often need to collect and analyze the statuses of orders by date. In this example, create a table named orders to hold the orders, define create_time and order_id, which are frequently used as conditions to filter the orders, as primary key columns, and define the other two columns, order_state and total_price, as metric columns. This way, the orders can be updated in real time as their statuses change, and can be quickly filtered to accelerate queries.

The statement for creating the table is as follows:

CREATE TABLE IF NOT EXISTS orders (
    create_time DATE NOT NULL COMMENT "create time of an order",
    order_id BIGINT NOT NULL COMMENT "id of an order",
    order_state INT COMMENT "state of an order",
    total_price BIGINT COMMENT "price of an order"
)
UNIQUE KEY(create_time, order_id)
DISTRIBUTED BY HASH(order_id) BUCKETS 8;

Usage notes

Take note of the following points about the primary key of a table:
- The primary key is defined by using the UNIQUE KEY keyword.
- The primary key must be created on columns on which unique constraints are enforced and whose names cannot be changed.
- The primary key must be properly designed:
  - When queries are run, primary key columns are filtered before the aggregation of multiple data versions, whereas metric columns are filtered after the aggregation of multiple data versions. Therefore, we recommend that you identify the columns that are frequently used as filter conditions and define these columns as primary key columns. This way, data filtering can start before the aggregation of multiple data versions to improve query performance.
  - During the aggregation process, CelerData compares all primary key columns. This is time-consuming and may decrease query performance. Therefore, do not define a large number of primary key columns. If a column is rarely used as a filter condition for queries, we recommend that you do not define the column as a primary key column.
When you create a table, you cannot create BITMAP indexes or Bloom Filter indexes on the metric columns of the table.
The Unique Key table does not support materialized views.

What to do next

After a table is created, you can use the loading method Load from S3 to load data into CelerData.

When you load data into a table that uses the Unique Key table, you can only update all columns of the table. For example, when you update the preceding orders table, you must update all its columns, which are create_time, order_id, order_state, and total_price.
When you query data from a table that uses the Unique Key table, CelerData needs to aggregate records of multiple data versions. In this situation, a large number of data versions decreases query performance. Therefore, we recommend that you specify a proper frequency at which data is loaded into the table to meet meet your requirements for real-time data analytics while preventing a large number of data versions. If you require minute-level data, you can specify a loading frequency of 1 minute instead of a loading frequency of 1 second.

In this article

Unique Key table