Release Notes
Introduction to CelerData Cloud Serverless
Quick Start
Warehouses
- Overview of warehouses
- Manage warehouses
Catalog, database, table, view, and MV
- Overview of database objects
- Catalog
- Table types
- Asynchronous materialized views
Data Loading
- Batch load data from Amazon S3
- Load streaming data from Confluent Cloud
- Load data from a local file system
Data access control
- Overview of access control
- Privileges
- Manage user privileges
- Column and row-level security
Networking and private connectivity
- Connect with a Private Link
- IP Firewall
Usage and Billing
- Understand usage and billing
Organization and Account
- Organization and account
Integration
- Authenticate to AWS resources
- BI tools
- IDE tools
  - DataGrip
  - DBeaver
Query Acceleration
- CBO
- Colocate join
- Indexing
  - Bitmap indexing
  - Bloomfilter indexing
- Data Deduplication
  - Use Bitmap for exact count distinct
  - Use HLL for approximate count distinct
- Lateral join
Reference
- AWS IAM policies
- Information Schema
- Data Types
- System Metadatabase
- Keywords
- SQL Statements
- SQL Functions
  - Function List
  - String Functions
    - CONCAT
    - HEX
    - LOWER
    - SPLIT
    - LPAD
    - SUBSTRING
    - PARSE_URL
    - INSTR
    - REPEAT
    - LCASE
    - REPLACE
    - HEX_DECODE_BINARY
    - RPAD
    - SPLIT_PART
    - STRCMP
    - SPACE
    - CHARACTER_LENGTH
    - URL_ENCODE
    - APPEND_TAILING_CHAR_IF_ABSENT
    - LTRIM
    - HEX_DECODE_STRING
    - URL_DECODE
    - LEFT
    - STARTS_WITH
    - CONCAT
    - GROUP_CONCAT
    - STR_TO_MAP
    - STRLEFT
    - STRRIGHT
    - MONEY_FORMAT
    - RIGHT
    - SUBSTRING_INDEX
    - UCASE
    - TRIM
    - FIND_IN_SET
    - RTRIM
    - ASCII
    - UPPER
    - REVERSE
    - LENGTH
    - UNHEX
    - ENDS_WITH
    - CHAR_LENGTH
    - NULL_OR_EMPTY
    - LOCATE
    - CHAR
  - Predicate Functions
    - REGEXP
    - REGEXP_EXTRACT_ALL
    - REGEXP_EXTRACT
    - REGEXP_REPLACE
    - LIKE
  - Map Functions
    - MAP_KEYS
    - MAP_VALUES
    - MAP_CONCAT
    - MAP_SIZE
    - MAP_FROM_ARRAYS
    - MAP_APPLY
    - TRANSFORM_KEYS
    - CARDINALITY
    - DISTINCT_MAP_KEYS
    - MAP_FILTER
    - ELEMENT_AT
    - TRANSFORM_VALUES
  - Binary Functions
    - TO_BINARY
    - FROM_BINARY
  - Geospatial Functions
    - ST_X
    - ST_Y
    - S_CIRCLE
    - ST_DISTANCE_SPHERE
    - ST_POINT
    - ST_POLYGON
    - ST_ASTEXT
    - ST_CONTAINS
    - ST_GEOMETRY_FROM_TEXT
    - ST_LINE_FROM_TEXT
  - Lambda Expression
    - Lambda Expression
  - Utility Functions
    - SLEEP
    - UUID
    - IS_NOT_NULL
    - CURRENT_VERSION
    - LAST_QUERY_ID
    - HOST_NAME
    - VERSION
    - UUID_NUMERIC
    - CATALOG
    - IS_NULL
    - CURRENT_ROLE
    - DATABASE
  - Bitmap Functions
    - BITMAP_SUBSET_LIMIT
    - TO_BITMAP
    - BITMAP_AGG
    - BITMAP_FROM_STRING
    - BITMAP_OR
    - BITMAP_REMOVE
    - BITMAP_AND
    - BITMAP_TO_BASE64
    - BITMAP_MIN
    - BITMAP_CONTAINS
    - SUB_BITMAP
    - BITMAP_UNION
    - BITMAP_COUNT
    - BITMAP_UNION_INT
    - BITMAP_XOR
    - BITMAP_UNION_COUNT
    - BITMAP_HAS_ANY
    - BITMAP_INTERSECT
    - BITMAP_AND_NOT
    - BITMAP_TO_STRING
    - BITMAP_HASH
    - INTERSECT_COUNT
    - BITMAP_EMPTY
    - BITMAP_MAX
    - BASE64_TO_ARRAY
    - BITMAP_TO_ARRAY
  - Struct Functions
    - NAMED_STRUCT
    - ROW
  - Aggregate Functions
    - RETENTION
    - MI
    - MULTI_DISTINCT_SUM
    - WINDOW_FUNNEL
    - STDDEV_SAMP
    - GROUPING_ID
    - HLL_HASH
    - AVG
    - HLL_UNION_AGG
    - COUNT
    - BITMAP
    - HLL_EMPTY
    - SUM
    - MAX_BY
    - PERCENTILE_CONT
    - COVAR_POP
    - PERCENTILE_APPROX
    - HLL_RAW_AGG
    - STDDEV
    - CORR
    - COVAR_SAMP
    - MIN_BY
    - MAX
    - VAR_SAMP
    - STD
    - HLL_UNION
    - APPROX_COUNT_DISTINCT
    - MULTI_DISTINCT_COUNT
    - VARIANCE
    - ANY_VALUE
    - COUNT_IF
    - GROUPING
    - PERCENTILE_DISC
  - Array Functions
    - ARRAY_CUM_SUM
    - ARRAY_MAX
    - ARRAY_LENGTH
    - ARRAY_REMOVE
    - UNNEST
    - ARRAY_SLICE
    - ALL_MATCH
    - ARRAY_CONCAT
    - ARRAY_SORT
    - ARRAY_POSITION
    - ARRAY_DIFFERENCE
    - ARRAY_CONTAINS
    - ARRAY_JOIN
    - ARRAY_INTERSECT
    - CARDINALITY
    - ARRAY_CONTAINS_ALL
    - ARRAYS_OVERLAP
    - ARRAY_MIN
    - ARRAY_MAP
    - ELEMENT_AT
    - ARRAY_APPEND
    - ARRAY_SORTBY
    - ARRAY_TO_BITMAP
    - ARRAY_GENERATE
    - ARRAY_AVG
    - ARRAY_FILTER
    - ANY_MATCH
    - REVERSE
    - ARRAY_AGG
    - ARRAY_DISTINCT
    - ARRAY_SUM
  - Condition Functions
    - NULLIF
    - IFNULL
    - IF
    - CASE_WHEN
    - COALESCE
  - Math Functions
    - COT
    - LOG10
    - PMOD
    - RAND
    - GREATEST
    - COS_SIMILARITY
    - ROUND
    - ACOS
    - ATAN
    - EXP
    - MOD
    - SIN
    - RADIANS
    - SQUARE
    - LOG2
    - CONV
    - SINH
    - SIGN
    - COSH
    - FMOD
    - TANH
    - LOG
    - SQRT
    - POSITIVE
    - CEILING
    - POW
    - TRUNCATE
    - FLOOR
    - COS_SIMIlARITY_NORM
    - DIVIDE
    - ABS
    - BIN
    - LEAST
    - CEIL
    - PI
    - DEGREES
    - MULTIPLY
    - ASIN
    - NEGATIVE
    - LN
    - E
    - TAN
    - COS
    - ATAN2
  - Date and Time Functions
    - DAYNAME
    - MINUTE
    - FROM_UNIXTIME
    - HOUR
    - MONTHNAME
    - MONTHS_ADD
    - ADD_MONTHS
    - DATE_SUB
    - PREVIOUS_DAY
    - TO_TERA_DATA
    - MINUTES_SUB
    - WEEKS_ADD
    - HOURS_DIFF
    - UNIX_TIMESTAMP
    - DAY
    - DATE_SLICE
    - DATE
    - CURTIME
    - SECONDS_SUB
    - MONTH
    - WEEK
    - TO_DATE
    - TIMEDIFF
    - MONTHS_DIFF
    - STR_TO_JODATIME
    - WEEK_ISO
    - MICROSECONDS_SUB
    - TIME_SLICE
    - MAKEDATE
    - DATE_TRUNC
    - JODATIME
    - DAYOFWEEK
    - YEARS_SUB
    - TIMESTAMP_ADD
    - HOURS_SUB
    - STR2DATE
    - TIMESTAMP
    - FROM_DAYS
    - WEEK_OF_YEAR
    - YEAR
    - TIMESTAMP_DIFF
    - TO_TERA_TIMESTAMP
    - DAYOFMONTH
    - DAYOFYEAR
    - DATE_FORMAT
    - MONTHS_SUB
    - NEXT_DAY
    - MINUTES_DIFF
    - DATA_ADD
    - MINUTES_ADD
    - CURDATE
    - DAY_OF_WEEK_ISO
    - CURRENt_TIMESTAMP
    - STR_TO_DATE
    - LAST_DAY
    - WEEKS_SUB
    - TO_DAYS
    - DATEDIFF
    - NOW
    - TO_ISO8601
    - TIME_TO_SEC
    - QUARTER
    - SECONDS_DIFF
    - UTC_TIMESTAMP
    - DATA_DIFF
    - SECONDS_ADD
    - ADDDATE
    - WEEKSDIFF
    - CONVERT_TZ
    - MICROSECONDS_ADD
    - SECOND
    - YEARS_DIFF
    - YEARS_ADD
    - HOURS_ADD
    - DAYS_SUB
    - DAYS_DIFF
  - Cryptographic Functions
    - SHA2
    - BASE64_DECODE_BINARY
    - SM3
    - AES_DECRYPT
    - MD5
    - AES_ENCRYPT
    - MD5SUM
    - MD5SUM_NUMERIC
    - FROM_BASE64
    - TO_BASE64
    - BASE64_DECODE_STRING
  - Percentile Functions
    - PERCENTILE_APPROX_RAW
    - PERCENTILE_UNION
    - PERCENTILE_HASH
    - PERCENTILE_EMPTY
  - Bit Functions
    - BITOR
    - BITAND
    - BIT_SHIFT_RIGHT_LOGICAL
    - BITXOR
    - BITNOT
    - BIT_SHIFT_LEFT
    - BIT_SHIFT_RIGHT
  - JSON Functions
    - Overview of JSON functions and operators
    - JSON_KEYS
    - JSON_STRING
    - GET_JSON_INT
    - GET_JSON_STRING
    - GET_JSON_DOUBLE
    - JSON_EACH
    - TO_JSON
    - JSON_LENGTH
    - ARROW FUNCTION
    - CAST
    - JSON_QUERY
    - JSON_EXISTS
    - JSON Operators
    - JSON_ARRAY
    - PARSE_JSON
    - JSON_OBJECT
  - Hash Functions
    - MURMUR_HASH3_32
  - Scalar Functions
    - HLL_CARDINALITY
  - Table Functions
    - GENERATE_SERIES
    - FILES

JSON

This topic describes the basic concepts of JSON. It also describes how to create a JSON column, load JSON data, query JSON data, and use JSON functions and operators to construct and process JSON data.

What is JSON

JSON is a lightweight, data-interchange format that is designed for semi-structured data. JSON presents data in a hierarchical tree structure, which is flexible and easy to read and write in a wide range of data storage and analytics scenarios. JSON supports NULL values and the following data types: NUMBER, STRING, BOOLEAN, ARRAY, and OBJECT.

For more information about JSON, visit the JSON website. For information about the input and output syntax of JSON, see JSON specifications at RFC 7159.

CelerData supports both storage and efficient querying and analytics of JSON data. CelerData does not directly store the input text. Instead, it stores JSON data in a binary format to reduce the cost of parsing and increase query efficiency.

Use JSON data

Create a JSON column

When you create a table, you can use the JSON keyword to specify the j column as a JSON column.

CREATE TABLE `tj` (
    `id` INT(11) NOT NULL COMMENT "",
    `j`  JSON NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`)
PROPERTIES (
    "replication_num" = "3",
    "storage_format" = "DEFAULT"
);

Load data and store the data as JSON data

CelerData provides the following methods for you to load data and store the data as JSON data:

Method 1: Use INSERT INTO to write data to a JSON column of a table. In the following example, a table named tj is used, and the j column of the table is a JSON column.

INSERT INTO tj (id, j) VALUES (1, parse_json('{"a": 1, "b": true}'));
INSERT INTO tj (id, j) VALUES (2, parse_json('{"a": 2, "b": false}'));
INSERT INTO tj (id, j) VALUES (3, parse_json('{"a": 3, "b": true}'));
INSERT INTO tj (id, j) VALUES (4, json_object('a', 4, 'b', false));

The parse_json function can interpret STRING data as JSON data. The json_object function can construct a JSON object or convert an existing table to a JSON file. For more information, see parse_json and json_object.

Method 2: Use Broker Load to load a Parquet file and store the file as JSON data. For more information, see Broker Load.

CelerData supports the following data type conversions at Parquet file loading.

Data type of Parquet file	JSON data type
INTEGER (INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, and UINT64)	NUMBER
FLOAT and DOUBLE	NUMBER
BOOLEAN	BOOLEAN
STRING	STRING
MAP	OBJECT
STRUCT	OBJECT
LIST	ARRAY
Other data types such as UNION and TIMESTAMP	Not supported

Method 3: Use Routine Load) to continuously load JSON data from a Confluent cluster.

Query and process JSON data

CelerData supports the querying and processing of JSON data and the use of JSON functions and operators.

In the following examples, a table named tj is used, and the j column of the table is specified as the JSON column.

mysql> select * from tj;
+------+----------------------+
| id   |          j           |
+------+----------------------+
| 1    | {"a": 1, "b": true}  |
| 2    | {"a": 2, "b": false} |
| 3    | {"a": 3, "b": true}  |
| 4    | {"a": 4, "b": false} |
+------+----------------------+

Example 1: Filter the data of the JSON column to retrieve the data that meets the id=1 filter condition.

mysql> select * from tj where id = 1;
+------+---------------------+
| id   |           j         |
+------+---------------------+
| 1    | {"a": 1, "b": true} |
+------+---------------------+

Example 2: Filter data of the JSON column j to retrieve the data that meets the specified filter condition.

j->'a' returns JSON data. You can use the first example to compare data (Note that implicit conversion is performed in this example). Alternatively, you can convert JSON data to INT by using the CAST function and then compare the data.

mysql> select * from tj where j->'a' = 1;
+------+---------------------+
| id   | j                   |
+------+---------------------+
|    1 | {"a": 1, "b": true} |


mysql> select * from tj where cast(j->'a' as INT) = 1;
+------+---------------------+
| id   | j                   |
+------+---------------------+
|    1 | {"a": 1, "b": true} |
+------+---------------------+

Example 3: Use the CAST function to convert the values in the JSON column of the table to BOOLEAN values. Then, filter the data of the JSON column to retrieve the data that meets the specified filter condition.

mysql> select * from tj where cast(j->'b' as boolean);
+------+---------------------+
|  id  |          j          |
+------+---------------------+
| 1    | {"a": 1, "b": true} |
| 3    | {"a": 3, "b": true} |
+------+---------------------+

Example 4: Use the CAST function to convert the values in the JSON column of the table to BOOLEAN values. Then, filter the data of the JSON column to retrieve the data that meets the specified filter condition, and perform arithmetic operations on the data.

mysql> select cast(j->'a' as int) from tj where cast(j->'b' as boolean);
+-----------------------+
|  CAST(j->'a' AS INT)  |
+-----------------------+
|          3            |
|          1            |
+-----------------------+

mysql> select sum(cast(j->'a' as int)) from tj where cast(j->'b' as boolean);
+----------------------------+
| sum(CAST(j->'a' AS INT))  |
+----------------------------+
|              4             |
+----------------------------+

Example 5: Sort the data of the table by using the JSON column as a sort key.

mysql> select * from tj
    ->        where j->'a' <= parse_json('3')
    ->        order by cast(j->'a' as int);
+------+----------------------+
| id   | j                    |
+------+----------------------+
|    1 | {"a": 1, "b": true}  |
|    2 | {"a": 2, "b": false} |
|    3 | {"a": 3, "b": true}  |
|    4 | {"a": 4, "b": false} |
+------+----------------------+
4 rows in set (0.05 sec)

JSON functions and operators

You can use JSON functions and operators to construct and process JSON data. For more information, see Overview of JSON functions and operators.

Limits and usage notes

The maximum length of a JSON value is 16 MB.
The ORDER BY, GROUP BY, and JOIN clauses do not support references to JSON columns. If you want to create references to JSON columns, use the CAST function to convert JSON columns to SQL columns before you create the references. For more information, see cast.
JSON columns are supported in Duplicate Key, Primary Key, and Unique Key tables. They are not supported in Aggregate tables.
JSON columns cannot be used as partition keys, bucketing keys, or dimension columns of DUPLICATE KEY, PRIMARY KEY, and UNIQUE KEY tables. They cannot be used in ORDER BY, GROUP BY, and JOIN clauses.
CelerData allows you to use the following JSON comparison operators to query JSON data: <, <=, >, >=, =, and !=. It does not allow you to use IN to query JSON data.

In this article

JSON