Release Notes
Introduction to CelerData Cloud Serverless
Quick Start
Warehouses
- Overview of warehouses
- Manage warehouses
Catalog, database, table, view, and MV
- Overview of database objects
- Catalog
- Table types
- Asynchronous materialized views
Data Loading
- Batch load data from Amazon S3
- Load streaming data from Confluent Cloud
- Load data from a local file system
Data access control
- Overview of access control
- Privileges
- Manage user privileges
- Column and row-level security
Networking and private connectivity
- Connect with a Private Link
- IP Firewall
Usage and Billing
- Understand usage and billing
Organization and Account
- Organization and account
Integration
- Authenticate to AWS resources
- BI tools
- IDE tools
  - DataGrip
  - DBeaver
Query Acceleration
- CBO
- Colocate join
- Indexing
  - Bitmap indexing
  - Bloomfilter indexing
- Data Deduplication
  - Use Bitmap for exact count distinct
  - Use HLL for approximate count distinct
- Lateral join
Reference
- AWS IAM policies
- Information Schema
- Data Types
- System Metadatabase
- Keywords
- SQL Statements
- SQL Functions
  - Function List
  - String Functions
    - CONCAT
    - HEX
    - LOWER
    - SPLIT
    - LPAD
    - SUBSTRING
    - PARSE_URL
    - INSTR
    - REPEAT
    - LCASE
    - REPLACE
    - HEX_DECODE_BINARY
    - RPAD
    - SPLIT_PART
    - STRCMP
    - SPACE
    - CHARACTER_LENGTH
    - URL_ENCODE
    - APPEND_TAILING_CHAR_IF_ABSENT
    - LTRIM
    - HEX_DECODE_STRING
    - URL_DECODE
    - LEFT
    - STARTS_WITH
    - CONCAT
    - GROUP_CONCAT
    - STR_TO_MAP
    - STRLEFT
    - STRRIGHT
    - MONEY_FORMAT
    - RIGHT
    - SUBSTRING_INDEX
    - UCASE
    - TRIM
    - FIND_IN_SET
    - RTRIM
    - ASCII
    - UPPER
    - REVERSE
    - LENGTH
    - UNHEX
    - ENDS_WITH
    - CHAR_LENGTH
    - NULL_OR_EMPTY
    - LOCATE
    - CHAR
  - Predicate Functions
    - REGEXP
    - REGEXP_EXTRACT_ALL
    - REGEXP_EXTRACT
    - REGEXP_REPLACE
    - LIKE
  - Map Functions
    - MAP_KEYS
    - MAP_VALUES
    - MAP_CONCAT
    - MAP_SIZE
    - MAP_FROM_ARRAYS
    - MAP_APPLY
    - TRANSFORM_KEYS
    - CARDINALITY
    - DISTINCT_MAP_KEYS
    - MAP_FILTER
    - ELEMENT_AT
    - TRANSFORM_VALUES
  - Binary Functions
    - TO_BINARY
    - FROM_BINARY
  - Geospatial Functions
    - ST_X
    - ST_Y
    - S_CIRCLE
    - ST_DISTANCE_SPHERE
    - ST_POINT
    - ST_POLYGON
    - ST_ASTEXT
    - ST_CONTAINS
    - ST_GEOMETRY_FROM_TEXT
    - ST_LINE_FROM_TEXT
  - Lambda Expression
    - Lambda Expression
  - Utility Functions
    - SLEEP
    - UUID
    - IS_NOT_NULL
    - CURRENT_VERSION
    - LAST_QUERY_ID
    - HOST_NAME
    - VERSION
    - UUID_NUMERIC
    - CATALOG
    - IS_NULL
    - CURRENT_ROLE
    - DATABASE
  - Bitmap Functions
    - BITMAP_SUBSET_LIMIT
    - TO_BITMAP
    - BITMAP_AGG
    - BITMAP_FROM_STRING
    - BITMAP_OR
    - BITMAP_REMOVE
    - BITMAP_AND
    - BITMAP_TO_BASE64
    - BITMAP_MIN
    - BITMAP_CONTAINS
    - SUB_BITMAP
    - BITMAP_UNION
    - BITMAP_COUNT
    - BITMAP_UNION_INT
    - BITMAP_XOR
    - BITMAP_UNION_COUNT
    - BITMAP_HAS_ANY
    - BITMAP_INTERSECT
    - BITMAP_AND_NOT
    - BITMAP_TO_STRING
    - BITMAP_HASH
    - INTERSECT_COUNT
    - BITMAP_EMPTY
    - BITMAP_MAX
    - BASE64_TO_ARRAY
    - BITMAP_TO_ARRAY
  - Struct Functions
    - NAMED_STRUCT
    - ROW
  - Aggregate Functions
    - RETENTION
    - MI
    - MULTI_DISTINCT_SUM
    - WINDOW_FUNNEL
    - STDDEV_SAMP
    - GROUPING_ID
    - HLL_HASH
    - AVG
    - HLL_UNION_AGG
    - COUNT
    - BITMAP
    - HLL_EMPTY
    - SUM
    - MAX_BY
    - PERCENTILE_CONT
    - COVAR_POP
    - PERCENTILE_APPROX
    - HLL_RAW_AGG
    - STDDEV
    - CORR
    - COVAR_SAMP
    - MIN_BY
    - MAX
    - VAR_SAMP
    - STD
    - HLL_UNION
    - APPROX_COUNT_DISTINCT
    - MULTI_DISTINCT_COUNT
    - VARIANCE
    - ANY_VALUE
    - COUNT_IF
    - GROUPING
    - PERCENTILE_DISC
  - Array Functions
    - ARRAY_CUM_SUM
    - ARRAY_MAX
    - ARRAY_LENGTH
    - ARRAY_REMOVE
    - UNNEST
    - ARRAY_SLICE
    - ALL_MATCH
    - ARRAY_CONCAT
    - ARRAY_SORT
    - ARRAY_POSITION
    - ARRAY_DIFFERENCE
    - ARRAY_CONTAINS
    - ARRAY_JOIN
    - ARRAY_INTERSECT
    - CARDINALITY
    - ARRAY_CONTAINS_ALL
    - ARRAYS_OVERLAP
    - ARRAY_MIN
    - ARRAY_MAP
    - ELEMENT_AT
    - ARRAY_APPEND
    - ARRAY_SORTBY
    - ARRAY_TO_BITMAP
    - ARRAY_GENERATE
    - ARRAY_AVG
    - ARRAY_FILTER
    - ANY_MATCH
    - REVERSE
    - ARRAY_AGG
    - ARRAY_DISTINCT
    - ARRAY_SUM
  - Condition Functions
    - NULLIF
    - IFNULL
    - IF
    - CASE_WHEN
    - COALESCE
  - Math Functions
    - COT
    - LOG10
    - PMOD
    - RAND
    - GREATEST
    - COS_SIMILARITY
    - ROUND
    - ACOS
    - ATAN
    - EXP
    - MOD
    - SIN
    - RADIANS
    - SQUARE
    - LOG2
    - CONV
    - SINH
    - SIGN
    - COSH
    - FMOD
    - TANH
    - LOG
    - SQRT
    - POSITIVE
    - CEILING
    - POW
    - TRUNCATE
    - FLOOR
    - COS_SIMIlARITY_NORM
    - DIVIDE
    - ABS
    - BIN
    - LEAST
    - CEIL
    - PI
    - DEGREES
    - MULTIPLY
    - ASIN
    - NEGATIVE
    - LN
    - E
    - TAN
    - COS
    - ATAN2
  - Date and Time Functions
    - DAYNAME
    - MINUTE
    - FROM_UNIXTIME
    - HOUR
    - MONTHNAME
    - MONTHS_ADD
    - ADD_MONTHS
    - DATE_SUB
    - PREVIOUS_DAY
    - TO_TERA_DATA
    - MINUTES_SUB
    - WEEKS_ADD
    - HOURS_DIFF
    - UNIX_TIMESTAMP
    - DAY
    - DATE_SLICE
    - DATE
    - CURTIME
    - SECONDS_SUB
    - MONTH
    - WEEK
    - TO_DATE
    - TIMEDIFF
    - MONTHS_DIFF
    - STR_TO_JODATIME
    - WEEK_ISO
    - MICROSECONDS_SUB
    - TIME_SLICE
    - MAKEDATE
    - DATE_TRUNC
    - JODATIME
    - DAYOFWEEK
    - YEARS_SUB
    - TIMESTAMP_ADD
    - HOURS_SUB
    - STR2DATE
    - TIMESTAMP
    - FROM_DAYS
    - WEEK_OF_YEAR
    - YEAR
    - TIMESTAMP_DIFF
    - TO_TERA_TIMESTAMP
    - DAYOFMONTH
    - DAYOFYEAR
    - DATE_FORMAT
    - MONTHS_SUB
    - NEXT_DAY
    - MINUTES_DIFF
    - DATA_ADD
    - MINUTES_ADD
    - CURDATE
    - DAY_OF_WEEK_ISO
    - CURRENt_TIMESTAMP
    - STR_TO_DATE
    - LAST_DAY
    - WEEKS_SUB
    - TO_DAYS
    - DATEDIFF
    - NOW
    - TO_ISO8601
    - TIME_TO_SEC
    - QUARTER
    - SECONDS_DIFF
    - UTC_TIMESTAMP
    - DATA_DIFF
    - SECONDS_ADD
    - ADDDATE
    - WEEKSDIFF
    - CONVERT_TZ
    - MICROSECONDS_ADD
    - SECOND
    - YEARS_DIFF
    - YEARS_ADD
    - HOURS_ADD
    - DAYS_SUB
    - DAYS_DIFF
  - Cryptographic Functions
    - SHA2
    - BASE64_DECODE_BINARY
    - SM3
    - AES_DECRYPT
    - MD5
    - AES_ENCRYPT
    - MD5SUM
    - MD5SUM_NUMERIC
    - FROM_BASE64
    - TO_BASE64
    - BASE64_DECODE_STRING
  - Percentile Functions
    - PERCENTILE_APPROX_RAW
    - PERCENTILE_UNION
    - PERCENTILE_HASH
    - PERCENTILE_EMPTY
  - Bit Functions
    - BITOR
    - BITAND
    - BIT_SHIFT_RIGHT_LOGICAL
    - BITXOR
    - BITNOT
    - BIT_SHIFT_LEFT
    - BIT_SHIFT_RIGHT
  - JSON Functions
    - Overview of JSON functions and operators
    - JSON_KEYS
    - JSON_STRING
    - GET_JSON_INT
    - GET_JSON_STRING
    - GET_JSON_DOUBLE
    - JSON_EACH
    - TO_JSON
    - JSON_LENGTH
    - ARROW FUNCTION
    - CAST
    - JSON_QUERY
    - JSON_EXISTS
    - JSON Operators
    - JSON_ARRAY
    - PARSE_JSON
    - JSON_OBJECT
  - Hash Functions
    - MURMUR_HASH3_32
  - Scalar Functions
    - HLL_CARDINALITY
  - Table Functions
    - GENERATE_SERIES
    - FILES

What is CelerData Cloud Serverless?

CelerData Cloud Serverless is a fully managed, blazingly fast data lake analytics platform built on top of StarRocks.

By taking advantage of an architecture that features separation of storage and compute, CelerData brings users cost-effective compute resources for their analytical SQL workloads.

With its out-of-the-box data analytics infrastructure, CelerData provides timely insights to all stakeholders. CelerData has brought many engineering breakthroughs to the market, delivering over 3x performance gains in standard benchmarks and up to an 80% reduction in operating costs. A number of significant customers worldwide choose CelerData as their analytics platform.

CelerData Cloud Serverless supports a wide range of use cases within one platform

Data lake analytics

Directly query the data on your own data lake without data migration. CelerData Cloud Serverless supports all mainstream open data lake formats including Apache Hive™, Apache Hudi, Apache Iceberg, and Apache Delta Lake.

Query acceleration

Accelerate the analytics workloads on your data lake using asynchronous materialized views.

Data warehousing

Ingest data from external data sources into CelerData Cloud Serverless to support many more low-latency and high-concurrency data analysis scenarios.

The following figure shows the architecture of CelerData.

Architecture

Software editions in CelerData Cloud Serverless

CelerData Cloud Serverless offers two software editions to choose from, ensuring that your usage fits your organization’s specific requirements:

CelerData Cloud Serverless Standard Edition
CelerData Cloud Serverless Premium Edition

Premium Edition builds on Standard Edition through the addition of edition-specific features and/or higher levels of service. As your organization’s needs to change and grow, changing editions is easy.

When you create an account in CelerData, you need to define the software edition of the account.

Standard Edition

Customer benefits:

Provides the compute engine for your data lake at minimum cost
Stores in open standard formats
Easy to query directly against AWS S3 or AWS Glue
One-click migration from Trino, Presto, or Athena
Provides a unified query layer to build reports based on multiple data sources

In Standard Edition, CelerData will not provide managed storage volumes for customers, which means that CelerData can be used as a query engine to conduct low-latency interactive analysis of your data lake, but not store data locally within CelerData.

Premium Edition

Customer benefits:

Everything included in Standard Edition
Higher performance queries for low-latency customer-facing workloads
Pipeline-free query acceleration using materialized views for simplified architecture/maintenance
Real-time low-latency workloads

In Premium Edition, CelerData provides managed storage for customers, which means data can be stored persistently within CelerData instead of, or in addition to, data in your data lake. So, in this edition, CelerData can be used as a query engine, and also as an analytical database. Import from your data lake into CelerData further accelerates analysis and supports business scenarios with more stringent latency and concurrency requirements.

Feature comparison

Feature	Standard Edition	Premium Edition
Integrate with external metastore (AWS Glue or Hive metastore) and query data in external data system (Apache Hudi, Apache Iceberg, Apache Hive, or Deltalake)	✓	✓
Table creation and data processing in external data system (Apache Hudi, Apache Iceberg, Apache Hive, or Deltalake)	✓	✓
Query load isolation based on multi-warehousing	✓	✓
Warehouse compute node scale-in and scale-out	✓	✓
Warehouse auto-suspend	✓	✓
Use local-disk cache to speed up queries	✓	✓
Table creation in CelerData managed storage volumes		✓
Batch data ingestion from cloud storage (customer managed AWS S3) into CelerData managed tables		✓
Integrate with Confluent cloud and routinely load data into CelerData managed tables		✓
Use HTTP Streaming API to push data from local sources into CelerData managed tables		✓
Incrementally load Parquet or ORC files from customer-managed AWS S3 buckets into CelerData managed tables		✓
Accelerate queries and build models with materialized views		✓
Integration with BI tools	✓	✓
Role-based access control	✓	✓
Audit log	✓	✓
Private link	✓	✓
IP address whitelist	✓	✓

Choosing between Standard and Premium Editions

Standard Edition

If you already have a data lake storage tier, for example you have built a data lake based on Iceberg, Hudi, Hive, or Deltalake on your own AWS S3 bucket and you are looking for a fully-managed query engine that can be compatible with them and deliver great query performance, you may want to consider going with Standard Edition. You just need to create warehouses (collections of compute resources) to execute the various SQL queries. In CelerData Cloud Serverless, the warehouses are responsible for the execution of SQL computations, not storage. The underlying compute nodes do not persistently store data, they are responsible for hot data caching and computation. Computation can be scaled horizontally according to your needs by adding and removing warehouses. Users only need to choose the appropriate warehouse size according to the trade-off about price and performance. When you use Standard Edition, you are only charged for the amount of time to run the warehouse. You can save unnecessary costs by resuming and suspending warehouses on demand.

Premium Edition

If you have not built a storage tier yet and are looking for a fully managed data analytics platform that can persist data and provide high-performance analytics, you can go for Premium Edition. Or, if you have already built a storage tier, and you want to be able to export to another system with lower query latency to accelerate your data analytics, whether you are persisting data in a streaming store (for example, Kafka), a data lake store (for example, AWS S3 or Apache Hive), or a database (for example, MySQL or PostgreSQL), you can go for Premium Edition. In summary, Premium Edition provides data import and persistent storage capabilities, as well as the ability to query both CelerData managed and external systems. CelerData managed data provides users with lower analysis latency.

It is worth mentioning that Premium Edition is built on the shared-data architecture of StarRocks, which provides separate storage and compute resources. Premium Edition utilizes object storage for data persistence, and almost unlimited storage. With Premium Edition, you pay for both the storage resources and the compute consumption of the warehouse, the hourly price will be slightly higher than Standard Edition.

Pricing concern

In CelerData Cloud Serverless, the hourly warehouse price for different software editions varies. We provide a free trial to help you to test without focusing on the cost. If you are satisfied with the features and performance of the product and would like to get a clearer price list, please feel free to contact our sales team.

Upgrading

When you are testing with Standard Edition and want to upgrade to Premium Edition, you can contact our technical support team to help you with the upgrade. Automatic upgrades are not yet supported, so it will take about a day to do so. One-way software edition upgrades are allowed. However, currently CelerData Cloud Serverless cannot support rollbacks for software editions, so choose carefully when you create an account and select the edition.

In this article