- Release Notes
- Introduction to CelerData Cloud Serverless
- Quick Start
- Sign up for CelerData Cloud Serverless
- A quick tour of the console
- Connect to CelerData Cloud Serverless
- Create an IAM integration
- Create and assign a warehouse
- Create an external catalog
- Load data from cloud storage
- Load data from Apache Kafka/Confluent Cloud
- Try your first query
- Invite new users
- Design data access control policy
- Warehouses
- Catalog, database, table, view, and MV
- Overview of database objects
- Catalog
- Table types
- Asynchronous materialized views
- Data Loading
- Data access control
- Networking and private connectivity
- Usage and Billing
- Organization and Account
- Integration
- Query Acceleration
- Reference
- AWS IAM policies
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- Data Types
- System Metadatabase
- Keywords
- SQL Statements
- Account Management
- Data Definition
- CREATE TABLE
- ALTER TABLE
- DROP CATALOG
- CREATE TABLE LIKE
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- DROP TABLE
- RECOVER
- USE
- CREATE MATERIALIZED VIEW
- DROP DATABASE
- ALTER MATERIALIZED VIEW
- DROP REPOSITORY
- CANCEL RESTORE
- DROP INDEX
- DROP MATERIALIZED VIEW
- CREATE DATABASE
- CREATE TABLE AS SELECT
- BACKUP
- CANCEL BACKUP
- CREATE REPOSITORY
- CREATE INDEX
- Data Manipulation
- INSERT
- SHOW CREATE DATABASE
- SHOW BACKUP
- SHOW ALTER MATERIALIZED VIEW
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- ALTER ROUTINE LOAD
- SHOW TABLES
- STREAM LOAD
- SHOW PARTITIONS
- CANCEL REFRESH MATERIALIZED VIEW
- SHOW CREATE CATALOG
- SHOW ROUTINE LOAD TASK
- SHOW RESTORE
- CREATE ROUTINE LOAD
- STOP ROUTINE LOAD
- SHOW DATABASES
- BROKER LOAD
- SHOW ROUTINE LOAD
- PAUSE ROUTINE LOAD
- SHOW SNAPSHOT
- SHOW CREATE TABLE
- CANCEL LOAD
- REFRESH MATERIALIZED VIEW
- SHOW REPOSITORIES
- SHOW LOAD
- Administration
- DESCRIBE
- SQL Functions
- Function List
- String Functions
- CONCAT
- HEX
- LOWER
- SPLIT
- LPAD
- SUBSTRING
- PARSE_URL
- INSTR
- REPEAT
- LCASE
- REPLACE
- HEX_DECODE_BINARY
- RPAD
- SPLIT_PART
- STRCMP
- SPACE
- CHARACTER_LENGTH
- URL_ENCODE
- APPEND_TAILING_CHAR_IF_ABSENT
- LTRIM
- HEX_DECODE_STRING
- URL_DECODE
- LEFT
- STARTS_WITH
- CONCAT
- GROUP_CONCAT
- STR_TO_MAP
- STRLEFT
- STRRIGHT
- MONEY_FORMAT
- RIGHT
- SUBSTRING_INDEX
- UCASE
- TRIM
- FIND_IN_SET
- RTRIM
- ASCII
- UPPER
- REVERSE
- LENGTH
- UNHEX
- ENDS_WITH
- CHAR_LENGTH
- NULL_OR_EMPTY
- LOCATE
- CHAR
- Predicate Functions
- Map Functions
- Binary Functions
- Geospatial Functions
- Lambda Expression
- Utility Functions
- Bitmap Functions
- BITMAP_SUBSET_LIMIT
- TO_BITMAP
- BITMAP_AGG
- BITMAP_FROM_STRING
- BITMAP_OR
- BITMAP_REMOVE
- BITMAP_AND
- BITMAP_TO_BASE64
- BITMAP_MIN
- BITMAP_CONTAINS
- SUB_BITMAP
- BITMAP_UNION
- BITMAP_COUNT
- BITMAP_UNION_INT
- BITMAP_XOR
- BITMAP_UNION_COUNT
- BITMAP_HAS_ANY
- BITMAP_INTERSECT
- BITMAP_AND_NOT
- BITMAP_TO_STRING
- BITMAP_HASH
- INTERSECT_COUNT
- BITMAP_EMPTY
- BITMAP_MAX
- BASE64_TO_ARRAY
- BITMAP_TO_ARRAY
- Struct Functions
- Aggregate Functions
- RETENTION
- MI
- MULTI_DISTINCT_SUM
- WINDOW_FUNNEL
- STDDEV_SAMP
- GROUPING_ID
- HLL_HASH
- AVG
- HLL_UNION_AGG
- COUNT
- BITMAP
- HLL_EMPTY
- SUM
- MAX_BY
- PERCENTILE_CONT
- COVAR_POP
- PERCENTILE_APPROX
- HLL_RAW_AGG
- STDDEV
- CORR
- COVAR_SAMP
- MIN_BY
- MAX
- VAR_SAMP
- STD
- HLL_UNION
- APPROX_COUNT_DISTINCT
- MULTI_DISTINCT_COUNT
- VARIANCE
- ANY_VALUE
- COUNT_IF
- GROUPING
- PERCENTILE_DISC
- Array Functions
- ARRAY_CUM_SUM
- ARRAY_MAX
- ARRAY_LENGTH
- ARRAY_REMOVE
- UNNEST
- ARRAY_SLICE
- ALL_MATCH
- ARRAY_CONCAT
- ARRAY_SORT
- ARRAY_POSITION
- ARRAY_DIFFERENCE
- ARRAY_CONTAINS
- ARRAY_JOIN
- ARRAY_INTERSECT
- CARDINALITY
- ARRAY_CONTAINS_ALL
- ARRAYS_OVERLAP
- ARRAY_MIN
- ARRAY_MAP
- ELEMENT_AT
- ARRAY_APPEND
- ARRAY_SORTBY
- ARRAY_TO_BITMAP
- ARRAY_GENERATE
- ARRAY_AVG
- ARRAY_FILTER
- ANY_MATCH
- REVERSE
- ARRAY_AGG
- ARRAY_DISTINCT
- ARRAY_SUM
- Condition Functions
- Math Functions
- Date and Time Functions
- DAYNAME
- MINUTE
- FROM_UNIXTIME
- HOUR
- MONTHNAME
- MONTHS_ADD
- ADD_MONTHS
- DATE_SUB
- PREVIOUS_DAY
- TO_TERA_DATA
- MINUTES_SUB
- WEEKS_ADD
- HOURS_DIFF
- UNIX_TIMESTAMP
- DAY
- DATE_SLICE
- DATE
- CURTIME
- SECONDS_SUB
- MONTH
- WEEK
- TO_DATE
- TIMEDIFF
- MONTHS_DIFF
- STR_TO_JODATIME
- WEEK_ISO
- MICROSECONDS_SUB
- TIME_SLICE
- MAKEDATE
- DATE_TRUNC
- JODATIME
- DAYOFWEEK
- YEARS_SUB
- TIMESTAMP_ADD
- HOURS_SUB
- STR2DATE
- TIMESTAMP
- FROM_DAYS
- WEEK_OF_YEAR
- YEAR
- TIMESTAMP_DIFF
- TO_TERA_TIMESTAMP
- DAYOFMONTH
- DAYOFYEAR
- DATE_FORMAT
- MONTHS_SUB
- NEXT_DAY
- MINUTES_DIFF
- DATA_ADD
- MINUTES_ADD
- CURDATE
- DAY_OF_WEEK_ISO
- CURRENt_TIMESTAMP
- STR_TO_DATE
- LAST_DAY
- WEEKS_SUB
- TO_DAYS
- DATEDIFF
- NOW
- TO_ISO8601
- TIME_TO_SEC
- QUARTER
- SECONDS_DIFF
- UTC_TIMESTAMP
- DATA_DIFF
- SECONDS_ADD
- ADDDATE
- WEEKSDIFF
- CONVERT_TZ
- MICROSECONDS_ADD
- SECOND
- YEARS_DIFF
- YEARS_ADD
- HOURS_ADD
- DAYS_SUB
- DAYS_DIFF
- Cryptographic Functions
- Percentile Functions
- Bit Functions
- JSON Functions
- Hash Functions
- Scalar Functions
- Table Functions
Overview of database objects
As an open cloud-native lakehouse, Celerdata Cloud Serverless can serve as a data warehouse to store data, as well as a query engine to query external data in your data lake. As for organizing data for CelerData, database objects such as catalogs, database, and tables play an important role. For example, the internal catalog (default_catalog
) is used to organize all data stored in CelerData, while external catalogs are used to query data from various external data systems.
This topic introduces database objects in CelerData to help you better design your own data governance paradigm and efficiently manage your organization's data assets.
Overview
The following figure displays an efficient catalog-based data management architecture in CelerData.
External catalogs help you directly query your data stored in the external data systems like AWS S3 at high performance without data ingestion in scenarios such as ad-hoc queries and data exploring.
When your business application requires lower latency and higher concurrency, you can load data in the internal catalog in CelerData. The internal catalog also provides various database objects to help you manipulate data, such as databases, tables, and materialized views.
Catalog
Catalogs enable you to manage internal and external data in one system, and offer a flexible way for you to easily query and analyze data that is stored in various external systems.
CelerData provides two types of catalogs: internal catalog and external catalog.
Internal catalog
NOTE
This feature is only available in Premium Edition.
The internal catalog manages the data loaded into CelerData. Each account has only one built-in internal catalog (default_catalog
), which can include one or more databases. You can use the internal catalog as a data warehouse to store data, significantly enhancing query performance, especially for complex analytical queries on large volumes of data.
External catalog
An external catalog allows access to external data in your data lake. Each account can have more than one external catalog, and each external catalog can include one or more databases. With external catalogs, you can directly query the external data in your data lake without loading the data into CelerData. However, CelerData also caches metadata of these external tables to accelerate queries by default.
When you query external data from your data lake, CelerData interacts with the two components of the data lake to generate a query execution plan, scan the target external data, perform calculations and then return the query result. The two components are metastore (for example, Hive metastore, AWS Glue, or Tabular) and data storage system (for example, AWS S3).
Currently, CelerData supports the following types of external catalogs: Hive catalog, Iceberg catalog, Tabular catalog, Hudi catalog, and Delta Lake catalog.
Database
A database is a collection of data objects, such as tables, materialized views, and pipes used to store, manage, and manipulate data.
Table
Tables are categorized into internal tables and external tables.
Internal table
NOTE
This feature is only available in Premium Edition.
Internal tables are maintained in databases under the internal catalog (default_catalog
). The metadata and datafiles of internal tables are managed in your CelerData. An internal table logically consists of rows and columns, where each row represents a record and each column represents an attribute or field of a data row.
CelerData provides four types of internal tables, Primary Key tables, Duplicate Key tables, Aggregate tables, and Unique Key tables, to store various data, such as raw logs, realtime data, and aggregated data, to meet your varying business needs.
An internal table adopts a partitioning+bucketing two-tier data distribution strategy to achieve even data distribution.
External table
External tables are maintained in external catalogs. You can use external tables to query data from external data sources. The metadata and datafiles of external tables are stored in external data sources. However, CelerData also caches metadata of these external tables to accelerate queries by default.
Materialized view
NOTE
This feature is only available in Premium Edition.
A materialized view is a special physical table that holds pre-computed query results from one or more base tables (internal or external tables). Materialized views can only be created in the internal catalog.
Pipe
NOTE
This feature is only available in Premium Edition.
A pipe is a data pipeline that is used to ingest data from an external data source into a table.
Access control
A role-based access control (RBAC) framework is employed in CelerData, which allows account administrators to precisely manage and regulate access to objects or data in their CelerData accounts.