- Release Notes
- Get Started
- Clusters
- Cloud Settings
- Table Type
- Query Data Lakes
- Integration
- Query Acceleration
- Data Loading
- Concepts
- Batch load data from Amazon S3
- Batch load data from Azure cloud storage
- Load data from a local file system
- Load data from Confluent Cloud
- Load data from Amazon MSK
- Load data from Amazon Kinesis
- Data Unloading
- Data Backup
- Security
- Console Access Control
- Data Access Control
- Application keys
- Service accounts
- Use SSL connection
- Alarm
- Usage and Billing
- Organizations and Accounts
- Reference
- Amazon Web Services (AWS)
- Microsoft Azure
- SQL Reference
- Keywords
- ALL statements
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW TABLE STATUS
- SHOW FILE
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- SQL Functions
- Function list
- Java UDFs
- Window functions
- Lambda expression
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_diff
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayofweek_iso
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- jodatime_format
- last_day
- makedate
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- next_day
- now
- previous_day
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str_to_jodatime
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- to_iso8601
- to_tera_date
- to_tera_timestamp
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Aggregate Functions
- any_value
- approx_count_distinct
- array_agg
- avg
- bitmap
- bitmap_agg
- count
- count_if
- corr
- covar_pop
- covar_samp
- group_concat
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- min_by
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- substring_index
- starts_with
- strleft
- strright
- str_to_map
- substring
- trim
- ucase
- unhex
- upper
- url_decode
- url_encode
- Pattern Matching Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bit Functions
- Bitmap Functions
- Array Functions
- all_match
- any_match
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_generate
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Map Functions
- Binary Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Struct Functions
- Table Functions
- Utility Functions
- AUTO_INCREMENT
- Generated columns
- System variables
- System limits
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- System Metadatabase
- API
- Overview
- Actions
- Clusters
- Create and Manage Clusters
- Query Clusters
- Identity and Access Management
- Organization and Account
- Usage and Billing
- Clusters
- Terraform Provider
- Run scripts
Understand column and row-level security
Background
In real-world business scenarios, a table can consist of numerous rows and columns. Customers require a fine-grained access control mechanism that allows only specific roles to query specific data columns or rows, in order to protect sensitive data.
Part of column and row-level security functionalities can be achieved using views. However, views cannot be applied to multiple tables. If the column to mask appear in multiple tables, users have to create a separate view for each table, resulting in an explosion of view numbers. On the other hand, views are not flexible in delegating privileges. Users must assign view and table privileges to data administrators or platform maintenance personnel. This reduces the flexibility of privilege management and increases the workload of platform maintenance personnel.
What are column and row-level security?
CelerData supports using column masking and row access policies to protect sensitive data from unauthorized access while allowing authorized users to access data at query runtime. Users can specify masking and filter conditions when they create a policy. Only data columns or data rows that match the conditions are returned. This helps achieve column-level data masking and row-level data filtering.
When users query a table with a policy applied, the query will be rewritten. It's like a temporary view is generated based on the policy and the original query is redirected from the table to the temporary view. This ensures that only masked data is returned, not all data.
CelerData does not modify sensitive data at query runtime, nor does it encrypt data when storing the data.
Typical use scenarios:
- Scenario 1: For sensitive data like client phone numbers, customers want different roles to have different access permissions on such data. For example, sales personnel need to see plaintext phone numbers to sell products, while data analysts only need to see the last four digits of the phone number. This scenario requires a column masking policy, where columns are dynamically masked based on the role in the current session.
- Scenario 2: Sales data tables consist of data from different regions and customers want each region's sales staff to view data only from their own region. This scenario requires a row access policy, where only data rows of the relevant region are returned based on the role in the current session and the region information.
The use cases for these two scenarios can be found in Masking policies and Row access policies.
Policies can be created once and applied to multiple tables. For example, you can create a masking policy for a sensitive field and then apply that policy to all tables that contain this sensitive field.
Policy creation and application are controlled by privileges such as CREATE and APPLY. Administrators can determine whether to delegate these privileges to certain departments based on business scenarios. This allows for flexible privilege management and consistent application of policies across multiple tables, ensuring that sensitive data is protected consistently and according to the defined policies. For more information about privilege management, see Manage privileges for policies.
Benefits
- Ease of Use
- A policy can be created once and then applied to multiple tables, eliminating the need to create numerous views.
- Easy modification
- A policy is easy to modify and can take effect immediately without having to re-apply the policy to each table.
- Flexible access control, Segregation of Duties
- Column masking and row access policies enable Segregation of Duties (SoD), allowing for fine-grained control over data access. Data management privileges can be decoupled from platform maintenance team. The data owners or dedicated data security teams can determine which roles have access to data of which security levels.
Overall, column masking and row access policies provide flexibility, ease of management, and enhanced control over data access and security, ensuring that sensitive data is protected and accessed only by authorized individuals or roles.
Objects of a policy
A policy can be applied to a table, view, materialized view, or external table. The SQL statements used to create, apply, alter, or drop a policy are similar for these objects.
Impact scope of a policy
A policy affects all the queries that run against the column with the policy applied, including the SELECT, CTAS, INSERT, and INTO AS SELECT operations.
Usage notes
- You can apply column masking and row access policies to tables, views, materialized views, and external tables. The statements are similar for these objects.
- Policy is a database-level concept but it can be referenced across databases and catalogs when you apply a policy. The format is
catalog.db.policy
. If no catalog is specified, the current catalog is used by default. - A table can have multiple column masking policies and row access policies. However, one column cannot have multiple row access policies.
- Row access policy and column masking policy cannot be applied to the same column. This is to prevent row access from exposing masked column data.
- Policies cannot be passed on. For example, CTAS and CREATE TABLE LIKE operations do not inherit the policies of the original table.
- Policies do not affect query rewrite of materialized views. When you query a base table, the query can be rewritten through the materialized view, no matter whether the materialized view has a policy applied.
- Column masking and row access policies, in essence, are to add predicates and conditions to the original query, which may degrade the query efficiency. You can determine whether to tune a policy based on the result of the EXPLAIN ANALYZE statement. The tuning and troubleshooting methods are similar to query profile analysis.