- Release Notes
- Get Started
- Clusters
- Cloud Settings
- Table Type
- Query Data Lakes
- Integration
- Query Acceleration
- Data Loading
- Concepts
- Batch load data from Amazon S3
- Batch load data from Azure cloud storage
- Load data from a local file system
- Load data from Confluent Cloud
- Load data from Amazon MSK
- Load data from Amazon Kinesis
- Data Unloading
- Data Backup
- Security
- Console Access Control
- Data Access Control
- Application keys
- Service accounts
- Use SSL connection
- Alarm
- Usage and Billing
- Organizations and Accounts
- Reference
- Amazon Web Services (AWS)
- Microsoft Azure
- SQL Reference
- Keywords
- ALL statements
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW TABLE STATUS
- SHOW FILE
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- SQL Functions
- Function list
- Java UDFs
- Window functions
- Lambda expression
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_diff
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayofweek_iso
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- jodatime_format
- last_day
- makedate
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- next_day
- now
- previous_day
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str_to_jodatime
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- to_iso8601
- to_tera_date
- to_tera_timestamp
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Aggregate Functions
- any_value
- approx_count_distinct
- array_agg
- avg
- bitmap
- bitmap_agg
- count
- count_if
- corr
- covar_pop
- covar_samp
- group_concat
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- min_by
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- substring_index
- starts_with
- strleft
- strright
- str_to_map
- substring
- trim
- ucase
- unhex
- upper
- url_decode
- url_encode
- Pattern Matching Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bit Functions
- Bitmap Functions
- Array Functions
- all_match
- any_match
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_generate
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Map Functions
- Binary Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Struct Functions
- Table Functions
- Utility Functions
- AUTO_INCREMENT
- Generated columns
- System variables
- System limits
- Information Schema
- Overview
- be_bvars
- be_cloud_native_compactions
- be_compactions
- character_sets
- collations
- column_privileges
- columns
- engines
- events
- global_variables
- key_column_usage
- load_tracking_logs
- loads
- materialized_views
- partitions
- pipe_files
- pipes
- referential_constraints
- routines
- schema_privileges
- schemata
- session_variables
- statistics
- table_constraints
- table_privileges
- tables
- tables_config
- task_runs
- tasks
- triggers
- user_privileges
- views
- System Metadatabase
- API
- Overview
- Actions
- Clusters
- Create and Manage Clusters
- Query Clusters
- Identity and Access Management
- Organization and Account
- Usage and Billing
- Clusters
- Terraform Provider
- Run scripts
Unload data into AWS S3
This topic describes how to unload CSV-formatted data from CelerData into an AWS S3 bucket in three steps.
Step 1: Configure an S3 bucket access policy
AWS access control requirements
CelerData requires the following permissions on an S3 bucket and folder to be able to access files in the folder and its sub-folders:
s3:GetObject
s3:PutObject
s3:DeleteObject
s3:ListBucket
Creating an IAM policy
The following step-by-step instructions describe how to configure access permissions for CelerData in your AWS Management Console so that you can use an S3 bucket to unload data:
Log in to the AWS Management Console.
From the home dashboard, choose Identity & Access Management (IAM).
Choose Account settings from the left-hand navigation pane.
Expand the Security Token Service Regions list, find the AWS region in which your CelerData cluster is deployed, and choose Activate if the status is Inactive.
Choose Policies from the left-hand navigation pane.
Click Create Policy.
Click the JSON tab.
Add the policy document that will allow CelerData to access the S3 bucket and folder.
The following policy in JSON format provides CelerData with the required access permissions for the specified bucket and folder path. You can copy and paste the text into the policy editor:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<bucket_name>" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", ], "Resource": [ "arn:aws:s3:::<bucket_name>/*" ] } ] }
NOTE
Replace
<bucket_name>
with the name of the S3 bucket that you want to access. For example, if your S3 bucket is namedbucket_s3
, replace<bucket_name>
withbucket_s3
.Click Review policy.
Enter the policy name (for example,
celerdata_export_policy
) and optionally a policy description. Then, click Create policy to create the policy.
Step 2: Create an IAM user
Choose Users from the left-hand navigation pane and click Add user.
On the Add user page, enter a new user name (for example,
celerdata_export
), select Access key - Programmatic access as the access type, and then click Next: Permissions.Click Attach existing policies directly, select the policy you created earlier, and then click Next: Tags.
Review the user details. Then, click Create user.
Record the access credentials. The easiest way to record them is to click Download Credentials to write them to a file (for example,
credentials.csv
).NOTICE
Once you leave this page, the Secret Access Key will no longer be available anywhere in the AWS Management Console. If you lose the key, you must generate a new set of credentials for the user.
You have now:
- Created an IAM policy for an S3 bucket.
- Created an IAM user and generated access credentials for the user.
- Attached the policy to the user.
With the AWS access key (AK) and secret key (SK) for the S3 bucket, you have the credentials necessary to access the bucket in CelerData using a broker load.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<bucket_name>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}
Step 3: Unload data into AWS S3
Unload table data into Amazon S3
You can use the EXPORT statement to unload all data from a table or only the data from specific partitions of a table into an S3 bucket.
EXPORT TABLE <table_name>
[PARTITION (<partition_name>[, <partition_name>, ...])]
TO "s3a://<bucket_name>/path/to"
WITH BROKER
(
"aws.s3.access_key" = <your_access_key_id>,
"aws.s3.secret_key" = <your_secret_access_key>,
"aws.s3.endpoint" = "s3.<region_id>.amazonaws.com"
);
The following table describes the parameters.
Parameter | Description |
---|---|
table_name | The name of the table from which you want to unload data. |
partition_name | The name of the partition from which you want to unload data. If you specify multiple partitions, separate the partition names with a comma (,) and a space. |
bucket_name | The name of the S3 bucket into which you want to unload data. |
aws.s3.access_key | The Access Key ID that you can use to access the S3 bucket. |
aws.s3.secret_key | The Secret Access Key that you can use to access the S3 bucket. |
aws.s3.endpoint | The endpoint that you can use to access the S3 bucket. region_id in the endpoint is the ID of the AWS region to which the S3 bucket belongs. |
NOTE
The S3A protocol is used for data unloads from Amazon S3. Therefore, the file path that you specify must start with the prefix
s3a://
.
For more information, see EXPORT.
Unload query results into Amazon S3
You can use the SELECT statement to unload the result of a query into one or more text files stored in an S3 bucket.
Unload data using the SELECT statement:
SELECT * FROM <xxx>
INTO OUTFILE "s3a://<bucket_name>/path/to/<filename_prefix>"
FORMAT AS CSV
PROPERTIES
(
"broker.name" = "",
"aws.s3.access_key" = <your_access_key_id>,
"aws.s3.secret_key" = <your_secret_access_key>,
"aws.s3.endpoint" = "s3.<region_id>.amazonaws.com",
"column_separator" = ",",
"line_delimiter" = "\n",
"max_file_size" = "1024MB"
)
The following table describes the parameters.
Parameter | Description |
---|---|
bucket_name | The name of the S3 bucket into which you want to unload data. |
FORMAT AS | The format of the table. Valid values: CSV . |
broker.name | Leave this parameter unspecified. |
aws.s3.access_key | The Access Key ID that you can use to access the S3 bucket. |
aws.s3.secret_key | The Secret Access Key that you can use to access the S3 bucket. |
aws.s3.endpoint | The endpoint that you can use to access the S3 bucket. region_id in the endpoint is the ID of the AWS region to which the S3 bucket belongs. |
column_separator | The column separator used in the table. |
line_delimiter | The row separator used in the table. |
max_file_size | The maximum amount of data that can be unloaded into the S3 bucket. Unit: MB. |
NOTE
The S3A protocol is used for data unloads from Amazon S3. Therefore, the file path that you specify must start with the prefix
s3a://
.
If the filename_prefix
that you specify already exists, an error message "Error 1064: Cannot truncate a file by broker" is returned.
Limits
- Only the CSV format is supported.
- The AWS AK and SK used as credentials are necessary.