Batch load data from Azure cloud storage
CelerData provides two options for loading data from Microsoft Azure Storage:
- Synchronous loading using INSERT+
FILES()
- Asynchronous loading using Broker Load
Each of these options has its own advantages, which are detailed in the following sections.
In most cases, we recommend that you use the INSERT+FILES()
method, which is much easier to use.
However, the INSERT+FILES()
method currently supports only the Parquet and ORC file formats. Therefore, if you need to load data of other file formats such as CSV, or perform data changes such as DELETE during data loading, you can resort to Broker Load.
Before you begin
Make source data ready
Make sure that the source data you want to load into CelerData is properly stored in a container within your Azure storage account.
In this topic, suppose you want to load the data of a Parquet-formatted sample dataset (user_behavior_ten_million_rows.parquet
) stored in the root directory of a container (celerdata-container
) within an Azure Data Lake Storage Gen2 (ADLS Gen2) storage account (celerdata
).
Check privileges
You can load data into tables within your CelerData cluster only as a user who has the INSERT privilege on those tables. If you do not have the INSERT privilege, follow the instructions provided in GRANT to grant the INSERT privilege to the user that you use to connect to your CelerData cluster.
Gather connection details
The examples in this topic use the Shared Key authentication method. To ensure that you have permission to read data from ADLS Gen2, we recommend that you read Azure Data Lake Storage Gen2 > Shared Key (access key of storage account) to understand the authentication parameters that you need to configure.
In a nutshell, if you practice Shared Key authentication, you need to gather the following information:
- The name of your ADLS Gen2 storage account
- The shared key (access key) of your ADLS Gen2 storage account
For information about all the authentication methods available, see Authenticate to Azure cloud storage.
Use INSERT+FILES()
This method supports only the Parquet and ORC file formats.