Create an external catalog to access your data lake

This tutorial walks you through all the steps necessary to create an external catalog in CelerData Cloud Serverless.

Suppose your data lake uses AWS S3 as storage and AWS Glue as metastore and you have created an IAM integration that allows access to your S3 bucket and your AWS Glue Data Catalog. Follow these steps to create an external catalog for accessing your data lake:

  1. Sign in to the CelerData Cloud Serverless console.

  2. In the left-side navigation pane, choose Catalog.

  3. On the View your data page, click Add external catalog.

    Catalog - overview

  4. In Step 1: Add data source, choose Amazon S3 as data source and click Next.

    Catalog - Step 1: Add data source

  5. In Step 2: Configure data source, configure the data source as follows and click Next:

    • Catalog name: Enter a name for the external catalog.

    • Description: Optionally enter a description for the external catalog.

    • S3 bucket region: Select the AWS region in which your S3 bucket resides. This bucket is the one referenced in your IAM integration (see Create an IAM integration).

    • Table format: Select the default table format used for table creation. Hive is selected by default. CelerData can read data from any table format.

    • Metastore configuration: Select the type of metastore used for your data lake. For this example, select AWS Glue and then in the AWS Glue region field displayed below select the AWS region in which your AWS Glue Data Catalog resides.

      NOTE

      CelerData supports two types of metastore services: AWS Glue and Hive Metastore:

      • If you select AWS Glue, AWS Glue region is displayed below for you to select the AWS region in which your AWS Glue Data Catalog resides.
      • If you select Hive Metastore, Hive Metastore host and Port are displayed below for you to enter the host and port of your Hive Metastore.

    Catalog - Step 2: Configure data source

  6. In Step3: Configure authentication information, choose IAM Role as the authentication method. For Role ARN for S3 and Role ARN for Glue, select the IAM role ARN referenced in your IAM integration (see Create an IAM integration). Then, click Create.

    Catalog - Step3: Configure authentication information

After the external catalog is created, you can view it in the External catalog section of the View your data page. Click it to open the catalog details page on which you can view its DDL and the databases within it.

You can also navigate to the SQL Editor page and run SHOW DATABASES FROM <catalog_name>. If the system can properly return the query result, your CelerData cloud has been successfully connected with your data lake. For detailed operations, see Run queries using SQL Editor.