elastic_cluster_v2

note

The resource's API may change in subsequent versions to simplify user experience.

Deploy a multi-warehouse elastic CelerData cluster on AWS EC2 instances、Azure virtual machines or GCP virtual machines.

The implementation of this resource is part of the whole cluster deployment procedure and depends on the implementation of a data credential, a deployment credential, and a network configuration. For detailed procedures of cluster deployments on AWS、Azure and GCP, see Provision CelerData Cloud BYOC on AWS and Provision CelerData Cloud BYOC on Azure.

Supported Node Sizes

For information about the instance types supported by CelerData, see Supported instance types.

Example Usage

resource "celerdatabyoc_elastic_cluster_v2" "elastic_cluster_1" {
  deployment_credential_id = "<deployment_credential_resource_ID>"
  data_credential_id = "<data_credential_resource_ID>"
  network_id = "<network_configuration_resource_ID>"

  cluster_name = "<cluster_name>"
  coordinator_node_size = "<coordinator_node_instance_type>"
  coordinator_node_count = <coordinator_node_number>
  
  // optional
  coordinator_node_volume_config {
    vol_size = <vol_size>
    iops = <iops>
    throughput = <throughput>
  }
  // optional
  coordinator_node_configs = {
    <key> = <value>
  }
  
  // The configuration for “default_warehouse” is required.
  default_warehouse {
    compute_node_size        = "<compute_node_instance_type>"
    compute_node_count       = <compute_node_number>
   
    // optional
    compute_node_volume_config {
      vol_number = <vol_number>
      vol_size = <vol_size>
      iops = <iops>
      throughput = <throughput>
    }
    // optional
    compute_node_configs = {
      <key> = <value>
    }
    // optional 
    resource_tags = {
      <tag_key> = "<tag_name>"
    }
  }

   warehouse {
    name                           = "<warehouse_name>"
    compute_node_size              = "<compute_node_instance_type>"
    compute_node_count             = <compute_node_number>
    // When using an EBS-backed instance type, specify the following two parameters. Otherwise, delete them.
    compute_node_ebs_disk_number   = <compute_node_ebs_disk_number>
    compute_node_ebs_disk_per_size = <compute_node_ebs_disk_per_size>
    distribution_policy            = "{specify_az | crossing_az}"
    
    // specify_az                  = "us-west-2b"
    // expected_state="Suspended"
    // auto_scaling_policy         = celerdatabyoc_auto_scaling_policy.policy_1.policy_json
    
    // optional
    compute_node_volume_config {
      vol_number = <vol_number>
      vol_size = <vol_size>
      iops = <iops>
      throughput = <throughput>
    }
    // optional
    compute_node_configs = {
      <key> = <value>
    }
    // optional
    resource_tags = {
      <tag_key> = "<tag_name>"
    }
   }

   custom_ami {
    ami = "<ami_id>"
    // ami = "ami-09245d5773578a1d6"
    os = "al2023"
  }

  // optional
  /*
    global_session_variables = {
      <key> = <value>
    }
  */

  // optional
  scheduling_policy {
    policy_name = "auto-resume-suspend"
    description = "Auto resume/suspend"
    active_days = ["TUESDAY"]
    time_zone = "UTC" // IANA Time-Zone
    resume_at   = "09:00"
    suspend_at  = "18:00"
    enable      = true
  }
  
  default_admin_password = "<SQL_user_initial_password>"
  expected_cluster_state = "{Suspended | Running}"
  ldap_ssl_certs = [
    "<ssl_cert_s3_path>"
  ]
  ranger_certs_dir_path = "<ranger_config_s3_path>" // Deprecated
  ranger_config_id = "<ranger_config_ID>"
  resource_tags = {
    <tag_key> = "<tag_name>"
  }
  csp = "{aws | azure}"
  region = "<cloud_provider_region>"

  init_scripts {
      logs_dir    = "{<log_s3_path>|<log_sub_path_of_azure_storage_container>|<log_gs_path>}"
      script_path = "{<script_s3_path>|<script_sub_path_of_azure_storage_container>|<script_gs_path>}"
  }

  # Unlike `init_scripts`, when the `rerun` attribute is true, the `scripts` will be executed once every time `terraform apply` is executed
  scripts {
    logs_dir    = "{<log_s3_path>|<log_azure_storage_container_path>|<log_gs_path>}"
    script_path = "{<script_s3_path>|<script_azure_storage_container_path>|<script_gs_path>}"
    rerun = false
  }

  run_scripts_parallel = false
  query_port = 9030
  idle_suspend_interval = 60
  enabled_termination_protection = false
}

Argument Reference

note

This section explains only the arguments of the celerdatabyoc_elastic_cluster_v2 resource. For the explanation of arguments of other resources, see the corresponding resource topics.

The celerdatabyoc_elastic_cluster_v2 resource contains the following required arguments and optional arguments:

Required:

cluster_name: (Not allowed to modify) The desired name for the cluster. Enter a unique name.
coordinator_node_size: The instance type for coordinator nodes in the cluster. Select a coordinator node instance type from the table "Supported Node Sizes". For example, you can set this argument to m6i.4xlarge.
deployment_credential_id: (Not allowed to modify) The ID of the deployment credential.
- If you deploy the cluster on AWS, set this argument to celerdatabyoc_aws_deployment_role_credential.<resource_name>.id and replace <resource_name> with the name of the celerdatabyoc_aws_deployment_role_credential resource.
- If you deploy the cluster on Azure, set this argument to celerdatabyoc_azure_deployment_credential.<resource_name>.id and replace <resource_name> with the name of the celerdatabyoc_azure_deployment_credential resource.
data_credential_id: (Not allowed to modify) The ID of the data credential.
- If you deploy the cluster on AWS, set this argument to celerdatabyoc_aws_data_credential.<resource_name>.id and replace <resource_name> with the name of the celerdatabyoc_aws_data_credential resource.
- If you deploy the cluster on Azure, set this argument to celerdatabyoc_azure_data_credential.<resource_name>.id and replace <resource_name> with the name of the celerdatabyoc_azure_data_credential resource.
network_id: (Not allowed to modify) The ID of the network configuration.
- If you deploy the cluster on AWS, set this argument to celerdatabyoc_aws_network.<resource_name>.id and replace <resource_name> with the name of the celerdatabyoc_aws_network resource.
- If you deploy the cluster on Azure, set this argument to celerdatabyoc_azure_network.<resource_name>.id and replace <resource_name> with the name of the celerdatabyoc_azure_network resource.
default_warehouse: (List of Object) The default warehouse. The attributes of a default warehouse include:
- compute_node_size: (Required) The instance type for compute nodes in the cluster. Select a compute node instance type from the table "Supported Node Sizes". For example, you can set this argument to r6id.4xlarge.
- compute_node_count: (Optional) The number of compute nodes in the cluster. Valid values: any non-zero positive integer. Default value: 3.
- compute_node_volume_config: The compute nodes volume configuration.
  - vol_number: The number of disks for each compute node. Valid values: [1,16]. Default value: 2.
  - vol_size: The size per disk for each compute node. Unit: GB. Default value: 100. You can only increase the value of this parameter.
  - iops: (Available only for AWS) Disk IOPS.
  - throughput: (Available only for AWS) Disk throughput. ~> You can use the vol_number and vol_size arguments to specify the disk space. The total disk space provisioned to a compute node is equal to vol_number * vol_size.
- compute_node_configs: The static configuration items you want to customize for the compute nodes in the warehouse.
- resource_tags: The tags to be attached to the warehouse (Please note that resource_tags is a concept in ClelerData. For AWS and Azure, it will be added as a tag to the corresponding resources. For GCP Cloud, it will be added as a label to the corresponding GCP resources).
- auto_scaling_policy: (Optional) This policy will automatically scale the number of compute nodes, based on CPU utilization of the warehouse. For more information, see Enable Auto Scaling for your warehouse. You can generate the policy_json value for this argument using the celerdatabyoc_auto_scaling_policy resource.
- distribution_policy: (Optional, available only for AWS) The compute node distribution policy for the warehouse if you want to enable Multi-AZ deployment for the cluster. Valid values: specify_az (Nodes are deployed in the primary availability zone) and crossing_az (Nodes are deployed across the three availability zone). For more information, see Multi-AZ Deployment.
  
  ~> To enable Multi-AZ Deployment, you must deploy at least 3 coordinator nodes, that is, coordinator_node_count must be greater or equal to 3.
- specify_az: (Optional, available only for AWS) The primary availability zone for node deployment. This argument is available only when distribution_policy is set to specify_az.
custom_ami: (Optional, available only for AWS) The Amazon Machine Image (AMI) used to deploy the cluster. You can use custom AMI for deployment. You can only specify this parameter when creating the cluster. If this argument is not specified, the default AMI is used.
- ami: The ID of the custom AMI.
- os: The operating system (OS) of the AMI. Currently only al2023 (Amazon Linux 2023) is supported. The value of this field must be consistent with the actual OS of the AMI. Otherwise, the deployment will fail.
default_admin_password: (Not allowed to modify) The initial password of the cluster admin user.
expected_cluster_state: When creating a cluster, you need to declare the status of the cluster you are creating. Cluster states are categorized as Suspended and Running. If you want the cluster to start after provisioning, set this argument to Running. If you do not do so, the cluster will be suspended after provisioning.
csp: (Not allowed to modify) The cloud service provider of the cluster.
- If you deploy the cluster on AWS, set this argument to aws.
- If you deploy the cluster on Azure, set this argument to azure.
region: (Not allowed to modify) The ID of the cloud provider region to which the network hosting the cluster belongs. See Supported cloud platforms and regions.

Optional:

coordinator_node_count: The number of coordinator nodes in the cluster. Valid values: 1, 3, and 5. Default value: 1. If you want to enable Multi-AZ Deployment (Available only for AWS), you must deploy at least 3 Coordinator Nodes, that is, coordinator_node_count must be greater or equal to 3.
coordinator_node_volume_config: The coordinator nodes volume configuration.
- vol_size: The size per disk for each coordinator node. Unit: GB. Default value: 150. You can only increase the value of this parameter.
- iops: (Available only for AWS) Disk IOPS.
- throughput: (Available only for AWS) Disk throughput.
coordinator_node_configs: The coordinator node static configuration.
warehouse: (List of Object) The list of warehouses. The attributes of a warehouse include:
- name: (Required) The warehouse name must be unique within the cluster and cannot be named "default_warehouse".
- compute_node_size: (Required) The instance type for compute nodes in the cluster. Select a compute node instance type from the table "Supported Node Sizes". For example, you can set this argument to r6id.4xlarge.
- compute_node_count: The number of compute nodes in the cluster. Valid values: any non-zero positive integer. Default value: 3.
- compute_node_volume_config: The compute nodes volume configuration.
  - vol_number: The number of disks for each compute node. Valid values: [1,16]. Default value: 2.
  - vol_size: The size per disk for each compute node. Unit: GB. Default value: 100. You can only increase the value of this parameter.
  - iops: (Available only for AWS) Disk IOPS.
  - throughput: (Available only for AWS) Disk throughput. ~> You can use the vol_number and vol_size arguments to specify the disk space. The total disk space provisioned to a compute node is equal to vol_number * vol_size.
- compute_node_configs: The compute node static configuration.
- resource_tags: The tags to be attached to the warehouse (Please note that resource_tags is a concept in ClelerData. For AWS and Azure, it will be added as a tag to the corresponding resources. For GCP Cloud, it will be added as a label to the corresponding GCP resources).
- expected_state: When creating non-default warehouse, you can declare the status of the warehouse. Warehouse states are categorized as Suspended and Running. If you want the warehouse to start after provisioning, set this argument to Running. If you set this argument to Suspended, the warehouse will be suspended after provisioning.
- idle_suspend_interval: The amount of time (in minutes) during which the warehouse can stay idle. After the specified time period elapses, the warehouse will be automatically suspended. To enable the Auto Suspend feature, set this argument to an integer with the range of 15 to 999999. To disable this feature again, remove this argument from your Terraform configuration.
- auto_scaling_policy: This policy will automatically scale the number of Compute nodes (CN), based on CPU utilization of the warehouse. For more information, see Enable Auto Scaling for your warehouse. You can generate the policy_json value for this argument using the celerdatabyoc_auto_scaling_policy resource.
- distribution_policy: (Available only for AWS) The Compute Node distribution policy for the warehouse if you want to enable Multi-AZ deployment for the cluster. Valid values: specify_az (Nodes are deployed in the primary availability zone) and crossing_az (Nodes are deployed across the three availability zone). For more information, see Multi-AZ Deployment.
  
  ~> To enable Multi-AZ Deployment, you must deploy at least 3 Coordinator Nodes, that is, coordinator_node_count must be greater or equal to 3.
- specify_az: (Available only for AWS) The primary availability zone for node deployment. This argument is available only when distribution_policy is set to specify_az.
global_session_variables: Global session variables of the cluster. You can find all configurable variables by select VARIABLE_NAME from information_schema.global_variables;.
ldap_ssl_certs: (Available only for AWS) The path in the AWS S3 bucket that stores the LDAP SSL certificates. Multiple paths must be separated by commas (,). CelerData supports using LDAP over SSL by uploading the LDAP SSL certificates from S3. To allow CelerData to successfully fetch the certificates, you must grant the ListObject and GetObject permissions to CelerData. To delete the certificates uploaded, you only need to remove this argument.
ranger_certs_dir: (Deprecated) The parent dir path in the AWS S3 bucket that stores the Ranger SSL certificates. CelerData supports using Ranger over SSL by uploading the Ranger SSL certificates from S3. To allow CelerData to successfully fetch the certificates, you must grant the ListObject and GetObject permissions to CelerData. To delete the certificates uploaded, you only need to remove this argument.

note

You can only upload or delete LDAP or Ranger SSL certificates while the cluster's expected_cluster_state is set to Running.

ranger_config_id: The ID of the Ranger configuration you want to apply to the cluster. You can set this argument to celerdatabyoc_ranger_config.<resource_name>.id or a hard-coded ID value.
resource_tags: The tags to be attached to the cluster (Please note that resource_tags is a concept in ClelerData. For AWS and Azure, it will be added as a tag to the corresponding resources. For GCP Cloud, it will be added as a label to the corresponding GCP resources).
init_scripts: (Available for AWS/Azure/GCP) The configuration block to specify the paths to which scripts and script execution results are stored. The maximum number of executable scripts is 20. For information about the formats supported by these arguments, see scripts.logs_dir and scripts.script_path in Run scripts.
- logs_dir: The path which script execution results are stored. This can be the same as or different from the path you specify in the celerdatabyoc_aws_data_credential resource.
- script_path: The path that stores the scripts to run via Terraform. This path must be the one you specify in the celerdatabyoc_aws_data_credential resource.
scripts: (Available for AWS/Azure/GCP) The configuration block to specify the paths to which scripts and script execution results are stored. The maximum number of executable scripts is 20. For information about the formats supported by these arguments, see scripts.logs_dir and scripts.script_path in Run one-time scripts.
- logs_dir: The path where the script execution results are stored. This path can be the same as or different from the path specified in the celerdatabyoc_aws_data_credential resource.
- script_path: The path where the scripts run via Terraform are stored. This path must be the one specified in the celerdatabyoc_aws_data_credential resource.
- rerun: If the value is true, the current script will be re-run every time terraform apply is executed.
run_scripts_parallel: Whether to execute the scripts in parallel. Valid values: true and false. Default value: false.
run_scripts_timeout: The amount of time after which the script execution times out. Unit: Seconds. Default: 3600 (1 hour). The maximum value of this item is 21600 (6 hours).
query_port: The query port, which must be within the range of 1-65535 excluding 443. The default query port is port 9030. Note that this argument can be specified only at cluster deployment, and cannot be modified once it is set.
idle_suspend_interval: The amount of time (in minutes) during which the cluster can stay idle. After the specified time period elapses, the cluster will be automatically suspended. The Auto Suspend feature is disabled by default. To enable the Auto Suspend feature, set this argument to an integer with the range of 15-999999. To disable this feature again, remove this argument from your Terraform configuration.
enabled_termination_protection: When enabled, termination protection prevents the deletion of the cluster via Console or APIs. To delete the cluster, this feature needs to be disabled. This has no affect on termination from scale-in, auto scaling events and scheduled maintenance.
scheduling_policy:(Optional, List) When specified. CelerData will automatically suspend the cluster to save the majority of costs on EC2 (only EBS costs will be incurred) and resume the cluster for usage as scheduled.
- policy_name: (Required) Policy name.
- description: (Optional) Explanation of this policy strategy.
- active_days: (Required) Configure the date when the cluster scheduling policy is triggered. Available values:
  - MONDAY
  - TUESDAY
  - WEDNESDAY
  - THURSDAY
  - FRIDAY
  - SATURDAY
  - SUNDAY
- time_zone: (Optional) Specify your IANA Time-Zone. Default: UTC.
- resume_at: (Optional) Cluster auto resume time. resume_at and suspend_at cannot both be empty.
- suspend_at: (Optional) Cluster auto suspend time.
- enable: (Required) Whether to enable this scheduling policy. When specified as true, the system will perform cluster scheduling according to this policy.

Supported Node Sizes​

Example Usage​

Argument Reference​

See Also​

AWS​

Azure​

GCP​

Warehouse​

Supported Node Sizes

Example Usage

Argument Reference

See Also

AWS

Azure

GCP

Warehouse