

Provides a GPDB Hadoop Data Source resource.

Hadoop DataSource Config.

For information about GPDB Hadoop Data Source and how to use it, see What is Hadoop Data Source.

-> NOTE: Available since v1.230.0.

Example Usage

Basic Usage

variable "name" {
  default = "terraform-example"

provider "alicloud" {
  region = "cn-beijing"

data "alicloud_zones" "default" {
  available_resource_creation = "VSwitch"

data "alicloud_vpcs" "default" {
  name_regex = "^default-NODELETING$"

data "alicloud_vswitches" "default" {
  vpc_id  = data.alicloud_vpcs.default.ids.0
  zone_id = "cn-beijing-h"

resource "alicloud_ecs_key_pair" "default" {
  key_pair_name =

resource "alicloud_security_group" "default" {
  name   =
  vpc_id = data.alicloud_vpcs.default.ids.0

resource "alicloud_ram_role" "default" {
  name        =
  document    = <<EOF
        "Statement": [
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
            "Service": [
        "Version": "1"
  description = "this is a role example."
  force       = true

data "alicloud_resource_manager_resource_groups" "default" {
  status = "OK"

data "alicloud_kms_keys" "default" {
  status = "Enabled"

resource "alicloud_emrv2_cluster" "default" {
  node_groups {
    vswitch_ids = [
    instance_types = [
    node_count           = "1"
    spot_instance_remedy = "false"
    data_disks {
      count             = "3"
      category          = "cloud_essd"
      size              = "80"
      performance_level = "PL0"

    node_group_name   = "emr-master"
    payment_type      = "PayAsYouGo"
    with_public_ip    = "false"
    graceful_shutdown = "false"
    system_disk {
      category          = "cloud_essd"
      size              = "80"
      performance_level = "PL0"
      count             = "1"

    node_group_type = "MASTER"
  node_groups {
    spot_instance_remedy = "false"
    node_group_type      = "CORE"
    vswitch_ids = [
    node_count        = "2"
    graceful_shutdown = "false"
    system_disk {
      performance_level = "PL0"
      count             = "1"
      category          = "cloud_essd"
      size              = "80"

    data_disks {
      count             = "3"
      performance_level = "PL0"
      category          = "cloud_essd"
      size              = "80"

    node_group_name = "emr-core"
    payment_type    = "PayAsYouGo"
    instance_types = [
    with_public_ip = "false"

  deploy_mode = "NORMAL"
  tags = {
    Created = "TF"
    For     = "example"
  release_version = "EMR-5.10.0"
  applications = [
  node_attributes {
    zone_id              = "cn-beijing-h"
    key_pair_name        =
    data_disk_encrypted  = "true"
    data_disk_kms_key_id = data.alicloud_kms_keys.default.ids.0
    vpc_id               = data.alicloud_vpcs.default.ids.0
    ram_role             =
    security_group_id    =

  resource_group_id = data.alicloud_resource_manager_resource_groups.default.ids.0
  cluster_name      =
  payment_type      = "PayAsYouGo"
  cluster_type      = "DATAFLOW"

resource "alicloud_gpdb_instance" "defaultZoepvx" {
  instance_spec         = "2C8G"
  description           =
  seg_node_num          = "2"
  seg_storage_type      = "cloud_essd"
  instance_network_type = "VPC"
  payment_type          = "PayAsYouGo"
  ssl_enabled           = "0"
  engine_version        = "6.0"
  zone_id               = "cn-beijing-h"
  vswitch_id            = data.alicloud_vswitches.default.ids[0]
  storage_size          = "50"
  master_cu             = "4"
  vpc_id                = data.alicloud_vpcs.default.ids.0
  db_instance_mode      = "StorageElastic"
  engine                = "gpdb"
  db_instance_category  = "Basic"

resource "alicloud_gpdb_external_data_service" "defaultyOxz1K" {
  service_name        =
  db_instance_id      =
  service_description =
  service_spec        = "8"

resource "alicloud_gpdb_hadoop_data_source" "default" {
  hdfs_conf               = "aaa"
  data_source_name        = alicloud_gpdb_external_data_service.defaultyOxz1K.service_name
  yarn_conf               = "aaa"
  hive_conf               = "aaa"
  hadoop_create_type      = "emr"
  data_source_description =
  map_reduce_conf         = "aaa"
  data_source_type        = "hive"
  hadoop_core_conf        = "aaa"
  emr_instance_id         =
  db_instance_id          =
  hadoop_hosts_address    = "aaa"

Argument Reference

The following arguments are supported:

  • db_instance_id - (Required, ForceNew) The instance ID.

  • data_source_description - (Optional) Data Source Description

  • data_source_name - (Optional, ForceNew) Data Source Name

  • data_source_type - (Optional) The type of the data source. Valid values:

    • mysql
    • postgresql
    • hdfs
    • hive
  • emr_instance_id - (Optional) The ID of the Emr instance.

  • hadoop_core_conf - (Optional) The string that specifies the content of the Hadoop core-site.xml file.

  • hadoop_create_type - (Optional) The type of the external service. Valid values:

    • emr: E-MapReduce (EMR) Hadoop cluster.
    • selfCreate: self-managed Hadoop cluster.
  • hadoop_hosts_address - (Optional) The IP address and hostname of the Hadoop cluster (data source) in the /etc/hosts file.

  • hdfs_conf - (Optional) The string that specifies the content of the Hadoop hdfs-site.xml file. This parameter must be specified when DataSourceType is set to HDFS.

  • hive_conf - (Optional) The string that specifies the content of the Hadoop hive-site.xml file. This parameter must be specified when DataSourceType is set to Hive.

  • map_reduce_conf - (Optional) The content of the Hadoop mapred-site.xml file. This parameter must be specified when DataSourceType is set to HDFS.

  • yarn_conf - (Optional) The string that specifies the content of the Hadoop yarn-site.xml file. This parameter must be specified when DataSourceType is set to HDFS.

Attributes Reference

The following attributes are exported:

  • id - The ID of the resource supplied above.The value is formulated as <db_instance_id>:<data_source_id>.

  • create_time - Creation time

  • data_source_id - The data source ID.

  • status - Data Source Status


The timeouts block allows you to specify timeouts for certain actions:

  • create - (Defaults to 5 mins) Used when create the Hadoop Data Source.
  • delete - (Defaults to 5 mins) Used when delete the Hadoop Data Source.
  • update - (Defaults to 5 mins) Used when update the Hadoop Data Source.


GPDB Hadoop Data Source can be imported using the id, e.g.

$ terraform import alicloud_gpdb_hadoop_data_source.example <db_instance_id>:<data_source_id>