Big data instance families combine large-scale local storage with high internal network bandwidth. They are optimized for Hadoop MapReduce, Hadoop Distributed File System (HDFS), Hive, HBase, Spark, Elasticsearch, and Kafka workloads. Most families offer a 1:4 CPU-to-memory ratio.
Instance availability varies by region. Before you select an instance type, check purchase availability by region and review the instance type selection guide. For metric definitions, see instance metric descriptions. To estimate costs, use the ECS Price Calculator.
Recommended instance families
| Recommended | Not recommended (use the recommended families if these are sold out) |
|---|---|
| d3s, d3c, d2c, d2s | d1ne |
Choose between d3s and d3c
Both d3s and d3c use Intel Ice Lake processors and support hot disk swapping, but they serve different workload profiles:
| d3s (storage-intensive) | d3c (compute-intensive) | |
|---|---|---|
| Max local storage per instance | 32 × 11,918 GB (~380 TB) | 4 × 13,743 GB (~55 TB) |
| Max network bandwidth | 80 Gbit/s | 40 Gbit/s |
| Disk IOPS spec | Not published | Up to 100,000 IOPS |
| OS support | Not restricted (Linux and Windows supported) | Linux only |
| Storage-compute decoupling | Not mentioned in product documentation | Supported (EMR JindoFS + OSS) |
Choose d3s when storage capacity and sequential throughput are the primary constraints — for example, large HDFS clusters with high data volumes per node. Choose d3c when you need higher disk IOPS, storage-compute decoupling via EMR JindoFS and Object Storage Service (OSS), or hot/cold data separation.
Local disk considerations
Local disk data durability depends on the reliability of the physical host. A hardware failure on the host can result in data loss. Store only temporary or replicated data on local disks. For more information, see Local disks.
Frameworks such as HDFS and Kafka replicate data across multiple nodes by design. Size your cluster with enough replicas to maintain durability if a single node fails.
Additional constraints:
Instances with local SSDs do not support instance type changes.
Local disks are tied to specific instance types. The count and capacity vary by instance type. Local disks cannot be purchased separately or moved to other instances.
Snapshots cannot be created for local disks. To create an image from an instance with local SSDs, snapshot only the system disk and any cloud data disks (not local disks), then combine those snapshots into an image.
Images that combine system disk snapshots with local SSD data disk snapshots cannot be created.
Standard SSDs can be attached to instances with local SSDs, and their capacity can be extended.
Certain instance operations affect data on local disks. For details, see Impacts of instance operations on data stored on local disks.
Initialize local disks
Linux kernel v2.6.37 and later enable the lazyinit feature by default, which defers inode table initialization until the file system is mounted. On instances with many local disks, this deferred initialization can consume up to 600 MB/s of disk throughput and affect service stability. Linux kernel v4.x increased the concurrency limit for lazy initialization. For the upstream fix, see this kernel commit.
To initialize all local disks before starting services:
List all local serial advanced technology attachment (SATA) HDDs on the instance.
Run the following command for each local disk to disable lazy initialization. This example formats
/dev/vdbwith an ext4 file system:mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdb &Run this command in parallel for each disk (note the trailing
&).After all disks finish formatting, run the following command and wait until the I/O activity for every disk drops to 0:
iostat -x 5Mount all disks.
d3s, storage-intensive big data instance family
Key specs: Up to 32 × 11,918 GB local SATA HDDs (~380 TB raw), up to 80 Gbit/s network bandwidth, 2.7 GHz Intel® Xeon® Scalable (Ice Lake) processors with 3.5 GHz all-core turbo frequency.
Use cases:
Hadoop MapReduce, HDFS, Hive, and HBase workloads
Spark in-memory computing and MLlib
Elasticsearch and Kafka deployments
Hardware:
All instances are I/O optimized
Supported cloud disk types: ESSDs and ESSD AutoPL disks
Network: IPv4 and IPv6. For IPv6 setup, see IPv6 communication. Network performance scales with instance size.
Disk failure handling: d3s supports online replacement and hot swapping of failed disks without instance shutdown. When a disk fails, you receive a system event. Initiate the disk repair process to resolve it. For details, see O&M scenarios and system events for instances equipped with local disks.
Data on a failed disk cannot be restored after you initiate the repair process.
Instance types:
| Instance type | vCPUs | Memory (GiB) | Local storage | Network baseline/burst bandwidth (Gbit/s) | Packet forwarding rate (pps) | Disk baseline/burst bandwidth (Gbit/s) |
|---|---|---|---|---|---|---|
| ecs.d3s.2xlarge | 8 | 32 | 4 × 11,918 GB (4 × 11,100 GiB) | 10/burstable up to 15 | 2,000,000 | 3/burstable up to 5 |
| ecs.d3s.4xlarge | 16 | 64 | 8 × 11,918 GB (8 × 11,100 GiB) | 25/none | 3,000,000 | 5/none |
| ecs.d3s.8xlarge | 32 | 128 | 16 × 11,918 GB (16 × 11,100 GiB) | 40/none | 6,000,000 | 8/none |
| ecs.d3s.12xlarge | 48 | 192 | 24 × 11,918 GB (24 × 11,100 GiB) | 60/none | 9,000,000 | 12/none |
| ecs.d3s.16xlarge | 64 | 256 | 32 × 11,918 GB (32 × 11,100 GiB) | 80/none | 12,000,000 | 16/none |
d3c, compute-intensive big data instance family
Key specs: Up to 4 × 13,743 GB local disks (~55 TB raw), up to 40 Gbit/s network bandwidth, third-generation 2.9 GHz Intel® Xeon® Scalable (Ice Lake) processors with 3.5 GHz all-core turbo frequency.
d3c supports Linux images only. Select a Linux image when creating an instance.
Use cases:
Hadoop MapReduce, HDFS, Hive, and HBase workloads
Storage-compute decoupling with EMR JindoFS and OSS (hot/cold data separation)
Spark in-memory computing and MLlib
Elasticsearch and Kafka deployments
Hardware:
All instances are I/O optimized
Supported cloud disk types: ESSDs and ESSD AutoPL disks
Network: IPv4 and IPv6. For IPv6 setup, see IPv6 communication. Network performance scales with instance size.
Disk failure handling: d3c supports online replacement and hot swapping of failed disks without instance shutdown. When a disk fails, you receive a system event. Initiate the disk repair process to resolve it. For details, see O&M scenarios and system events for instances equipped with local disks.
Data on a failed disk cannot be restored after you initiate the repair process.
Instance types:
| Instance type | vCPUs | Memory (GiB) | Local storage | Network baseline/burst bandwidth (Gbit/s) | Packet forwarding rate (pps) | Disk baseline/burst IOPS | Disk baseline/burst bandwidth (Gbit/s) |
|---|---|---|---|---|---|---|---|
| ecs.d3c.3xlarge | 14 | 56.0 | 1 × 13,743 GB (1 × 12,800 GiB) | 8/burstable up to 10 | 1,600,000 | 40,000/none | 3/none |
| ecs.d3c.7xlarge | 28 | 112.0 | 2 × 13,743 GB (2 × 12,800 GiB) | 16/burstable up to 25 | 2,500,000 | 50,000/none | 4/none |
| ecs.d3c.14xlarge | 56 | 224.0 | 4 × 13,743 GB (4 × 12,800 GiB) | 40/none | 5,000,000 | 100,000/none | 8/none |
d2c, compute-intensive big data instance family
Key specs: Up to 12 × 3,972 GB local SATA HDDs (~47 TB raw), up to 35 Gbit/s network bandwidth, 2.5 GHz Intel® Xeon® Platinum 8269CY (Cascade Lake) processors.
Use cases:
Hadoop MapReduce, HDFS, Hive, and HBase workloads
Storage-compute decoupling with EMR JindoFS and OSS (hot/cold data separation)
Spark in-memory computing and MLlib
Elasticsearch and Kafka deployments
Hardware:
All instances are I/O optimized
Supported cloud disk types: Enhanced SSDs (ESSDs), ESSD AutoPL disks, standard SSDs, and ultra disks
Network: IPv4 and IPv6. For IPv6 setup, see IPv6 communication. Network performance scales with instance size.
Disk failure handling: d2c supports online replacement and hot swapping of failed disks without instance shutdown. When a disk fails, you receive a system event. Initiate the disk repair process to resolve it. For details, see O&M scenarios and system events for instances equipped with local disks.
Data on a failed disk cannot be restored after you initiate the repair process.
Instance types:
| Instance type | vCPUs | Memory (GiB) | Local storage | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
|---|---|---|---|---|---|
| ecs.d2c.6xlarge | 24 | 88.0 | 3 × 3,972 GB (3 × 3,700 GiB) | 12.0 | 1,600,000 |
| ecs.d2c.12xlarge | 48 | 176.0 | 6 × 3,972 GB (6 × 3,700 GiB) | 20.0 | 2,000,000 |
| ecs.d2c.24xlarge | 96 | 352.0 | 12 × 3,972 GB (12 × 3,700 GiB) | 35.0 | 4,500,000 |
d2s, storage-intensive big data instance family
Key specs: Up to 30 × 7,838 GB local SATA HDDs (~235 TB raw), up to 35 Gbit/s network bandwidth, 2.5 GHz Intel® Xeon® Platinum 8163 (Skylake) processors.
Use cases:
Hadoop MapReduce, HDFS, Hive, and HBase workloads
Spark in-memory computing and MLlib
Elasticsearch and Kafka deployments
Hardware:
All instances are I/O optimized
Supported cloud disk types: ESSDs, ESSD AutoPL disks, standard SSDs, and ultra disks
Network: IPv4 and IPv6. For IPv6 setup, see IPv6 communication. Network performance scales with instance size.
Disk failure handling: d2s supports online replacement and hot swapping of failed disks without instance shutdown. When a disk fails, you receive a system event. Initiate the disk repair process to resolve it. For details, see O&M scenarios and system events for instances equipped with local disks.
Data on a failed disk cannot be restored after you initiate the repair process.
Instance types:
| Instance type | vCPUs | Memory (GiB) | Local storage | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
|---|---|---|---|---|---|
| ecs.d2s.5xlarge | 20 | 88.0 | 8 × 7,838 GB (8 × 7,300 GiB) | 12.0 | 1,600,000 |
| ecs.d2s.10xlarge | 40 | 176.0 | 15 × 7,838 GB (15 × 7,300 GiB) | 20.0 | 2,000,000 |
| ecs.d2s.20xlarge | 80 | 352.0 | 30 × 7,838 GB (30 × 7,300 GiB) | 35.0 | 4,500,000 |
d1ne, network-enhanced big data instance family (not recommended)
d1ne is no longer recommended. Use d3s, d3c, d2c, or d2s instead.
Key specs: Up to 28 × 5,905 GB local SATA HDDs (~165 TB raw), up to 35 Gbit/s network bandwidth, 1:4 CPU-to-memory ratio.
Use cases:
Hadoop MapReduce, HDFS, Hive, and HBase workloads
Spark in-memory computing and MLlib
Elasticsearch deployments
Hardware:
Processor: 2.5 GHz Intel® Xeon® E5-2682 v4 (Broadwell) or Intel® Xeon® Platinum 8163 (Skylake)
All instances are I/O optimized
Supported cloud disk types: standard SSDs and ultra disks only
Network: IPv4 and IPv6. For IPv6 setup, see IPv6 communication. Network performance scales with instance size.
Instance types:
| Instance type | vCPUs | Memory (GiB) | Local storage | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
|---|---|---|---|---|---|
| ecs.d1ne.2xlarge | 8 | 32.0 | 4 × 5,905 GB (4 × 5,500 GiB) | 6.0 | 1,000,000 |
| ecs.d1ne.4xlarge | 16 | 64.0 | 8 × 5,905 GB (8 × 5,500 GiB) | 12.0 | 1,600,000 |
| ecs.d1ne.6xlarge | 24 | 96.0 | 12 × 5,905 GB (12 × 5,500 GiB) | 16.0 | 2,000,000 |
| ecs.d1ne-c8d3.8xlarge | 32 | 128.0 | 12 × 5,905 GB (12 × 5,500 GiB) | 20.0 | 2,000,000 |
| ecs.d1ne.8xlarge | 32 | 128.0 | 16 × 5,905 GB (16 × 5,500 GiB) | 20.0 | 2,500,000 |
| ecs.d1ne-c14d3.14xlarge | 56 | 160.0 | 12 × 5,905 GB (12 × 5,500 GiB) | 35.0 | 4,500,000 |
| ecs.d1ne.14xlarge | 56 | 224.0 | 28 × 5,905 GB (28 × 5,500 GiB) | 35.0 | 4,500,000 |