These benchmark tests measure the performance gap between HBase Community Edition and ApsaraDB for HBase Performance-enhanced Edition across three dimensions: throughput, response latency, and storage efficiency. Run these tests to verify whether the Performance-enhanced Edition meets your production requirements before migrating, or to reproduce the published benchmark results in your own environment.
All tests use Yahoo Cloud Serving Benchmark (YCSB) as the load generator.
Prepare data
Create a test table in each cluster. All tests share the same schema: 200 partitions based on YCSB data distribution.
ApsaraDB for HBase Performance-enhanced Edition uses INDEX encoding and Zstandard (ZSTD) compression. INDEX encoding is exclusive to this edition — if you set the encoding to DIFF, it is automatically upgraded to INDEX.
YCSB workload configuration:
create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }HBase Community Edition uses DIFF encoding and SNAPPY compression, as recommended by Apache HBase.
create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }For the single-row read and range read tests, load an initial dataset of 2 billion rows, 20 columns per row, 20 bytes per column.
YCSB workload configuration:
recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=20
fieldlength=20
readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0
requestdistribution=uniformCommand to load data:
bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -sThroughput benchmark tests
These tests compare throughput at the same number of threads. The dataset is 2 billion rows, 20 columns per row, 20 bytes per column. The four scenarios are independent.
For all read and range scan tests, follow this pre-test procedure: run a major compaction and wait for it to complete, then run a warm-up test for 20 minutes before starting the formal 20-minute test.
Read data in a single row — query range: 10 million rows, 200 threads, 20-minute formal test
YCSB workload configuration:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=1.0 updateproportion=0.0 scanproportion=0 insertproportion=0 requestdistribution=uniformCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200Read data within a specified range — query range: 10 million rows, 50 rows per scan, 100 threads, 20-minute formal test
YCSB workload configuration:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=1.0 insertproportion=0 requestdistribution=uniform maxscanlength=50 hbase.usepagefilter=falseCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200Write data into a single row — 1 column per insert, 20 bytes per column, 200 threads, 20-minute test
YCSB workload configuration:
recordcount=2000000000 operationcount=100000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=0 insertproportion=1.0 requestdistribution=uniformCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200Write data into multiple rows — 1 column per insert, 20 bytes per column, 100 rows per batch, 100 threads, 20-minute test
YCSB workload configuration:
recordcount=2000000000 operationcount=10000000 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=20 cyclickey=true readallfields=false readproportion=0 updateproportion=0 scanproportion=0 insertproportion=0.0 batchproportion=1.0 batchsize=100 requestdistribution=uniformCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
Response latency benchmark tests
These tests compare response latency at a fixed Operations per Second (OPS). By capping throughput to the same OPS on both systems, the tests reveal how each system handles latency under a controlled load. The dataset is 2 billion rows, 20 columns per row, 20 bytes per column.
For all read and range scan tests, follow this pre-test procedure: run a major compaction and wait for it to complete, then run a warm-up test for 20 minutes before starting the formal 20-minute test.
Read data in a single row — query range: 10 million rows, max OPS: 5,000, 200 threads, 20-minute formal test
YCSB workload configuration:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=1.0 updateproportion=0.0 scanproportion=0 insertproportion=0 requestdistribution=uniformCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000Read data within a specified range — query range: 10 million rows, 50 rows per scan, max OPS: 5,000, 100 threads, 20-minute formal test
YCSB workload configuration:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=1.0 insertproportion=0 requestdistribution=uniform maxscanlength=50 hbase.usepagefilter=falseCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000Write data into a single row — 1 column per insert, 20 bytes per column, max OPS: 50,000, 200 threads, 20-minute test
YCSB workload configuration:
recordcount=2000000000 operationcount=100000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=0 insertproportion=1.0 requestdistribution=uniformCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000Write data into multiple rows — 1 column per insert, 20 bytes per column, 100 rows per batch, max OPS: 2,000, 100 threads, 20-minute test
YCSB workload configuration:
recordcount=2000000000 operationcount=10000000 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=20 cyclickey=true readallfields=false readproportion=0 updateproportion=0 scanproportion=0 insertproportion=0.0 batchproportion=1.0 batchsize=100 requestdistribution=uniformCommand to run the test:
bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000
Compression ratio benchmark tests
All four tests follow the same procedure: insert 5 million rows via YCSB, manually trigger a flush and major compaction, then check the table size.
Each test uses a different column configuration:
| Columns per row | Column size (bytes) |
|---|---|
| 1 | 10 |
| 1 | 100 |
| 20 | 10 |
| 20 | 20 |
YCSB workload configuration:
recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=<Number of columns in each row>
fieldlength=<Size of each column>
readproportion=1.0
requestdistribution=uniformCommand to insert data:
bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s