Benchmark methods

更新时间:
复制 MD 格式

These benchmark tests measure the performance gap between HBase Community Edition and ApsaraDB for HBase Performance-enhanced Edition across three dimensions: throughput, response latency, and storage efficiency. Run these tests to verify whether the Performance-enhanced Edition meets your production requirements before migrating, or to reproduce the published benchmark results in your own environment.

All tests use Yahoo Cloud Serving Benchmark (YCSB) as the load generator.

Prepare data

Create a test table in each cluster. All tests share the same schema: 200 partitions based on YCSB data distribution.

ApsaraDB for HBase Performance-enhanced Edition uses INDEX encoding and Zstandard (ZSTD) compression. INDEX encoding is exclusive to this edition — if you set the encoding to DIFF, it is automatically upgraded to INDEX.

YCSB workload configuration:

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }

HBase Community Edition uses DIFF encoding and SNAPPY compression, as recommended by Apache HBase.

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }

For the single-row read and range read tests, load an initial dataset of 2 billion rows, 20 columns per row, 20 bytes per column.

YCSB workload configuration:

recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=20
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform

Command to load data:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s

Throughput benchmark tests

These tests compare throughput at the same number of threads. The dataset is 2 billion rows, 20 columns per row, 20 bytes per column. The four scenarios are independent.

For all read and range scan tests, follow this pre-test procedure: run a major compaction and wait for it to complete, then run a warm-up test for 20 minutes before starting the formal 20-minute test.

  • Read data in a single row — query range: 10 million rows, 200 threads, 20-minute formal test

    YCSB workload configuration:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=1.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=0
    
    requestdistribution=uniform

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
  • Read data within a specified range — query range: 10 million rows, 50 rows per scan, 100 threads, 20-minute formal test

    YCSB workload configuration:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=1.0
    insertproportion=0
    
    requestdistribution=uniform
    maxscanlength=50
    hbase.usepagefilter=false

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
  • Write data into a single row — 1 column per insert, 20 bytes per column, 200 threads, 20-minute test

    YCSB workload configuration:

    recordcount=2000000000
    operationcount=100000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=1.0
    
    requestdistribution=uniform

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
  • Write data into multiple rows — 1 column per insert, 20 bytes per column, 100 rows per batch, 100 threads, 20-minute test

    YCSB workload configuration:

    recordcount=2000000000
    operationcount=10000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    fieldcount=1
    fieldlength=20
    cyclickey=true
    
    readallfields=false
    readproportion=0
    updateproportion=0
    scanproportion=0
    insertproportion=0.0
    batchproportion=1.0
    batchsize=100
    
    requestdistribution=uniform

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200

Response latency benchmark tests

These tests compare response latency at a fixed Operations per Second (OPS). By capping throughput to the same OPS on both systems, the tests reveal how each system handles latency under a controlled load. The dataset is 2 billion rows, 20 columns per row, 20 bytes per column.

For all read and range scan tests, follow this pre-test procedure: run a major compaction and wait for it to complete, then run a warm-up test for 20 minutes before starting the formal 20-minute test.

  • Read data in a single row — query range: 10 million rows, max OPS: 5,000, 200 threads, 20-minute formal test

    YCSB workload configuration:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=1.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=0
    
    requestdistribution=uniform

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
  • Read data within a specified range — query range: 10 million rows, 50 rows per scan, max OPS: 5,000, 100 threads, 20-minute formal test

    YCSB workload configuration:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=1.0
    insertproportion=0
    
    requestdistribution=uniform
    maxscanlength=50
    hbase.usepagefilter=false

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
  • Write data into a single row — 1 column per insert, 20 bytes per column, max OPS: 50,000, 200 threads, 20-minute test

    YCSB workload configuration:

    recordcount=2000000000
    operationcount=100000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=1.0
    
    requestdistribution=uniform

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000
  • Write data into multiple rows — 1 column per insert, 20 bytes per column, 100 rows per batch, max OPS: 2,000, 100 threads, 20-minute test

    YCSB workload configuration:

    recordcount=2000000000
    operationcount=10000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    fieldcount=1
    fieldlength=20
    cyclickey=true
    
    readallfields=false
    readproportion=0
    updateproportion=0
    scanproportion=0
    insertproportion=0.0
    batchproportion=1.0
    batchsize=100
    
    requestdistribution=uniform

    Command to run the test:

    bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000

Compression ratio benchmark tests

All four tests follow the same procedure: insert 5 million rows via YCSB, manually trigger a flush and major compaction, then check the table size.

Each test uses a different column configuration:

Columns per rowColumn size (bytes)
110
1100
2010
2020

YCSB workload configuration:

recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=<Number of columns in each row>
fieldlength=<Size of each column>

readproportion=1.0

requestdistribution=uniform

Command to insert data:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s