Using the faster-bulk plug-in

更新时间:
复制 MD 格式

The faster-bulk plug-in is a built-in tool that optimizes write operations by aggregating bulk requests based on a specified size and time interval. This prevents small-batch writes from blocking the write queue, which makes it ideal for high-throughput scenarios with numerous index shards. The plug-in is disabled by default and must be manually enabled. However, because this aggregation adds latency, the plug-in is not recommended for low-latency write scenarios.

Usage notes

You must install the plug-in before using it. For more information, see Install or uninstall a built-in plug-in.

Write performance

The following reference data shows the performance of the faster-bulk plug-in in a specific test environment.

Test environment: Three 16-core, 64 GB data nodes and two 16-core, 64 GB independent client nodes, using the official esrally nyc-taxis dataset (650 bytes per document), with apack.fasterbulk.combine.interval set to 200 ms.

Translog status

Without plug-in

With plug-in

Performance improvement

Synchronous (default)

182,314/s

226,242/s

23%

Asynchronous

218,732/s

241,060/s

10%

Enable bulk aggregation

PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.enabled":"true"
   }
}

Configure aggregation parameters

Configure the aggregation size and time interval for bulk requests. The system triggers a data write when either the cumulative size of bulk requests or the aggregation time interval on a single data node reaches the configured threshold.

PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.flush_threshold_size":"1mb",
      "apack.fasterbulk.combine.interval":"50"
   }
}

Parameter

Description

Default

apack.fasterbulk.combine.flush_threshold_size

The maximum cumulative size of aggregated bulk requests on a single data node.

1mb

apack.fasterbulk.combine.interval

The maximum time interval for aggregating bulk requests. Unit: ms.

50

For high-concurrency scenarios with large data volumes, you can increase the maximum aggregation size or time interval within the capacity of your cluster. This helps prevent bulk requests from blocking the write queue.

Directed routing

When you batch-write documents without specifying a routing value or a primary key (_id), you can enable directed routing for the cluster or a specific index to improve write speed. This feature does not affect write requests that already specify a routing value or a primary key (_id).

To enable directed routing for a cluster:

PUT _cluster/settings
{
  "persistent" : {
    "index.direct_routing.global.enable" : "true"
  }
}

To enable directed routing for a specific index:

PUT <index_name>/_settings
{
  "index.direct_routing.enable" : "true"
}

Disable bulk aggregation

PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.enabled":"false"
   }
}