Optimize large transaction commits

更新时间:
复制 MD 格式

In workloads that involve large transactions, commits can take an excessive amount of time to write to the binlog. This can make the instance non-writable for extended periods or even cause it to hang. ApsaraDB RDS for MySQL introduces the binlog cache free flush feature to optimize the commit speed for large transactions. This feature resolves performance fluctuations and prolonged write unavailability, improving overall instance stability.

Features

Because binlog writes are serial, a large transaction is slow to write its binlog events and also blocks other transactions from committing. This feature optimizes the binlog write mechanism for large transaction commits, making commit speed independent of the number of binlog events. This feature resolves the following common issues:

  • Extremely large transactions block the instance, preventing other transactions from committing.

  • Large transactions cause high I/O consumption, which can exhaust I/O resources and slow down other operations, such as queries.

  • Large transactions slow down the execution of small transactions, leading to a surge in active connections and performance fluctuations. In severe cases, this can trigger a cascading failure and make the instance unavailable.

  • Frequent execution of large transactions causes sustained performance instability for your workload.

Note
  • A large transaction refers to a transaction that generates a large volume of binlog events. For more information, see Problem analysis.

  • For more information about how the binlog cache free flush feature works, see How it works.

Prerequisites

The instance must meet one of the following version requirements:

  • MySQL 8.4

  • MySQL 8.0 with a minor engine version of 20240731 or later.

  • MySQL 5.7 with a minor engine version of 20240731 or later.

    Note

    On the Basic Information page, check the Configuration Information section for the Upgrade minor engine version button. If the button is available, you can upgrade your instance. If the button is not available, your instance is already on the latest minor engine version. For more information, see Upgrade minor engine version.

Usage

  • New instances: The binlog cache free flush feature is enabled by default.

  • Existing instances: You can enable this feature by setting the loose_binlog_cache_free_flush global system variable. The change takes effect immediately without requiring an instance restart.

  • Related parameter:

    loose_binlog_cache_free_flush_limit_size: After you enable the binlog cache free flush feature, if the binlog generated by a transaction exceeds the value of this parameter, the binlog cache temporary file is promoted to a binlog file upon commit. Default value: 256 MB. Value range: 20971520 to 18446744073709551615 (unit: bytes).

Problem analysis

When a MySQL instance executes a large transaction, the slow query log may record unusual delays for statements that are expected to complete quickly, such as COMMIT.image

This issue is caused by resource contention in the serial binlog write mechanism. A large transaction in one thread takes a long time to write its binlog, blocking other transactions from committing and causing a global, serialized delay.

During a sysbench oltp_write_only test, after running a large UPDATE operation that generates 512 MB of binlog every 5 seconds, we observed the following effects:

  • Every time the large UPDATE commits, the Transactions Per Second (TPS) drops sharply.

  • The performance impact is proportional to the binlog size: the larger the binlog generated by the transaction, the more significant the latency impact on other workloads.

The following figure illustrates the impact of large transactions on a running workload.

image

The UPDATE-related operations:

CREATE TABLE t_large (a INT, b LONGTEXT) ENGINE = InnoDB;

# Insert a row with 256 MB of text. The resulting UPDATE generates 512 MB of binlog events.
INSERT INTO t_large VALUES (1, repeat('a', 256000000)); 

UPDATE t_large SET a = a + 1;

Optimization effects

  • Significantly improved TPS stability

    • Before optimization: The large UPDATE operation, which runs every 5 seconds, causes severe TPS fluctuations.

    • After optimization (orange line): The impact of the large UPDATE on TPS is almost eliminated.

    image

  • No significant latency fluctuations

    • As shown by the orange line (after optimization), the impact of UPDATE operations of different sizes on latency is no longer significant.

    • Even when the binlog size exceeds 512 MB, the commit latency of other transactions is not significantly affected.

    image

How it works

Binlog cache

image

The binlog cache is a temporary, session-level space used to stage binlog events. Each session is allocated its own binlog cache, which consists of a memory cache (buffer) and a temporary file. The size of the memory cache is controlled by the binlog_cache_size parameter. When a transaction is large enough to fill the memory cache, its events are written to the temporary file.

During transaction execution, generated events are temporarily stored in the binlog cache. At commit time, the system must read these events sequentially from the binlog cache, update the end_pos (end position) and checksum for each event, and then write them to the binlog file. To ensure the transaction's events are contiguous in the binlog file, a lock protects this entire process. While one transaction holds this lock to write to the binlog file, all other committing transactions are blocked.

Issues with large transaction binlog writes

image

Writing these events to the binlog file during commit can be time-consuming and severely impact the instance in two main ways:

  • The process holds a binlog write lock, making the entire instance non-writable for the duration of the write.

  • The write process consumes significant I/O resources. In environments with limited I/O capacity, this can cause the entire instance to hang.

Binlog cache free flush optimization

image

AliSQL redesigns the temporary files used by the binlog cache, allowing them to be directly promoted to binlog files. When this feature is enabled, committing a large transaction promotes its temporary file directly into a new binlog file. This process is extremely fast, consumes minimal I/O resources, and completely eliminates the negative impacts of writing binlogs for large transactions.

Changes to the binlog cache

image

To allow a binlog cache temporary file to be directly promoted into a binlog file, AliSQL modifies how the binlog cache is used in two key ways:

  • When writing to the binlog cache's temporary file, a space is reserved at the beginning of the file. During the promotion process, this space is used to store the header events of the binlog.

  • When writing to the binlog cache, the end_pos of each event is calculated based on the size of this reserved space.

Binlog file promotion

image

When a large transaction is committed, its binlog cache temporary file is promoted directly into a new binlog file. During this process, the binlog's file header and initial header events are written into the reserved space. This reserved space contains four main components:

  • File Header: A 4-byte marker, [ 0xFE 'bin'], is written at the beginning of every binlog file to identify it as such.

  • Header Events: Includes the Format description event and Previous gtid event.

  • Empty event: An existing, ignorable event type is used to fill any remaining reserved space.

  • Gtid event: The GTID for the large transaction is generated during the commit phase, and its event is also written into the reserved space.

The content following the reserved space consists of the events from the original binlog cache, which typically include Query, Table map, Row, and Xid events.

A binlog file promoted by this feature differs from a regular one in two ways:

  • It contains an additional Empty event.

  • The checksum for this file is disabled by default.

The binlog cache free flush feature does not alter the binlog format. Replication and third-party tools that rely on binlog files are not affected.