Memcg backend asynchronous reclaim

更新时间:
复制 MD 格式

When a memcg's memory usage reaches its limit, the kernel triggers direct memory reclamation — a synchronous operation that runs in the memory allocation path and stalls the current process. To eliminate this latency, Alibaba Cloud Linux provides backend asynchronous reclaim for memcgs. The kernel reclaims memory in the background before usage reaches the hard limit, so your processes never block waiting for reclamation to complete.

This feature is available on:

  • Alibaba Cloud Linux 2 with kernel version 4.19.81-17.al7 and later

  • Alibaba Cloud Linux 3 with kernel version 5.10.134-12.al8 and later

Warning

Keep the following behavior in mind before enabling this feature:

  • Memory allocation in an existing memcg may recursively trigger backend asynchronous reclamation in the parent cgroup.

  • Reclamation starts at the memcg where it was triggered and proceeds top-down through the cgroup hierarchy.

  • When memory.high is set to a value less than memory.limit_in_bytes, memory.wmark_high and memory.wmark_low are calculated based on memory.high instead of memory.limit_in_bytes.

How it works

The feature introduces a watermark-based mechanism. Set memory.wmark_ratio to a percentage of the memcg limit to define a high watermark (memory.wmark_high). When memory usage crosses that watermark, the kernel starts reclaiming memory asynchronously in the background. Reclamation stops when usage drops below a low watermark (memory.wmark_low), which is derived from memory.wmark_high offset by a configurable scale factor.

This creates a buffer zone: reclamation runs in the background before the hard limit is reached, preventing the abrupt stalls caused by direct reclamation.

Interface reference

memory.wmark_ratio

Read-write. Enables the feature and sets the high watermark as a percentage of the memcg limit.

  • Valid values: 0–100

  • Default: 0 (feature disabled)

  • When set to a non-zero value, the feature is enabled and memory.wmark_high is recalculated.

memory.wmark_high

Read-only. The memory usage threshold that triggers backend asynchronous reclamation.

  • Formula: memory.wmark_high = memory.limit_in_bytes x memory.wmark_ratio / 100

  • When the feature is disabled (memory.wmark_ratio = 0), this value is set to the maximum possible value to prevent triggering.

  • Not present in the memcg root directory.

memory.wmark_low

Read-only. The memory usage threshold at which backend asynchronous reclamation stops.

  • Formula: memory.wmark_low = memory.wmark_high - memory.limit_in_bytes x memory.wmark_scale_factor / 10000

  • Not present in the memcg root directory.

memory.wmark_scale_factor

Read-write. Controls the gap between memory.wmark_high and memory.wmark_low.

  • Unit: 0.01% of the memcg limit

  • Valid values: 1–1000

  • Default: 50 (0.50% of the memcg limit), inherited from the parent cgroup when the memcg is created

  • Not present in the memcg root directory.

Enable backend asynchronous reclaim

The following example creates a test memcg, sets a 1 GB memory limit, enables the feature with a 95% watermark, and verifies the resulting interface values.

  1. Create a test memcg.

    sudo mkdir /sys/fs/cgroup/memory/test/
  2. Set the memory limit to 1 GB.

    sudo sh -c 'echo 1G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes'
  3. Enable backend asynchronous reclaim by setting memory.wmark_ratio to 95. This configures memory.wmark_high to 95% of the memcg limit.

    sudo sh -c 'echo 95 > /sys/fs/cgroup/memory/test/memory.wmark_ratio'
  4. Read back the interface values to verify the configuration.

    Read memory.wmark_scale_factor. The default inherited value is 50, which represents 0.50% of the memcg limit.

    cat /sys/fs/cgroup/memory/test/memory.wmark_scale_factor

    Read memory.wmark_high to verify the high watermark.

    cat /sys/fs/cgroup/memory/test/memory.wmark_high

    Read memory.wmark_low to verify the low watermark.

    cat /sys/fs/cgroup/memory/test/memory.wmark_low