THP reclaim

更新时间:
复制 MD 格式

THP reclaim splits Transparent Huge Pages (THPs) into subpages and reclaims zero subpages to prevent out-of-memory (OOM) errors caused by memory bloat. This topic describes the interfaces for configuring THP reclaim in memory control groups and how to enable and test the feature.

Background

Linux manages memory in pages. Regular pages are 4 KiB. THPs are larger blocks — 2 MiB or 1 GiB — that the kernel allocates dynamically to reduce Translation Lookaside Buffer (TLB) misses and improve application performance.

THPs can cause memory bloat. When an application requests only 8 KiB, the kernel may allocate a 2 MiB THP. That THP contains the 2 requested 4 KiB pages plus 510 zero-filled 4 KiB pages (zero pages), wasting resident set size (RSS) and increasing OOM risk.

THP reclaim addresses this by splitting THPs into subpages and reclaiming zero subpages. Note that enabling THP reclaim may degrade memory performance.

Interfaces

The following interfaces control THP reclaim behavior.

InterfaceDescription
memory.thp_reclaimEnables or disables THP reclaim for a memory control group. Valid values: reclaim (enable), swap (reserved for future use), disable (default). Enable reclaim when memory bloat is causing OOM errors; note it may degrade memory performance.
memory.thp_reclaim_statQueries THP reclaim status for a memory control group. Reports three counters per NUMA node, listed in ascending NUMA node ID order (node0, node1, ...): queue_length (THPs currently in the reclaim queue), split_hugepage (total THPs split), reclaim_subpage (total zero subpages reclaimed).
memory.thp_reclaim_ctrlControls how THP reclaim is triggered. Parameters: threshold (maximum number of zero subpages in a THP before reclaim triggers; default: 16), reclaim (write-only; triggers reclaim immediately). Tune threshold to control what qualifies as a zero-heavy THP — lower values trigger reclaim more aggressively; higher values reduce reclaim frequency.
/sys/kernel/mm/transparent_hugepage/reclaimGlobal interface that overrides per-cgroup settings. Valid values: memcg (each cgroup uses its own memory.thp_reclaim config; default), reclaim (force-enables reclaim for all cgroups), swap (reserved for future use), disable (force-disables reclaim for all cgroups). When set to reclaim or disable, this interface takes precedence over per-cgroup memory.thp_reclaim configurations. The per-cgroup configurations are preserved and take effect again if the global interface is reset to memcg.

Supported kernel versions

THP reclaim is supported on the following Alibaba Cloud Linux kernel versions:

  • Alibaba Cloud Linux 2: kernel 4.19.91-24.al7 or later

  • Alibaba Cloud Linux 3: kernel 5.10.134-15.al8 or later

To check your kernel version, run:

uname -r

Configure THP reclaim

The steps below configure THP reclaim for a memory control group named test.

Prerequisites

Before you begin, ensure that:

  • Your kernel version meets the minimum requirements listed in Supported kernel versions

  • You have sudo privileges on the instance

Procedure

  1. Create a memory control group named test:

    sudo mkdir /sys/fs/cgroup/memory/test/
  2. Enable THP reclaim for test:

    sudo sh -c 'echo reclaim > /sys/fs/cgroup/memory/test/memory.thp_reclaim'
  3. Verify that THP reclaim is enabled:

    cat /sys/fs/cgroup/memory/test/memory.thp_reclaim

    The active setting is enclosed in brackets. The following output confirms that THP reclaim is enabled:

    [reclaim] swap disable
  4. (Optional) Override per-cgroup settings using the global interface.

    To force-enable THP reclaim for all memory control groups:

    sudo sh -c 'echo reclaim > /sys/kernel/mm/transparent_hugepage/reclaim'

    To force-disable THP reclaim for all memory control groups:

    sudo sh -c 'echo disable > /sys/kernel/mm/transparent_hugepage/reclaim'
    Note: When the global interface is set to reclaim or disable, it takes precedence over the memory.thp_reclaim interface. The per-cgroup memory.thp_reclaim configurations are not affected.
  5. Set the zero-subpage threshold for test.

    The threshold parameter defines the maximum number of zero subpages a THP can contain before THP reclaim is triggered. The default is 16. To change it to 32:

    sudo sh -c 'echo "threshold 32" > /sys/fs/cgroup/memory/test/memory.thp_reclaim_ctrl'

    With this setting, THP reclaim triggers when a THP contains more than 32 zero subpages.

  6. Trigger zero subpage reclaim manually.

    Note: The reclaim parameter in memory.thp_reclaim_ctrl is write-only. Running cat on this interface does not return the reclaim setting.

    To trigger reclaim for test only:

    sudo sh -c 'echo "reclaim 1" > /sys/fs/cgroup/memory/test/memory.thp_reclaim_ctrl'

    To trigger reclaim recursively for test and all its child control groups:

    sudo sh -c 'echo "reclaim 2" > /sys/fs/cgroup/memory/test/memory.thp_reclaim_ctrl'

    THP reclaim is also triggered automatically when:

  7. Check THP reclaim status for test:

    cat /sys/fs/cgroup/memory/test/memory.thp_reclaim_stat

    Example output:

    queue_length        14
    split_hugepage     523
    reclaim_subpage 256207
    • queue_length 14: 14 THPs are currently in the reclaim queue.

    • split_hugepage 523: 523 THPs have been split.

    • reclaim_subpage 256207: 256,207 zero subpages have been reclaimed.

Test the THP reclaim feature

The sample C code below allocates 1 GiB of memory (512 THPs), where 10 THPs contain zero subpages. Use it to compare memory behavior with THP reclaim enabled and disabled.

  1. Set the memory limit for test to 1 GiB:

    sudo sh -c 'echo 1G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes'

    Verify the value:

    cat /sys/fs/cgroup/memory/test/memory.limit_in_bytes

    The expected output is:

    1073741824
  2. Disable the memcg backend asynchronous reclaim feature to isolate the test:

    sudo sh -c 'echo 0 > /sys/fs/cgroup/memory/test/memory.wmark_ratio'

    For details on this feature, see Memcg backend asynchronous reclaim.

  3. Compile and run the sample C code with THP reclaim enabled and then disabled. Compare the results.

    Compile:

    gcc -o test <test.c>

    Replace <test.c> with your actual source file name.

    // Allocates 1 GiB (512 THPs). The first 10 THPs contain zero subpages.
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    
    #define HUGEPAGE_SIZE 4096 * 512
    
    int main()
    {
        int i, thp = 512;
        char *addr;
        posix_memalign((void **)&addr, HUGEPAGE_SIZE, HUGEPAGE_SIZE * thp);
    
        for (i = 0; i < 10; i++) {
            memset(addr, 0xc, HUGEPAGE_SIZE >> 1);
            addr += HUGEPAGE_SIZE;
        }
    
        for (; i < thp; i++) {
            memset(addr, 0xc, HUGEPAGE_SIZE);
            addr += HUGEPAGE_SIZE;
        }
    
        pause();
        return 0;
    }

    On a separate terminal, monitor for OOM errors:

    dmesg -wH

    Expected results:

    • THP reclaim enabled: The feature splits the allocated THPs and reclaims zero subpages, reducing memory usage and preventing OOM errors.

    • THP reclaim disabled: THPs are not split and zero subpages are not reclaimed. OOM errors may occur. Kernel logs show insufficient memory and the details of processes that may be terminated.

What's next