Detecting kernel memory pollution with KFENCE

更新时间:
复制 MD 格式

Kernel Electric-Fence (KFENCE) is a built-in Linux kernel tool that can be enabled in an online environment. KFENCE detects memory pollution issues in the kernel and kernel modules. When KFENCE detects a memory pollution issue, KFENCE generates an error message that contains the details of the issue. Alibaba Cloud enhanced KFENCE in Alibaba Cloud Linux 3. You can flexibly and dynamically enable or disable KFENCE and use KFENCE to comprehensively detect memory pollution issues to meet the requirements for online detection and offline debugging.

Limits on operating systems

  • x86 architecture

    Alibaba Cloud Linux 3 whose kernel version is 5.10.84-10 or later

  • Arm architecture

    Alibaba Cloud Linux 3 whose kernel version is 5.10.134-16 or later

Note
  • If you are a developer of the kernel or kernel modules, you can use KFENCE to check whether memory pollution occurs in the kernel or kernel modules.

  • If you are a regular user and encounter a kernel crash, you can enable KFENCE to provide kernel or third-party driver developers with more information.

Terms

Term

Description

memory pollution

The issue that memory areas are incorrectly modified or corrupted when a program is running, which causes exceptions or crashes on the program. Memory pollution can be caused by programming errors, software vulnerabilities, malware, or hardware failures.

slab

Slab allocation is an efficient memory allocation mechanism in the Linux kernel. The kernel uses slabs to pre-allocate a specific number of memory objects in a memory cache pool for quick memory allocation and release. Slabs can be used to prevent frequent memory allocation and release operations and improve the efficiency of memory allocation.

order-0 page

Order-0 page allocation is a memory allocation mechanism in the Linux kernel. Memory is divided into fixed-size blocks called page frames. In most cases, the size of an order-0 page frame is 4 KiB. An order-0 page is a 4-KiB page frame, which is the basic unit for memory allocation. When an application or the kernel requires small blocks of memory, memory is allocated by order-0 pages.

Enable KFENCE

KFENCE is used in the following business scenarios:

Online detection scenario

Scenario 1: Use KFENCE to detect whether a memory pollution issue occurs

Note

This scenario uses 2 MiB of memory with no significant impact on performance.

  • Add the kfence.sample_interval parameter to enable KFENCE.

    Replace <kfence.sample_interval> with the value that you want to specify. For example, a value of 100 specifies that the KFENCE debugging tool is automatically enabled the next time the system starts and the sampling interval is set to 100 events.

    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.sample_interval=<kfence.sample_interval>"
  • Add the kfence.booting_max parameter to limit the maximum amount of memory that KFENCE can consume based on the memory specifications.

    Note
    • In kernel version 5.10.134-17 or later, the default configuration kfence.booting_max=0-2G:0,2G-32G:2M,32G-:32M is added to the boot commandline parameter list. The preceding default configuration is used in conjunction with the default value (255) of the num_objects parameter to ensure that the memory overhead of KFENCE does not exceed 1‰ of the total memory in all memory specifications. When the preceding default configuration and value are used, KFENCE can consume up to 2 MiB of memory if standard 4-KiB memory pages are used and up to 32 MiB of memory if 64-KiB enormous pages are used.

    • This parameter sets the upper limit for memory overhead and constrains the num_objects parameter. This value is only an upper bound and does not represent the actual memory overhead.

    Replace <kfence.booting_max> with the value that you want to specify, such as 0-128M:0,128M-256M:1M,256M-:2M. Description of the segments in the sample value:

    • 0-128M:0: If the total memory on the machine that you use is less than 128 MiB in size, KFENCE is disabled.

    • 128M-256M:1M: If the total memory on the machine that you use is larger than or equal to 128 MiB but less than or equal to 256 MiB in size, KFENCE can consume up to 1 MiB of memory. The value of the num_objects parameter cannot exceed 127.

    • 256M-:2M: If the total memory on the machine that you use is larger than 256 MiB in size, KFENCE can consume up to 2 MiB of memory. The value of the num_objects parameter cannot exceed 255.

    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.booting_max=<kfence.booting_max>"

    The preceding configuration applies only to the scenario in which KFENCE is started on system startup by adding parameters to the boot commandline parameter list. The configuration does not take effect in Scenario 2 in which KFENCE is configured after the system starts.

The configuration automatically takes effect the next time the system starts.

Scenario 2: Use KFENCE to detect whether a memory pollution issue occurs

Important

In this scenario, a large amount of memory at the GiB level is consumed. Exercise caution when you use a small-memory machine.

  1. Create a memory allocation script and add the following content. In the following example, the script is named kfence.sh and the slab type to be monitored is kmalloc-64.

    #!/bin/bash
    # usage: ./kfence.sh kmalloc-64
    
    SLAB_PREFIX=/sys/kernel/slab
    MODULE_PREFIX=/sys/module/kfence/parameters
    
    if [ $# -eq 0 ]; then
    	echo "err: please input slabs"
    	exit 1
    fi
    
    #check whether slab exists
    for i in $@; do
    	slab_path=$SLAB_PREFIX/$i
    	if [ !  -d $slab_path ]; then
    		echo "err: slab $i not exist!"
    		exit 1
    	fi
    done
    
    #calculate num_objects
    sumobj=0
    for i in $@; do
    	objects=($(cat $SLAB_PREFIX/$i/objects))
    	maxobj=1
    	for ((j=1; j<${#objects[@]}; j++)); do
    		nodeobj=$(echo ${objects[$j]} | awk -F= '{print $2}')
    		[ $maxobj -lt $nodeobj ] && maxobj=$nodeobj
    	done
    	((sumobj += maxobj))
    done
    echo "recommend num_objects per node: $sumobj"
    
    #check kfence stats
    if [ $(cat $MODULE_PREFIX/sample_interval) -ne 0 ]; then
    	echo "kfence is running, disable it and wait..."
    	echo 0 > $MODULE_PREFIX/sample_interval
    	sleep 1
    fi
    
    #disable all slabs catching
    for file in $SLAB_PREFIX/*
    do
    	(echo 0 > $file/kfence_enable) 2>/dev/null || echo 1 > $file/skip_kfence
    done
    
    #disable order0 page catching
    echo 0 > $MODULE_PREFIX/order0_page
    
    #enable setting slabs catching
    for i in $@; do
    	(echo 1 > $SLAB_PREFIX/$i/kfence_enable) 2>/dev/null || echo 0 > $SLAB_PREFIX/$i/skip_kfence
    done
    
    #setting num_objects and node mode
    echo $sumobj > $MODULE_PREFIX/num_objects
    echo node > $MODULE_PREFIX/pool_mode
    
    #start kfence
    echo -1 > $MODULE_PREFIX/sample_interval
    if [ $?  -ne 0 ]; then
    	echo "err: kfence enable fail!"
    	exit 1
    fi
    echo "kfence enabled!"

    The script is used to detect the number of active objects of the slabs, estimate the appropriate KFENCE pool size based on the number, and then enable KFENCE to obtain information about the memory allocation of all the slabs.

    Note

    Slabs are commonly used in memory management to optimize memory allocation and release operations. This improves system performance and efficiency. KFENCE can monitor slabs and order-0 pages.

  2. Run the following command to execute the detection script.

    sudo bash ./kfence.sh kmalloc-64

Offline debugging scenario

Enable KFENCE by specifying parameters for the x86 architecture

  1. Run the following commands to enable KFENCE:

    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.num_objects=1000000"
    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.sample_interval=-1"
    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.fault=panic"
    • num_objects: the size of the KFENCE pool, which is the maximum number of slab objects that KFENCE can monitor.

      • When the value of the num_objects parameter is smaller than or equal to 131071, the maximum amount of memory that KFENCE can consume is calculated using the following formula: (num_objects + 1) × 8 KiB.

      • When the value of the num_objects parameter is greater than 131071, the maximum amount of memory that KFENCE can consume is calculated using the following formula: ⌈num_objects/131071⌉ GiB. The ⌈⌉ symbols specify that the calculation result is rounded up to the nearest integer.

        Note

        We recommend that you set the num_objects parameter to 10% of the maximum available memory. For example, if you set the num_objects parameter to 1,000,000, KFENCE can consume up to 8 GiB of memory, which is calculated using the following formula: ⌈ 1,000,000/131071 ⌉ GiB = 8 GiB.

    • sample_interval: The value can be one of the following.

      • 0: KFENCE is disabled and does not monitor memory.

      • Positive number: the sampling interval in milliseconds. For example, a value of 100 specifies that KFENCE monitors the allocated memory every 100 milliseconds.

      • Negative number: Specifies full mode. In this mode, KFENCE monitors all memory that meets the slab type filtering conditions.

    • fault: This parameter is introduced in kernel version 5.10.134-16. Default value: report. If you set the fault parameter to panic, downtime occurs on the instance on which an issue is detected to preserve the core dump file generated when the issue occurred.

  2. Restart the operating system to allow the configurations to take effect.

    For more information, see Restart an instance.

Use a script to enable KFENCE for the x86 or Arm architecture

Note
  • After you run a script to enable KFENCE, KFENCE cannot detect the memory pollution issues that may occur during kernel startup.

  • If you want to change the value of the num_objects or sample_interval parameter after you enable KFENCE, you must first disable KFENCE.

Run the following commands to enable KFENCE:

sudo sh -c 'echo 1000000 > /sys/module/kfence/parameters/num_objects'
sudo sh -c 'echo -1 > /sys/module/kfence/parameters/sample_interval'
sudo sh -c 'echo panic > /sys/module/kfence/parameters/fault'
  • num_objects: Determines the size of the KFENCE pool. The memory consumed is ⌈num_objects/131071⌉ GiB. The ⌈⌉ symbols indicate rounding up to the nearest integer.

    Note

    We recommend that you set the num_objects parameter to 10% of the maximum available memory. For example, if you set the num_objects parameter to 1,000,000, KFENCE can consume up to 8 GiB of memory, which is calculated using the following formula: ⌈ 1,000,000/131071 ⌉ GiB = 8 GiB.

  • sample_interval: The value can be one of the following.

    • 0: KFENCE is disabled and does not monitor memory.

    • Positive number: the sampling interval in milliseconds. For example, a value of 100 specifies that KFENCE monitors the allocated memory every 100 milliseconds.

    • Negative number: Specifies full mode. In this mode, KFENCE monitors all memory that meets the slab type filtering conditions.

  • fault: This parameter is introduced in kernel version 5.10.134-16. Default value: report. If you set the fault parameter to panic, downtime occurs on the instance on which an issue is detected to preserve the core dump file generated when the issue occurred.

    Note

    If your kernel version is earlier than 5.10.134-16, an error message is reported when you run the preceding command. The error does not affect KFENCE. You can ignore the error message.

View results

After KFENCE detects memory pollution issues, you can view the number of issues and detailed error messages.

  • View the number of detected problems.

    sudo cat /sys/kernel/debug/kfence/stats

    The following figure shows the command output, which indicates that the total bugs count increases.

    image.png

  • View the details of error messages.

    dmesg | grep -i kfence

    The following figure shows the command output, which indicates that one error message is returned.

    image.png

Disable KFENCE

  • Run the following command to disable KFENCE:

    sudo bash -c 'echo 0 > /sys/module/kfence/parameters/sample_interval'

    After you disable KFENCE, KFENCE no longer detects memory allocation issues. When all monitored memory in the pool is released, KFENCE returns the memory to the kernel buddy systems at a granularity of 1 GiB.

  • In scenarios in which KFENCE is started by adding parameters to the boot commandline parameter list, you can run the following command to remove the parameters. Then, KFENCE is not automatically enabled the next time the system starts.

    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --remove-args="kfence.sample_interval"

FAQ

  • What are the impacts of KFENCE on memory and performance?

    • Impacts on memory

      KFENCE trades many memory overheads for less performance interference and consumes a high amount of memory. If the restart-triggered sampling mode (supported by the Linux community) is used, you can set the num_objects parameter to a smaller value to conserve memory. If the full mode is used or KFENCE is dynamically enabled, GiB-level memory is consumed. Exercise caution when you use small-memory machines.

    • Impacts on performance

      • In sampling mode, the performance is less affected.

      • In full mode, the impacts on the performance are acceptable if memory that meets a specific condition is monitored. For example, memory of a specific slab type is monitored.

      Note
      • We recommend that you perform a phased test based on the actual business scenario to observe the impacts of enabling KFENCE on the actual business performance and then determine the subsequent deployment.

      • Using full mode and comprehensive monitoring during offline debugging has a significant impact on performance and memory. However, this performance impact is not a concern because this scenario is typically used to pinpoint issues.

  • What is the difference between KFENCE and Kernel Address Sanitizer (KASAN)?

    KFENCE and KASAN are built-in Linux kernel tools that detect memory pollution. Alibaba Cloud enhanced KFENCE in kernel version 5.10. KFENCE can be enabled and disabled in a more flexible manner, supports sampling, and can run in an online business environment. The following section describes the functional differences between KFENCE and KASAN:

    • KFENCE supports monitoring of slabs up to 4 KiB in size, such as kmalloc-4k and order-0 pages. KASAN can monitor more types of memory, including memory of all types of slabs, pages of memory, stack memory, and global memory.

    • KFENCE has a higher success rate than KASAN in detecting abnormal memory behaviors within the monitoring range.

    • KFENCE has more memory overheads than KASAN. However, KFENCE has less impacts on service performance than KASAN.

    In most cases, we recommend that you do not use KFENCE and KASAN at the same time. KFENCE takes over the monitoring objects of KASAN.

  • How stable is KFENCE?

    A known issue exists in kernel version 5.10.134-15 and earlier. When KFENCE monitors memory of order-0 pages and slabs, downtime may occur in specific scenarios. To prevent this issue, run the following command to disable KFENCE from monitoring memory of order-0 pages:

    sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="kfence.order0_page=0"