关于使用Terway时部分Linux节点偶发CPU利用率异常升高问题的公告

Linux社区内核补丁对eBPF程序的影响,若集群使用了Terway容器网络插件,并开启了基于eBPF技术的DataPath V2 (含开启NetworkPolicy)功能,可能出现节点CPU利用率偶发异常升高的情况。

影响范围

同时满足下列情况的节点将会受到影响:

  • 节点操作系统为Alibaba Cloud Linux3ContainerOS,内核版本为介于5.10.134-155.10.134-19.1之间的版本。

  • 集群使用Terway容器网络插件,且开启了DataPath V2模式(开启NetworkPolicy支持时自动开启DataPath V2)。

解决方案

检测节点是否需要修复

首先需要判断集群中节点是否需要修复。通过ECS云助手,在集群中的节点上批量执行下方命令:

检查节点

#!/bin/bash

# Script functionality:
# 1. Check if /sys/fs/bpf/tc/globals/cilium_ct4_global exists
# 2. Only proceed if kernel release contains 'al8' or 'lifsea8'
# 3. Check if kernel version is in the range 5.10.134-15 to 5.10.134-19.1 (inclusive)
# 4. Check if kpatch_22519882 module is loaded
# 5. Behavior differs by kernel type:
#    - al8: allow fix (with -y) or prompt to install
#    - lifsea8: do NOT install; instead, warn user to upgrade to ContainerOS 3.5.1
# 6. Other kernels (e.g., el8) are skipped.

set -euo pipefail

# Default: dry-run mode (no install)
INSTALL=false

# Parse arguments
while getopts "y" opt; do
  case $opt in
    y)
      INSTALL=true
      ;;
    \?)
      echo "Usage: $0 [-y]" >&2
      exit 1
      ;;
  esac
done


# Define critical paths to check (in order of preference)
declare -a CT4_GLOBAL_PATHS=(
    "/sys/fs/bpf/tc/globals/cilium_ct4_global"
    "/.lifsea/rootfs/sys/fs/bpf/tc/globals/cilium_ct4_global"
)

# Variable to store the found path
CT4_GLOBAL_PATH=""

# Check each path in order
for path in "${CT4_GLOBAL_PATHS[@]}"; do
    if [[ -e "$path" ]]; then
        CT4_GLOBAL_PATH="$path"
        break
    fi
done

# If no valid path found, skip check
if [[ -z "$CT4_GLOBAL_PATH" ]]; then
    echo "Warning: None of the expected paths for cilium_ct4_global exist. Skipping check."
    exit 0
fi

echo "Detected $CT4_GLOBAL_PATH, proceeding with kernel version check..."

# Get full kernel release
KERNEL_RELEASE=$(uname -r)
echo "Current kernel release: $KERNEL_RELEASE"

# Extract the base version (e.g., 5.10.134-19.1)
if [[ $KERNEL_RELEASE =~ 5\.10\.134-[0-9]+(\.[0-9]+)? ]]; then
    KERNEL_VERSION="${BASH_REMATCH[0]}"
else
    echo "Error: Unable to extract kernel version from $KERNEL_RELEASE" >&2
    exit 1
fi

# Determine kernel type
if [[ $KERNEL_RELEASE == *"al8"* ]]; then
    KERNEL_TYPE="al8"
elif [[ $KERNEL_RELEASE == *"lifsea8"* ]]; then
    KERNEL_TYPE="lifsea8"
else
    echo "Kernel type not supported (neither al8 nor lifsea8), skipping."
    exit 0
fi

echo "Detected kernel type: $KERNEL_TYPE"

# Define version range (inclusive): 5.10.134-15 <= version <= 5.10.134-19.1
MIN_VERSION="5.10.134-15"
MAX_VERSION="5.10.134-19.1"

# Version comparison functions using natural sort
version_ge() {
    [[ "$1" == "$(printf '%s\n' "$1" "$2" | sort -V | tail -n1)" ]]
}

version_le() {
    [[ "$1" == "$(printf '%s\n' "$1" "$2" | sort -V | head -n1)" ]]
}

# Check version range
if ! version_ge "$KERNEL_VERSION" "$MIN_VERSION"; then
    echo "Kernel version $KERNEL_VERSION is below $MIN_VERSION, skipping."
    exit 0
fi

if ! version_le "$KERNEL_VERSION" "$MAX_VERSION"; then
    echo "Kernel version $KERNEL_VERSION is above $MAX_VERSION, skipping."
    exit 0
fi

echo "Kernel version $KERNEL_VERSION is within range $MIN_VERSION ~ $MAX_VERSION."

# Check if kpatch_22519882 module is loaded
if lsmod | grep -q "kpatch_22519882" >/dev/null; then
    echo "kpatch_22519882 module is already loaded, no action needed."
    exit 0
fi

# If we reach here, the hotfix is missing
HOTFIX_PKG="kernel-hotfix-22519882-$KERNEL_VERSION"

case "$KERNEL_TYPE" in
    "al8")
        echo "kpatch_22519882 module is not loaded. Hotfix package '$HOTFIX_PKG' needs to be installed."

        if [[ "$INSTALL" == true ]]; then
            echo "Installing $HOTFIX_PKG..."
            if yum install -y "$HOTFIX_PKG"; then
                echo "Installation successful."
                exit 0
            else
                echo "Installation failed. Please check yum repository or permissions." >&2
                exit 1
            fi
        else
            echo "Running in dry-run mode. Use -y to install the hotfix."
            exit 1
        fi
        ;;
    "lifsea8")
        echo "WARNING: This is a lifsea8 kernel ($KERNEL_RELEASE)." >&2
        echo "The issue cannot be fixed by hotpatch. You must upgrade to ContainerOS 3.5.1 or later." >&2
        echo "See official documentation or contact support for upgrade instructions." >&2
        exit 1
        ;;
esac

如节点需要进行修复,则返回如下结果:

Alibaba Cloud Linux3节点

Detected /sys/fs/bpf/tc/globals/cilium_ct4_global, proceeding with kernel version check...
Current kernel release: 5.10.134-18.al8.x86_64
Detected kernel type: al8
Kernel version 5.10.134-18 is within range 5.10.134-15~5.10.134-19.1.
kpatch_22519882 module is not loaded. Hotfix package 'kernel-hotfix-22519882-5.10.134-18' needs to be installed.
Running in dry-run mode.Use2yto install the hotfix.

ContainerOS节点

Detected /sys/fs/bpf/tc/globals/cilium_ct4_global, proceeding with kernel version check...
Current kernel release: 5.10.134-18.0.1.lifsea8.x86_64
Detected kernel type: lifsea8
Kernel version 5.10.134-18.0 is within range 5.10.134-15 ~ 5.10.134-19.1.
WARNING: This is a lifsea8 kernel (5.10.134-18.0.1.lifsea8.x86_64).
The issue cannot be fixed by hotpatch. You must upgrade to ContainerOS 3.5.1 or later.
See official documentation or contact support for upgrade instructions.

返回下列结果时,则表明节点无需修复:

  • 节点未启用Terway eBPF功能,跳过补丁安装。

    Path /sys/fs/bpf/tc/globals/cilium_ct4_global does not exist, skipping check.
  • 已经安装过补丁,无需再次安装。

    ...
    kpatch_22519882 module is already loaded, no actions needed.

执行修复操作

根据操作系统为ContainerOSAlibaba Cloud Linux3,需执行不同的修复操作:

ContainerOS节点修复

ContainerOS节点,该问题将在近期发布的ContainerOS 3.5.1版本中自动修复,请关注ContainerOS镜像发布记录,在3.5.1发布后参照更换操作系统升级版本。

Alibaba Cloud Linux3节点修复

为新扩容节点添加自定义数据

新建节点池或编辑已有节点池时,在实例预自定义数据中添加下方脚本,即可使新扩容的节点在启动后自动安装内核热补丁。具体操作,请参见创建和管理节点池

image.png

实例预自定义数据

#!/bin/bash

# Function: check_and_apply_hotfix
# Purpose: Checks the current kernel version and type, verifies if it's within the supported range,
#          and installs the hotfix package (kpatch_22519882) if not already applied.
check_and_apply_hotfix() {
    local KERNEL_RELEASE
    local KERNEL_VERSION
    local KERNEL_TYPE
    local MIN_VERSION="5.10.134-15"
    local MAX_VERSION="5.10.134-19.1"
    local HOTFIX_PKG

    # Get the full kernel release string
    KERNEL_RELEASE=$(uname -r)
    echo "Current kernel release: $KERNEL_RELEASE"

    # Extract base kernel version (e.g., 5.10.134-19.1)
    if [[ $KERNEL_RELEASE =~ 5\.10\.134-[0-9]+(\.[0-9]+)? ]]; then
        KERNEL_VERSION="${BASH_REMATCH[0]}"
    else
        echo "Error: Unable to extract kernel version from $KERNEL_RELEASE" >&2
        return 1
    fi

    # Detect kernel type (al8 or lifsea8)
    if [[ $KERNEL_RELEASE == *"al8"* ]]; then
        KERNEL_TYPE="al8"
    elif [[ $KERNEL_RELEASE == *"lifsea8"* ]]; then
        KERNEL_TYPE="lifsea8"
    else
        echo "Kernel type not supported (neither al8 nor lifsea8), skipping."
        return 0
    fi

    echo "Detected kernel type: $KERNEL_TYPE"

    # Version comparison functions using natural (version) sort
    version_ge() {
        [[ "$1" == "$(printf '%s\n%s' "$1" "$2" | sort -V | tail -n1)" ]]
    }

    version_le() {
        [[ "$1" == "$(printf '%s\n%s' "$1" "$2" | sort -V | head -n1)" ]]
    }

    # Check if kernel version is >= minimum supported version
    if ! version_ge "$KERNEL_VERSION" "$MIN_VERSION"; then
        echo "Kernel version $KERNEL_VERSION is below $MIN_VERSION, skipping."
        return 0
    fi

    # Check if kernel version is <= maximum supported version
    if ! version_le "$KERNEL_VERSION" "$MAX_VERSION"; then
        echo "Kernel version $KERNEL_VERSION is above $MAX_VERSION, skipping."
        return 0
    fi

    echo "Kernel version $KERNEL_VERSION is within range $MIN_VERSION ~ $MAX_VERSION."

    # Check if the kpatch module is already loaded
    if lsmod | grep -q "kpatch_22519882" >/dev/null; then
        echo "kpatch_22519882 module is already loaded, no action needed."
        return 0
    fi

    # If module is not loaded, prepare the hotfix package name
    HOTFIX_PKG="kernel-hotfix-22519882-$KERNEL_VERSION"

    # Handle installation based on kernel type
    case "$KERNEL_TYPE" in
        "al8")
            echo "kpatch_22519882 module is not loaded. Hotfix package '$HOTFIX_PKG' needs to be installed."
            echo "Installing $HOTFIX_PKG..."

            if yum install -y "$HOTFIX_PKG"; then
                echo "Installation successful."
                return 0
            else
                echo "Installation failed. Please check yum repository or permissions." >&2
                return 1
            fi
            ;;
        "lifsea8")
            echo "Kernel type 'lifsea8' is recognized but not currently supported for automatic installation."
            return 0
            ;;
        *)
            echo "Unknown kernel type: $KERNEL_TYPE"
            return 1
            ;;
    esac
}

# =====================
# Call the function
# =====================
check_and_apply_hotfix
为已有节点安装补丁

对已有的Alibaba Cloud Linux3节点,通过ECS云助手,在需要修复的节点上执行下方命令:

修复节点

#!/bin/bash

# Script functionality:
# 1. Check if /sys/fs/bpf/tc/globals/cilium_ct4_global exists
# 2. Only proceed if kernel release contains 'al8' or 'lifsea8'
# 3. Check if kernel version is in the range 5.10.134-15 to 5.10.134-19.1 (inclusive)
# 4. Check if kpatch_22519882 module is loaded
# 5. Behavior differs by kernel type:
#    - al8: allow fix (with -y) or prompt to install
#    - lifsea8: do NOT install; instead, warn user to upgrade to ContainerOS 3.5.1
# 6. Other kernels (e.g., el8) are skipped.

set -euo pipefail

# Default: dry-run mode (no install)
INSTALL=true

# Parse arguments
while getopts "y" opt; do
  case $opt in
    y)
      INSTALL=true
      ;;
    \?)
      echo "Usage: $0 [-y]" >&2
      exit 1
      ;;
  esac
done


# Define critical paths to check (in order of preference)
declare -a CT4_GLOBAL_PATHS=(
    "/sys/fs/bpf/tc/globals/cilium_ct4_global"
    "/.lifsea/rootfs/sys/fs/bpf/tc/globals/cilium_ct4_global"
)

# Variable to store the found path
CT4_GLOBAL_PATH=""

# Check each path in order
for path in "${CT4_GLOBAL_PATHS[@]}"; do
    if [[ -e "$path" ]]; then
        CT4_GLOBAL_PATH="$path"
        break
    fi
done

# If no valid path found, skip check
if [[ -z "$CT4_GLOBAL_PATH" ]]; then
    echo "Warning: None of the expected paths for cilium_ct4_global exist. Skipping check."
    exit 0
fi

echo "Detected $CT4_GLOBAL_PATH, proceeding with kernel version check..."

# Get full kernel release
KERNEL_RELEASE=$(uname -r)
echo "Current kernel release: $KERNEL_RELEASE"

# Extract the base version (e.g., 5.10.134-19.1)
if [[ $KERNEL_RELEASE =~ 5\.10\.134-[0-9]+(\.[0-9]+)? ]]; then
    KERNEL_VERSION="${BASH_REMATCH[0]}"
else
    echo "Error: Unable to extract kernel version from $KERNEL_RELEASE" >&2
    exit 1
fi

# Determine kernel type
if [[ $KERNEL_RELEASE == *"al8"* ]]; then
    KERNEL_TYPE="al8"
elif [[ $KERNEL_RELEASE == *"lifsea8"* ]]; then
    KERNEL_TYPE="lifsea8"
else
    echo "Kernel type not supported (neither al8 nor lifsea8), skipping."
    exit 0
fi

echo "Detected kernel type: $KERNEL_TYPE"

# Define version range (inclusive): 5.10.134-15 <= version <= 5.10.134-19.1
MIN_VERSION="5.10.134-15"
MAX_VERSION="5.10.134-19.1"

# Version comparison functions using natural sort
version_ge() {
    [[ "$1" == "$(printf '%s\n' "$1" "$2" | sort -V | tail -n1)" ]]
}

version_le() {
    [[ "$1" == "$(printf '%s\n' "$1" "$2" | sort -V | head -n1)" ]]
}

# Check version range
if ! version_ge "$KERNEL_VERSION" "$MIN_VERSION"; then
    echo "Kernel version $KERNEL_VERSION is below $MIN_VERSION, skipping."
    exit 0
fi

if ! version_le "$KERNEL_VERSION" "$MAX_VERSION"; then
    echo "Kernel version $KERNEL_VERSION is above $MAX_VERSION, skipping."
    exit 0
fi

echo "Kernel version $KERNEL_VERSION is within range $MIN_VERSION ~ $MAX_VERSION."

# Check if kpatch_22519882 module is loaded
if lsmod | grep -q "kpatch_22519882" >/dev/null; then
    echo "kpatch_22519882 module is already loaded, no action needed."
    exit 0
fi

# If we reach here, the hotfix is missing
HOTFIX_PKG="kernel-hotfix-22519882-$KERNEL_VERSION"

case "$KERNEL_TYPE" in
    "al8")
        echo "kpatch_22519882 module is not loaded. Hotfix package '$HOTFIX_PKG' needs to be installed."

        if [[ "$INSTALL" == true ]]; then
            echo "Installing $HOTFIX_PKG..."
            if yum install -y "$HOTFIX_PKG"; then
                echo "Installation successful."
                exit 0
            else
                echo "Installation failed. Please check yum repository or permissions." >&2
                exit 1
            fi
        else
            echo "Running in dry-run mode. Use -y to install the hotfix."
            exit 1
        fi
        ;;
    "lifsea8")
        echo "WARNING: This is a lifsea8 kernel ($KERNEL_RELEASE)." >&2
        echo "The issue cannot be fixed by hotpatch. You must upgrade to ContainerOS 3.5.1 or later." >&2
        echo "See official documentation or contact support for upgrade instructions." >&2
        exit 1
        ;;
esac

预期输出如下,表明补丁安装完成:

......
Total                                                                                                                                                                                                                                                                                    1.0 MB/s |  52 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                                                                                                 1/1 
  Installing       : kpatch-dnf-0.9.7_0.4-2.0.1.al8.noarch                                                                                                                                                                                                                                                           1/3 
  Running scriptlet: kpatch-dnf-0.9.7_0.4-2.0.1.al8.noarch                                                                                                                                                                                                                                                           1/3 
To enable automatic kpatch-patch subscription, run:
        $ dnf kpatch auto

  Installing       : kpatch-0.9.7-2.0.1.al8.noarch                                                                                                                                                                                                                                                                   2/3 
  Running scriptlet: kernel-hotfix-22519882-5.10.134-18-1.0-20250804154834.al8.x86_64                                                                                                                                                                                                                                3/3 
  Installing       : kernel-hotfix-22519882-5.10.134-18-1.0-20250804154834.al8.x86_64                                                                                                                                                                                                                                3/3 
  Running scriptlet: kernel-hotfix-22519882-5.10.134-18-1.0-20250804154834.al8.x86_64                                                                                                                                                                                                                                3/3 
Created symlink /etc/systemd/system/multi-user.target.wants/kpatch.service → /usr/lib/systemd/system/kpatch.service.
installing /var/khotfix/5.10.134-18.al8.x86_64/22519882/kpatch-22519882.ko (5.10.134-18.al8.x86_64)
loading patch module: /var/khotfix/5.10.134-18.al8.x86_64/22519882/kpatch-22519882.ko

  Verifying        : kpatch-0.9.7-2.0.1.al8.noarch                                                                                                                                                                                                                                                                   1/3 
  Verifying        : kpatch-dnf-0.9.7_0.4-2.0.1.al8.noarch                                                                                                                                                                                                                                                           2/3 
  Verifying        : kernel-hotfix-22519882-5.10.134-18-1.0-20250804154834.al8.x86_64                                                                                                                                                                                                                                3/3 

Installed:
  kernel-hotfix-22519882-5.10.134-18-1.0-20250804154834.al8.x86_64                                                            kpatch-0.9.7-2.0.1.al8.noarch                                                            kpatch-dnf-0.9.7_0.4-2.0.1.al8.noarch                                                           

Complete!
Installation successful.

相关文档

Alibaba Cloud Linux3 eBPF程序使用LRU hash导致CPU利用率异常升高问题处理