Optimization solution for network packet loss after container memory exhaustion on Alibaba Cloud Linux 3

更新时间:
复制 MD 格式

On Alibaba Cloud Linux 3, network packet loss can occur when a container's memory is exhausted. This document explains why this happens and provides two mitigation methods.

Problem description

When an application container's memory is exhausted on Alibaba Cloud Linux 3, the container may experience network packet loss, as shown in the following example:

Network packet loss metrics observed after container memory exhaustion

Cause

The Linux kernel allocates memory for network packets using a non-blocking approach within a software interrupt context. Because the kernel cannot wait for memory to become available in this context, it must either allocate immediately or fail.

When a container's memory is exhausted, the kernel does not reclaim the container's reclaimable cache in the software interrupt context. Because the allocation cannot block and no cache is reclaimed, the allocation fails, which causes packet loss.

Scope

All kernel versions of Alibaba Cloud Linux 3.

Solution

This is not a kernel issue and no patch is available. Use one of the following methods to address the problem.

Choose a solution

Method 1: Enable async background reclaimMethod 2: Upgrade to Alibaba Cloud Linux 4
How it worksAsynchronously reclaims the Memory Control Group (Memcg) cache before memory is exhaustedAllows the kernel to request memory from the system's reserved pool when a container's memory is exhausted
DisruptionNo OS upgrade requiredRequires upgrading the operating system
When to useContainer memory pressure is frequent but an OS upgrade is not feasibleLong-term fix when an OS upgrade is acceptable
LimitationCache reclaim may not converge fast enough under heavy memory pressureThe allocation succeeds only if the reserved memory is enough

Method 1: Enable asynchronous background reclaim for Memcg

Enable the asynchronous background reclaim for Memcg feature to dynamically reclaim cache before memory exhaustion occurs.

This feature reclaims cache at the Memcg level.

Method 2: Upgrade to Alibaba Cloud Linux 4

Upgrade to Alibaba Cloud Linux 4. In later versions of the Linux kernel, memory for network packets is still allocated in a non-blocking manner, but the kernel can request memory from the system's reserved pool. The allocation succeeds if the reserved memory is enough.