Use SysOM to troubleshoot container memory issues

更新时间:
复制 MD 格式

The opaque nature of the Container Service for Kubernetes (ACK) can make troubleshooting difficult. To address this, Alibaba Cloud Container Service for Kubernetes (ACK) introduces SysOM. SysOM provides container monitoring data from the operating system kernel layer, enhancing observability into container memory issues. This provides the transparency to view and diagnose issues at the container engine layer, helping smooth the migration to containers. This topic describes how to use SysOM to troubleshoot container memory issues.

Prerequisites

Billing

After ack-sysom-monitor is enabled, related components automatically send monitoring metrics to Managed Service for Prometheus. These metrics are billed as custom metrics.

Before enabling this feature, read the Billing overview to understand how custom metrics are charged. Fees vary based on cluster size and the number of applications running. To monitor and control resource usage, see View resource usage.

Background

Containerization is a best practice for enterprise IT architecture because it offers lower costs, higher efficiency, flexibility, and scalability.

However, containerization also introduces opacity at the container engine layer. This can lead to excessive memory consumption that exceeds limits and triggers OOM (Out of Memory) events.

To address this, the Alibaba Cloud Container Service for Kubernetes (ACK) and GuestOS operating system teams collaborated to provide precise memory control through kernel-level container monitoring, helping prevent these OOM issues.

Container memory composition

Container memory consists of application memory, kernel memory, and free memory.

Memory category

Memory subcategory

Description

Application memory

Application memory consists of the following components:

  • Anonymous memory (Anon): Memory not associated with a file, such as a process's heap, stack, or data segment. This includes heap memory allocated by brk() and mmap().

  • File cache: Memory used to cache data for file reads and writes. Frequently accessed cache is known as active file cache and is less likely to be reclaimed by the system.

  • Buffer: Memory used to store metadata for block devices or file systems.

  • Huge pages (HugeTLB): Memory allocated using huge pages technology.

Memory used by an application at runtime.

Kernel memory

Kernel memory consists of the following components:

  • Slab: A memory pool for caching kernel objects.

  • Vmalloc: A mechanism for allocating large, virtually contiguous memory areas.

  • allocpage: A mechanism for allocating local memory.

  • Others: Includes the kernel stack, page tables, reserved memory, and more.

Memory used by the operating system kernel.

Free memory

Not applicable.

Unused, available memory.

How it works

Kubernetes uses the working set to monitor and manage container memory usage. When a container's memory consumption exceeds its configured limit or a Node is under memory pressure, Kubernetes uses the working set to decide whether to evict or terminate the container. Monitoring a Pod's working set with SysOM provides more comprehensive and precise memory analysis capabilities. This helps operations and development teams quickly identify and resolve issues caused by a large working set, thereby improving container performance and stability.

A working set refers to the memory actively used by a container within a specific time frame—the memory required for its current operations. It is calculated using the formula: working set = InactiveAnon + ActiveAnon + ActiveFile. Here, InactiveAnon and ActiveAnon represent the total anonymous memory, while ActiveFile represents the size of the active file cache. By monitoring and analyzing this data, operations teams can manage resources more effectively and ensure application stability.

Use SysOM

SysOM provides kernel-level dashboards for Pods and Nodes, allowing you to monitor system-level metrics such as memory, network, and storage in real time. For detailed information about SysOM metrics, see SysOM kernel-level container monitoring.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Operations > Prometheus Monitoring.

  3. On the Prometheus Monitoring page, click the SysOM tab, and then click the SysOM - Pods tab to view the Pod memory data on the dashboard.

    Use the following formulas to analyze and identify potential memory black hole issues.

    Pod total memory = RSS + Cache ≈ inactive_anon+active_anon+inactive_file+active_file

    working set = inactive_anon + active_anon + active_file

    1. In the Pod Memory Monitor section, you can break down the total Pod memory into cache and RSS memory based on the formula. The cache is further broken down by proportion into active_file, inactive_file, and shmem (shared memory). The RSS memory is broken down by proportion into active_anon and inactive_anon.

      As shown in the following figure, inactive_anon memory accounts for the largest proportion.

      image

    2. In the Pod Resource Analysis section, use the Top tool to quickly locate the Pod that consumes the most InactiveAnon memory in the cluster.

      As shown in the following figure, the arms-prom Pod consumes the most memory.

      The table is sorted in descending order by InactiveAnon size, showing that the arms-prometheus Pod has an InactiveAnon size of 494 MiB, accounting for 99.4%, which is much higher than other Pods in the kube-system namespace.

    3. In the Pod Memory Details section, view the detailed memory composition of the Pod. Monitoring different memory components, such as Pod Cache, InactiveFile (inactive file memory), InactiveAnon (inactive anonymous memory), and Dirty Memory, helps you identify common Pod memory black hole issues.

      The table also displays the Namespace, Pod name, Usage (memory usage), Page fault delta, and Total page faults for each Pod. You can sort the table by any column header. By default, it is sorted by Usage in descending order.

  4. In the Pod File Cache section, investigate the cause of high cache memory usage.

    If a Pod's memory cache is large, it can increase the Pod's working set. This cached memory can become a memory black hole within the Pod's working set, impacting application performance.

    The Pod file cache table includes the Namespace, Pod name, Container, File, Memory cache size, and Total file size columns. By default, it is sorted by total file size in descending order, which helps you locate the Pod and the corresponding file with the largest cache consumption.

  5. Fix the memory black hole issue.

    After you identify a container memory black hole, you can use the fine-grained scheduling feature of ACK to resolve it. For more information, see Enable container memory QoS.

Related documents