Enable container memory QoS (Quality of Service) in an ACK cluster to improve application performance-Container Service for Kubernetes(ACK)-阿里云帮助中心

Use ack-koordinator to prioritize memory for latency-sensitive containers and reduce OOM errors during contention.

Note

Before using memory QoS, read Pod Quality of Service Classes and Assign Memory Resources to Containers and Pods in the Kubernetes documentation.

How it works

Kubernetes assigns each pod a memory request and limit. Two failure modes occur under memory pressure:

Container-level pressure: When a container's memory usage (including page cache) nears its limit, the OS triggers memcg-level direct reclamation, blocking processes. If allocation outpaces reclamation, an OOM error terminates the pod.
Node-level pressure: When total container memory limits exceed node physical memory, the kernel reclaims memory across containers indiscriminately, degrading performance and potentially triggering OOM errors.

ack-koordinator addresses both by configuring the memory control group (memcg) per container, using three Alibaba Cloud Linux (Alinux) kernel features:

Memcg QoS: locks a minimum amount of memory so high-priority containers retain their working set
Memcg backend asynchronous reclamation: proactively reclaims memory before the limit is reached, avoiding blocking direct reclamation
Memcg global minimum watermark rating: adjusts the per-container reclamation threshold so latency-sensitive (LS) containers are reclaimed last

This produces fairer memory distribution and lower latency during overcommitment.

Advantages over open-source Kubernetes memory QoS

The upstream Kubernetes memory QoS feature (Kubernetes 1.22+) supports only cgroup v2, requires manual kubelet configuration, and lacks per-pod or per-namespace granularity.

ack-koordinator improves on the upstream implementation in two ways:

Broader kernel compatibility: Supports cgroup v1 and v2, backed by Alinux kernel features such as memcg backend asynchronous reclamation and minimum watermark rating. See Overview of kernel features and interfaces.
Fine-grained configuration: Configure memory QoS per pod, namespace, or cluster through pod annotations or ConfigMaps.

Configuration mechanism

ack-koordinator uses four cgroup parameters for memory QoS. Each maps to configuration options in Advanced parameters:

cgroup parameter	Controls	Configured by
`memory.limit_in_bytes`	Hard upper limit for the container	Kubernetes (from `limits.memory`)
`memory.high`	Throttling threshold — reclamation starts here	`throttlingPercent`
`memory.wmark_high`	Async reclamation trigger	`wmarkRatio`
`memory.min`	Unreclaimable memory floor	`minLimitPercent` / `lowLimitPercent`

Configuration priority

When multiple configuration sources apply to the same pod, ack-koordinator uses the following priority order (highest first):

Pod annotation (koordinator.sh/memoryQOS)
Namespace-level ConfigMap (ack-slo-pod-config)
Cluster-level ConfigMap (ack-slo-config)

QoS class mapping

If a pod lacks the koordinator.sh/qosClass label, ack-koordinator maps Kubernetes QoS classes automatically:

Kubernetes QoS class	koordinator QoS class
Guaranteed	Default memory QoS settings
Burstable	LS (latency-sensitive)
BestEffort	BE (best-effort)

Prerequisites

Before you begin, make sure you have:

An ACK cluster running Kubernetes 1.18+. See Manually update ACK clusters.
Alinux as the node OS. Some advanced parameters depend on Alinux kernel features. See Advanced parameters.
ack-koordinator 0.8.0+ installed. See ack-koordinator.

Enable memory QoS for a specific pod

Add the following annotation to the pod spec:

annotations:
  # Enable memory QoS with recommended settings
  koordinator.sh/memoryQOS: '{"policy": "auto"}'
  # Disable memory QoS
  # koordinator.sh/memoryQOS: '{"policy": "none"}'

Enable memory QoS for a cluster

Use the ack-slo-config ConfigMap to apply memory QoS to all pods in the cluster.

Create configmap.yaml with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-slo-config
  namespace: kube-system
data:
  resource-qos-config: |-
    {
      "clusterStrategy": {
        "lsClass": {
          "memoryQOS": {
            "enable": true
          }
        },
        "beClass": {
          "memoryQOS": {
            "enable": true
          }
        }
      }
    }

Set each pod's QoS class with the koordinator.sh/qosClass label:

apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
  labels:
    koordinator.sh/qosClass: 'LS'

Apply the ConfigMap:
- If ack-slo-config already exists in kube-system, update it to preserve other settings: ``bash kubectl patch cm -n kube-system ack-slo-config --patch "$(cat configmap.yaml)" ``
- If it does not exist, create it: ``bash kubectl apply -f configmap.yaml ``
(Optional) Configure advanced parameters.

Enable memory QoS for a namespace

Use the ack-slo-pod-config ConfigMap to enable or disable memory QoS for pods in specific namespaces.

Create ack-slo-pod-config.yaml with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-slo-pod-config
  namespace: kube-system
data:
  memory-qos: |
    {
      "enabledNamespaces": ["allow-ns"],
      "disabledNamespaces": ["block-ns"]
    }

Replace allow-ns and block-ns with the actual namespace names.

Apply the ConfigMap:

kubectl patch cm -n kube-system ack-slo-pod-config --patch "$(cat ack-slo-pod-config.yaml)"

(Optional) Configure advanced parameters.

Example: Redis under memory overcommitment

This example shows how memory QoS reduces Redis latency and increases throughput under memory overcommitment. The test uses:

An ACK Pro cluster with two nodes (8 vCPUs, 32 GB each)
One node for Redis, the other for the stress test

Run the test

Create redis-demo.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-demo-config
data:
  redis-config: |
    appendonly yes
    appendfsync no
---
apiVersion: v1
kind: Pod
metadata:
  name: redis-demo
  labels:
    koordinator.sh/qosClass: 'LS'
  annotations:
    koordinator.sh/memoryQOS: '{"policy": "auto"}'
spec:
  containers:
  - name: redis
    image: redis:5.0.4
    command:
      - redis-server
      - "/redis-master/redis.conf"
    env:
    - name: MASTER
      value: "true"
    ports:
    - containerPort: 6379
    resources:
      limits:
        cpu: "2"
        memory: "6Gi"
      requests:
        cpu: "2"
        memory: "2Gi"
    volumeMounts:
    - mountPath: /redis-master-data
      name: data
    - mountPath: /redis-master
      name: config
  volumes:
    - name: data
      emptyDir: {}
    - name: config
      configMap:
        name: redis-demo-config
        items:
        - key: redis-config
          path: redis.conf
  nodeName: # Set to the name of the node running Redis.
---
apiVersion: v1
kind: Service
metadata:
  name: redis-demo
spec:
  ports:
  - name: redis-port
    port: 6379
    protocol: TCP
    targetPort: 6379
  selector:
    name: redis-demo
  type: ClusterIP

Deploy Redis:
```
kubectl apply -f redis-demo.yaml
```

Simulate memory overcommitment with the Stress tool. Create stress-demo.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: stress-demo
  labels:
    koordinator.sh/qosClass: 'BE'
  annotations:
    koordinator.sh/memoryQOS: '{"policy": "auto"}'
spec:
  containers:
    - args:
        - '--vm'
        - '2'
        - '--vm-bytes'
        - 11G
        - '-c'
        - '2'
        - '--vm-hang'
        - '2'
      command:
        - stress
      image: polinux/stress
      imagePullPolicy: Always
      name: stress
  restartPolicy: Always
  nodeName: # Set to the same node as redis-demo.

Deploy the stress workload:
```
kubectl apply -f stress-demo.yaml
```
Verify the global minimum watermark before running the benchmark.
Important
In memory overcommitment scenarios, a low global minimum watermark causes the OOM killer to run before memory reclamation. For a 32 GB node, set this value to at least 4,000,000 KB.
```
cat /proc/sys/vm/min_free_kbytes
```
Expected output:
```
4000000
```

Deploy memtier-benchmark to send requests to the Redis pod:

apiVersion: v1
kind: Pod
metadata:
  labels:
    name: memtier-demo
  name: memtier-demo
spec:
  containers:
    - command:
        - memtier_benchmark
        - '-s'
        - 'redis-demo'
        - '--data-size'
        - '200000'
        - "--ratio"
        - "1:4"
      image: 'redislabs/memtier_benchmark:1.3.0'
      name: memtier
  restartPolicy: Never
  nodeName: # Set to the name of the node sending requests.

Check the benchmark results:
```
kubectl logs -f memtier-demo
```

To compare, disable memory QoS on both pods and repeat the test:

apiVersion: v1
kind: Pod
metadata:
  name: redis-demo
  labels:
    koordinator.sh/qosClass: 'LS'
  annotations:
    koordinator.sh/memoryQOS: '{"policy": "none"}'
spec:
  ...
---
apiVersion: v1
kind: Pod
metadata:
  name: stress-demo
  labels:
    koordinator.sh/qosClass: 'BE'
  annotations:
    koordinator.sh/memoryQOS: '{"policy": "none"}'

Test results

Important

Results are for reference only. Actual values depend on cluster configuration and workload.

Metric	Memory QoS disabled	Memory QoS enabled
`Latency-avg`	51.32 ms	47.25 ms
`Throughput-avg`	149.0 MB/s	161.9 MB/s

Enabling memory QoS reduced Redis latency by 7.9% and increased throughput by 8.7% under memory overcommitment.

Advanced parameters

Configure these parameters in pod annotations or the ack-slo-config ConfigMap. Pod annotations take precedence.

Note

The Annotation and ConfigMap columns indicate whether the parameter is configurable through annotations and the ConfigMap. indicates supported and indicates not supported.

Parameter	Type	Value range	Description	Pod annotation	ConfigMap
`enable`	Boolean	- `true` - `false`	- `true`: enables memory QoS for all containers with recommended memcg settings. - `false`: disables memory QoS for all containers and restores default memcg settings.
`policy`	String	- `auto` - `default` - `none`	- `auto`: enables memory QoS with recommended settings, overriding the ack-slo-pod-config ConfigMap. - `default`: inherits settings from the ack-slo-pod-config ConfigMap. - `none`: disables memory QoS and restores default memcg settings, overriding the ack-slo-pod-config ConfigMap.
`minLimitPercent`	Int	0~100	Unit: %. Default value: `0` (disabled). Unreclaimable proportion of the pod memory request. Use this to cache files for page-cache-sensitive applications. See Memcg QoS feature of the cgroup v1 interface. Formula: `memory.min = request × minLimitPercent/100`. For example, with `Memory Request=100MiB` and `minLimitPercent=100`, the value of `memory.min`is `104857600`.
`lowLimitPercent`	Int	0~100	Unit: %. Default value: `0` (disabled). Relatively unreclaimable proportion of the pod memory request. See Memcg QoS feature of the cgroup v1 interface. Formula: `memory.low = request × lowLimitPercent/100`. For example, with `Memory Request=100MiB` and `lowLimitPercent=100`, the value of `memory.low` is `104857600`.
`throttlingPercent`	Int	0~100	Unit: %. Default value: `0` (disabled). Memory throttling threshold as a ratio of container usage to limit. When exceeded, memory is reclaimed. Prevents cgroup-level OOM in overcommitment scenarios. See Memcg QoS feature of the cgroup v1 interface. Formula: `memory.high = limit × throttlingPercent/100`. For example, with `Memory Limit=100MiB` and `throttlingPercent=80`, the value of `memory.high` is `83886080(80 MiB)`.
`wmarkRatio`	Int	0~100	Unit: %. Default value: `95`. `0` disables this parameter. When usage exceeds the threshold, memcg backend asynchronous reclamation triggers. Asynchronous reclamation threshold: ratio of usage to limit, or usage to `memory.high`. See Memcg backend asynchronous reclaim. If throttlingPercent is disabled, the formula is: Value of memory.wmark_high = Memory limit × wmarkRatio/100. If throttlingPercent is enabled, the formula is: `memory.wmark_high = memory.high × wmarkRatio/100`. For example, with `Memory Limit=100MiB` and `wmarkRatio=95,throttlingPercent=80`: `memory.high`is `83886080 (80 MiB)`, `memory.wmark_ratio` is `95`, and `memory.wmark_high` is `79691776 (76 MiB)`.
`wmarkMinAdj`	Int	-25~50	Unit: %. The default value is `-25` for the `LS` QoS class and `50` for the `BE` QoS class. `0` disables this parameter. Adjusts the global minimum watermark per container. Negative values postpone reclamation; positive values expedite it. See Memcg global minimum watermark rating. For example, an LS pod defaults to `memory.wmark_min_adj=-25`, decreasing the minimum watermark by 25%.

FAQ

Is the memory QoS configuration from ack-slo-manager still valid after upgrading to ack-koordinator?

Yes. ack-koordinator is backward compatible with the annotation-based protocol used in ack-slo-manager 0.8.0 and earlier:

alibabacloud.com/qosClass — sets the QoS class
alibabacloud.com/memoryQOS — configures memory QoS

The following table shows which protocols each version supports:

Component version	alibabacloud.com protocol	koordinator.sh protocol
≥ 0.3.0 and < 0.8.0	✓	×
≥ 0.8.0	✓	✓

Support for the alibabacloud.com protocol ended on July 30, 2023. Migrate to the koordinator.sh protocol.

Billing

No fee is charged for installing or using the ack-koordinator component. However, costs may apply in the following cases:

Node resource usage: ack-koordinator is a non-managed component that runs on worker nodes. You can configure the resource requests for each module at install time.
Prometheus metrics: If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, the metrics are billed as custom metrics. Before enabling this feature, review the Managed Service for Prometheus billing rules. To monitor usage, see Query the amount of observable data and bills.

Next steps

Overview of kernel features and interfaces — kernel features required by ACK memory QoS
Memcg QoS feature of the cgroup v1 interface
Memcg backend asynchronous reclamation
Memcg global minimum watermark rating
Enable CPU QoS for containers — limit and evict reclaimed resources to protect latency-sensitive workloads
Enable resource isolation based on the L3 cache and MBA
KEP-2570: Support Memory QoS with cgroups v2