Use ack-koordinator to prioritize memory for latency-sensitive containers and reduce OOM errors during contention.
How it works
Kubernetes assigns each pod a memory request and limit. Two failure modes occur under memory pressure:
-
Container-level pressure: When a container's memory usage (including page cache) nears its limit, the OS triggers memcg-level direct reclamation, blocking processes. If allocation outpaces reclamation, an OOM error terminates the pod.
-
Node-level pressure: When total container memory limits exceed node physical memory, the kernel reclaims memory across containers indiscriminately, degrading performance and potentially triggering OOM errors.
ack-koordinator addresses both by configuring the memory control group (memcg) per container, using three Alibaba Cloud Linux (Alinux) kernel features:
-
Memcg QoS: locks a minimum amount of memory so high-priority containers retain their working set
-
Memcg backend asynchronous reclamation: proactively reclaims memory before the limit is reached, avoiding blocking direct reclamation
-
Memcg global minimum watermark rating: adjusts the per-container reclamation threshold so latency-sensitive (LS) containers are reclaimed last
This produces fairer memory distribution and lower latency during overcommitment.
Advantages over open-source Kubernetes memory QoS
The upstream Kubernetes memory QoS feature (Kubernetes 1.22+) supports only cgroup v2, requires manual kubelet configuration, and lacks per-pod or per-namespace granularity.
ack-koordinator improves on the upstream implementation in two ways:
-
Broader kernel compatibility: Supports cgroup v1 and v2, backed by Alinux kernel features such as memcg backend asynchronous reclamation and minimum watermark rating. See Overview of kernel features and interfaces.
-
Fine-grained configuration: Configure memory QoS per pod, namespace, or cluster through pod annotations or ConfigMaps.
Configuration mechanism
ack-koordinator uses four cgroup parameters for memory QoS. Each maps to configuration options in Advanced parameters:
| cgroup parameter | Controls | Configured by |
|---|---|---|
memory.limit_in_bytes |
Hard upper limit for the container | Kubernetes (from limits.memory) |
memory.high |
Throttling threshold — reclamation starts here | throttlingPercent |
memory.wmark_high |
Async reclamation trigger | wmarkRatio |
memory.min |
Unreclaimable memory floor | minLimitPercent / lowLimitPercent |
Configuration priority
When multiple configuration sources apply to the same pod, ack-koordinator uses the following priority order (highest first):
-
Pod annotation (
koordinator.sh/memoryQOS) -
Namespace-level ConfigMap (
ack-slo-pod-config) -
Cluster-level ConfigMap (
ack-slo-config)
QoS class mapping
If a pod lacks the koordinator.sh/qosClass label, ack-koordinator maps Kubernetes QoS classes automatically:
| Kubernetes QoS class | koordinator QoS class |
|---|---|
| Guaranteed | Default memory QoS settings |
| Burstable | LS (latency-sensitive) |
| BestEffort | BE (best-effort) |
Prerequisites
Before you begin, make sure you have:
-
An ACK cluster running Kubernetes 1.18+. See Manually update ACK clusters.
-
Alinux as the node OS. Some advanced parameters depend on Alinux kernel features. See Advanced parameters.
-
ack-koordinator 0.8.0+ installed. See ack-koordinator.
Enable memory QoS for a specific pod
Add the following annotation to the pod spec:
annotations:
# Enable memory QoS with recommended settings
koordinator.sh/memoryQOS: '{"policy": "auto"}'
# Disable memory QoS
# koordinator.sh/memoryQOS: '{"policy": "none"}'
Enable memory QoS for a cluster
Use the ack-slo-config ConfigMap to apply memory QoS to all pods in the cluster.
-
Create
configmap.yamlwith the following content:apiVersion: v1 kind: ConfigMap metadata: name: ack-slo-config namespace: kube-system data: resource-qos-config: |- { "clusterStrategy": { "lsClass": { "memoryQOS": { "enable": true } }, "beClass": { "memoryQOS": { "enable": true } } } } -
Set each pod's QoS class with the
koordinator.sh/qosClasslabel:apiVersion: v1 kind: Pod metadata: name: pod-demo labels: koordinator.sh/qosClass: 'LS' -
Apply the ConfigMap:
-
If
ack-slo-configalready exists inkube-system, update it to preserve other settings: ``bash kubectl patch cm -n kube-system ack-slo-config --patch "$(cat configmap.yaml)"`` -
If it does not exist, create it: ``
bash kubectl apply -f configmap.yaml``
-
-
(Optional) Configure advanced parameters.
Enable memory QoS for a namespace
Use the ack-slo-pod-config ConfigMap to enable or disable memory QoS for pods in specific namespaces.
-
Create
ack-slo-pod-config.yamlwith the following content:apiVersion: v1 kind: ConfigMap metadata: name: ack-slo-pod-config namespace: kube-system data: memory-qos: | { "enabledNamespaces": ["allow-ns"], "disabledNamespaces": ["block-ns"] }Replace
allow-nsandblock-nswith the actual namespace names. -
Apply the ConfigMap:
kubectl patch cm -n kube-system ack-slo-pod-config --patch "$(cat ack-slo-pod-config.yaml)" -
(Optional) Configure advanced parameters.
Example: Redis under memory overcommitment
This example shows how memory QoS reduces Redis latency and increases throughput under memory overcommitment. The test uses:
-
An ACK Pro cluster with two nodes (8 vCPUs, 32 GB each)
-
One node for Redis, the other for the stress test
Run the test
-
Create
redis-demo.yaml:apiVersion: v1 kind: ConfigMap metadata: name: redis-demo-config data: redis-config: | appendonly yes appendfsync no --- apiVersion: v1 kind: Pod metadata: name: redis-demo labels: koordinator.sh/qosClass: 'LS' annotations: koordinator.sh/memoryQOS: '{"policy": "auto"}' spec: containers: - name: redis image: redis:5.0.4 command: - redis-server - "/redis-master/redis.conf" env: - name: MASTER value: "true" ports: - containerPort: 6379 resources: limits: cpu: "2" memory: "6Gi" requests: cpu: "2" memory: "2Gi" volumeMounts: - mountPath: /redis-master-data name: data - mountPath: /redis-master name: config volumes: - name: data emptyDir: {} - name: config configMap: name: redis-demo-config items: - key: redis-config path: redis.conf nodeName: # Set to the name of the node running Redis. --- apiVersion: v1 kind: Service metadata: name: redis-demo spec: ports: - name: redis-port port: 6379 protocol: TCP targetPort: 6379 selector: name: redis-demo type: ClusterIP -
Deploy Redis:
kubectl apply -f redis-demo.yaml -
Simulate memory overcommitment with the Stress tool. Create
stress-demo.yaml:apiVersion: v1 kind: Pod metadata: name: stress-demo labels: koordinator.sh/qosClass: 'BE' annotations: koordinator.sh/memoryQOS: '{"policy": "auto"}' spec: containers: - args: - '--vm' - '2' - '--vm-bytes' - 11G - '-c' - '2' - '--vm-hang' - '2' command: - stress image: polinux/stress imagePullPolicy: Always name: stress restartPolicy: Always nodeName: # Set to the same node as redis-demo. -
Deploy the stress workload:
kubectl apply -f stress-demo.yaml -
Verify the global minimum watermark before running the benchmark.
ImportantIn memory overcommitment scenarios, a low global minimum watermark causes the OOM killer to run before memory reclamation. For a 32 GB node, set this value to at least 4,000,000 KB.
cat /proc/sys/vm/min_free_kbytesExpected output:
4000000 -
Deploy memtier-benchmark to send requests to the Redis pod:
apiVersion: v1 kind: Pod metadata: labels: name: memtier-demo name: memtier-demo spec: containers: - command: - memtier_benchmark - '-s' - 'redis-demo' - '--data-size' - '200000' - "--ratio" - "1:4" image: 'redislabs/memtier_benchmark:1.3.0' name: memtier restartPolicy: Never nodeName: # Set to the name of the node sending requests. -
Check the benchmark results:
kubectl logs -f memtier-demo -
To compare, disable memory QoS on both pods and repeat the test:
apiVersion: v1 kind: Pod metadata: name: redis-demo labels: koordinator.sh/qosClass: 'LS' annotations: koordinator.sh/memoryQOS: '{"policy": "none"}' spec: ... --- apiVersion: v1 kind: Pod metadata: name: stress-demo labels: koordinator.sh/qosClass: 'BE' annotations: koordinator.sh/memoryQOS: '{"policy": "none"}'
Test results
Results are for reference only. Actual values depend on cluster configuration and workload.
| Metric | Memory QoS disabled | Memory QoS enabled |
|---|---|---|
Latency-avg |
51.32 ms | 47.25 ms |
Throughput-avg |
149.0 MB/s | 161.9 MB/s |
Enabling memory QoS reduced Redis latency by 7.9% and increased throughput by 8.7% under memory overcommitment.
Advanced parameters
Configure these parameters in pod annotations or the ack-slo-config ConfigMap. Pod annotations take precedence.
<table> <thead> <tr> <td><p><b>Parameter</b></p></td> <td><p><b>Type</b></p></td> <td><p><b>Value range</b></p></td> <td><p><b>Description</b></p></td> <td><p><b>Pod annotation</b></p></td> <td><p><b>ConfigMap</b></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <tbody> <tr> <td><p><code>enable</code></p></td> <td><p>Boolean</p></td> <td> <ul> <li><p><code>true</code></p></li> <li><p><code>false</code></p></li> </ul></td> <td> <ul> <li><p><code>true</code>: enables memory QoS for all containers with recommended memcg settings.</p></li> <li><p><code>false</code>: disables memory QoS for all containers and restores default memcg settings.</p></li> </ul></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>policy</code></p></td> <td><p>String</p></td> <td> <ul> <li><p><code>auto</code></p></li> <li><p><code>default</code></p></li> <li><p><code>none</code></p></li> </ul></td> <td> <ul> <li><p><code>auto</code>: enables memory QoS with recommended settings, overriding the ack-slo-pod-config ConfigMap.</p></li> <li><p><code>default</code>: inherits settings from the ack-slo-pod-config ConfigMap.</p></li> <li><p><code>none</code>: disables memory QoS and restores default memcg settings, overriding the ack-slo-pod-config ConfigMap.</p></li> </ul></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>minLimitPercent</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>0</code> (disabled). </p><p>Unreclaimable proportion of the pod memory request. Use this to cache files for page-cache-sensitive applications. See<a href="https://www.alibabacloud.com/help/en/document_detail/169536.html#concept-2482889">Memcg QoS feature of the cgroup v1 interface</a>. </p><p>Formula: <code>Value of memory.min = Memory request × Value of minLimitPercent/100</code>. For example, with <code>Memory Request=100MiB</code> and <code>minLimitPercent=100</code>, <code>the value of memory.min is 104857600</code>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>lowLimitPercent</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>0</code> (disabled). </p><p>Relatively unreclaimable proportion of the pod memory request. See<a href="https://www.alibabacloud.com/help/en/document_detail/169536.html#concept-2482889">Memcg QoS feature of the cgroup v1 interface</a>. </p><p>Formula: <span><code>Value of memory.low = Memory request × Value of lowLimitPercent/100</code></span>. For example, with <span><code>Memory Request=100MiB</code></span> and <span><code>lowLimitPercent=100</code></span>, <span><code>the value of memory.low is 104857600</code></span>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>throttlingPercent</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>0</code> (disabled). </p><p>Memory throttling threshold as a ratio of container usage to limit. When exceeded, memory is reclaimed. Prevents cgroup-level OOM in overcommitment scenarios. See<a href="https://www.alibabacloud.com/help/en/document_detail/169536.html#concept-2482889">Memcg QoS feature of the cgroup v1 interface</a>. </p><p>Formula: <code>Value of memory.high = Memory limit × Value of throttlingPercent/100</code>. For example, with <code>Memory Limit=100MiB</code> and <code>throttlingPercent=80</code>, <code>the value of memory.high is 83886080(80 MiB)</code>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>wmarkRatio</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>95</code>. <code>0</code> disables this parameter. When usage exceeds the threshold, memcg backend asynchronous reclamation triggers. </p><p>Asynchronous reclamation threshold: ratio of usage to limit, or usage to <code>memory.high</code>. See<a href="https://www.alibabacloud.com/help/en/document_detail/169535.html#task-2487938">Memcg backend asynchronous reclaim</a>. </p><p>If throttlingPercent is disabled, the formula is: Value of memory.wmark_high = Memory limit × wmarkRatio/100. If throttlingPercent is enabled, the formula is: <code>Value of memory.wmark_high = Value of memory.high × wmarkRatio/100</code>. For example, with <code>Memory Limit=100MiB</code> and <code>wmarkRatio=95,throttlingPercent=80</code>: <code>memory.high is 83886080 (80 MiB)</code>, <code>memory.wmark_ratio is 95</code>, and <code>memory.wmark_high is 79691776 (76 MiB)</code>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>wmarkMinAdj</code></p></td> <td><p>Int</p></td> <td><p>-25\~50</p></td> <td><p>Unit: %. The default value is <code>-25</code> for the <code>LS</code> QoS class and <code>50</code> for the <code>BE</code> QoS class. <code>0</code> disables this parameter. </p><p>Adjusts the global minimum watermark per container. Negative values postpone reclamation; positive values expedite it. See<a href="https://www.alibabacloud.com/help/en/document_detail/169537.html#task-2492619">Memcg global minimum watermark rating</a>. </p><p>For example, an LS pod defaults to <code>memory.wmark_min_adj=-25</code>, decreasing the minimum watermark by 25%. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> </tbody> </table>
FAQ
Is the memory QoS configuration from ack-slo-manager still valid after upgrading to ack-koordinator?
Yes. ack-koordinator is backward compatible with the annotation-based protocol used in ack-slo-manager 0.8.0 and earlier:
-
alibabacloud.com/qosClass— sets the QoS class -
alibabacloud.com/memoryQOS— configures memory QoS
The following table shows which protocols each version supports:
| Component version | alibabacloud.com protocol | koordinator.sh protocol |
|---|---|---|
| ≥ 0.3.0 and < 0.8.0 | ✓ | × |
| ≥ 0.8.0 | ✓ | ✓ |
Support for the alibabacloud.com protocol ended on July 30, 2023. Migrate to the koordinator.sh protocol.
Billing
No fee is charged for installing or using the ack-koordinator component. However, costs may apply in the following cases:
-
Node resource usage: ack-koordinator is a non-managed component that runs on worker nodes. You can configure the resource requests for each module at install time.
-
Prometheus metrics: If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, the metrics are billed as custom metrics. Before enabling this feature, review the Managed Service for Prometheus billing rules. To monitor usage, see Query the amount of observable data and bills.
Next steps
-
Overview of kernel features and interfaces — kernel features required by ACK memory QoS
-
Enable CPU QoS for containers — limit and evict reclaimed resources to protect latency-sensitive workloads