ACK lets you harden pod security with controls that reduce the risk of container escapes and privilege escalation, including restrictions on privileged mode, root execution, hostPath volumes, and ServiceAccount token mounting.
Why container escapes matter
Container escapes let attackers elevate privileges from a container to control the host. Two default Kubernetes behaviors create this risk.
Default root context. Container processes run as root by default. Docker restricts root with Linux capabilities, but the default set is broad:
cap_chown, cap_dac_override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_bind_service, cap_net_raw, cap_sys_chroot, cap_mknod, cap_audit_write, cap_setfcap
An attacker who compromises a containerized application can use these capabilities to read Secrets, ConfigMaps, and other sensitive data on the host. Avoid privileged mode — it grants all Linux capabilities of the host root user.
Node-wide API access via the kubelet. Kubernetes worker nodes use the node authorizer to govern kubelet API requests. It grants each kubelet read access to Services, Endpoints, Nodes, Pods, Secrets, ConfigMaps, persistent volumes (PVs), and persistent volume claims (PVCs) for pods on that node, plus write access to node status, pod status, and Events. It also grants read/write access to the CertificateSigningRequest (CSR) API for TLS bootstrapping, and the ability to create TokenReview and SubjectAccessReview for delegated authentication and authorization.
By default, ACK clusters enable the NodeRestriction admission controller, which limits each kubelet to modifying only its own node and bound pods. However, NodeRestriction alone cannot prevent an attacker from querying the Kubernetes API to discover cluster information.
Enforcement mechanism
ACK supports pod security policies built on Open Policy Agent (OPA) and Gatekeeper that validate pod create and update requests against your rules, rejecting non-compliant requests. Each recommendation below has a predefined ACK policy for namespace-level enforcement.
Pod security recommendations
Apply these nine controls together for defense in depth against common attack vectors.
1. Forbid privileged containers
Privileged containers inherit all Linux capabilities of the host root user. Most workloads do not need them. Forbid privileged mode to prevent attackers from directly accessing host resources.
Restricted fields:
|
Field |
Allowed values |
|
|
Undefined, |
|
|
Undefined, |
Deploy the ACKPSPPrivilegedContainer policy to enforce this restriction across specified namespaces.
2. Run pods as a non-root user
Containers run as root by default. An attacker with shell access to a root container has a much easier path to the host. Run containers as a non-root user to limit the blast radius of a compromise.
Use any of these approaches:
-
Remove the shell from the container image.
-
Add a
USERinstruction to the Dockerfile. -
Set
spec.securityContext.runAsUserandrunAsGroupin the podSpec.
Deploy the ACKPSPAllowedUsers policy to restrict which users and groups can run containers in specified namespaces.
3. Forbid Docker-in-Docker and Docker.sock mounting
Building or running images inside a container by using Docker-in-Docker or by mounting Docker.sock grants the container process control over the node.
Use alternative image-build approaches instead:
-
Use a Container Registry Enterprise Edition instance to build an image
-
kaniko — builds images inside Kubernetes without Docker daemon access
-
img — rootless image builds without Docker.sock
4. Restrict hostPath volumes
A hostPath volume mounts a host directory into a pod. A root container with write access can modify kubelet settings, create symbolic links to files outside the mounted path (such as /etc/shadow), install SSH keys, read host Secrets, or perform other malicious operations. Set hostPath mounts to read-only to limit the damage.
volumeMounts:
- name: hostPath-volume
readOnly: true
mountPath: /host-path
Deploy the ACKPSPHostFilesystem policy to restrict which host directories can be mounted in specified namespaces.
5. Set resource requests and limits
A pod without resource requests or limits can exhaust node CPU and memory, crash the kubelet, or evict other pods. Set requests and limits to reduce resource contention.
Specify CPU and memory requests and limits in the podSpec. Apply a resource quota to the namespace to require all containers to declare requests and limits. Use a LimitRange to set per-container defaults and bounds.
Deploy the ACKContainerLimits policy to enforce resource limits in specified namespaces.
6. Forbid privilege escalation
Privilege escalation lets a process gain elevated permissions at runtime — for example, by executing a SUID or SGID binary such as sudo. Disable this to prevent non-root processes from regaining root-level access.
Restricted field:
|
Field |
Allowed values |
|
|
|
securityContext:
allowPrivilegeEscalation: false
Deploy the ACKPSPAllowPrivilegeEscalationContainer policy to enforce this setting in specified namespaces.
7. Disable automatic ServiceAccount token mounting
For pods that do not need Kubernetes API access, disable automatic ServiceAccount token mounting to prevent token exposure if the pod is compromised.
Disable token mounting for a specific pod:
apiVersion: v1
kind: Pod
metadata:
name: pod-no-automount
spec:
automountServiceAccountToken: false
Disable token mounting for all pods that use a specific ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-no-automount
automountServiceAccountToken: false
Disabling token mounting does not prevent the pod from reaching the Kubernetes API — a pod can still make network connections to the API server. To block API access entirely, restrict the ACK cluster API server endpoint exposure and configure network policies.
Deploy the ACKBlockAutomountToken policy to enforce automountServiceAccountToken: false across application pods in specified namespaces.
8. Disable service discovery
For pods that do not need other cluster services, disable service links and change the DNS policy to limit what an attacker can enumerate if the pod is compromised.
apiVersion: v1
kind: Pod
metadata:
name: pod-no-service-info
spec:
dnsPolicy: Default # The value Default does not indicate the default setting of a DNS policy.
enableServiceLinks: false
By default, a pod's DNS policy is ClusterFirst, which routes queries through the in-cluster CoreDNS service. Setting dnsPolicy: Default routes DNS through the node's resolver instead. Setting enableServiceLinks: false prevents Services in the namespace from being injected as environment variables.
These settings do not block direct CoreDNS access. An attacker can still enumerate cluster services by running dig SRV *.*.svc.cluster.local @$CLUSTER_DNS_IP. Use network policies to fully restrict service discovery.
9. Use a read-only root file system
A read-only root file system prevents attackers from overwriting application binaries or configuration files. If the application must write to disk, use a tmpfs volume or a mounted persistent volume instead.
Restricted field:
|
Field |
Allowed values |
|
|
|
securityContext:
readOnlyRootFilesystem: true
Deploy the ACKPSPReadOnlyRootFilesystem policy to enforce a read-only root file system for pods in specified namespaces.
Next steps
-
Review all predefined ACK security policies.
-
Use network policies to restrict pod-to-pod and pod-to-API-server traffic.