Pod security

更新时间:
复制 MD 格式

ACK lets you harden pod security with controls that reduce the risk of container escapes and privilege escalation, including restrictions on privileged mode, root execution, hostPath volumes, and ServiceAccount token mounting.

Why container escapes matter

Container escapes let attackers elevate privileges from a container to control the host. Two default Kubernetes behaviors create this risk.

Default root context. Container processes run as root by default. Docker restricts root with Linux capabilities, but the default set is broad:

cap_chown, cap_dac_override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_bind_service, cap_net_raw, cap_sys_chroot, cap_mknod, cap_audit_write, cap_setfcap

An attacker who compromises a containerized application can use these capabilities to read Secrets, ConfigMaps, and other sensitive data on the host. Avoid privileged mode — it grants all Linux capabilities of the host root user.

Node-wide API access via the kubelet. Kubernetes worker nodes use the node authorizer to govern kubelet API requests. It grants each kubelet read access to Services, Endpoints, Nodes, Pods, Secrets, ConfigMaps, persistent volumes (PVs), and persistent volume claims (PVCs) for pods on that node, plus write access to node status, pod status, and Events. It also grants read/write access to the CertificateSigningRequest (CSR) API for TLS bootstrapping, and the ability to create TokenReview and SubjectAccessReview for delegated authentication and authorization.

By default, ACK clusters enable the NodeRestriction admission controller, which limits each kubelet to modifying only its own node and bound pods. However, NodeRestriction alone cannot prevent an attacker from querying the Kubernetes API to discover cluster information.

Enforcement mechanism

ACK supports pod security policies built on Open Policy Agent (OPA) and Gatekeeper that validate pod create and update requests against your rules, rejecting non-compliant requests. Each recommendation below has a predefined ACK policy for namespace-level enforcement.

Pod security recommendations

Apply these nine controls together for defense in depth against common attack vectors.

1. Forbid privileged containers

Privileged containers inherit all Linux capabilities of the host root user. Most workloads do not need them. Forbid privileged mode to prevent attackers from directly accessing host resources.

Restricted fields:

Field

Allowed values

spec.containers[*].securityContext.privileged

Undefined, false

spec.initContainers[*].securityContext.privileged

Undefined, false

Deploy the ACKPSPPrivilegedContainer policy to enforce this restriction across specified namespaces.

2. Run pods as a non-root user

Containers run as root by default. An attacker with shell access to a root container has a much easier path to the host. Run containers as a non-root user to limit the blast radius of a compromise.

Use any of these approaches:

  • Remove the shell from the container image.

  • Add a USER instruction to the Dockerfile.

  • Set spec.securityContext.runAsUser and runAsGroup in the podSpec.

Deploy the ACKPSPAllowedUsers policy to restrict which users and groups can run containers in specified namespaces.

3. Forbid Docker-in-Docker and Docker.sock mounting

Building or running images inside a container by using Docker-in-Docker or by mounting Docker.sock grants the container process control over the node.

Use alternative image-build approaches instead:

4. Restrict hostPath volumes

A hostPath volume mounts a host directory into a pod. A root container with write access can modify kubelet settings, create symbolic links to files outside the mounted path (such as /etc/shadow), install SSH keys, read host Secrets, or perform other malicious operations. Set hostPath mounts to read-only to limit the damage.

volumeMounts:
- name: hostPath-volume
  readOnly: true
  mountPath: /host-path

Deploy the ACKPSPHostFilesystem policy to restrict which host directories can be mounted in specified namespaces.

5. Set resource requests and limits

A pod without resource requests or limits can exhaust node CPU and memory, crash the kubelet, or evict other pods. Set requests and limits to reduce resource contention.

Specify CPU and memory requests and limits in the podSpec. Apply a resource quota to the namespace to require all containers to declare requests and limits. Use a LimitRange to set per-container defaults and bounds.

Deploy the ACKContainerLimits policy to enforce resource limits in specified namespaces.

6. Forbid privilege escalation

Privilege escalation lets a process gain elevated permissions at runtime — for example, by executing a SUID or SGID binary such as sudo. Disable this to prevent non-root processes from regaining root-level access.

Restricted field:

Field

Allowed values

securityContext.allowPrivilegeEscalation

false

securityContext:
  allowPrivilegeEscalation: false

Deploy the ACKPSPAllowPrivilegeEscalationContainer policy to enforce this setting in specified namespaces.

7. Disable automatic ServiceAccount token mounting

For pods that do not need Kubernetes API access, disable automatic ServiceAccount token mounting to prevent token exposure if the pod is compromised.

Disable token mounting for a specific pod:

apiVersion: v1
kind: Pod
metadata:
  name: pod-no-automount
spec:
  automountServiceAccountToken: false

Disable token mounting for all pods that use a specific ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-no-automount
automountServiceAccountToken: false
Important

Disabling token mounting does not prevent the pod from reaching the Kubernetes API — a pod can still make network connections to the API server. To block API access entirely, restrict the ACK cluster API server endpoint exposure and configure network policies.

Deploy the ACKBlockAutomountToken policy to enforce automountServiceAccountToken: false across application pods in specified namespaces.

8. Disable service discovery

For pods that do not need other cluster services, disable service links and change the DNS policy to limit what an attacker can enumerate if the pod is compromised.

apiVersion: v1
kind: Pod
metadata:
  name: pod-no-service-info
spec:
  dnsPolicy: Default # The value Default does not indicate the default setting of a DNS policy.
  enableServiceLinks: false

By default, a pod's DNS policy is ClusterFirst, which routes queries through the in-cluster CoreDNS service. Setting dnsPolicy: Default routes DNS through the node's resolver instead. Setting enableServiceLinks: false prevents Services in the namespace from being injected as environment variables.

Important

These settings do not block direct CoreDNS access. An attacker can still enumerate cluster services by running dig SRV *.*.svc.cluster.local @$CLUSTER_DNS_IP. Use network policies to fully restrict service discovery.

9. Use a read-only root file system

A read-only root file system prevents attackers from overwriting application binaries or configuration files. If the application must write to disk, use a tmpfs volume or a mounted persistent volume instead.

Restricted field:

Field

Allowed values

securityContext.readOnlyRootFilesystem

true

securityContext:
  readOnlyRootFilesystem: true

Deploy the ACKPSPReadOnlyRootFilesystem policy to enforce a read-only root file system for pods in specified namespaces.

Next steps