Container Escape - Ausbruch aus Docker/Kubernetes-Containern

Container Escape refers to breaking out of the container sandbox onto the host system or into other containers. Containers provide isolation via Linux namespaces and cgroups—but this isolation is not as strong as hardware virtualization. Misconfigurations or vulnerabilities can bypass this isolation.

Container Isolation and Its Limitations

How container isolation works:

  Linux containers are based on:
  □ Namespaces: process, network, and filesystem isolation
  □ cgroups: resource limits (CPU, memory)
  □ Capabilities: granular permissions instead of root/non-root
  □ Seccomp: whitelist of allowed syscalls
  □ AppArmor/SELinux: Mandatory Access Control

  Difference from VMs:
  VM:       Hypervisor → full hardware emulation → strong isolation
  Container: shared kernel with host → weaker isolation!
  → Container escape = kernel attack (VM escape: significantly harder)

Docker privilege model:
  Standard user in the container: root (UID 0) in the container namespace
  BUT: root in the container ≠ root on the host
  → Namespace mapping: Container root → Host UID (e.g., 100000)
  → Exception: if no user namespace → Container root = Host root!

Container Escape Techniques

Vector 1 - Privileged Container (--privileged):

  docker run --privileged nginx
  → Container has ALL Linux capabilities
  → Can: mount new devices, use raw sockets, manipulate cgroups

  Trivial escape method:
  # From inside the --privileged container:
  mkdir /tmp/host-mount
  mount /dev/sda1 /tmp/host-mount    # Mount the host hard drive!
  chroot /tmp/host-mount             # Host filesystem as root
  # → Full host access!

  Or:
  # cgroup v1 release_agent:
  mkdir /tmp/cgrp &amp;&amp; mount -t cgroup -o memory cgroup /tmp/cgrp
  echo 1 &gt; /tmp/cgrp/notify_on_release
  echo &quot;$path/cmd&quot; &gt; /tmp/cgrp/release_agent
  sh -c &quot;echo ... &gt; /tmp/cgrp/x/cgroup.procs&quot;
  # → Command is executed with host privileges!

Vector 2 - Docker Socket Mounted (/var/run/docker.sock):

  Commonly seen in CI/CD pipelines:
  docker run -v /var/run/docker.sock:/var/run/docker.sock image

  Escape:
  # From inside the container:
  docker run --rm -v /:/host alpine chroot /host /bin/bash
  # Or:
  curl -X POST --unix-socket /var/run/docker.sock \
    &#x27;http://localhost/containers/create&#x27; \
    -d &#x27;{&quot;Image&quot;:&quot;alpine&quot;,&quot;HostConfig&quot;:{&quot;Binds&quot;:[&quot;/:/host&quot;],&quot;Privileged&quot;:true}}&#x27;
  # → New container with host root → Escape!

  Why: Docker socket = root access to Docker daemon!
  If an attacker reaches the socket = full host control

Vector 3 - Sensitive Volume Mounts:

  Common dangerous mounts:
  -v /etc:/etc → Host /etc editable!
  -v /proc:/proc → Host /proc accessible!
  -v ~/.ssh:/root/.ssh → SSH keys exposed!
  -v /home:/home → User home directories readable

  Exploitation:
  Mount /etc: → Edit /etc/crontab → Run cron as root → RCE on host
  Mount /proc: → /proc/sysrq-trigger → Kernel crash
  Mount /sys: → sysfs → Hardware manipulation

Vector 4 - Kernel exploits (direct attack):

  Container and host share the same kernel:
  → Kernel CVE in the host → Exploit possible from the container!

  Known CVEs:
  CVE-2022-0492 (cgroup escape, Linux Kernel 5.x)
  CVE-2019-5736 (runc overwrite, affected Docker &lt; 18.09.2)
  CVE-2019-16884 (AppArmor bypass in runc)
  CVE-2020-15257 (containerd shim escape)

  runc CVE-2019-5736:
  → Attacker container with special binary
  → When executing docker exec: runc binary overwritten on host
  → Next docker run: compromised runc → Host RCE!

Vector 5 - Capabilities Abuse:

  Dangerous Capabilities (which containers sometimes have):
  CAP_SYS_ADMIN:  like --privileged, very far-reaching
  CAP_NET_ADMIN:  Manipulate network interfaces
  CAP_SYS_PTRACE: Debug other processes (including host processes!)
  CAP_NET_RAW:    Raw sockets → ARP spoofing, sniffing

  SYS_PTRACE exploit:
  # Container has CAP_SYS_PTRACE:
  # PID 1 in the container is visible to the host process (without PID namespace isolation)
  # ptrace on host PID → inject shellcode into host process

Vector 6 - Kubernetes Specifics:

  Service Account Credentials (too broad):
  cat /var/run/secrets/kubernetes.io/serviceaccount/token
  → Default service account too powerful in many K8s clusters!
  → kubectl with token: other pods, secrets, ClusterRoleBindings?

  Over-privileged service account:
  kubectl auth can-i create pods -n kube-system
  → If yes: Pod with hostPath:/,privileged=true → Escape!

  ETCD directly accessible:
  → Is etcd port 2379 accessible from the pod?
  → All Kubernetes secrets readable in plain text!

Detection Strategies

Detect container escapes:

Falcosecurity/Falco (Open Source):
  → Kernel-level monitoring via eBPF
  → Detects: sensitive mounts, /proc access, syscall anomalies

  Important Falco rules:
  - rule: Container started in privileged mode
    condition: spawned_process and container and proc.name=docker and
               container.privileged=true
    priority: WARNING

  - rule: Access to sensitive files in the container
    condition: open_read and container and
               fd.name in (/etc/shadow, /etc/passwd, /root/.ssh/id_rsa)
    priority: CRITICAL

  - rule: Docker socket access in the container
    condition: open_write and container and
               fd.name=/var/run/docker.sock
    priority: CRITICAL

System audit (auditd):
  -w /var/run/docker.sock -p rw -k docker-socket
  -w /proc/sysrq-trigger -p w -k kernel-trigger
  → Log on write access to Docker socket / sysrq-trigger

Kubernetes Audit Logging:
  → Pod creation with hostPath / privileged=true → Alert!
  → Service account with cluster-admin from user namespace → Alert!
  → kubectl exec on privileged pods → Alert!

Mitigation Measures

Container Hardening:

1. No privileged containers (--privileged prohibited!):
   docker run --security-opt no-new-privileges nginx
   → No privilege escalation in the container
   Kubernetes:
   securityContext:
     allowPrivilegeEscalation: false
     privileged: false
     capabilities:
       drop: [&quot;ALL&quot;]   # Remove all capabilities!

2. Read-Only Root Filesystem:
   docker run --read-only nginx
   Kubernetes:
   securityContext:
     readOnlyRootFilesystem: true
   → Malware cannot write executables!

3. Do not run as root:
   Dockerfile: USER 1000
   Kubernetes:
   securityContext:
     runAsNonRoot: true
     runAsUser: 1000

4. Seccomp profiles (system call filtering):
   docker run --security-opt seccomp:profile.json nginx
   Kubernetes:
   securityContext:
     seccompProfile:
       type: RuntimeDefault
   → Only allowed syscalls → Reduced kernel exploit surface area

5. AppArmor profiles:
   docker run --security-opt apparmor:docker-default nginx
   → MAC rules in addition to capabilities

6. No dangerous volume mounts:
   □ No /var/run/docker.sock
   □ No /proc, /sys, /dev
   □ No /, /etc, /home
   □ hostPath: avoid if possible; if necessary: readOnly: true

7. Kubernetes-specific:
   □ Pod Security Admission (PSA): enforce Restricted profile
   □ Network Policies: Pods may only communicate what is necessary
   □ Service Account: separate SA per deployment, minimal permissions
   □ automountServiceAccountToken: false if not necessary
   □ ETCD: accessible only via API server, no direct pod access

Container Isolation and Its Limitations

Container Escape Techniques

Detection Strategies

Mitigation Measures

AWARE7 Services on This Topic