Container can't fork: it's hit its --pids-limit
Medium

Problem

The worker service is stuck at partial capacity. It's Up, but its log is full of can't start new thread / Resource temporarily unavailable and it only managed to spawn ~98 of its 200 workers. The host has plenty of RAM and the app code is unchanged. Find what's stopping it from creating more processes and lift the cap.

Initial setup

  • workermyapp:latest, Up 20 minutes, target 200-thread pool,
stuck at ~98.
  • Host RAM is fine; no OOM in dmesg.

Example interaction

$ docker logs worker
RuntimeError: can't start new thread
[error] OSError: [Errno 11] Resource temporarily unavailable
[warn] pool degraded: 98/200 workers; new tasks queuing

Errno 11 / Resource temporarily unavailable on a fork/thread is EAGAIN — the process table for this container is full. The cap is in docker inspect … HostConfig.

Acceptance

You've solved it when:

  • You've established worker is hitting its --pids-limit (the per-
container process cap), read from docker inspectHostConfig.PidsLimit is 100 — and NOT misdiagnosed it as host RAM / OOM, a host ulimit -u, or app code.
  • You've raised the limit so the pool can grow — `docker update
--pids-limit 500 worker` (live) or recreate with a higher/removed cap — and confirmed HostConfig.PidsLimit now reflects the larger value (500), with worker still running.

Constraints

  • Tools: docker CLI only.
  • Raising the host ulimit -u does nothing — the cap is the container's
pids cgroup, not a host limit. Fix it on the container.

Follow-up

  1. Why does the same Resource temporarily unavailable error also come
from a host ulimit -u, and how do you tell the two apart?
  1. pids.max counts threads, not just processes — why does a JVM or a
thread pool hit a --pids-limit far sooner than ps would suggest?
  1. What's a sane --pids-limit for this workload, and why cap it at all
(fork-bomb containment)?
Live session
Code
SavedNo commands yet