The worker service is stuck at partial capacity. It's Up, but its log is full of can't start new thread / Resource temporarily unavailable and it only managed to spawn ~98 of its 200 workers. The host has plenty of RAM and the app code is unchanged. Find what's stopping it from creating more processes and lift the cap.
worker — myapp:latest, Up 20 minutes, target 200-thread pool,dmesg.$ docker logs worker
RuntimeError: can't start new thread
[error] OSError: [Errno 11] Resource temporarily unavailable
[warn] pool degraded: 98/200 workers; new tasks queuing
Errno 11 / Resource temporarily unavailable on a fork/thread is EAGAIN — the process table for this container is full. The cap is in docker inspect … HostConfig.
You've solved it when:
worker is hitting its --pids-limit (the per-docker inspect —
HostConfig.PidsLimit is 100 — and NOT misdiagnosed it as host RAM /
OOM, a host ulimit -u, or app code.
HostConfig.PidsLimit now reflects the larger value (500),
with worker still running.
docker CLI only.ulimit -u does nothing — the cap is the container'sResource temporarily unavailable error also comeulimit -u, and how do you tell the two apart?
pids.max counts threads, not just processes — why does a JVM or a--pids-limit far sooner than ps would suggest?
--pids-limit for this workload, and why cap it at all