App latency spikes — a hidden --cpus limit is throttling it

Problem

The api service is slow — p99 latency blew past the 500ms SLO and requests are taking 4–6 seconds. It is not crashing: docker ps shows it Up, logs show it booted fine and is still serving, just slowly. CPU on the host is mostly idle. Find why api is starved and give it room to run.

Initial setup

api — myapp:latest, Up 5 hours, 4 gunicorn workers, serving but slow.
Host has spare CPU; nothing else is hot.

Example interaction

$ docker logs api
[warn] request GET /api/report took 4812ms (slo: 500ms)
[warn] worker 3 heartbeat late by 3.9s

$ docker stats --no-stream api
CONTAINER ID   NAME   CPU %   MEM USAGE / LIMIT   MEM %   ...   PIDS
ccaa11bb22dd   api    --      ...                  ...           5

Note docker stats does not reveal CFS throttling (CPU% reads -- here) — that's the trap. The container's configured CPU ceiling is in docker inspect … HostConfig.

Acceptance

You've solved it when:

You've established api is CPU-throttled by its own --cpus limit,

read from docker inspect — HostConfig.NanoCpus is 500000000 (= 0.5 CPU) — and NOT misdiagnosed it as slow app code, GC, a slow dependency, or host CPU saturation.

You've raised the limit so api gets more CPU — `docker update

--cpus 2 api` (live, no restart) or recreate with a higher/removed cap — and confirmed HostConfig.NanoCpus now reflects the larger value (2000000000), with api still running.

Constraints

Tools: docker CLI only.
Adding more replicas does NOT fix it — each replica is throttled at 0.5

the same way. Raise the per-container quota.

Follow-up

Why did docker stats CPU% mislead you, and what cgroup file

(cpu.stat: nrthrottled / nrperiods) would have shown the throttling directly?

--cpus 0.5 sets NanoCpus; HostConfig.CpuQuota stays 0. Why — and

how does --cpus differ from --cpu-shares?

Is docker update --cpus (live) or a recreate the safer fix in

production, and why?