Cgroup OOM-Kill: A Capped Service Keeps Dying on a Healthy Host

Problem

The api service on api-host keeps dying. You see its python3 process in ps, holding steady for a while, then it's gone — and a moment later it's back with a new PID (something keeps relaunching it). There is NO Python traceback anywhere in the app's own logs: the process isn't crashing, something is killing it.

The box itself looks healthy. Find out what is killing the service and prove it — then say how you'd stop it.

Initial setup

Host: api-host, Alpine, cgroup v2 enabled.
Service: python3 -m uvicorn api:app runs under a 256 MB memory

cgroup.

The host has plenty of RAM free; the disk is fine.

Acceptance

You've solved it when:

You've shown via dmesg that the kernel cgroup OOM-killer killed it

("Memory cgroup out of memory: Killed process 4711 (python3)"), with the memory: usage 262144kB, limit 262144kB line proving usage hit the cgroup cap.

You've ruled OUT global memory pressure with free — the host has

~1.4 GB available, so this is the per-cgroup limit, not the host.

You've named the fix: raise the cgroup memory limit (cgroup v2

memory.max / docker update --memory / systemd MemoryMax=), or shrink the workload so peak RSS fits under the cap.