Alive but Crawling: a Box Thrashing on Swap
Hard

Problem

orders-host (2 GB RAM) is alive but crawling — every request takes seconds and the box feels frozen, yet nothing has crashed. The on-call sees swap almost full and is about to add more swap. Work out what is actually wrong before you do — adding swap here makes it worse.

Initial setup

  • Host: orders-host, Debian 12, 2 GB RAM, 2 GB swap.
  • A gunicorn app (orders-api) with a pool of workers.

Acceptance

You've solved it when:

  • You've read vmstat: swap is heavily USED (swpd ~1.9 GiB) but the
tell is the sustained si/so TRAFFIC (tens of MB/s swapped in AND out continuously) with b (blocked) and wa (I/O wait) elevated — the box is thrashing its working set, not merely parking cold pages.
  • You've used free / top / ps aux --sort=-rss to see this is an
OVER-COMMIT, not a leak: the gunicorn workers each hold a stable ~300 MiB and there is no single growing offender — 6 workers x ~300 MiB simply do not fit in 2 GiB.
  • You've found the root cause in the gunicorn config (workers = 6 sized for
the old 8 GB box, never reduced after the downsize) and named the fix: reduce memory pressure — cut the worker count to fit RAM (or move workers off / add RAM / enable preload_app). NOT add more swap, NOT reboot, NOT kill one worker (the next request just respawns the pressure).
Live session
Code
SavedNo commands yet