db-host is paging: queries that normally take milliseconds are taking tens of seconds. Load is up. The on-call already knows "it's I/O" from top — but the box has two disks. Which one is saturated, how badly, and what's hammering it? Adding CPU or rebooting won't help.
db-host, Debian 12, 4 vCPU, two volumes: vda (OS) and vdbYou've solved it when:
top that the CPU is mostly in iowait (high%wa, idle CPU) — not CPU-bound.
iostat -x and localised the saturation to one device,vdb: %util ~99.8, aqu-sz ~95 (requests piling up), and
w_await ~240 ms — all three agree, so the data volume is genuinely
saturated, while vda is idle. (%util alone can mislead on SSD/RAID;
here aqu-sz and await confirm it.)
ps and the Postgres log to find what's hammering vdb: aVACUUM FULL rewriting the whole orders table, with ordinary
backends stuck in D behind it.
vdb — cancel / rescheduleVACUUM FULL (a plain VACUUM doesn't rewrite the table; run it
off-peak), and/or move the data to faster storage / split WAL onto its own
volume. NOT add vCPUs, NOT reboot, NOT kill Postgres wholesale.