I/O Wait, But Which Disk? Pinpointing a Saturated Volume
Hard

Problem

db-host is paging: queries that normally take milliseconds are taking tens of seconds. Load is up. The on-call already knows "it's I/O" from top — but the box has two disks. Which one is saturated, how badly, and what's hammering it? Adding CPU or rebooting won't help.

Initial setup

  • Host: db-host, Debian 12, 4 vCPU, two volumes: vda (OS) and vdb
(the Postgres data volume).

Acceptance

You've solved it when:

  • You've confirmed from top that the CPU is mostly in iowait (high
%wa, idle CPU) — not CPU-bound.
  • You've run iostat -x and localised the saturation to one device,
vdb: %util ~99.8, aqu-sz ~95 (requests piling up), and w_await ~240 ms — all three agree, so the data volume is genuinely saturated, while vda is idle. (%util alone can mislead on SSD/RAID; here aqu-sz and await confirm it.)
  • You've used ps and the Postgres log to find what's hammering vdb: a
VACUUM FULL rewriting the whole orders table, with ordinary backends stuck in D behind it.
  • You've named the fix: take the I/O load off vdb — cancel / reschedule
the VACUUM FULL (a plain VACUUM doesn't rewrite the table; run it off-peak), and/or move the data to faster storage / split WAL onto its own volume. NOT add vCPUs, NOT reboot, NOT kill Postgres wholesale.
Live session
Code
SavedNo commands yet