I/O Wait, But Which Disk? Pinpointing a Saturated Volume

Problem

db-host is paging: queries that normally take milliseconds are taking tens of seconds. Load is up. The on-call already knows "it's I/O" from top — but the box has two disks. Which one is saturated, how badly, and what's hammering it? Adding CPU or rebooting won't help.

Initial setup

Host: db-host, Debian 12, 4 vCPU, two volumes: vda (OS) and vdb

(the Postgres data volume).

Acceptance

You've solved it when:

You've confirmed from top that the CPU is mostly in iowait (high

%wa, idle CPU) — not CPU-bound.

You've run iostat -x and localised the saturation to one device,

vdb: %util ~99.8, aqu-sz ~95 (requests piling up), and w_await ~240 ms — all three agree, so the data volume is genuinely saturated, while vda is idle. (%util alone can mislead on SSD/RAID; here aqu-sz and await confirm it.)

You've used ps and the Postgres log to find what's hammering vdb: a

VACUUM FULL rewriting the whole orders table, with ordinary backends stuck in D behind it.

You've named the fix: take the I/O load off vdb — cancel / reschedule

the VACUUM FULL (a plain VACUUM doesn't rewrite the table; run it off-peak), and/or move the data to faster storage / split WAL onto its own volume. NOT add vCPUs, NOT reboot, NOT kill Postgres wholesale.