The Disk Is Maxed Out — On Reads. What's Reading?
Medium

Problem

app-db slows to a crawl every weekday mid-morning: queries that are normally instant take seconds, and load is up. The on-call sees the disk "maxed out" and is about to file for a faster/bigger volume or more vCPUs.

Before spending money: the disk is busy — but doing what, and driven by whom? Reads or writes? Find the offender and say what you'd actually do.

Initial setup

  • Host: app-db, Debian 12, 4 vCPU, two volumes: vda (OS) and vdb
(the Postgres data volume).

Acceptance

You've solved it when:

  • You've confirmed from top that the CPU is mostly in iowait (high
%wa, idle CPU) — so it's I/O, not compute.
  • You've run vmstat and seen that bi (blocks/KiB read IN from
disk) is huge (~297000) while bo (written out) is low — the system is READING from disk hard, not writing. (This is the opposite of a write-saturation / VACUUM-style problem.)
  • You've run iostat -x and localised it to vdb: high r/s
(~2400), high rkB/s (~297000, ~290 MB/s), %util ~99, aqu-sz ~86 — but w/s and w_await are low. The saturation is entirely on the read path; vda is idle.
  • You've used ps (and /etc/cron.d/orders-backup) to find the reader: a
pg_dump backup (with its Postgres COPY backend in D) reading the whole orders database — scheduled at 09:00 on weekdays, i.e. during business hours, and now overrunning. The ordinary backends in D are its victims.
  • You've named the fix: take the read load off the primary during the
day — reschedule the backup to off-peak, ionice/rate-limit it (ionice -c3 / --rate), or run it from a read replica / standby. NOT add vCPUs (the CPU is idle in %wa), NOT add RAM, NOT chase a writer / VACUUM (writes are low), NOT kill -9 the D-state backend (it's uninterruptible until its I/O completes), NOT reboot.
Live session
Code
SavedNo commands yet