Load Average 8 on a 4-Core Box — But the CPU Is Idle
Hard

Problem

db-host (4 vCPU) is paging alerts: load average ~8 and the database is crawling. The on-call's first instinct is "the CPU is melting — find the hog and kill it." But load is not the same as CPU. Work out what's actually saturated, identify the real source, and name the fix — without killing the wrong thing.

Initial setup

  • Host: db-host, Debian 12, 4 vCPU.
  • A node API, a Postgres instance, and an hourly backup job.

Acceptance

You've solved it when:

  • You've read top's header: load ~8 on 4 cores, BUT %Cpu(s) shows low
us and high wa (I/O wait) with the CPU mostly idle — so this is NOT CPU saturation.
  • You've used the STAT column (top, or ps -eo pid,%cpu,stat,comm)
to see the load-drivers are in state D (uninterruptible sleep on I/O): the rsync backup plus the Postgres backends blocked behind it. You've rejected the node process (STAT R, ~12% CPU) as the decoy it is.
  • You've named the fix: relieve the disk I/O (stop/ionice the runaway
hourly rsync and reschedule it off-peak — see the cron), NOT kill the node process, NOT add CPUs, NOT reboot.
Live session
Code
SavedNo commands yet