CPU Pegged, But No Process Is Using It
Medium

Problem

web-edge (a small reverse-proxy VM) is unresponsive and dropping requests. uptime/top show the CPU pegged — but when you sort the process list by CPU, no application process is using much CPU at all. nginx is barely working. So what is eating the CPU, and why won't adding vCPUs or restarting nginx fix it?

Initial setup

  • Host: web-edge, Debian 12, 2 vCPU, runs nginx as a reverse proxy.

Acceptance

You've solved it when:

  • You've read top's %Cpu(s) line and seen the CPU is in si
(softirq), not us (user) or sy (system) — us is ~2%. The CPU is busy in the kernel's interrupt path, which is why no userland process shows up as the hog. (ps/top by %CPU shows only ksoftirqd, the kernel softirq thread, running flat out — nginx is idle.)
  • You've run vmstat and seen the in (interrupts/s) column is
enormous (~152000) while cs (context switches) is only moderately elevated, and si/so (swap) and wa (iowait) are 0 — so this is an interrupt storm, not swapping, not I/O-wait, and not a context-switch storm.
  • You've confirmed the source in dmesg: the NIC (eth0, r8169) is
logging repeated NETDEV WATCHDOG ... transmit queue ... timed out / Reset adapter — the NIC is generating interrupts faster than one core can service them, with all NET_RX softirq landing on a single core.
  • You've named the fix: spread the interrupt/softirq load — enable/
restart irqbalance or set IRQ affinity, enable multiple NIC RX queues (RSS via ethtool -L), turn on software steering (RPS/RFS), and/or fix interrupt coalescing (ethtool -C) or update the NIC driver. NOT add vCPUs (a second idle core won't help while all softirq pins to one core), NOT restart/kill nginx (it isn't the consumer), NOT reboot, and it is NOT swap (si/so 0) or I/O-wait (wa 0).
Live session
Code
SavedNo commands yet