web-edge (a small reverse-proxy VM) is unresponsive and dropping requests. uptime/top show the CPU pegged — but when you sort the process list by CPU, no application process is using much CPU at all. nginx is barely working. So what is eating the CPU, and why won't adding vCPUs or restarting nginx fix it?
web-edge, Debian 12, 2 vCPU, runs nginx as a reverse proxy.You've solved it when:
top's %Cpu(s) line and seen the CPU is in sius (user) or sy (system) — us is ~2%. The CPU is
busy in the kernel's interrupt path, which is why no userland process
shows up as the hog. (ps/top by %CPU shows only ksoftirqd, the
kernel softirq thread, running flat out — nginx is idle.)
vmstat and seen the in (interrupts/s) column iscs (context switches) is only moderately
elevated, and si/so (swap) and wa (iowait) are 0 — so this is an
interrupt storm, not swapping, not I/O-wait, and not a
context-switch storm.
dmesg: the NIC (eth0, r8169) isNETDEV WATCHDOG ... transmit queue ... timed out /
Reset adapter — the NIC is generating interrupts faster than one core
can service them, with all NET_RX softirq landing on a single core.
irqbalance or set IRQ affinity, enable multiple NIC RX queues
(RSS via ethtool -L), turn on software steering (RPS/RFS), and/or fix
interrupt coalescing (ethtool -C) or update the NIC driver. NOT add
vCPUs (a second idle core won't help while all softirq pins to one core),
NOT restart/kill nginx (it isn't the consumer), NOT reboot, and
it is NOT swap (si/so 0) or I/O-wait (wa 0).