pg-host (4 vCPU) shows 100% CPU and throughput has collapsed — queries that took milliseconds now take seconds. The on-call's reflex is "CPU is maxed, scale up the cores." But the CPU is busy doing the wrong thing. Work out what, and why adding cores would make it worse.
pg-host, Debian 12, 4 vCPU, a Postgres database under aYou've solved it when:
top: the CPU is pegged but it's almost all sy (system /%sy ~84, %us ~10. The box is busy in the kernel, not
running application code.
vmstat and seen the smoking gun: cs (contextr ~38 (far above 4 cores),
while si/so are 0 (not swap) and wa is 0 (not I/O). This is a
context-switch / lock-contention storm.
ps/top to see dozens of Postgres backends all STAT R,postgresql.conf: max_connections = 600 with no connection pooler,
so the spike put hundreds of active backends on 4 cores.
max_connections) so only a handful run at
once. NOT add vCPUs (more cores worsens the contention), NOT kill
backends one by one (they're victims), NOT reboot.