Two pages fired in the same minute and it looks like two separate fires:
redis ... Connection refused.
It is tempting to split up and debug two outages. Don't. Two services that don't talk to each other failing at the exact same second with the same error host is a tell: they share something. Find the one thing both depend on.
web is Up but every request that touches the cache 500s withredis.exceptions.ConnectionError: Error 111 connecting to redis:6379.
worker is Up but its consumer loop is stuck retryingCannot connect to redis://redis:6379/0.
redis:6379 — appears in both unrelated services'$ docker ps -a --format 'table {{.Names}}\t{{.Status}}'
NAMES STATUS
web Up 8 hours
worker Up 8 hours
redis Exited (137) 90 seconds ago
$ docker inspect redis --format '{{.State.OOMKilled}} / {{.State.ExitCode}}'
true / 137
$ docker network inspect appnet
[
{
"Name": "appnet",
"Containers": {
"web": { "Name": "web", "EndpointState": "connected", "ipv4_address": "172.32.0.10" },
"worker": { "Name": "worker", "EndpointState": "connected", "ipv4_address": "172.32.0.20" }
}
}
]
Note redis is absent from the network's active members — an exited container drops its endpoint — while web and worker are still there. redis is Exited (137) (SIGKILL, State.OOMKilled=true) — one dead container. Both web (which uses it as a cache) and worker (which uses it as a broker) point at the same redis:6379. That single Redis is a shared dependency / single point of failure: when it died, the blast radius hit two tiers at once. Two symptoms, one root cause.
You've solved it when:
redis container is down (Exited (137),
OOM-killed), and both web (cache) and worker (broker) depend on
it — not two independent outages.
redis:6379 in bothdocker network inspect / both containers'
Redis target being the one node).
docker start redis (and, since itmaxmemory + an eviction
policy, and ideally split the cache Redis from the broker Redis so
one death can't take down two tiers).
redis.status == running, web's cache 500s stop, andworker's consumer reconnects.
docker CLI only.web or worker as the fix — they are healthy clients ofmaxmemory+eviction, a restart policy, a replica, or
splitting cache from broker — and why?
web and worker were both Up the whole time — docker ps neverSocketClosedUnexpectedlyError: Socket closed unexpectedly rather than
ECONNREFUSED. Why does "was connected, then dropped" look different
from "never connected", and which one tells you the container died
mid-flight?