claude-code - 💡(How to fix) Fix Opus (CLI) took down a node's DNS by deploying a k8s LoadBalancer on :53, after dismissing on-screen evidence the port was the host resolver

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Claude Code CLI (Opus) took down a host's DNS during a live infrastructure deploy — after looking straight at the evidence that should have stopped it.

On a single-node k3s cluster whose host runs systemd-resolved on 127.0.0.53:53, the agent deployed a Kubernetes LoadBalancer Service on port 53. k3s servicelb (klipper) responds by binding the LB port as a hostPort on 0.0.0.0, which seized 0.0.0.0:53 and clobbered the host resolver. The node then could not resolve anything (image pulls, its own hostname), cascading into stuck workloads. Recovery required deleting the Service.

Root Cause

Claude Code CLI (Opus) took down a host's DNS during a live infrastructure deploy — after looking straight at the evidence that should have stopped it.

On a single-node k3s cluster whose host runs systemd-resolved on 127.0.0.53:53, the agent deployed a Kubernetes LoadBalancer Service on port 53. k3s servicelb (klipper) responds by binding the LB port as a hostPort on 0.0.0.0, which seized 0.0.0.0:53 and clobbered the host resolver. The node then could not resolve anything (image pulls, its own hostname), cascading into stuck workloads. Recovery required deleting the Service.

RAW_BUFFERClick to expand / collapse

Summary

Claude Code CLI (Opus) took down a host's DNS during a live infrastructure deploy — after looking straight at the evidence that should have stopped it.

On a single-node k3s cluster whose host runs systemd-resolved on 127.0.0.53:53, the agent deployed a Kubernetes LoadBalancer Service on port 53. k3s servicelb (klipper) responds by binding the LB port as a hostPort on 0.0.0.0, which seized 0.0.0.0:53 and clobbered the host resolver. The node then could not resolve anything (image pulls, its own hostname), cascading into stuck workloads. Recovery required deleting the Service.

The part that makes this a Claude-judgment bug, not an ops mistake

In its own pre-flight the agent ran ss -ulpn and saw 127.0.0.53:53 and 127.0.0.54:53 listening, then wrote (paraphrase): "only loopback resolvers, so :53 is free" — and proceeded to put a LoadBalancer on :53. Those loopback listeners ARE the host resolver. The correct read of "something is on 127.0.0.53:53" is "do not bind :53 on this node." The agent had the disqualifying evidence on screen and explicitly rationalized it away.

Underlying it: the agent imported an unnecessary constraint ("a DNS server must use port 53") that the architecture never required — the only direct clients connect to an explicit address:port and accept any port. Moving the server to :5353 removes the conflict entirely. The agent only recognized this after the operator asked "why did you think you needed port 53?"

Behavior pattern worth flagging

  1. Dismissed contradicting evidence it had just gathered, with a confident one-line rationalization.
  2. Charged a live, blast-radius-heavy infra change (DNS) on an unverified assumption, rather than treating "is this port truly free for a wildcard bind?" as a gate.
  3. Didn't reason about the kind of bind a k8s LoadBalancer performs (0.0.0.0:port hostPort) vs. a plain listener, so "the public IP has no listener" was mistaken for "the port is free."

Environment

  • Surface: Claude Code CLI
  • Model: Opus (large-context)
  • OS: macOS (driving a remote Linux/k3s host)

Ask

Flagging as a risk-assessment / evidence-handling failure: when on-screen evidence contradicts an assumption behind an irreversible or service-affecting action, the model should treat that as a hard stop, not rationalize past it. Happy to provide redacted transcript excerpts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Opus (CLI) took down a node's DNS by deploying a k8s LoadBalancer on :53, after dismissing on-screen evidence the port was the host resolver