claude-code - 💡(How to fix) Fix Opus (CLI) took down a node's DNS by deploying a k8s LoadBalancer on :53, after dismissing on-screen evidence the port was the host resolver

claude-code2026-05-31 02:35:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Claude Code CLI (Opus) took down a host's DNS during a live infrastructure deploy — after looking straight at the evidence that should have stopped it.

On a single-node k3s cluster whose host runs systemd-resolved on 127.0.0.53:53, the agent deployed a Kubernetes LoadBalancer Service on port 53. k3s servicelb (klipper) responds by binding the LB port as a hostPort on 0.0.0.0, which seized 0.0.0.0:53 and clobbered the host resolver. The node then could not resolve anything (image pulls, its own hostname), cascading into stuck workloads. Recovery required deleting the Service.

Root Cause

Claude Code CLI (Opus) took down a host's DNS during a live infrastructure deploy — after looking straight at the evidence that should have stopped it.

RAW_BUFFERClick to expand / collapse

Summary

Claude Code CLI (Opus) took down a host's DNS during a live infrastructure deploy — after looking straight at the evidence that should have stopped it.

The part that makes this a Claude-judgment bug, not an ops mistake

In its own pre-flight the agent ran ss -ulpn and saw 127.0.0.53:53 and 127.0.0.54:53 listening, then wrote (paraphrase): "only loopback resolvers, so :53 is free" — and proceeded to put a LoadBalancer on :53. Those loopback listeners ARE the host resolver. The correct read of "something is on 127.0.0.53:53" is "do not bind :53 on this node." The agent had the disqualifying evidence on screen and explicitly rationalized it away.

Underlying it: the agent imported an unnecessary constraint ("a DNS server must use port 53") that the architecture never required — the only direct clients connect to an explicit address:port and accept any port. Moving the server to :5353 removes the conflict entirely. The agent only recognized this after the operator asked "why did you think you needed port 53?"

Behavior pattern worth flagging

Dismissed contradicting evidence it had just gathered, with a confident one-line rationalization.
Charged a live, blast-radius-heavy infra change (DNS) on an unverified assumption, rather than treating "is this port truly free for a wildcard bind?" as a gate.
Didn't reason about the kind of bind a k8s LoadBalancer performs (0.0.0.0:port hostPort) vs. a plain listener, so "the public IP has no listener" was mistaken for "the port is free."

Environment

Surface: Claude Code CLI
Model: Opus (large-context)
OS: macOS (driving a remote Linux/k3s host)

Ask

Flagging as a risk-assessment / evidence-handling failure: when on-screen evidence contradicts an assumption behind an irreversible or service-affecting action, the model should treat that as a hard stop, not rationalize past it. Happy to provide redacted transcript excerpts.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Opus (CLI) took down a node's DNS by deploying a k8s LoadBalancer on :53, after dismissing on-screen evidence the port was the host resolver

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

The part that makes this a Claude-judgment bug, not an ops mistake

Behavior pattern worth flagging

Environment

Ask

Still need to ship something?

TRENDING