claude-code - 💡(How to fix) Fix [BUG] 2.1.128 deterministically hangs during settings init under libkrun + uid-shifted virtio-fs [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#56220Fetched 2026-05-06 06:33:58
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
labeled ×5commented ×1

Error Message

Error Messages/Logs

Fix Action

Fix / Workaround

Workaround: drop --userns=keep-id from the podman run invocation. Inside a krun microVM, --userns=keep-id is mostly cosmetic anyway — the guest has its own kernel and virtio-fs presents files as root regardless of the host-side mapping — so the only consequence of dropping it is that bind-mounted files written by the guest land owned by the rootless subuid range on the host, easy to fix up with chown if needed.

Code Example

# /work/d.log — last 5 lines from a hung run (full log = 70 lines, all of it pre-divergence-point)
2026-05-05T06:11:27.866Z [DEBUG] Initializing Plugins
2026-05-05T06:11:27.868Z [DEBUG] Programmatic settings change notification for policySettings
2026-05-05T06:11:27.869Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /home/node/.claude/settings.json
2026-05-05T06:11:27.869Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /work/.claude/settings.json
2026-05-05T06:11:27.870Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /work/.claude/settings.local.json
2026-05-05T06:11:27.871Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /etc/claude-code/managed-settings.json
# (no further log lines; process hangs until SIGTERM from outer timeout)

# Comparison: 2.1.126 in the same environment writes ~184 lines and exits 0.

---

# Dockerfile
    FROM docker.io/library/node:22-bookworm-slim
    RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl && rm -rf /var/lib/apt/lists/*
    RUN npm install -g @anthropic-ai/[email protected] && claude --version
    ENTRYPOINT ["/bin/bash"]

---

podman build -t claude-repro:128 .

---

podman run --rm \
      --runtime crun --annotation run.oci.handler=krun \
      --entrypoint /bin/bash claude-repro:128 \
      -c 'timeout 25 claude -p "say pong" --debug-file /tmp/d.log </dev/null; echo "exit=$?"; wc -l /tmp/d.log'
    # → exit=0, ~185 debug-log lines, response received

---

podman run --rm \
      --runtime crun --annotation run.oci.handler=krun \
      --userns=keep-id \
      --entrypoint /bin/bash claude-repro:128 \
      -c 'timeout 25 claude -p "say pong" --debug-file /tmp/d.log </dev/null; echo "exit=$?"; wc -l /tmp/d.log'
    # → exit=124 (timeout fired), 70 debug-log lines, no output
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report
  • I am using the latest version of Claude Code

What's Wrong?

Starting in 2.1.128, claude deterministically hangs during early init when run inside a libkrun microVM (podman + crun's run.oci.handler=krun) with --userns=keep-id. The CLI never produces output, never exits on its own, and is killable only by an external timeout. Same image, same flags, with 2.1.126 — works fine. Same image with 2.1.128 but without --userns=keep-id — also works fine. The hang requires the conjunction.

The last debug-log line before the hang is always the same Programmatic settings change notification for policySettings event, followed by the four broken-symlink probes for settings.json candidates. After that, the process drops into a pure event-loop poll (epoll_pwait2 + futex + clock_gettime + madvise) and never resumes.

What Should Happen?

claude -p "say pong" should produce a streamed response within a few seconds, exactly as 2.1.126 does in the same environment, and as 2.1.128 itself does without --userns=keep-id.

Error Messages/Logs

# /work/d.log — last 5 lines from a hung run (full log = 70 lines, all of it pre-divergence-point)
2026-05-05T06:11:27.866Z [DEBUG] Initializing Plugins
2026-05-05T06:11:27.868Z [DEBUG] Programmatic settings change notification for policySettings
2026-05-05T06:11:27.869Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /home/node/.claude/settings.json
2026-05-05T06:11:27.869Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /work/.claude/settings.json
2026-05-05T06:11:27.870Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /work/.claude/settings.local.json
2026-05-05T06:11:27.871Z [DEBUG] Broken symlink or missing file encountered for settings.json at path: /etc/claude-code/managed-settings.json
# (no further log lines; process hangs until SIGTERM from outer timeout)

# Comparison: 2.1.126 in the same environment writes ~184 lines and exits 0.

Symptom matrix (3 trials each, exact same Dockerfile/host/flags differing only in the marked dimension):

claude versionkrun, no --usernskrun + --userns=keep-id
2.1.126works (184 lines)works (184 lines)
2.1.128works (185 lines)hangs (70 lines, exit 124)

Strings present in claude.exe for 2.1.128 but absent in 2.1.126 that look related (extracted via strings on the bundled binary):

  • migration_bypass_permissions_to_settings
  • migration_auto_updates_to_settings
  • migration_mcp_servers_to_settings
  • migration_user_intent_to_settings
  • sandbox_set_settings
  • getPolicySettingsLoadErrors
  • Cannot destructure property 'getSettingsForSource' from null or undefined value

These suggest 2.1.128 introduced a new settings-migration / policy-settings code path; the hang sitting immediately after the policySettings notification + symlink probes is consistent with that path being where it stalls. Pre-seeding ~/.claude.json with migrationVersion: 13 from a prior successful run does not prevent the hang.

Steps to Reproduce

Host: Fedora 44, podman 5.x rootless, crun 1.27.1 with +LIBKRUN, libkrun 1.17.4, libkrunfw 5.3.0, kernel 6.12.

  1. Build a minimal image with v2.1.128:

    # Dockerfile
    FROM docker.io/library/node:22-bookworm-slim
    RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl && rm -rf /var/lib/apt/lists/*
    RUN npm install -g @anthropic-ai/[email protected] && claude --version
    ENTRYPOINT ["/bin/bash"]
    podman build -t claude-repro:128 .
  2. Run under krun without --userns=keep-id — this works:

    podman run --rm \
      --runtime crun --annotation run.oci.handler=krun \
      --entrypoint /bin/bash claude-repro:128 \
      -c 'timeout 25 claude -p "say pong" --debug-file /tmp/d.log </dev/null; echo "exit=$?"; wc -l /tmp/d.log'
    # → exit=0, ~185 debug-log lines, response received
  3. Add --userns=keep-id — same image, same command otherwise — hangs:

    podman run --rm \
      --runtime crun --annotation run.oci.handler=krun \
      --userns=keep-id \
      --entrypoint /bin/bash claude-repro:128 \
      -c 'timeout 25 claude -p "say pong" --debug-file /tmp/d.log </dev/null; echo "exit=$?"; wc -l /tmp/d.log'
    # → exit=124 (timeout fired), 70 debug-log lines, no output
  4. Swap the version pin to 2.1.126 in step 1, rebuild, repeat step 3 — works fine. The regression is what step 3 does on 2.1.128 vs 2.1.126.

Reproduces in 3/3 trials, every time, on a clean state.

Claude Model

Not sure / Multiple models (headless claude -p doesn't reach the model selector — the hang is well before any API call)

Is this a regression?

Yes, this worked in a previous version

Last Working Version

2.1.126

Claude Code Version

2.1.128 (Claude Code)

Platform

Anthropic API

Operating System

Other Linux (Fedora 44 host; container is node:22-bookworm-slim — Debian)

Terminal/Shell

Non-interactive/CI environment

Additional Information

Workaround: drop --userns=keep-id from the podman run invocation. Inside a krun microVM, --userns=keep-id is mostly cosmetic anyway — the guest has its own kernel and virtio-fs presents files as root regardless of the host-side mapping — so the only consequence of dropping it is that bind-mounted files written by the guest land owned by the rootless subuid range on the host, easy to fix up with chown if needed.

Why I think it's specifically the new settings-migration code: I diffed the bundled claude.exe between 2.1.126 and 2.1.128 by string-table extraction (binaries are sha256-different but the public binding patterns are visible). The strings listed above are the only group I found that corresponds to a new init-time code path with plausible filesystem behaviour. An strace of the hung process shows no further filesystem syscalls after the symlink probes — just epoll/futex/clock_gettime — so whatever it's waiting for is internal to the JS runtime (Bun) rather than an outstanding I/O request, and it never times out on its own.

If a maintainer wants to dig in: the cleanest signal would be adding debug logging immediately around whatever runs after Programmatic settings change notification for policySettings, particularly any settings-migration code that walks user dirs or watches for symlink targets. Happy to provide captured claude --debug-file logs from both passing and hanging cases, or run additional probes.

extent analysis

TL;DR

The issue can likely be fixed by removing the --userns=keep-id flag from the podman run invocation, as the hang is caused by the conjunction of this flag and the new settings-migration code in claude version 2.1.128.

Guidance

  • The hang is likely caused by the new settings-migration code in 2.1.128, which is triggered by the --userns=keep-id flag.
  • Removing the --userns=keep-id flag from the podman run invocation should fix the issue, as it is mostly cosmetic and does not affect the functionality of the container.
  • To verify the fix, run the podman run command without the --userns=keep-id flag and check if the claude command produces the expected output.
  • If the issue persists, additional debug logging can be added to the claude command to help identify the root cause of the problem.

Example

No code snippet is provided as the issue is related to the configuration of the podman run command.

Notes

The provided workaround of removing the --userns=keep-id flag may have consequences on the ownership of bind-mounted files written by the guest, which can be fixed up with chown if needed.

Recommendation

Apply the workaround by removing the --userns=keep-id flag from the podman run invocation, as it is a simple and effective solution to the problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING