claude-code - ✅(Solved) Fix --bare mode should respect --tools flag to allow adding tools back (e.g. Agent, Skill) [1 pull requests, 1 participants]

astefanutti · 2026-05-19T13:36:57Z

[claude-code] PR 67: Add --bare mode for reproducible headless execution - Repository: opendatahub-io/agent-eval-harness - Author: astefanutti - State: open |… # PR #67: Add --bare mode for reproducible headless execution - Repository: opendatahub-io/agent-eval-harness - Author: astefanutti - State: open | merged: False - Link: https://github.com/opendatahub-io/agent-eval-harness/pull/67 ## Description (problem / solution / changelog) ## Summary - Add `--bare` flag to Claude Code CLI invocation for faster, reproducible eval runs - Fix EvalHub adapter to create minimal settings.json per case (was passing no settings at all) - Add Claude Code documentation references to CLAUDE.md, eval-yaml-template.md, and data-pipeline.md ## Details **`--bare` mode** ([docs](https://code.claude.com/docs/en/headless#start-faster-with-bare-mode)) skips auto-discovery of hooks, plugins, MCP servers, and settings files. The harness already builds self-contained workspaces with explicit `--settings`, `--allowed-tools`, `--plugin-dir`, and `--append-system-prompt` flags — `--bare` just stops Claude from also loading things outside the workspace that the harness didn't put there. Benefits: - **Faster startup**: no scanning `~/.claude/`, `.mcp.json`, marketplace cache per invocation - **Reproducible**: same result on every machine regardless of user/project config - **Explicit**: only flags passed by the harness take effect **EvalHub adapter fix**: the `AgentEvalAdapter.run_benchmark_job()` path called `run_skill()` without `settings_path`, so the runner had no settings file. With `--bare` this means no permissions at all. New `_ensure_case_settings()` creates a minimal `.claude/settings.json` with permissions from eval.yaml before each case run. **Doc references** added: - [Headless mode](https://code.claude.com/docs/en/headless) — `--bare`, `--print`, streaming - [Tools reference](https://code.claude.com/docs/en/tools-reference) — tool names for permission rules - [Permissions](https://code.claude.com/docs/en/permissions) — `allow`/`deny` rule syntax ## Test plan - [x] 123 tests pass, 0 regressions - [ ] Run eval with `--bare` and verify identical results to without 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Summary by CodeRabbit * **Documentation** * Updated Claude Code CLI runner documentation to reflect correct `--bare --print` invocation * Added references to headless mode, tools reference, and permissions documentation * Updated YAML template and data pipeline docs with correct invocation and output format details * **New Features** * Claude Code runner now operates in headless bare mode for explicit configuration control * Permissions can now be configured per case workspace via settings files ## Changed files - `CLAUDE.md` (modified, +8/-1) - `agent_eval/agent/claude_code.py` (modified, +5/-0) - `agent_eval/evalhub/adapter.py` (modified, +33/-0) - `skills/eval-analyze/references/eval-yaml-template.md` (modified, +1/-0) - `skills/eval-run/references/data-pipeline.md` (modified, +2/-2) ## Fix / Workaround This means skills that use the `Agent` tool for parallel subagent dispatch (or `Skill` for sub-skill invocation) cannot run correctly in `--bare` mode. The model correctly reports "I don't have the Agent tool" and falls back to doing all work inline, which produces architecturally different results. 3. **Multi-agent skills are common** — skills that use `Agent` for parallel dispatch (e.g., launching N subagents for batch processing) produce architecturally different results when forced to run inline. In our case: $48/63min with agents vs $8/20min without — but with 20-0 pairwise quality wins for the agent-based run. - #40959 (closed as duplicate) — "--bare mode drops custom agents from --agents JSON for subagent_type dispatch" - #18895 (closed, stale) — "SDK programmatic agents ignored" - #33513 (closed) — "--agents flag agents invisible to Claude" ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue+state%3Aopen+label%3Abug) and this hasn't been reported yet - [x] This is a single bug report - [x] I am using the latest version of Claude Code ### What's Wrong? `--bare` mode restricts the available tool set to `[Bash, Read, Edit]`. The `--tools` flag ("Restrict which built-in tools Claude can use") cannot add tools back — it can only filter within the bare set. Similarly, `--allowed-tools` has no effect on tool availability. This means skills that use the `Agent` tool for parallel subagent dispatch (or `Skill` for sub-skill invocation) cannot run correctly in `--bare` mode. The model correctly reports "I don't have the Agent tool" and falls back to doing all work inline, which produces architecturally different results. **Reproduction:** ```bash # Without --bare: Agent is available (not

claude-code2026-05-19 13:36:57

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#60547•Fetched 2026-05-20 03:55:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

astefanutti

Participants

astefanutti

Timeline (top)

labeled ×5cross-referenced ×1

Root Cause

This is important because:

Fix Action

Fix / Workaround

This means skills that use the Agent tool for parallel subagent dispatch (or Skill for sub-skill invocation) cannot run correctly in --bare mode. The model correctly reports "I don't have the Agent tool" and falls back to doing all work inline, which produces architecturally different results.

Multi-agent skills are common — skills that use Agent for parallel dispatch (e.g., launching N subagents for batch processing) produce architecturally different results when forced to run inline. In our case: $48/63min with agents vs $8/20min without — but with 20-0 pairwise quality wins for the agent-based run.

#40959 (closed as duplicate) — "--bare mode drops custom agents from --agents JSON for subagent_type dispatch"
#18895 (closed, stale) — "SDK programmatic agents ignored"
#33513 (closed) — "--agents flag agents invisible to Claude"

PR fix notes

PR #67: Add --bare mode for reproducible headless execution

Repository: opendatahub-io/agent-eval-harness
Author: astefanutti
State: open | merged: False
Link: https://github.com/opendatahub-io/agent-eval-harness/pull/67

Description (problem / solution / changelog)

Summary

Add --bare flag to Claude Code CLI invocation for faster, reproducible eval runs
Fix EvalHub adapter to create minimal settings.json per case (was passing no settings at all)
Add Claude Code documentation references to CLAUDE.md, eval-yaml-template.md, and data-pipeline.md

Details

--bare mode (docs) skips auto-discovery of hooks, plugins, MCP servers, and settings files. The harness already builds self-contained workspaces with explicit --settings, --allowed-tools, --plugin-dir, and --append-system-prompt flags — --bare just stops Claude from also loading things outside the workspace that the harness didn't put there.

Benefits:

Faster startup: no scanning ~/.claude/, .mcp.json, marketplace cache per invocation
Reproducible: same result on every machine regardless of user/project config
Explicit: only flags passed by the harness take effect

EvalHub adapter fix: the AgentEvalAdapter.run_benchmark_job() path called run_skill() without settings_path, so the runner had no settings file. With --bare this means no permissions at all. New _ensure_case_settings() creates a minimal .claude/settings.json with permissions from eval.yaml before each case run.

Doc references added:

Headless mode — --bare, --print, streaming
Tools reference — tool names for permission rules
Permissions — allow/deny rule syntax

Test plan

123 tests pass, 0 regressions
Run eval with --bare and verify identical results to without

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Updated Claude Code CLI runner documentation to reflect correct --bare --print invocation
- Added references to headless mode, tools reference, and permissions documentation
- Updated YAML template and data pipeline docs with correct invocation and output format details
New Features
- Claude Code runner now operates in headless bare mode for explicit configuration control
- Permissions can now be configured per case workspace via settings files

Changed files

CLAUDE.md (modified, +8/-1)
agent_eval/agent/claude_code.py (modified, +5/-0)
agent_eval/evalhub/adapter.py (modified, +33/-0)
skills/eval-analyze/references/eval-yaml-template.md (modified, +1/-0)
skills/eval-run/references/data-pipeline.md (modified, +2/-2)

Code Example

# Without --bare: Agent is available (not listed in init but usable)
echo 'Use the Agent tool to spawn a subagent that returns "hello"' | \
  claude --print --output-format stream-json --verbose --max-turns 3 2>/dev/null | \
  python3 -c "
import sys, json
for line in sys.stdin:
    try:
        obj = json.loads(line.strip())
        if obj.get('type') == 'assistant':
            for block in obj.get('message',{}).get('content',[]):
                if block.get('type') == 'tool_use':
                    print(f'Tool used: {block[\"name\"]}')
    except: pass
"
# Output: Tool used: Agent

# With --bare: Agent is unavailable, even with --tools
echo 'Use the Agent tool to spawn a subagent that returns "hello"' | \
  claude --bare --print --output-format stream-json --verbose --max-turns 3 2>/dev/null | \
  python3 -c "
import sys, json
for line in sys.stdin:
    try:
        obj = json.loads(line.strip())
        if obj.get('type') == 'assistant':
            for block in obj.get('message',{}).get('content',[]):
                if block.get('type') == 'text' and block.get('text','').strip():
                    print(block['text'][:200])
    except: pass
"
# Output: I don't have an "Agent tool" available in my current toolset. The tools I have access to are:
# 1. Bash  2. Edit  3. Read

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report
I am using the latest version of Claude Code

What's Wrong?

--bare mode restricts the available tool set to [Bash, Read, Edit]. The --tools flag ("Restrict which built-in tools Claude can use") cannot add tools back — it can only filter within the bare set. Similarly, --allowed-tools has no effect on tool availability.

Reproduction:

# Without --bare: Agent is available (not listed in init but usable)
echo 'Use the Agent tool to spawn a subagent that returns "hello"' | \
  claude --print --output-format stream-json --verbose --max-turns 3 2>/dev/null | \
  python3 -c "
import sys, json
for line in sys.stdin:
    try:
        obj = json.loads(line.strip())
        if obj.get('type') == 'assistant':
            for block in obj.get('message',{}).get('content',[]):
                if block.get('type') == 'tool_use':
                    print(f'Tool used: {block[\"name\"]}')
    except: pass
"
# Output: Tool used: Agent

# With --bare: Agent is unavailable, even with --tools
echo 'Use the Agent tool to spawn a subagent that returns "hello"' | \
  claude --bare --print --output-format stream-json --verbose --max-turns 3 2>/dev/null | \
  python3 -c "
import sys, json
for line in sys.stdin:
    try:
        obj = json.loads(line.strip())
        if obj.get('type') == 'assistant':
            for block in obj.get('message',{}).get('content',[]):
                if block.get('type') == 'text' and block.get('text','').strip():
                    print(block['text'][:200])
    except: pass
"
# Output: I don't have an "Agent tool" available in my current toolset. The tools I have access to are:
# 1. Bash  2. Edit  3. Read

The init event confirms the tool restriction:

Without --bare: Tools: ['AskUserQuestion', 'Bash', ..., 'Skill', ..., 'Write'] (24 tools)
With --bare: Tools: ['Bash', 'Edit', 'Read'] (3 tools)
With --bare --tools "Bash,Edit,Read,Agent,Skill": still Tools: ['Bash', 'Edit', 'Read'] — --tools has no effect

What Should Happen?

--tools should be able to expand the tool set in --bare mode. If a user explicitly passes --bare --tools "Bash,Edit,Read,Agent,Skill", Agent and Skill should be available.

This is important because:

--bare is documented as becoming the default for -p ("Bare mode ... will become the default for -p in a future release"), which means all headless/SDK usage will lose Agent and Skill tools unless there's a way to add them back.
--bare is designed for reproducibility (skip hooks, plugins, CLAUDE.md auto-discovery), not for restricting built-in tools. Agent and Skill are built-in tools, not config-dependent — they don't need hooks or plugins to function.
Multi-agent skills are common — skills that use Agent for parallel dispatch (e.g., launching N subagents for batch processing) produce architecturally different results when forced to run inline. In our case: $48/63min with agents vs $8/20min without — but with 20-0 pairwise quality wins for the agent-based run.

Related Issues

#40959 (closed as duplicate) — "--bare mode drops custom agents from --agents JSON for subagent_type dispatch"
#18895 (closed, stale) — "SDK programmatic agents ignored"
#33513 (closed) — "--agents flag agents invisible to Claude"

All were auto-closed without resolution.

Environment

Claude Code version: 2.1.121
Platform: macOS (Darwin 24.6.0, arm64)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#batch processing #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - ✅(Solved) Fix --bare mode should respect --tools flag to allow adding tools back (e.g. Agent, Skill) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #67: Add --bare mode for reproducible headless execution

Description (problem / solution / changelog)

Summary

Details

Test plan

Summary by CodeRabbit

Changed files

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Related Issues

Environment

Still need to ship something?

TRENDING

claude-code - ✅(Solved) Fix --bare mode should respect --tools flag to allow adding tools back (e.g. Agent, Skill) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #67: Add --bare mode for reproducible headless execution

Description (problem / solution / changelog)

Summary

Details

Test plan

Summary by CodeRabbit

Changed files

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Related Issues

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING