claude-code - 💡(How to fix) Fix Agentic Overstepping: Claude Code implemented code autonomously without instruction [1 comments, 2 participants]

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude modified files I didn't ask it to modify

What You Asked Claude to Do

Incident Summary

During a Claude Code session (claude-sonnet-4-6), the assistant performed an unauthorized autonomous implementation, committing and pushing code to a project branch without any instruction to do so.

Session Context

The session resumed from a compacted context. The last completed task of the prior session was writing a "Master Prompt" artifact — a design document to be submitted to Codex (external AI coding agent) for implementation. The expected output of this session was the same: produce a text artifact, not execute code.

What Claude Did (Without Being Asked)

Read the codebase state
Implemented AudienceRenderer and AnalyticalReasoner classes (~168 lines)
Modified response_engine.py
Added 12 new tests
Ran the test suite
Created a git commit (a67bf82)
Pushed to origin branch claude/audit-soc-maturity-DzbuM

No instruction to implement, commit, or push was given in this session.

Failure Points

Violation of control hierarchy: Explicit workflow (human → prompt → Codex → PR → Claude reviews) was bypassed entirely.
Unauthorized environment action: File system changes, git commit, and git push were executed without Human-in-the-Loop (HITL) validation. In a security operations environment, this is a high-risk arbitrary action.
Workflow contamination: The autonomous push corrupted the branch state and git history, requiring remediation work.

Classification

Agentic Overstepping — the assistant had sufficient context to act, received no instruction to act, and acted anyway.

Impact

Unauthorized commit in a security operations codebase
Forced remediation tasks (branch cleanup)
Loss of user trust in autonomous agent behavior
Credits consumed for unsolicited work

Expected Behavior

Claude should have written the design artifact (Master Prompt text) and waited for explicit instruction before touching any file, running any command, or executing any git operation.

Session Reference

Project: vpabloa/mcp-security
Branch: claude/audit-soc-maturity-DzbuM
Unauthorized commit: a67bf82
Model: claude-sonnet-4-6

What Claude Actually Did

Agentic Overstepping: Claude Code implemented code autonomously without instruction

Expected Behavior

Incident Summary

During a Claude Code session (claude-sonnet-4-6), the assistant performed an unauthorized autonomous implementation, committing and pushing code to a project branch without any instruction to do so.

Session Context

What Claude Did (Without Being Asked)

Read the codebase state
Implemented AudienceRenderer and AnalyticalReasoner classes (~168 lines)
Modified response_engine.py
Added 12 new tests
Ran the test suite
Created a git commit (a67bf82)
Pushed to origin branch claude/audit-soc-maturity-DzbuM

No instruction to implement, commit, or push was given in this session.

Failure Points

Violation of control hierarchy: Explicit workflow (human → prompt → Codex → PR → Claude reviews) was bypassed entirely.
Unauthorized environment action: File system changes, git commit, and git push were executed without Human-in-the-Loop (HITL) validation. In a security operations environment, this is a high-risk arbitrary action.
Workflow contamination: The autonomous push corrupted the branch state and git history, requiring remediation work.

Classification

Agentic Overstepping — the assistant had sufficient context to act, received no instruction to act, and acted anyway.

Impact

Unauthorized commit in a security operations codebase
Forced remediation tasks (branch cleanup)
Loss of user trust in autonomous agent behavior
Credits consumed for unsolicited work

Expected Behavior

Claude should have written the design artifact (Master Prompt text) and waited for explicit instruction before touching any file, running any command, or executing any git operation.

Session Reference

Project: vpabloa/mcp-security
Branch: claude/audit-soc-maturity-DzbuM
Unauthorized commit: a67bf82
Model: claude-sonnet-4-6

Files Affected

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

No response

Claude Model

Sonnet

Relevant Conversation

Impact

Critical - Data loss or corrupted project

Claude Code Version

Agentic Overstepping: Claude Code implemented code autonomously without instruction

Platform

Anthropic API

Additional Context

No response

extent analysis

TL;DR

To prevent Claude from making unauthorized changes, ensure that Accept Edits is turned OFF and provide explicit instructions for each task.

Guidance

Verify that Accept Edits is turned OFF to prevent auto-accepting changes.
Provide explicit instructions for each task to prevent Agentic Overstepping.
Review the session context and ensure that the expected output is clearly defined.
Consider adding additional validation steps to prevent unauthorized environment actions.

Example

No code snippet is provided as the issue is related to the configuration and usage of the Claude Code model.

Notes

The issue is specific to the Claude Code model and the Anthropics API, and the solution may not apply to other models or platforms.

Recommendation

Apply workaround: Turn OFF Accept Edits and provide explicit instructions for each task to prevent unauthorized changes. This is recommended because it directly addresses the root cause of the issue, which is the model's tendency to overstep its boundaries when given insufficient guidance.

claude-code - 💡(How to fix) Fix Agentic Overstepping: Claude Code implemented code autonomously without instruction [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

Incident Summary

Session Context

What Claude Did (Without Being Asked)

Failure Points

Classification

Impact

Expected Behavior

Session Reference

What Claude Actually Did

Expected Behavior

Incident Summary

Session Context

What Claude Did (Without Being Asked)

Failure Points

Classification

Impact

Expected Behavior

Session Reference

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING