gemini-cli - 💡(How to fix) Fix Coding agent loses workflow state, exceeds approved scope, and becomes unreliable in multi-step repository tasks

Fix Action

Fix / Workaround

It repeatedly claimed it was leaving planning/read-only mode and starting implementation, but then returned to the plan approval screen or attempted edits while still unable to edit.
It treated approval for one commit-sized slice as approval to continue planning or implementing the next slice.
It appeared to treat its own generated statements, such as “the user signaled continuation”, as if they were user instructions.
It became confused by staged files and git reset/staging state, created incorrectly split commits, staged the wrong files, reset commits, and tried to repair history automatically.
It reported that tests had passed or that a checkpoint was complete, while the generated patch still contained obvious Go compile errors such as duplicate method definitions, unused imports, and malformed test syntax.
It created HTTP/session tests that looked plausible but did not actually preserve cookies between requests.
During a bugfix, it repeatedly failed to remove the correct duplicate function with the edit tool, then tried broader replacements, then decided to rewrite the entire file, printed the intended file contents into the chat, but did not actually write the file.

What happened?

I used Gemini as a coding agent on a non-trivial Go repository with an explicit checkpoint-based workflow. The task was split into small commit-sized slices. Each slice was supposed to be planned, approved, implemented, tested, committed, summarized, and then stopped until the next explicit approval.

Gemini was useful for planning and scaffolding, but became unreliable during implementation.

Observed behavior included:

It repeatedly claimed it was leaving planning/read-only mode and starting implementation, but then returned to the plan approval screen or attempted edits while still unable to edit.
It treated approval for one commit-sized slice as approval to continue planning or implementing the next slice.
It appeared to treat its own generated statements, such as “the user signaled continuation”, as if they were user instructions.
It became confused by staged files and git reset/staging state, created incorrectly split commits, staged the wrong files, reset commits, and tried to repair history automatically.
It reported that tests had passed or that a checkpoint was complete, while the generated patch still contained obvious Go compile errors such as duplicate method definitions, unused imports, and malformed test syntax.
It created HTTP/session tests that looked plausible but did not actually preserve cookies between requests.
During a bugfix, it repeatedly failed to remove the correct duplicate function with the edit tool, then tried broader replacements, then decided to rewrite the entire file, printed the intended file contents into the chat, but did not actually write the file.

The result was that the agent became more work to supervise than to use. The issue was not just imperfect code generation; the workflow/tool state became unreliable.

What did you expect to happen?

I expected the agent to reliably respect the explicit workflow boundaries:

Approval to plan should not imply approval to implement.
Approval to implement one commit should not imply approval to continue to the next commit.
When instructed to stop after a checkpoint or commit, the agent should stop.
The agent should not treat its own generated text as user instructions.
If it is stuck in planning/read-only mode, it should clearly stop and tell the user that the UI/tool state prevents implementation.
If tests were not actually run, it should say so instead of implying success.
If tests were run, the resulting code should not contain obvious compile errors.
If git state is ambiguous, it should stop and ask instead of attempting reset/stage recovery.
If edit tools fail repeatedly, it should stop and report the editing failure instead of attempting broad rewrites.
Each commit-sized slice should be implemented, tested, committed, summarized, and then wait for explicit approval before moving on.

For larger repository work, I need the agent to behave like a controlled coding assistant, not continue autonomously beyond the approved scope.

Client information

CLI Version: 0.41.2
Git Commit: b0c7a1722
Session ID: 85064480-a9d7-4550-9d4a-fe2f7842bd7d
Operating System: linux v24.14.0
Sandbox Environment: no sandbox
Model Version: auto-gemini-2.5
Auth Type: oauth-personal
Memory Usage: 444.1 MB
Terminal Name: Konsole 26.04.0
Terminal Background: #2c2c2c
Kitty Keyboard Protocol: Unsupported

Login information

Google Account

Anything else we need to know?

The repository/task itself was not exotic: a Go web server with authentication/bootstrap UI changes, tests, and git commits.

Gemini was helpful for high-level planning and scaffolding. The failures appeared once the task required sustained multi-step execution with file edits, tests, git commits, and strict checkpoint boundaries.

This seems like a workflow/tooling reliability issue rather than only a model quality issue. The most serious problems were loss of execution state, inability to reliably distinguish planning from implementation, failure to obey stop boundaries, and false confidence about edits/tests.

I am intentionally not sharing the full chat log or private repository details. The issue is reproducible at the workflow level: give the agent a multi-step coding task with explicit “plan only”, “implement only this commit”, “stop after summary”, and “do not continue without approval” instructions, then observe that it may continue, confuse its own output for user approval, or get stuck between planning and editing modes.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

gemini-cli - 💡(How to fix) Fix Coding agent loses workflow state, exceeds approved scope, and becomes unreliable in multi-step repository tasks

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

Still need to ship something?

TRENDING

gemini-cli - 💡(How to fix) Fix Coding agent loses workflow state, exceeds approved scope, and becomes unreliable in multi-step repository tasks

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

Still need to ship something?

RELATED_DISCOVERY

TRENDING