gemini-cli - 💡(How to fix) Fix Coding agent loses workflow state, exceeds approved scope, and becomes unreliable in multi-step repository tasks

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

  • It repeatedly claimed it was leaving planning/read-only mode and starting implementation, but then returned to the plan approval screen or attempted edits while still unable to edit.
  • It treated approval for one commit-sized slice as approval to continue planning or implementing the next slice.
  • It appeared to treat its own generated statements, such as “the user signaled continuation”, as if they were user instructions.
  • It became confused by staged files and git reset/staging state, created incorrectly split commits, staged the wrong files, reset commits, and tried to repair history automatically.
  • It reported that tests had passed or that a checkpoint was complete, while the generated patch still contained obvious Go compile errors such as duplicate method definitions, unused imports, and malformed test syntax.
  • It created HTTP/session tests that looked plausible but did not actually preserve cookies between requests.
  • During a bugfix, it repeatedly failed to remove the correct duplicate function with the edit tool, then tried broader replacements, then decided to rewrite the entire file, printed the intended file contents into the chat, but did not actually write the file.
RAW_BUFFERClick to expand / collapse

What happened?

I used Gemini as a coding agent on a non-trivial Go repository with an explicit checkpoint-based workflow. The task was split into small commit-sized slices. Each slice was supposed to be planned, approved, implemented, tested, committed, summarized, and then stopped until the next explicit approval.

Gemini was useful for planning and scaffolding, but became unreliable during implementation.

Observed behavior included:

  • It repeatedly claimed it was leaving planning/read-only mode and starting implementation, but then returned to the plan approval screen or attempted edits while still unable to edit.
  • It treated approval for one commit-sized slice as approval to continue planning or implementing the next slice.
  • It appeared to treat its own generated statements, such as “the user signaled continuation”, as if they were user instructions.
  • It became confused by staged files and git reset/staging state, created incorrectly split commits, staged the wrong files, reset commits, and tried to repair history automatically.
  • It reported that tests had passed or that a checkpoint was complete, while the generated patch still contained obvious Go compile errors such as duplicate method definitions, unused imports, and malformed test syntax.
  • It created HTTP/session tests that looked plausible but did not actually preserve cookies between requests.
  • During a bugfix, it repeatedly failed to remove the correct duplicate function with the edit tool, then tried broader replacements, then decided to rewrite the entire file, printed the intended file contents into the chat, but did not actually write the file.

The result was that the agent became more work to supervise than to use. The issue was not just imperfect code generation; the workflow/tool state became unreliable.

What did you expect to happen?

I expected the agent to reliably respect the explicit workflow boundaries:

  • Approval to plan should not imply approval to implement.
  • Approval to implement one commit should not imply approval to continue to the next commit.
  • When instructed to stop after a checkpoint or commit, the agent should stop.
  • The agent should not treat its own generated text as user instructions.
  • If it is stuck in planning/read-only mode, it should clearly stop and tell the user that the UI/tool state prevents implementation.
  • If tests were not actually run, it should say so instead of implying success.
  • If tests were run, the resulting code should not contain obvious compile errors.
  • If git state is ambiguous, it should stop and ask instead of attempting reset/stage recovery.
  • If edit tools fail repeatedly, it should stop and report the editing failure instead of attempting broad rewrites.
  • Each commit-sized slice should be implemented, tested, committed, summarized, and then wait for explicit approval before moving on.

For larger repository work, I need the agent to behave like a controlled coding assistant, not continue autonomously beyond the approved scope.

Client information

  • CLI Version: 0.41.2
  • Git Commit: b0c7a1722
  • Session ID: 85064480-a9d7-4550-9d4a-fe2f7842bd7d
  • Operating System: linux v24.14.0
  • Sandbox Environment: no sandbox
  • Model Version: auto-gemini-2.5
  • Auth Type: oauth-personal
  • Memory Usage: 444.1 MB
  • Terminal Name: Konsole 26.04.0
  • Terminal Background: #2c2c2c
  • Kitty Keyboard Protocol: Unsupported

Login information

Google Account

Anything else we need to know?

The repository/task itself was not exotic: a Go web server with authentication/bootstrap UI changes, tests, and git commits.

Gemini was helpful for high-level planning and scaffolding. The failures appeared once the task required sustained multi-step execution with file edits, tests, git commits, and strict checkpoint boundaries.

This seems like a workflow/tooling reliability issue rather than only a model quality issue. The most serious problems were loss of execution state, inability to reliably distinguish planning from implementation, failure to obey stop boundaries, and false confidence about edits/tests.

I am intentionally not sharing the full chat log or private repository details. The issue is reproducible at the workflow level: give the agent a multi-step coding task with explicit “plan only”, “implement only this commit”, “stop after summary”, and “do not continue without approval” instructions, then observe that it may continue, confuse its own output for user approval, or get stuck between planning and editing modes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

gemini-cli - 💡(How to fix) Fix Coding agent loses workflow state, exceeds approved scope, and becomes unreliable in multi-step repository tasks