gemini-cli - 💡(How to fix) Fix [Bug/Perf] Redundant utility_router calls in LocalAgentExecutor during subagent execution [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#25156Fetched 2026-04-11 06:31:05
View on GitHub
Comments
2
Participants
3
Timeline
9
Reactions
0
Author
Timeline (top)
commented ×2labeled ×2mentioned ×2subscribed ×2

Root Cause

Because the utility_router (typically using gemini-2.5-flash-lite) is used to classify task complexity, calling it repeatedly within the same subagent session is redundant. For a subagent task that requires 10 turns to complete, the system makes 10 extra AI requests for classification.

Code Example

> /about
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                     │
About Gemini CLI│                                                                                                                                                     │
CLI Version                                        0.39.0-nightly.20260408.e77b22e63Git Commit                                         21e1c6092                                                                                        │
Model                                              Auto (Gemini 3)Sandbox                                            no sandbox                                                                                       │
OS                                                 win32                                                                                            │
Auth Method                                        Signed in with Google (i@lm902.cn)Tier                                               Gemini Code Assist in Google One AI Pro│                                                                                                                                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RAW_BUFFERClick to expand / collapse

What happened?

When invoking a subagent (e.g., @codebase_investigator or @generalist) with the model configuration set to auto (or inherited auto), the LocalAgentExecutor triggers a fresh call to ModelRouterService.route() on every single turn of the subagent's loop.

Because the utility_router (typically using gemini-2.5-flash-lite) is used to classify task complexity, calling it repeatedly within the same subagent session is redundant. For a subagent task that requires 10 turns to complete, the system makes 10 extra AI requests for classification.

This behavior differs from the main GeminiClient implementation. The GeminiClient correctly implements "model stickiness" by caching the routing decision in currentSequenceModel and reusing it for the duration of a sequence. LocalAgentExecutor currently lacks this caching mechanism, leading to:

  1. Increased Latency: Every turn waits for an extra classification call before the actual model generation.
  2. Wasted Quota: Multiplies the number of API requests sent to the routing model.
  3. Redundant Telemetry: Floods the logs with identical routing decisions for the same task.

Image: Example of Flash Lite usage almost mirroring Flash usage after heavy usage of subagents.

<img width="2902" height="1049" alt="Image" src="https://github.com/user-attachments/assets/97e8604c-de18-4f36-9ee9-4b08fabd8305" />

What did you expect to happen?

The LocalAgentExecutor should cache the routing decision made during the first turn of a subagent's execution. Subsequent turns within that same subagent session should reuse the cached model choice instead of re-invoking the utility_router.

Client information

<details> <summary>Client Information</summary>

Run gemini to enter the interactive CLI, then run the /about command.

> /about
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                     │
│ About Gemini CLI                                                                                                                                    │
│                                                                                                                                                     │
│ CLI Version                                        0.39.0-nightly.20260408.e77b22e63                                                                │
│ Git Commit                                         21e1c6092                                                                                        │
│ Model                                              Auto (Gemini 3)                                                                                  │
│ Sandbox                                            no sandbox                                                                                       │
│ OS                                                 win32                                                                                            │
│ Auth Method                                        Signed in with Google ([email protected])                                                               │
│ Tier                                               Gemini Code Assist in Google One AI Pro                                                          │
│                                                                                                                                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
</details>

Login information

Signed in with Google

Anything else we need to know?

Users can verify this by running /stats model and observing the disproportionately high call count for the utility_router role compared to the number of user-initiated prompts.

extent analysis

TL;DR

Implementing a caching mechanism in LocalAgentExecutor to store the routing decision for the duration of a subagent session is likely to fix the issue.

Guidance

  • Identify the LocalAgentExecutor class and locate the method responsible for calling ModelRouterService.route() to understand where the caching mechanism needs to be implemented.
  • Consider adding a cache, such as a dictionary, to store the routing decision for each subagent session, using a unique identifier for the session as the key.
  • Modify the LocalAgentExecutor to check the cache before calling ModelRouterService.route(), and update the cache with the routing decision if it's not already stored.
  • Verify the fix by running /stats model and checking that the call count for the utility_router role is reduced.

Example

# Pseudo-code example of caching mechanism
class LocalAgentExecutor:
    def __init__(self):
        self.route_cache = {}

    def execute_subagent(self, subagent_id):
        if subagent_id not in self.route_cache:
            # Call ModelRouterService.route() and store the result in the cache
            route_decision = ModelRouterService.route()
            self.route_cache[subagent_id] = route_decision
        else:
            # Reuse the cached routing decision
            route_decision = self.route_cache[subagent_id]
        # Proceed with the subagent execution using the route decision

Notes

The exact implementation of the caching mechanism may vary depending on the specific requirements and constraints of the system. Additionally, the cache should be properly invalidated when a subagent session ends to avoid stale data.

Recommendation

Apply a workaround by implementing a caching mechanism in LocalAgentExecutor, as this is likely to fix the issue and reduce the redundant API requests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING