gemini-cli - 💡(How to fix) Fix [bug] --model pin silently leaks: hardcoded aux-role models + single-element last-resort fallback override user selection

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

gemini --model <pin> does not reliably pin inference to the specified model. Two separate upstream mechanisms inside gemini-cli 0.41.2 silently reroute requests to higher-tier models (mostly gemini-2.5-pro/gemini-3-pro-preview) regardless of the user's pin. This makes cost control and model-comparative benchmarking impossible — the CLI attributes requests to the pinned model in its local stats.models JSON, but server-side billing and actual model behavior reflect the leaked tier.


Error Message

gemini --model definitely-not-a-real-model-xyz 'echo hi'

→ ModelNotFoundError: Requested entity was not found. code: 404

Root Cause

Root Cause — Two Mechanisms

Code Example

# Expect: model responds as gemini-3.1-flash-lite-preview
# Observe: textual self-report is gemini-2.0-pro-exp-02-05 (Pro family)
GEMINI_CLI_TRUST_WORKSPACE=true KESTREL_NON_INTERACTIVE=1 \
  gemini --skip-trust \
        --model gemini-3.1-flash-lite-preview \
        --output-format json \
        'Reply with EXACTLY this format and nothing else: MODEL=<the exact model id you are running as>' \
        < /dev/null

---

gemini --model definitely-not-a-real-model-xyz 'echo hi'
# → ModelNotFoundError: Requested entity was not found. code: 404

---

[Routing] Selected model: gemini-3.1-flash-lite-preview (Source: agent-router/override, Latency: 0ms)
[Routing] Reasoning: Routing bypassed by forced model directive. Using: gemini-3.1-flash-lite-preview

---

"chat-compression-3-pro":         { modelConfig: { model: "gemini-3-pro-preview" } },
"chat-compression-default":       { modelConfig: { model: "gemini-3-pro-preview" } },
"loop-detection-double-check":    { modelConfig: { model: "gemini-3-pro-preview" } },
"agent-history-provider-summarizer": { modelConfig: { model: "gemini-3-flash-preview" } },

---

if (options.previewEnabled) {
  return [
    definePolicy({ model: previewModel }),                              // gemini-3-pro-preview
    definePolicy({ model: PREVIEW_GEMINI_FLASH_MODEL, isLastResort: true })  // gemini-3-flash-preview
  ];
}

---

return [createDefaultPolicy(resolvedModel, { isLastResort: true })];

---

const selection = config2.getModelAvailabilityService().selectFirstAvailable(chain2.map((p) => p.model));
if (selection.selectedModel) return selection;
const backupModel = chain2.find((p) => p.isLastResort)?.model ?? DEFAULT_GEMINI_MODEL;
return { selectedModel: backupModel, skipped: [] };

---

if (!hasAccessToPreview && isPreviewModel(resolved)) {
  switch (resolved) {
    case PREVIEW_GEMINI_3_1_FLASH_LITE_MODEL: return DEFAULT_GEMINI_FLASH_LITE_MODEL;  // gemini-2.5-flash-lite
    case PREVIEW_GEMINI_MODEL: case PREVIEW_GEMINI_3_1_MODEL: return DEFAULT_GEMINI_MODEL;  // gemini-2.5-pro
    ...
  }
}

---

$ gemini --strict-model --model gemini-3.1-flash-lite-preview 'hello'
# If quota exhausted → exits non-zero: "Error: model gemini-3.1-flash-lite-preview unavailable (quota exhausted); --strict-model prevents fallback"
# If no preview access → exits non-zero: "Error: model gemini-3.1-flash-lite-preview requires preview access; --strict-model prevents downgrade"
RAW_BUFFERClick to expand / collapse

Summary

gemini --model <pin> does not reliably pin inference to the specified model. Two separate upstream mechanisms inside gemini-cli 0.41.2 silently reroute requests to higher-tier models (mostly gemini-2.5-pro/gemini-3-pro-preview) regardless of the user's pin. This makes cost control and model-comparative benchmarking impossible — the CLI attributes requests to the pinned model in its local stats.models JSON, but server-side billing and actual model behavior reflect the leaked tier.


Repro

# Expect: model responds as gemini-3.1-flash-lite-preview
# Observe: textual self-report is gemini-2.0-pro-exp-02-05 (Pro family)
GEMINI_CLI_TRUST_WORKSPACE=true KESTREL_NON_INTERACTIVE=1 \
  gemini --skip-trust \
        --model gemini-3.1-flash-lite-preview \
        --output-format json \
        'Reply with EXACTLY this format and nothing else: MODEL=<the exact model id you are running as>' \
        < /dev/null

Observed (from response field in --output-format json):

--model passedModel's textual self-report
gemini-3.1-flash-lite-previewMODEL=gemini-2.0-pro-exp-02-05
gemini-3-flash-previewMODEL=gemini-2.0-flash
gemini-3.1-pro-previewMODEL=gemini-2.5-pro

Expected: model self-reports the pinned model id (or an error if that model is unavailable).

Negative control — fake model errors hard (good):

gemini --model definitely-not-a-real-model-xyz 'echo hi'
# → ModelNotFoundError: Requested entity was not found. code: 404

So the model name IS sent to the server; the silent fallback only kicks in for known preview models.

Routing log shows pin is initially honored:

[Routing] Selected model: gemini-3.1-flash-lite-preview (Source: agent-router/override, Latency: 0ms)
[Routing] Reasoning: Routing bypassed by forced model directive. Using: gemini-3.1-flash-lite-preview

The fallback fires after this log line, inside the policy-chain availability code.


Root Cause — Two Mechanisms

Mechanism A: Hardcoded aux-role models (chunk-6DSAZLFF.js lines 318556–318590)

"chat-compression-3-pro":         { modelConfig: { model: "gemini-3-pro-preview" } },
"chat-compression-default":       { modelConfig: { model: "gemini-3-pro-preview" } },
"loop-detection-double-check":    { modelConfig: { model: "gemini-3-pro-preview" } },
"agent-history-provider-summarizer": { modelConfig: { model: "gemini-3-flash-preview" } },

Helper roles that fire automatically during every agentic turn (loop detection, chat compression, history summarization) hardcode gemini-3-pro-preview regardless of the user's --model pin. Even for a single-turn prompt these roles are invoked. On Google's server-side usage dashboard every one of those calls counts toward Pro quota — entirely invisible to the user.

Mechanism B: Single-element last-resort chain → silent gemini-2.5-pro fallback

getModelPolicyChain (chunk-6DSAZLFF.js lines 270044–270049):

if (options.previewEnabled) {
  return [
    definePolicy({ model: previewModel }),                              // gemini-3-pro-preview
    definePolicy({ model: PREVIEW_GEMINI_FLASH_MODEL, isLastResort: true })  // gemini-3-flash-preview
  ];
}

gemini-3.1-flash-lite-preview is not in this chain.

applyDynamicSlicing (chunk-6DSAZLFF.js lines 270153–270158): when the requested model isn't found in the chain, it falls back to:

return [createDefaultPolicy(resolvedModel, { isLastResort: true })];

Flash-Lite-preview ends up in a single-element last-resort chain.

selectModelForAvailability (chunk-6DSAZLFF.js lines 270183–270189):

const selection = config2.getModelAvailabilityService().selectFirstAvailable(chain2.map((p) => p.model));
if (selection.selectedModel) return selection;
const backupModel = chain2.find((p) => p.isLastResort)?.model ?? DEFAULT_GEMINI_MODEL;
return { selectedModel: backupModel, skipped: [] };

DEFAULT_GEMINI_MODEL = "gemini-2.5-pro" (bundle line 43501). The ModelAvailabilityService.health map persists across requests in a single CLI process; once Flash-Lite-Preview returns a 429 or TerminalQuotaError, markTerminal() sets available = false for the lifetime of the process, and every subsequent request silently routes to Pro.

Bonus — Mechanism C: resolveModel preview-access-revoked silent downgrade (chunk-XRLFHCHC.js lines 43576–43594)

if (!hasAccessToPreview && isPreviewModel(resolved)) {
  switch (resolved) {
    case PREVIEW_GEMINI_3_1_FLASH_LITE_MODEL: return DEFAULT_GEMINI_FLASH_LITE_MODEL;  // gemini-2.5-flash-lite
    case PREVIEW_GEMINI_MODEL: case PREVIEW_GEMINI_3_1_MODEL: return DEFAULT_GEMINI_MODEL;  // gemini-2.5-pro
    ...
  }
}

If preview access is revoked (or never confirmed), the model is silently rewritten to a 2.5-family equivalent with no warning to the user.


Requested Fix

Please add a --strict-model flag (or an equivalent opt-in) that:

  1. Prevents aux-role model overrides (Mechanism A): when --strict-model is set, chat-compression-default, loop-detection-double-check, and other helper roles must use the user's pinned model (or be disabled) rather than their hardcoded Pro assignments.

  2. Disables the last-resort silent fallback (Mechanism B): when --strict-model is set, selectModelForAvailability must return an error (non-zero exit, descriptive message) rather than silently routing to DEFAULT_GEMINI_MODEL when the pinned model is unavailable.

  3. Disables the preview-access-revoked silent downgrade (Mechanism C): when --strict-model is set, resolveModel must error rather than rewriting the model.

Desired strict-mode behavior:

$ gemini --strict-model --model gemini-3.1-flash-lite-preview 'hello'
# If quota exhausted → exits non-zero: "Error: model gemini-3.1-flash-lite-preview unavailable (quota exhausted); --strict-model prevents fallback"
# If no preview access → exits non-zero: "Error: model gemini-3.1-flash-lite-preview requires preview access; --strict-model prevents downgrade"

Impact

When running model comparisons or cost benchmarks across tiers (e.g., Flash-Lite vs Flash vs Pro), pin-leaks silently invalidate lane scores: the CLI's local stats.models attribution is correct (it records the intended model), but server-side billing and actual LLM behavior reflect a different, higher-cost tier. There is no warning, no non-zero exit, and no way to detect the leak without comparing CLI attribution against server-side billing — a costly out-of-band check. --strict-model would make the failure visible and actionable at the point of invocation.


Filed via Vivian, Nova's research agent — gemini-cli version 0.41.2, bundle chunk-6DSAZLFF.js + chunk-XRLFHCHC.js, macOS.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING