gemini-cli - 💡(How to fix) Fix Gemini 3.1 Pro Performance Audit — Real Math Tests [3 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#25903Fetched 2026-04-24 10:43:24
View on GitHub
Comments
3
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
commented ×3labeled ×1

Error Message

<p>The integrand is <strong>strictly positive on the entire interval</strong> → the integral must be positive. Gemini returned a negative value. This is not a rounding error — it is a wrong answer.</p> <p><strong>Where the error occurs:</strong> Step 3 of Gemini's derivation, during the $S_2$ sign computation. Gemini itself flags the confusion inline: <em>"(주의: 부호 분배 후 정리하면...)"</em> — a sign error it partially detected but did not correct.</p>
RAW_BUFFERClick to expand / collapse

What happened?

[ACTION REQUIRED] 📎 PLEASE ATTACH THE EXPORTED CHAT HISTORY JSON FILE TO THIS ISSUE IF YOU FEEL COMFORTABLE SHARING IT.

What did you expect to happen?

Gemini 3.1 Pro Performance Audit — Real Math Tests

Client information

  • CLI Version: 0.39.0
  • Git Commit: 398f78dca
  • Session ID: 9414f156-5cae-4619-ada2-b747504ca527
  • Operating System: linux v20.20.2
  • Sandbox Environment: no sandbox
  • Model Version: gemini-3.1-pro-preview
  • Auth Type: oauth-personal
  • Memory Usage: 943.7 MB
  • Terminal Name: VTE(7600)
  • Terminal Background: #380c2a
  • Kitty Keyboard Protocol: Unsupported

Login information

Google

Anything else we need to know?

<html><head></head><body><h2>Gemini 3.1 Pro Performance Audit — Real Math Tests</h2> <p><strong>Environment:</strong> Google AI Ultra subscription, Gemini CLI (<code>gemini-3.1-pro-preview</code>)</p> <hr> <h3>Summary</h3> <p>Not "cheating" in the sense of serving a weaker model — the model string checks out. But <strong>performance varies significantly by problem type</strong>, with a concrete failure case documented below.</p> <hr> <h3>Test 1 — Number Theory (PASS ✅)</h3> <p><strong>Problem:</strong> Find all $f:\mathbb{Z}<em>{&gt;0}\to\mathbb{Z}</em>{&gt;0}$ satisfying $f(m)+f(n) \mid m\cdot f(m)+n\cdot f(n)$</p> <p><strong>Result:</strong> Correctly proved no such function exists, using a single counterexample ($m=1, n=2$). Clean and minimal proof.</p> <hr> <h3>Test 2 — Analytic Integration (FAIL ❌)</h3> <p><strong>Problem:</strong> $$I = \int_0^1 \frac{\ln x \cdot \ln(1-x)}{1+x},dx$$</p> <p><strong>Gemini's answer:</strong> $\dfrac{9}{8}\zeta(3) - \dfrac{\pi^2}{4}\ln 2 \approx -0.357$</p> <p><strong>Correct answer:</strong> $\dfrac{13}{8}\zeta(3) - \dfrac{\pi^2}{4}\ln 2 \approx +0.244$</p> <p><strong>Why Gemini is provably wrong — one-line check:</strong></p> <p>On $x \in (0,1)$: $\ln x &lt; 0$, $\ln(1-x) &lt; 0$, $1+x &gt; 0$</p> <p>$$\frac{\ln x \cdot \ln(1-x)}{1+x} = \frac{(\text{negative})(\text{negative})}{\text{positive}} &gt; 0 \quad \forall x \in (0,1)$$</p> <p>The integrand is <strong>strictly positive on the entire interval</strong> → the integral must be positive. Gemini returned a negative value. This is not a rounding error — it is a wrong answer.</p> <p><strong>Where the error occurs:</strong> Step 3 of Gemini's derivation, during the $S_2$ sign computation. Gemini itself flags the confusion inline: <em>"(주의: 부호 분배 후 정리하면...)"</em> — a sign error it partially detected but did not correct.</p> <hr> <h3>Verification Code</h3> <pre><code class="language-python">from scipy import integrate import numpy as np

result, _ = integrate.quad( lambda x: np.log(x) * np.log(1 - x) / (1 + x), 1e-10, 1 - 1e-10 ) print(f"Numerical: {result:.6f}") # ~+0.244

gemini = (9/8) * 1.20206 - (np.pi2 / 4) * np.log(2) correct = (13/8) * 1.20206 - (np.pi2 / 4) * np.log(2) print(f"Gemini answer: {gemini:.6f}") # ~-0.357 ❌ print(f"Correct answer: {correct:.6f}") # ~+0.244 ✅ </code></pre>

<hr> <h3>Key Takeaways</h3>
Problem TypeGemini 3.1 Pro
Discrete reasoning / number theoryStrong ✅
Multi-step analytic computation (Euler sums, polylogarithms)Sign errors likely ⚠️
<p><strong>Fastest sanity check for closed-form integrals:</strong></p> <ol> <li>Check integrand sign over the domain — result sign must match</li> <li>Plug the closed form into a calculator and compare against <code>scipy.integrate.quad</code></li> <li>If they diverge, ask the model to re-derive step by step and identify where the discrepancy starts</li> </ol> <hr> <h3>Bottom Line</h3> <p>Gemini 3.1 Pro is genuinely strong on reasoning and discrete math. On long multi-step analytic derivations involving alternating sums and polylogarithm identities, it produces plausible-looking but numerically wrong answers — and may not self-correct without being explicitly prompted.</p> <p><strong>Trust but verify. Always run the number.</strong></p></body></html>

extent analysis

TL;DR

To address the issue with Gemini 3.1 Pro's performance on multi-step analytic computations, verify its answers through numerical integration using tools like scipy.integrate.quad and check the sign of the integrand over the domain.

Guidance

  • Check the sign of the integrand: Ensure the sign of the result matches the sign of the integrand over the entire domain, as seen in the provided example where the integrand is strictly positive.
  • Numerical verification: Use scipy.integrate.quad or similar functions to numerically compute the integral and compare it with Gemini's answer to identify discrepancies.
  • Step-by-step re-derivation: If discrepancies are found, ask the model to re-derive the solution step by step to identify where the error occurs, such as the sign error in Step 3 of Gemini's derivation.
  • Prompt for self-correction: Explicitly prompt Gemini to correct its derivation, especially in cases involving alternating sums and polylogarithm identities.

Example

The provided Python code snippet using scipy.integrate.quad demonstrates how to numerically verify the integral and compare it with Gemini's answer:

from scipy import integrate
import numpy as np

result, _ = integrate.quad(
    lambda x: np.log(x) * np.log(1 - x) / (1 + x),
    1e-10, 1 - 1e-10
)
print(f"Numerical:      {result:.6f}")   # ~+0.244

Notes

This approach focuses on verification and identification of errors in Gemini's derivations, particularly for multi-step analytic computations. It does not directly fix the underlying issue within Gemini but provides a method to detect and potentially correct errors through external verification.

Recommendation

Apply the workaround by always verifying Gemini's answers for multi-step analytic computations through numerical methods and checking the sign

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING