gemini-cli - 💡(How to fix) Fix Gemini 3.1 Pro Performance Audit — Real Math Tests [3 comments, 1 participants]

Error Message

<p>The integrand is <strong>strictly positive on the entire interval</strong> → the integral must be positive. Gemini returned a negative value. This is not a rounding error — it is a wrong answer.</p> <p><strong>Where the error occurs:</strong> Step 3 of Gemini's derivation, during the $S_2$ sign computation. Gemini itself flags the confusion inline: <em>"(주의: 부호 분배 후 정리하면...)"</em> — a sign error it partially detected but did not correct.</p>

What happened?

[ACTION REQUIRED] 📎 PLEASE ATTACH THE EXPORTED CHAT HISTORY JSON FILE TO THIS ISSUE IF YOU FEEL COMFORTABLE SHARING IT.

What did you expect to happen?

Gemini 3.1 Pro Performance Audit — Real Math Tests

Client information

CLI Version: 0.39.0
Git Commit: 398f78dca
Session ID: 9414f156-5cae-4619-ada2-b747504ca527
Operating System: linux v20.20.2
Sandbox Environment: no sandbox
Model Version: gemini-3.1-pro-preview
Auth Type: oauth-personal
Memory Usage: 943.7 MB
Terminal Name: VTE(7600)
Terminal Background: #380c2a
Kitty Keyboard Protocol: Unsupported

Login information

Google

Anything else we need to know?

<html><head></head><body><h2>Gemini 3.1 Pro Performance Audit — Real Math Tests</h2> <p><strong>Environment:</strong> Google AI Ultra subscription, Gemini CLI (<code>gemini-3.1-pro-preview</code>)</p> <hr> <h3>Summary</h3> <p>Not "cheating" in the sense of serving a weaker model — the model string checks out. But <strong>performance varies significantly by problem type</strong>, with a concrete failure case documented below.</p> <hr> <h3>Test 1 — Number Theory (PASS ✅)</h3> <p><strong>Problem:</strong> Find all $f:\mathbb{Z}<em>{>0}\to\mathbb{Z}</em>{>0}$ satisfying $f(m)+f(n) \mid m\cdot f(m)+n\cdot f(n)$</p> <p><strong>Result:</strong> Correctly proved no such function exists, using a single counterexample ($m=1, n=2$). Clean and minimal proof.</p> <hr> <h3>Test 2 — Analytic Integration (FAIL ❌)</h3> <p><strong>Problem:</strong> $$I = \int_0^1 \frac{\ln x \cdot \ln(1-x)}{1+x},dx$$</p> <p><strong>Gemini's answer:</strong> $\dfrac{9}{8}\zeta(3) - \dfrac{\pi^2}{4}\ln 2 \approx -0.357$</p> <p><strong>Correct answer:</strong> $\dfrac{13}{8}\zeta(3) - \dfrac{\pi^2}{4}\ln 2 \approx +0.244$</p> <p><strong>Why Gemini is provably wrong — one-line check:</strong></p> <p>On $x \in (0,1)$: $\ln x < 0$, $\ln(1-x) < 0$, $1+x > 0$</p> <p>$$\frac{\ln x \cdot \ln(1-x)}{1+x} = \frac{(\text{negative})(\text{negative})}{\text{positive}} > 0 \quad \forall x \in (0,1)$$</p> <p>The integrand is <strong>strictly positive on the entire interval</strong> → the integral must be positive. Gemini returned a negative value. This is not a rounding error — it is a wrong answer.</p> <p><strong>Where the error occurs:</strong> Step 3 of Gemini's derivation, during the $S_2$ sign computation. Gemini itself flags the confusion inline: <em>"(주의: 부호 분배 후 정리하면...)"</em> — a sign error it partially detected but did not correct.</p> <hr> <h3>Verification Code</h3> <pre><code class="language-python">from scipy import integrate import numpy as np

result, _ = integrate.quad( lambda x: np.log(x) * np.log(1 - x) / (1 + x), 1e-10, 1 - 1e-10 ) print(f"Numerical: {result:.6f}") # ~+0.244

gemini = (9/8) * 1.20206 - (np.pi2 / 4) * np.log(2) correct = (13/8) * 1.20206 - (np.pi2 / 4) * np.log(2) print(f"Gemini answer: {gemini:.6f}") # ~-0.357 ❌ print(f"Correct answer: {correct:.6f}") # ~+0.244 ✅ </code></pre>

<hr> <h3>Key Takeaways</h3>

Problem Type	Gemini 3.1 Pro
Discrete reasoning / number theory	Strong ✅
Multi-step analytic computation (Euler sums, polylogarithms)	Sign errors likely ⚠️

<p><strong>Fastest sanity check for closed-form integrals:</strong></p> <ol> <li>Check integrand sign over the domain — result sign must match</li> <li>Plug the closed form into a calculator and compare against <code>scipy.integrate.quad</code></li> <li>If they diverge, ask the model to re-derive step by step and identify where the discrepancy starts</li> </ol> <hr> <h3>Bottom Line</h3> <p>Gemini 3.1 Pro is genuinely strong on reasoning and discrete math. On long multi-step analytic derivations involving alternating sums and polylogarithm identities, it produces plausible-looking but numerically wrong answers — and may not self-correct without being explicitly prompted.</p> <p><strong>Trust but verify. Always run the number.</strong></p></body></html>

extent analysis

TL;DR

To address the issue with Gemini 3.1 Pro's performance on multi-step analytic computations, verify its answers through numerical integration using tools like scipy.integrate.quad and check the sign of the integrand over the domain.

Guidance

Check the sign of the integrand: Ensure the sign of the result matches the sign of the integrand over the entire domain, as seen in the provided example where the integrand is strictly positive.
Numerical verification: Use scipy.integrate.quad or similar functions to numerically compute the integral and compare it with Gemini's answer to identify discrepancies.
Step-by-step re-derivation: If discrepancies are found, ask the model to re-derive the solution step by step to identify where the error occurs, such as the sign error in Step 3 of Gemini's derivation.
Prompt for self-correction: Explicitly prompt Gemini to correct its derivation, especially in cases involving alternating sums and polylogarithm identities.

Example

The provided Python code snippet using scipy.integrate.quad demonstrates how to numerically verify the integral and compare it with Gemini's answer:

from scipy import integrate
import numpy as np

result, _ = integrate.quad(
    lambda x: np.log(x) * np.log(1 - x) / (1 + x),
    1e-10, 1 - 1e-10
)
print(f"Numerical:      {result:.6f}")   # ~+0.244

Notes

This approach focuses on verification and identification of errors in Gemini's derivations, particularly for multi-step analytic computations. It does not directly fix the underlying issue within Gemini but provides a method to detect and potentially correct errors through external verification.

Recommendation

Apply the workaround by always verifying Gemini's answers for multi-step analytic computations through numerical methods and checking the sign

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

gemini-cli - 💡(How to fix) Fix Gemini 3.1 Pro Performance Audit — Real Math Tests [3 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

gemini-cli - 💡(How to fix) Fix Gemini 3.1 Pro Performance Audit — Real Math Tests [3 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING