claude-code - 💡(How to fix) Fix [MODEL] Session quality — pattern-matching diagnostics burned ~90 min and many redeploy cycles when web-search-first would have resolved it in 2 min [1 comments, 2 participants]

Error Message

Verify the deployed Streamlit Cloud app after a routine dashboard data refresh push. The OAuth login gate threw "Internal server error" on the callback, blocking access. I asked Claude to debug. 2. Suggested adding "/~/+/" prefix to redirect_uri — caused a NEW error (redirect_uri_mismatch from Google). On any infra / third-party-platform / dependency error, web search should be step 1 — not step N after pattern-matching guesses fail. The model already has this discipline for advisory queries ("search before recommending best practices"); it should apply equally to diagnostic queries. 3. After redeploy, encounter an OAuth callback error. Observation: the model is likely to propose configuration / code changes based on plausible-but-unverified hypotheses (cookie_secret, redirect_uri, bracket syntax, dependency versions) before searching the web for the exact error string. Each hypothesis triggers a redeploy.

Code Example

- apps/platform-web/views/ac_upgrade.py (the legitimate dashboard refresh — fine)
- apps/platform-web/requirements.txt (pinned authlib==1.6.5 on a hunch — unnecessary change)
- Multiple .py scripts in tmp/ (slide-design helpers — fine)

Also caused several config edits on my side (Google Cloud Console redirect URIs added, Streamlit Cloud secrets edited multiple times) that were not actually needed.

---

After ~90 minutes of guessing, I said: "hmmm, you assumed and guessed again without verifying over internet, escalate"

Claude's response (paraphrased): admitted misdiagnosis, then searched the web and found the actual cause in the first result.

Earlier exchange: "wtf are you talking" after Claude misread my brackets.

Earlier still: "Tons of token spent, any warranty from Claude?" after I realized the depth of the diagnostic loop.

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration

What You Asked Claude to Do

Verify the deployed Streamlit Cloud app after a routine dashboard data refresh push. The OAuth login gate threw "Internal server error" on the callback, blocking access. I asked Claude to debug.

What Claude Actually Did

Cycled through five guess-and-check hypotheses before searching the web:

Misread my [auth.google] secrets paste as [[auth.google]] (markdown auto-linkified the brackets). Pushed me to "fix" something that wasn't broken.
Suggested adding "/~/+/" prefix to redirect_uri — caused a NEW error (redirect_uri_mismatch from Google).
Asked me to add Google Cloud Console URIs that turned out to be unnecessary.
Pushed a commit pinning authlib==1.6.5, asserting the newly-deployed authlib==1.7.0 was broken — without verifying the version's actual status. Triggered another redeploy cycle.
Only after I explicitly said "you assumed and guessed again without verifying over internet, escalate" did the model do a web search.

The very first search result identified the actual cause: a documented unresolved Streamlit Cloud multi-instance OAuth-state bug, open since 2024 (https://discuss.streamlit.io/t/107354).

Expected Behavior

On any infra / third-party-platform / dependency error, web search should be step 1 — not step N after pattern-matching guesses fail. The model already has this discipline for advisory queries ("search before recommending best practices"); it should apply equally to diagnostic queries.

Each guessed fix that requires a code change or config edit costs the user a redeploy cycle (1-2 min on Streamlit Cloud) and tokens. Guessing without verification when the answer is one search away is a high-cost low-value loop.

Additionally: when guessing without external verification, the model should report confidence honestly (low / medium) instead of presenting hypotheses with the same tone as verified facts.

Files Affected

- apps/platform-web/views/ac_upgrade.py (the legitimate dashboard refresh — fine)
- apps/platform-web/requirements.txt (pinned authlib==1.6.5 on a hunch — unnecessary change)
- Multiple .py scripts in tmp/ (slide-design helpers — fine)

Also caused several config edits on my side (Google Cloud Console redirect URIs added, Streamlit Cloud secrets edited multiple times) that were not actually needed.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Sometimes (intermittent)

Steps to Reproduce

Have a Streamlit Cloud app with an OAuth gate.
Push a routine code change.
After redeploy, encounter an OAuth callback error.
Ask Claude to debug.

Observation: the model is likely to propose configuration / code changes based on plausible-but-unverified hypotheses (cookie_secret, redirect_uri, bracket syntax, dependency versions) before searching the web for the exact error string. Each hypothesis triggers a redeploy.

The known Streamlit issue is the very first hit when searching: "streamlit MismatchingStateError multi-instance"

Claude Model

Opus

Relevant Conversation

After ~90 minutes of guessing, I said: "hmmm, you assumed and guessed again without verifying over internet, escalate"

Claude's response (paraphrased): admitted misdiagnosis, then searched the web and found the actual cause in the first result.

Earlier exchange: "wtf are you talking" after Claude misread my brackets.

Earlier still: "Tons of token spent, any warranty from Claude?" after I realized the depth of the diagnostic loop.

Impact

Medium - Extra work to undo changes

Claude Code Version

Claude Code Desktop on Windows 11 Pro (10.0.26100) - Claude for Windows: Version 1.5220.0 (082bd4)

Platform

Other

Additional Context

This was a working session day with three priority tasks:

Slide-design work — went well; format-preserving Slides API edits with render verification worked cleanly.
App dashboard refresh — went well.
Finance-comms email draft — fine.

The OAuth detour cost more wall-clock and tokens than the rest of the day combined.

Suggestion for the model's training / system prompt:

Add a discipline: "On infra / third-party-platform / dependency-version errors, web search is step 1, not step N."
When guessing without web verification, label confidence (low/medium) explicitly so users know whether to wait for evidence or test the hypothesis.

The platform owner has now codified this lesson in their own personal memory layer (across auto-memory + Operating Manual) so future sessions don't repeat the pattern. But it would be cheaper if the discipline were upstream rather than per-user.

extent analysis

TL;DR

The model should prioritize web search over guess-and-check hypotheses for infra, third-party platform, or dependency errors, and clearly label confidence levels when presenting unverified solutions.

Guidance

When encountering infra or dependency errors, the model should perform a web search as the initial step to identify potential causes and solutions.
The model should explicitly label the confidence level of its suggestions, especially when they are based on unverified hypotheses, to help users assess the reliability of the proposed solutions.
To mitigate the issue, users can explicitly ask the model to search the web for the exact error string before proposing solutions.
The model's training data should be updated to include a discipline that prioritizes web search for infra and dependency errors, and emphasizes the importance of labeling confidence levels for unverified suggestions.

Example

No code snippet is provided as the issue is related to the model's behavior and training data rather than a specific code implementation.

Notes

The issue is intermittent and may not always reproduce, but the suggested changes to the model's behavior and training data can help improve its performance and reduce the likelihood of similar issues occurring in the future.

Recommendation

Apply workaround: Prioritize web search for infra and dependency errors, and explicitly label confidence levels for unverified suggestions, until the model's training data is updated to include this discipline. This will help reduce the number of unnecessary redeploy cycles and token expenditures.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MODEL] Session quality — pattern-matching diagnostics burned ~90 min and many redeploy cycles when web-search-first would have resolved it in 2 min [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] Session quality — pattern-matching diagnostics burned ~90 min and many redeploy cycles when web-search-first would have resolved it in 2 min [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING