openclaw - ✅(Solved) Fix Agent repeatedly uses web_fetch for Twitter/X URLs despite documented alternatives — needs tool-level domain routing [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59872Fetched 2026-04-08 02:39:30
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

When an agent receives a Twitter/X URL (e.g. https://x.com/user/status/123), it consistently reaches for the web_fetch tool first, which always fails on x.com (returns empty page or login wall). This happens despite:

  1. Documented SOPs with working alternatives
  2. Explicit rules in workspace files (AGENTS.md, TOOLS.md)
  3. Logged regressions (5+ incidents across sessions)
  4. A dedicated web-search SOP with a fxtwitter routing table

The agent has the tools and knowledge to fetch tweets correctly. The problem is behavioral: the web_fetch tool description ("Fetch and extract readable content from a URL") is the strongest pattern match when a URL appears, and nothing at the tool-description level warns against using it on blocked domains.

Error Message

Add a config option like tools.web.fetch.domainBlocklist that rejects calls to known-bad domains and returns an error message pointing to the correct tool:

Root Cause

The agent identified the root cause itself:

"The system prompt lists web_fetch as: Fetch and extract readable content from a URL. That is the most direct pattern match in my tool list when I see a URL. URL appears → I scan available tools → web_fetch says give me a URL, get content → I call it. No intermediate thinking."

The tool description is the strongest signal. All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation.

Fix Action

Fix / Workaround

The tool description is the strongest signal. All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation.

PR fix notes

PR #66047: feat(web-fetch): add FxTwitter API fallback for Twitter/X URLs

Description (problem / solution / changelog)

Summary

  • Detect twitter.com / x.com tweet URLs in web_fetch and route them through the public FxTwitter JSON API (api.fxtwitter.com)
  • Twitter returns login walls to non-browser user agents, making web_fetch useless for tweet URLs
  • Fail-safe: if FxTwitter is down, falls through to the normal fetch + provider fallback pipeline

Addresses #59872

Changes

FileChange
src/agents/tools/web-fetch-twitter.tsNew: Twitter URL detection + FxTwitter API fetch + markdown formatter
src/agents/tools/web-fetch-twitter.test.ts9 test cases for URL detection and rewriting
src/agents/tools/web-fetch.tsImport + intercept Twitter URLs before normal fetch

How it works

  1. After URL validation, check if the URL matches twitter.com/*/status/* or x.com/*/status/*
  2. If match: fetch api.fxtwitter.com/{user}/status/{id} (JSON API, no auth required)
  3. Format response as markdown (author, text, metrics, media links)
  4. If FxTwitter fails: fall through to normal fetch pipeline (unchanged behavior)

Test plan

  • 9 unit tests (URL detection, rewriting, edge cases)
  • pnpm build succeeds
  • No import cycle regressions

🤖 Generated with Claude Code

Changed files

  • src/agents/tools/web-fetch-twitter.test.ts (added, +49/-0)
  • src/agents/tools/web-fetch-twitter.ts (added, +103/-0)
  • src/agents/tools/web-fetch.ts (modified, +14/-0)

Code Example

"web_fetch blocked for x.com — use fxtwitter API: https://api.fxtwitter.com/<user>/status/<id>"

---

{
  "tools": {
    "web": {
      "fetch": {
        "domainRoutes": {
          "x.com": "https://api.fxtwitter.com/{path}",
          "twitter.com": "https://api.fxtwitter.com/{path}"
        }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When an agent receives a Twitter/X URL (e.g. https://x.com/user/status/123), it consistently reaches for the web_fetch tool first, which always fails on x.com (returns empty page or login wall). This happens despite:

  1. Documented SOPs with working alternatives
  2. Explicit rules in workspace files (AGENTS.md, TOOLS.md)
  3. Logged regressions (5+ incidents across sessions)
  4. A dedicated web-search SOP with a fxtwitter routing table

The agent has the tools and knowledge to fetch tweets correctly. The problem is behavioral: the web_fetch tool description ("Fetch and extract readable content from a URL") is the strongest pattern match when a URL appears, and nothing at the tool-description level warns against using it on blocked domains.

What works (but the agent does not reach for)

  • fxtwitter API: web_fetch("https://api.fxtwitter.com/<user>/status/<id>") — works instantly, returns full tweet JSON
  • xAI x_search: Direct Twitter API via Grok Responses API — works for threads and fresh content
  • SearXNG: Local search engine proxy — works for discovery
  • xurl CLI: Authenticated X API tool (when configured)

What we tried to fix it (all failed to change behavior)

1. Detailed SOP (ops/playbooks/tools/sops/web-search.md)

  • Full routing table: "Specific tweet by ID → https://api.fxtwitter.com/<user>/status/<id>"
  • Decision matrix for every URL type
  • Explicit "DO NOT give up" rule with fallback order
  • Result: Agent does not read the SOP before acting on URLs

2. Rules in AGENTS.md (loaded every session in system prompt)

  • Added: "When a URL arrives, check TOOLS.md and the relevant SOP before calling web_fetch"
  • Linked to regression numbers for weight
  • Result: Agent skips the check step entirely — goes straight from URL → web_fetch

3. Gotcha in regressions.md (searchable, referenced in boot files)

  • Added to Gotchas section: "x.com / twitter.com URLs: NEVER use web_fetch"
  • With explicit routing: fxtwitter → x_search → SearXNG
  • Result: Agent does not search regressions before tool use

4. Regression logging (5 regressions: #144, #145, #146, #147 + earlier incidents)

  • Each regression documents the failure pattern and fix
  • Pattern tags: tool-exists-didnt-use, capability-gap, wrong-fallback
  • Result: Regressions document the problem but do not prevent recurrence

5. URL routing table at top of TOOLS.md (latest attempt)

  • Moved the routing table to the very top of TOOLS.md (system prompt injected)
  • Table format: domain → "web_fetch works?" → "use instead"
  • x.com row: ❌ NEVER → fxtwitter API
  • Result: TBD — just implemented, but same pattern as previous attempts (adding text the agent needs to voluntarily read before acting)

Root cause analysis

The agent identified the root cause itself:

"The system prompt lists web_fetch as: Fetch and extract readable content from a URL. That is the most direct pattern match in my tool list when I see a URL. URL appears → I scan available tools → web_fetch says give me a URL, get content → I call it. No intermediate thinking."

The tool description is the strongest signal. All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation.

Proposed solution: Domain-aware tool routing at the platform level

The fix should happen at the OpenClaw/tool layer, not the agent instruction layer:

Option A: Domain blocklist on web_fetch

Add a config option like tools.web.fetch.domainBlocklist that rejects calls to known-bad domains and returns an error message pointing to the correct tool:

"web_fetch blocked for x.com — use fxtwitter API: https://api.fxtwitter.com/<user>/status/<id>"

Option B: Domain-to-tool routing config

A tools.web.fetch.domainRoutes config that automatically rewrites URLs before fetching:

{
  "tools": {
    "web": {
      "fetch": {
        "domainRoutes": {
          "x.com": "https://api.fxtwitter.com/{path}",
          "twitter.com": "https://api.fxtwitter.com/{path}"
        }
      }
    }
  }
}

Option C: Tool description override

Allow tools.web.fetch.description to be user-configurable so operators can append domain-specific warnings that appear at the same salience level as the default description.

Any of these would solve the problem at the layer where the decision actually happens, rather than relying on the agent to check external documentation before every tool call.

Environment

  • OpenClaw 2026.3.23
  • Models tested: Claude Opus, Claude Sonnet, GPT-5.3-Codex (all exhibit the same behavior)
  • 5+ documented regressions across multiple sessions and models

extent analysis

TL;DR

Implement domain-aware tool routing at the platform level by adding a domain blocklist or routing config to the web_fetch tool.

Guidance

  • Identify the most suitable option among the proposed solutions (domain blocklist, domain-to-tool routing config, or tool description override) based on the specific requirements and constraints of the system.
  • Implement the chosen solution at the OpenClaw/tool layer to prevent the agent from using web_fetch on blocked domains like x.com.
  • Test the solution with different models (e.g., Claude Opus, Claude Sonnet, GPT-5.3-Codex) to ensure it works consistently across multiple sessions and models.
  • Monitor the system for any regressions or issues after implementing the solution and adjust as needed.

Example

For Option A (domain blocklist), the config might look like:

{
  "tools": {
    "web": {
      "fetch": {
        "domainBlocklist": ["x.com", "twitter.com"]
      }
    }
  }
}

For Option B (domain-to-tool routing config), the config might look like:

{
  "tools": {
    "web": {
      "fetch": {
        "domainRoutes": {
          "x.com": "https://api.fxtwitter.com/{path}",
          "twitter.com": "https://api.fxtwitter.com/{path}"
        }
      }
    }
  }
}

Notes

The solution should be implemented at the platform level to ensure consistency and reliability. The agent's behavior is driven by the tool description, so modifying the tool configuration is the most effective way to address the issue.

Recommendation

Apply Option B (domain-to-tool routing config) as it provides a more flexible and scalable solution, allowing for easy addition or modification of domain routes as needed. This approach also ensures that the agent uses the correct tool for each domain without requiring manual intervention or documentation checks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Agent repeatedly uses web_fetch for Twitter/X URLs despite documented alternatives — needs tool-level domain routing [1 pull requests, 1 participants]