openclaw - ✅(Solved) Fix Agent repeatedly uses web_fetch for Twitter/X URLs despite documented alternatives — needs tool-level domain routing [1 pull requests, 1 participants]

sene1337 · 2026-04-02T19:17:32Z

[openclaw] When an agent receives a Twitter/X URL e.g. https://x.com/user/status/123 , it consistently reaches for the web fetch tool first, which always fails… When an agent receives a Twitter/X URL (e.g. `https://x.com/user/status/123`), it consistently reaches for the `web_fetch` tool first, which **always fails** on x.com (returns empty page or login wall). This happens despite: 1. Documented SOPs with working alternatives 2. Explicit rules in workspace files (AGENTS.md, TOOLS.md) 3. Logged regressions (5+ incidents across sessions) 4. A dedicated web-search SOP with a fxtwitter routing table The agent has the tools and knowledge to fetch tweets correctly. The problem is behavioral: the `web_fetch` tool description ("Fetch and extract readable content from a URL") is the strongest pattern match when a URL appears, and nothing at the tool-description level warns against using it on blocked domains. # PR #66047: feat(web-fetch): add FxTwitter API fallback for Twitter/X URLs - Repository: openclaw/openclaw - Author: asakir44 - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/66047 ## Description (problem / solution / changelog) ## Summary - Detect twitter.com / x.com tweet URLs in `web_fetch` and route them through the public FxTwitter JSON API (`api.fxtwitter.com`) - Twitter returns login walls to non-browser user agents, making `web_fetch` useless for tweet URLs - Fail-safe: if FxTwitter is down, falls through to the normal fetch + provider fallback pipeline Addresses #59872 ## Changes | File | Change | |------|--------| | `src/agents/tools/web-fetch-twitter.ts` | New: Twitter URL detection + FxTwitter API fetch + markdown formatter | | `src/agents/tools/web-fetch-twitter.test.ts` | 9 test cases for URL detection and rewriting | | `src/agents/tools/web-fetch.ts` | Import + intercept Twitter URLs before normal fetch | ## How it works 1. After URL validation, check if the URL matches `twitter.com/*/status/*` or `x.com/*/status/*` 2. If match: fetch `api.fxtwitter.com/{user}/status/{id}` (JSON API, no auth required) 3. Format response as markdown (author, text, metrics, media links) 4. If FxTwitter fails: fall through to normal fetch pipeline (unchanged behavior) ## Test plan - [x] 9 unit tests (URL detection, rewriting, edge cases) - [x] `pnpm build` succeeds - [x] No import cycle regressions 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Changed files - `src/agents/tools/web-fetch-twitter.test.ts` (added, +49/-0) - `src/agents/tools/web-fetch-twitter.ts` (added, +103/-0) - `src/agents/tools/web-fetch.ts` (modified, +14/-0) ## Fix / Workaround **The tool description is the strongest signal.** All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation. ## Summary When an agent receives a Twitter/X URL (e.g. `https://x.com/user/status/123`), it consistently reaches for the `web_fetch` tool first, which **always fails** on x.com (returns empty page or login wall). This happens despite: 1. Documented SOPs with working alternatives 2. Explicit rules in workspace files (AGENTS.md, TOOLS.md) 3. Logged regressions (5+ incidents across sessions) 4. A dedicated web-search SOP with a fxtwitter routing table The agent has the tools and knowledge to fetch tweets correctly. The problem is behavioral: the `web_fetch` tool description ("Fetch and extract readable content from a URL") is the strongest pattern match when a URL appears, and nothing at the tool-description level warns against using it on blocked domains. ## What works (but the agent does not reach for) - **fxtwitter API**: `web_fetch("https://api.fxtwitter.com/ /status/ ")` — works instantly, returns full tweet JSON - **xAI x_search**: Direct Twitter API via Grok Responses API — works for threads and fresh content - **SearXNG**: Local search engine proxy — works for discovery - **xurl CLI**: Authenticated X API tool (when configured) ## What we tried to fix it (all failed to change behavior) ### 1. Detailed SOP (`ops/playbooks/tools/sops/web-search.md`) - Full routing table: "Specific tweet by ID → `https://api.fxtwitter.com/ /status/ `" - Decision matrix for every URL type - Explicit "DO NOT give up" rule with fallback order - **Result:** Agent does not read the SOP before acting on URLs ### 2. Rules in AGENTS.md (loaded every session in system prompt) - Added: "When a URL arrives, check TOOLS.md and the relevant SOP before calling web_fetch" - Linked to regression numbers for weight - **Result:** Agent skips the check step entirely — goes straight from URL → web_fetch ### 3. Gotcha in regressions.md (searchable, referenced in boot files) - Added to Gotchas section: "x.com / twitter.com URLs: NEVER use web_fetch" - With explicit routing: fxtwitter → x_search → SearXNG - **Result:** Agent does not search regressions before tool use ### 4. Regression logging (5 regressions: #144, #145, #146, #147 + earlier incidents) - Each regressio

openclaw2026-04-02 19:17:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#59872•Fetched 2026-04-08 02:39:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sene1337

Participants

sene1337

When an agent receives a Twitter/X URL (e.g. https://x.com/user/status/123), it consistently reaches for the web_fetch tool first, which always fails on x.com (returns empty page or login wall). This happens despite:

Documented SOPs with working alternatives
Explicit rules in workspace files (AGENTS.md, TOOLS.md)
Logged regressions (5+ incidents across sessions)
A dedicated web-search SOP with a fxtwitter routing table

The agent has the tools and knowledge to fetch tweets correctly. The problem is behavioral: the web_fetch tool description ("Fetch and extract readable content from a URL") is the strongest pattern match when a URL appears, and nothing at the tool-description level warns against using it on blocked domains.

Error Message

Add a config option like tools.web.fetch.domainBlocklist that rejects calls to known-bad domains and returns an error message pointing to the correct tool:

Root Cause

The agent identified the root cause itself:

"The system prompt lists web_fetch as: Fetch and extract readable content from a URL. That is the most direct pattern match in my tool list when I see a URL. URL appears → I scan available tools → web_fetch says give me a URL, get content → I call it. No intermediate thinking."

The tool description is the strongest signal. All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation.

Fix Action

Fix / Workaround

The tool description is the strongest signal. All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation.

PR fix notes

PR #66047: feat(web-fetch): add FxTwitter API fallback for Twitter/X URLs

Repository: openclaw/openclaw
Author: asakir44
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/66047

Description (problem / solution / changelog)

Summary

Detect twitter.com / x.com tweet URLs in web_fetch and route them through the public FxTwitter JSON API (api.fxtwitter.com)
Twitter returns login walls to non-browser user agents, making web_fetch useless for tweet URLs
Fail-safe: if FxTwitter is down, falls through to the normal fetch + provider fallback pipeline

Addresses #59872

Changes

File	Change
`src/agents/tools/web-fetch-twitter.ts`	New: Twitter URL detection + FxTwitter API fetch + markdown formatter
`src/agents/tools/web-fetch-twitter.test.ts`	9 test cases for URL detection and rewriting
`src/agents/tools/web-fetch.ts`	Import + intercept Twitter URLs before normal fetch

How it works

After URL validation, check if the URL matches twitter.com/*/status/* or x.com/*/status/*
If match: fetch api.fxtwitter.com/{user}/status/{id} (JSON API, no auth required)
Format response as markdown (author, text, metrics, media links)
If FxTwitter fails: fall through to normal fetch pipeline (unchanged behavior)

Test plan

9 unit tests (URL detection, rewriting, edge cases)
pnpm build succeeds
No import cycle regressions

🤖 Generated with Claude Code

Changed files

src/agents/tools/web-fetch-twitter.test.ts (added, +49/-0)
src/agents/tools/web-fetch-twitter.ts (added, +103/-0)
src/agents/tools/web-fetch.ts (modified, +14/-0)

Code Example

"web_fetch blocked for x.com — use fxtwitter API: https://api.fxtwitter.com/<user>/status/<id>"

---

{
  "tools": {
    "web": {
      "fetch": {
        "domainRoutes": {
          "x.com": "https://api.fxtwitter.com/{path}",
          "twitter.com": "https://api.fxtwitter.com/{path}"
        }
      }
    }
  }
}

RAW_BUFFERClick to expand / collapse

Summary

Documented SOPs with working alternatives
Explicit rules in workspace files (AGENTS.md, TOOLS.md)
Logged regressions (5+ incidents across sessions)
A dedicated web-search SOP with a fxtwitter routing table

What works (but the agent does not reach for)

fxtwitter API: web_fetch("https://api.fxtwitter.com/<user>/status/<id>") — works instantly, returns full tweet JSON
xAI x_search: Direct Twitter API via Grok Responses API — works for threads and fresh content
SearXNG: Local search engine proxy — works for discovery
xurl CLI: Authenticated X API tool (when configured)

What we tried to fix it (all failed to change behavior)

1. Detailed SOP (`ops/playbooks/tools/sops/web-search.md`)

Full routing table: "Specific tweet by ID → https://api.fxtwitter.com/<user>/status/<id>"
Decision matrix for every URL type
Explicit "DO NOT give up" rule with fallback order
Result: Agent does not read the SOP before acting on URLs

2. Rules in AGENTS.md (loaded every session in system prompt)

Added: "When a URL arrives, check TOOLS.md and the relevant SOP before calling web_fetch"
Linked to regression numbers for weight
Result: Agent skips the check step entirely — goes straight from URL → web_fetch

3. Gotcha in regressions.md (searchable, referenced in boot files)

Added to Gotchas section: "x.com / twitter.com URLs: NEVER use web_fetch"
With explicit routing: fxtwitter → x_search → SearXNG
Result: Agent does not search regressions before tool use

4. Regression logging (5 regressions: #144, #145, #146, #147 + earlier incidents)

Each regression documents the failure pattern and fix
Pattern tags: tool-exists-didnt-use, capability-gap, wrong-fallback
Result: Regressions document the problem but do not prevent recurrence

5. URL routing table at top of TOOLS.md (latest attempt)

Moved the routing table to the very top of TOOLS.md (system prompt injected)
Table format: domain → "web_fetch works?" → "use instead"
x.com row: ❌ NEVER → fxtwitter API
Result: TBD — just implemented, but same pattern as previous attempts (adding text the agent needs to voluntarily read before acting)

Root cause analysis

The agent identified the root cause itself:

"The system prompt lists web_fetch as: Fetch and extract readable content from a URL. That is the most direct pattern match in my tool list when I see a URL. URL appears → I scan available tools → web_fetch says give me a URL, get content → I call it. No intermediate thinking."

The tool description is the strongest signal. All other mitigations require a deliberate pause-and-check step that the agent consistently skips under normal operation.

Proposed solution: Domain-aware tool routing at the platform level

The fix should happen at the OpenClaw/tool layer, not the agent instruction layer:

Option A: Domain blocklist on `web_fetch`

Add a config option like tools.web.fetch.domainBlocklist that rejects calls to known-bad domains and returns an error message pointing to the correct tool:

"web_fetch blocked for x.com — use fxtwitter API: https://api.fxtwitter.com/<user>/status/<id>"

Option B: Domain-to-tool routing config

A tools.web.fetch.domainRoutes config that automatically rewrites URLs before fetching:

{
  "tools": {
    "web": {
      "fetch": {
        "domainRoutes": {
          "x.com": "https://api.fxtwitter.com/{path}",
          "twitter.com": "https://api.fxtwitter.com/{path}"
        }
      }
    }
  }
}

Option C: Tool description override

Allow tools.web.fetch.description to be user-configurable so operators can append domain-specific warnings that appear at the same salience level as the default description.

Any of these would solve the problem at the layer where the decision actually happens, rather than relying on the agent to check external documentation before every tool call.

Environment

OpenClaw 2026.3.23
Models tested: Claude Opus, Claude Sonnet, GPT-5.3-Codex (all exhibit the same behavior)
5+ documented regressions across multiple sessions and models

extent analysis

TL;DR

Implement domain-aware tool routing at the platform level by adding a domain blocklist or routing config to the web_fetch tool.

Guidance

Identify the most suitable option among the proposed solutions (domain blocklist, domain-to-tool routing config, or tool description override) based on the specific requirements and constraints of the system.
Implement the chosen solution at the OpenClaw/tool layer to prevent the agent from using web_fetch on blocked domains like x.com.
Test the solution with different models (e.g., Claude Opus, Claude Sonnet, GPT-5.3-Codex) to ensure it works consistently across multiple sessions and models.
Monitor the system for any regressions or issues after implementing the solution and adjust as needed.

Example

For Option A (domain blocklist), the config might look like:

{
  "tools": {
    "web": {
      "fetch": {
        "domainBlocklist": ["x.com", "twitter.com"]
      }
    }
  }
}

For Option B (domain-to-tool routing config), the config might look like:

{
  "tools": {
    "web": {
      "fetch": {
        "domainRoutes": {
          "x.com": "https://api.fxtwitter.com/{path}",
          "twitter.com": "https://api.fxtwitter.com/{path}"
        }
      }
    }
  }
}

Notes

The solution should be implemented at the platform level to ensure consistency and reliability. The agent's behavior is driven by the tool description, so modifying the tool configuration is the most effective way to address the issue.

Recommendation

Apply Option B (domain-to-tool routing config) as it provides a more flexible and scalable solution, allowing for easy addition or modification of domain routes as needed. This approach also ensures that the agent uses the correct tool for each domain without requiring manual intervention or documentation checks.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API middleware #SSR setup #ISR setup #authentication setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Agent repeatedly uses web_fetch for Twitter/X URLs despite documented alternatives — needs tool-level domain routing [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #66047: feat(web-fetch): add FxTwitter API fallback for Twitter/X URLs

Description (problem / solution / changelog)

Summary

Changes

How it works

Test plan

Changed files

Code Example

Summary

What works (but the agent does not reach for)

What we tried to fix it (all failed to change behavior)

1. Detailed SOP (ops/playbooks/tools/sops/web-search.md)

2. Rules in AGENTS.md (loaded every session in system prompt)

3. Gotcha in regressions.md (searchable, referenced in boot files)

4. Regression logging (5 regressions: #144, #145, #146, #147 + earlier incidents)

5. URL routing table at top of TOOLS.md (latest attempt)

Root cause analysis

Proposed solution: Domain-aware tool routing at the platform level

Option A: Domain blocklist on web_fetch

Option B: Domain-to-tool routing config

Option C: Tool description override

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Detailed SOP (`ops/playbooks/tools/sops/web-search.md`)

Option A: Domain blocklist on `web_fetch`