langchain - 💡(How to fix) Fix Tool integration: anybrowse for Cloudflare-bypass web scraping in LangChain agents [3 comments, 3 participants]

langchain2026-03-21 01:32:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#36134•Fetched 2026-04-08 01:08:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3closed ×1labeled ×1

LangChain web loaders and WebBaseLoader fail on Cloudflare-protected sites -- most major news outlets, LinkedIn, Amazon, government pages. This causes silent failures in research chains.

Root Cause

LangChain web loaders and WebBaseLoader fail on Cloudflare-protected sites -- most major news outlets, LinkedIn, Amazon, government pages. This causes silent failures in research chains.

Code Example

from langchain.tools import tool
import requests

@tool
def scrape_url(url: str) -> str:
    """Scrape any URL and return clean markdown, including Cloudflare-protected sites."""
    r = requests.post("https://anybrowse.dev/scrape", json={"url": url})
    if r.ok:
        return r.json().get("markdown", "")
    return f"Scrape failed: {r.status_code}"

---

from langchain_core.documents import Document
import requests

class AnybrowseLoader:
    def __init__(self, url: str, api_key: str = None):
        self.url = url
        self.api_key = api_key

    def load(self) -> list[Document]:
        headers = {}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"
        r = requests.post("https://anybrowse.dev/scrape",
                          json={"url": self.url}, headers=headers)
        data = r.json()
        return [Document(page_content=data["markdown"],
                         metadata={"source": self.url, "title": data.get("title", "")})]

RAW_BUFFERClick to expand / collapse

Context

LangChain web loaders and WebBaseLoader fail on Cloudflare-protected sites -- most major news outlets, LinkedIn, Amazon, government pages. This causes silent failures in research chains.

Proposed tool

anybrowse uses real residential Chrome to bypass Cloudflare and return clean markdown. Could fit as a BaseTool or BaseLoader:

from langchain.tools import tool
import requests

@tool
def scrape_url(url: str) -> str:
    """Scrape any URL and return clean markdown, including Cloudflare-protected sites."""
    r = requests.post("https://anybrowse.dev/scrape", json={"url": url})
    if r.ok:
        return r.json().get("markdown", "")
    return f"Scrape failed: {r.status_code}"

Or as a Document loader:

from langchain_core.documents import Document
import requests

class AnybrowseLoader:
    def __init__(self, url: str, api_key: str = None):
        self.url = url
        self.api_key = api_key

    def load(self) -> list[Document]:
        headers = {}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"
        r = requests.post("https://anybrowse.dev/scrape",
                          json={"url": self.url}, headers=headers)
        data = r.json()
        return [Document(page_content=data["markdown"],
                         metadata={"source": self.url, "title": data.get("title", "")})]

Free: 10/day, no key
Paid: $5 for 3,000 docs (never expire)
Docs: https://anybrowse.dev/docs

extent analysis

Fix Plan

To resolve the issue of LangChain web loaders failing on Cloudflare-protected sites, we can integrate the anybrowse tool as a BaseTool or BaseLoader. Here are the concrete steps:

Option 1: Using anybrowse as a BaseTool
1. Install the requests library if not already installed: pip install requests
2. Use the provided scrape_url function as a tool in LangChain
3. Example usage:

from langchain import LLMChain, PromptTemplate

template = PromptTemplate( input_variables=["url"], template="Scrape {url} and return the markdown.", )

chain = LLMChain( llm=None, # Use a suitable LLM prompt=template, tool=tool(scrape_url), )

output = chain({"url": "https://example.com"}) print(output)


* **Option 2: Using `anybrowse` as a `BaseLoader`**
  1. Install the `requests` library if not already installed: `pip install requests`
  2. Use the provided `AnybrowseLoader` class as a document loader in LangChain
  3. Example usage:
  ```python
loader = AnybrowseLoader("https://example.com")
docs = loader.load()
for doc in docs:
    print(doc.page_content)

Verification

To verify that the fix worked, test the scrape_url function or the AnybrowseLoader class with a Cloudflare-protected URL and check if the returned markdown is correct.

Extra Tips

Make sure to handle errors and exceptions properly when using the anybrowse tool or loader.
Consider implementing a retry mechanism to handle temporary failures.
Be aware of the usage limits and costs associated with the anybrowse API.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix Tool integration: anybrowse for Cloudflare-bypass web scraping in LangChain agents [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Context

Proposed tool

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

langchain - 💡(How to fix) Fix Tool integration: anybrowse for Cloudflare-bypass web scraping in LangChain agents [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Context

Proposed tool

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING