langchain - ✅(Solved) Fix Add CRW partner integration (CrwLoader document loader) [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36273Fetched 2026-04-08 01:36:19
View on GitHub
Comments
4
Participants
3
Timeline
9
Reactions
0
Author
Timeline (top)
commented ×4mentioned ×2subscribed ×2labeled ×1

PR fix notes

PR #36270: Add CrwLoader partner integration

Description (problem / solution / changelog)

Summary

  • Adds langchain-crw, a partner integration for CRW — a high-performance, Firecrawl-compatible web scraper written in Rust
  • Provides CrwLoader document loader with scrape, crawl, and map modes
  • Works with self-hosted crw-server or the fastcrw.com cloud API

Test plan

  • make test in libs/partners/crw
  • Verify import test passes
  • Integration test with running crw-server instance

Changed files

  • libs/partners/crw/.gitignore (added, +1/-0)
  • libs/partners/crw/LICENSE (added, +21/-0)
  • libs/partners/crw/Makefile (added, +69/-0)
  • libs/partners/crw/README.md (added, +39/-0)
  • libs/partners/crw/langchain_crw/__init__.py (added, +7/-0)
  • libs/partners/crw/langchain_crw/document_loaders.py (added, +241/-0)
  • libs/partners/crw/langchain_crw/py.typed (added, +0/-0)
  • libs/partners/crw/pyproject.toml (added, +140/-0)
  • libs/partners/crw/scripts/check_imports.py (added, +19/-0)
  • libs/partners/crw/scripts/lint_imports.sh (added, +18/-0)
  • libs/partners/crw/tests/__init__.py (added, +0/-0)
  • libs/partners/crw/tests/integration_tests/__init__.py (added, +0/-0)
  • libs/partners/crw/tests/integration_tests/test_compile.py (added, +8/-0)
  • libs/partners/crw/tests/unit_tests/__init__.py (added, +0/-0)
  • libs/partners/crw/tests/unit_tests/test_imports.py (added, +12/-0)
RAW_BUFFERClick to expand / collapse

Feature Request

Add a partner integration for CRW — a high-performance, Firecrawl-compatible web scraper written in Rust.

Motivation

CRW provides a Firecrawl-compatible REST API for web scraping, crawling, and site mapping, optimized for AI agent workflows. It ships as a single binary with ~6 MB idle RAM — 5.5x faster and 75x less memory than Firecrawl on 1K real-world URLs.

A LangChain partner integration (langchain-crw) would give users a CrwLoader document loader with:

  • Scrape mode: Single page to LangChain Document
  • Crawl mode: BFS crawl with async polling, multiple pages to Documents
  • Map mode: URL discovery via sitemap + link extraction

Why Partner Integration (not Community)

CRW maintains its own API and release cycle. A partner package (langchain-crw) follows the same pattern as langchain-exa, langchain-firecrawl, etc.

Implementation

I have a working implementation ready: https://github.com/langchain-ai/langchain/pull/36270

Key details:

  • Uses requests directly (no SDK dependency)
  • Supports self-hosted (localhost:3000) and cloud (fastcrw.com)
  • Follows BaseLoader interface with lazy_load()
  • Includes unit tests, integration test scaffolding, and full package structure

References

extent analysis

Fix Plan

To integrate CRW with LangChain, we will create a partner package langchain-crw with a CrwLoader document loader. Here are the steps:

  • Create a new Python package langchain-crw with the necessary dependencies.
  • Implement the CrwLoader class with scrape, crawl, and map modes.
  • Use the requests library to interact with the CRW API.

Example Code

import requests
from langchain.document import Document
from langchain.loader import BaseLoader

class CrwLoader(BaseLoader):
    def __init__(self, api_url: str):
        self.api_url = api_url

    def lazy_load(self, url: str):
        # Scrape mode: Single page to LangChain Document
        response = requests.get(f"{self.api_url}/scrape", params={"url": url})
        return Document.from_json(response.json())

    def crawl(self, url: str):
        # Crawl mode: BFS crawl with async polling, multiple pages to Documents
        response = requests.get(f"{self.api_url}/crawl", params={"url": url})
        return [Document.from_json(doc) for doc in response.json()]

    def map(self, url: str):
        # Map mode: URL discovery via sitemap + link extraction
        response = requests.get(f"{self.api_url}/map", params={"url": url})
        return [Document.from_json(doc) for doc in response.json()]

Verification

To verify the fix, run the unit tests and integration tests provided in the implementation. Additionally, test the CrwLoader class with different modes and URLs to ensure it works as expected.

Extra Tips

  • Make sure to handle errors and exceptions properly when interacting with the CRW API.
  • Consider adding support for authentication and rate limiting if necessary.
  • Follow the same pattern as other LangChain partner packages, such as langchain-exa and langchain-firecrawl.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - ✅(Solved) Fix Add CRW partner integration (CrwLoader document loader) [1 pull requests, 4 comments, 3 participants]