langchain - ✅(Solved) Fix Add CRW partner integration (CrwLoader document loader) [1 pull requests, 4 comments, 3 participants]

us · 2026-03-26T13:00:28Z

[langchain] PR 36270: Add CrwLoader partner integration - Repository: langchain-ai/langchain - Author: us - State: closed | merged: False - Link: https://githu… # PR #36270: Add CrwLoader partner integration - Repository: langchain-ai/langchain - Author: us - State: closed | merged: False - Link: https://github.com/langchain-ai/langchain/pull/36270 ## Description (problem / solution / changelog) ## Summary - Adds `langchain-crw`, a partner integration for [CRW](https://github.com/crw-rs/crw) — a high-performance, Firecrawl-compatible web scraper written in Rust - Provides `CrwLoader` document loader with scrape, crawl, and map modes - Works with self-hosted `crw-server` or the [fastcrw.com](https://fastcrw.com) cloud API ## Test plan - [ ] `make test` in `libs/partners/crw` - [ ] Verify import test passes - [ ] Integration test with running crw-server instance ## Changed files - `libs/partners/crw/.gitignore` (added, +1/-0) - `libs/partners/crw/LICENSE` (added, +21/-0) - `libs/partners/crw/Makefile` (added, +69/-0) - `libs/partners/crw/README.md` (added, +39/-0) - `libs/partners/crw/langchain_crw/__init__.py` (added, +7/-0) - `libs/partners/crw/langchain_crw/document_loaders.py` (added, +241/-0) - `libs/partners/crw/langchain_crw/py.typed` (added, +0/-0) - `libs/partners/crw/pyproject.toml` (added, +140/-0) - `libs/partners/crw/scripts/check_imports.py` (added, +19/-0) - `libs/partners/crw/scripts/lint_imports.sh` (added, +18/-0) - `libs/partners/crw/tests/__init__.py` (added, +0/-0) - `libs/partners/crw/tests/integration_tests/__init__.py` (added, +0/-0) - `libs/partners/crw/tests/integration_tests/test_compile.py` (added, +8/-0) - `libs/partners/crw/tests/unit_tests/__init__.py` (added, +0/-0) - `libs/partners/crw/tests/unit_tests/test_imports.py` (added, +12/-0) ## Feature Request Add a partner integration for [CRW](https://github.com/us/crw) — a high-performance, Firecrawl-compatible web scraper written in Rust. ### Motivation CRW provides a Firecrawl-compatible REST API for web scraping, crawling, and site mapping, optimized for AI agent workflows. It ships as a single binary with ~6 MB idle RAM — 5.5x faster and 75x less memory than Firecrawl on 1K real-world URLs. A LangChain partner integration (`langchain-crw`) would give users a `CrwLoader` document loader with: - **Scrape mode**: Single page to LangChain Document - **Crawl mode**: BFS crawl with async polling, multiple pages to Documents - **Map mode**: URL discovery via sitemap + link extraction ### Why Partner Integration (not Community) CRW maintains its own API and release cycle. A partner package (`langchain-crw`) follows the same pattern as `langchain-exa`, `langchain-firecrawl`, etc. ### Implementation I have a working implementation ready: https://github.com/langchain-ai/langchain/pull/36270 Key details: - Uses `requests` directly (no SDK dependency) - Supports self-hosted (`localhost:3000`) and cloud (`fastcrw.com`) - Follows `BaseLoader` interface with `lazy_load()` - Includes unit tests, integration test scaffolding, and full package structure ### References - GitHub: https://github.com/us/crw - API docs: https://us.github.io/crw/rest-api - Cloud: https://fastcrw.com

langchain2026-03-26 13:00:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#36273•Fetched 2026-04-08 01:36:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

ccurme

majorelalexis-stack

Timeline (top)

commented ×4mentioned ×2subscribed ×2labeled ×1

PR fix notes

PR #36270: Add CrwLoader partner integration

Repository: langchain-ai/langchain
Author: us
State: closed | merged: False
Link: https://github.com/langchain-ai/langchain/pull/36270

Description (problem / solution / changelog)

Summary

Adds langchain-crw, a partner integration for CRW — a high-performance, Firecrawl-compatible web scraper written in Rust
Provides CrwLoader document loader with scrape, crawl, and map modes
Works with self-hosted crw-server or the fastcrw.com cloud API

Test plan

make test in libs/partners/crw
Verify import test passes
Integration test with running crw-server instance

Changed files

libs/partners/crw/.gitignore (added, +1/-0)
libs/partners/crw/LICENSE (added, +21/-0)
libs/partners/crw/Makefile (added, +69/-0)
libs/partners/crw/README.md (added, +39/-0)
libs/partners/crw/langchain_crw/__init__.py (added, +7/-0)
libs/partners/crw/langchain_crw/document_loaders.py (added, +241/-0)
libs/partners/crw/langchain_crw/py.typed (added, +0/-0)
libs/partners/crw/pyproject.toml (added, +140/-0)
libs/partners/crw/scripts/check_imports.py (added, +19/-0)
libs/partners/crw/scripts/lint_imports.sh (added, +18/-0)
libs/partners/crw/tests/__init__.py (added, +0/-0)
libs/partners/crw/tests/integration_tests/__init__.py (added, +0/-0)
libs/partners/crw/tests/integration_tests/test_compile.py (added, +8/-0)
libs/partners/crw/tests/unit_tests/__init__.py (added, +0/-0)
libs/partners/crw/tests/unit_tests/test_imports.py (added, +12/-0)

RAW_BUFFERClick to expand / collapse

Feature Request

Add a partner integration for CRW — a high-performance, Firecrawl-compatible web scraper written in Rust.

Motivation

CRW provides a Firecrawl-compatible REST API for web scraping, crawling, and site mapping, optimized for AI agent workflows. It ships as a single binary with ~6 MB idle RAM — 5.5x faster and 75x less memory than Firecrawl on 1K real-world URLs.

A LangChain partner integration (langchain-crw) would give users a CrwLoader document loader with:

Scrape mode: Single page to LangChain Document
Crawl mode: BFS crawl with async polling, multiple pages to Documents
Map mode: URL discovery via sitemap + link extraction

Why Partner Integration (not Community)

CRW maintains its own API and release cycle. A partner package (langchain-crw) follows the same pattern as langchain-exa, langchain-firecrawl, etc.

Implementation

I have a working implementation ready: https://github.com/langchain-ai/langchain/pull/36270

Key details:

Uses requests directly (no SDK dependency)
Supports self-hosted (localhost:3000) and cloud (fastcrw.com)
Follows BaseLoader interface with lazy_load()
Includes unit tests, integration test scaffolding, and full package structure

References

GitHub: https://github.com/us/crw
API docs: https://us.github.io/crw/rest-api
Cloud: https://fastcrw.com

extent analysis

Fix Plan

To integrate CRW with LangChain, we will create a partner package langchain-crw with a CrwLoader document loader. Here are the steps:

Create a new Python package langchain-crw with the necessary dependencies.
Implement the CrwLoader class with scrape, crawl, and map modes.
Use the requests library to interact with the CRW API.

Example Code

import requests
from langchain.document import Document
from langchain.loader import BaseLoader

class CrwLoader(BaseLoader):
    def __init__(self, api_url: str):
        self.api_url = api_url

    def lazy_load(self, url: str):
        # Scrape mode: Single page to LangChain Document
        response = requests.get(f"{self.api_url}/scrape", params={"url": url})
        return Document.from_json(response.json())

    def crawl(self, url: str):
        # Crawl mode: BFS crawl with async polling, multiple pages to Documents
        response = requests.get(f"{self.api_url}/crawl", params={"url": url})
        return [Document.from_json(doc) for doc in response.json()]

    def map(self, url: str):
        # Map mode: URL discovery via sitemap + link extraction
        response = requests.get(f"{self.api_url}/map", params={"url": url})
        return [Document.from_json(doc) for doc in response.json()]

Verification

To verify the fix, run the unit tests and integration tests provided in the implementation. Additionally, test the CrwLoader class with different modes and URLs to ensure it works as expected.

Extra Tips

Make sure to handle errors and exceptions properly when interacting with the CRW API.
Consider adding support for authentication and rate limiting if necessary.
Follow the same pattern as other LangChain partner packages, such as langchain-exa and langchain-firecrawl.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - ✅(Solved) Fix Add CRW partner integration (CrwLoader document loader) [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #36270: Add CrwLoader partner integration

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Feature Request

Motivation

Why Partner Integration (not Community)

Implementation

References

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

langchain - ✅(Solved) Fix Add CRW partner integration (CrwLoader document loader) [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #36270: Add CrwLoader partner integration

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Feature Request

Motivation

Why Partner Integration (not Community)

Implementation

References

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING