langchain - ✅(Solved) Fix Add Plasmate integration - SOM-based web browsing with 10x fewer tokens [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36175Fetched 2026-04-08 01:16:59
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1cross-referenced ×1labeled ×1

Add a langchain-plasmate partner integration for Plasmate, a headless browser engine that outputs SOM (Semantic Object Model) instead of raw HTML.

Root Cause

Add a langchain-plasmate partner integration for Plasmate, a headless browser engine that outputs SOM (Semantic Object Model) instead of raw HTML.

Fix Action

Fixed

PR fix notes

PR #36169: partners: Add Plasmate integration - SOM-based web browsing with 10x fewer tokens

Description (problem / solution / changelog)

Resolves #36175

Summary

Adds langchain-plasmate partner integration for Plasmate, a headless browser engine that outputs SOM (Semantic Object Model) instead of raw HTML.

Security

All URLs are validated via validate_safe_url(url, allow_private=False, allow_http=True) before being passed to the plasmate subprocess. This blocks SSRF attacks targeting private networks and cloud metadata endpoints.

Why

Sending raw HTML to LLMs is expensive. SOM compiles web pages into structured JSON that preserves content and interactivity while discarding presentation markup.

Benchmarks across 49 real-world websites:

MetricResult
Overall compression16.6x (HTML tokens to SOM tokens)
Median compression10.5x
Cost savings94% at GPT-4, GPT-4o, and Claude pricing

What's included

  • PlasmateFetchTool: Fetch web pages as structured SOM content
  • PlasmateNavigateTool: Navigate with interactive element details for agent workflows
  • PlasmateLoader: Document loader for RAG pipelines with compression metadata
  • SSRF validation on all URL inputs via validate_safe_url

Usage

from langchain_plasmate import PlasmateFetchTool

tool = PlasmateFetchTool()
result = tool.invoke({"url": "https://news.ycombinator.com"})

Dependencies

  • langchain-core>=1.0.0
  • som-parser>=0.3.0 (PyPI)
  • plasmate binary (install)

Links

Changed files

  • libs/partners/plasmate/LICENSE (added, +21/-0)
  • libs/partners/plasmate/Makefile (added, +12/-0)
  • libs/partners/plasmate/README.md (added, +109/-0)
  • libs/partners/plasmate/langchain_plasmate/__init__.py (added, +13/-0)
  • libs/partners/plasmate/langchain_plasmate/_utilities.py (added, +116/-0)
  • libs/partners/plasmate/langchain_plasmate/document_loaders.py (added, +98/-0)
  • libs/partners/plasmate/langchain_plasmate/py.typed (added, +0/-0)
  • libs/partners/plasmate/langchain_plasmate/tools.py (added, +160/-0)
  • libs/partners/plasmate/pyproject.toml (added, +42/-0)
  • libs/partners/plasmate/tests/__init__.py (added, +1/-0)
  • libs/partners/plasmate/tests/integration_tests/__init__.py (added, +1/-0)
  • libs/partners/plasmate/tests/unit_tests/__init__.py (added, +1/-0)
  • libs/partners/plasmate/tests/unit_tests/test_tools.py (added, +38/-0)
RAW_BUFFERClick to expand / collapse

Feature Request

Description

Add a langchain-plasmate partner integration for Plasmate, a headless browser engine that outputs SOM (Semantic Object Model) instead of raw HTML.

Motivation

Sending raw HTML to LLMs is expensive. SOM compiles web pages into structured JSON that preserves content and interactivity while discarding presentation markup.

Benchmarks across 49 real-world websites:

  • 16.6x overall token compression (HTML tokens to SOM tokens)
  • 10.5x median compression
  • 94% cost savings at GPT-4, GPT-4o, and Claude pricing

Proposed Solution

A partner package (langchain-plasmate) providing:

  • PlasmateFetchTool: Fetch web pages as structured SOM content
  • PlasmateNavigateTool: Navigate with interactive element details for agent workflows
  • PlasmateLoader: Document loader for RAG pipelines with compression metadata

Dependencies: langchain-core, som-parser (published on PyPI) Requires: plasmate binary (cargo install plasmate)

Related Links

extent analysis

Fix Plan

To integrate Plasmate with LangChain, we need to create a partner package langchain-plasmate with the proposed tools.

Step-by-Step Solution

  • Install required dependencies: langchain-core, som-parser, and plasmate binary
  • Create a new Python package langchain-plasmate with the following structure:
langchain-plasmate/
├── __init__.py
├── plasmate_fetch_tool.py
├── plasmate_navigate_tool.py
├── plasmate_loader.py
└── requirements.txt
  • Implement PlasmateFetchTool to fetch web pages as structured SOM content:
# plasmate_fetch_tool.py
import subprocess
import som_parser

class PlasmateFetchTool:
    def fetch(self, url):
        # Use plasmate binary to fetch SOM content
        som_content = subprocess.check_output(["plasmate", url])
        # Parse SOM content using som-parser
        som_data = som_parser.parse(som_content)
        return som_data
  • Implement PlasmateNavigateTool to navigate with interactive element details:
# plasmate_navigate_tool.py
class PlasmateNavigateTool:
    def navigate(self, url, action):
        # Use plasmate binary to navigate and get interactive element details
        interactive_elements = subprocess.check_output(["plasmate", url, action])
        # Parse interactive element details
        interactive_elements_data = som_parser.parse(interactive_elements)
        return interactive_elements_data
  • Implement PlasmateLoader to load documents with compression metadata:
# plasmate_loader.py
class PlasmateLoader:
    def load(self, url):
        # Use PlasmateFetchTool to fetch SOM content
        som_data = PlasmateFetchTool().fetch(url)
        # Extract compression metadata
        compression_metadata = som_data.get("compression_metadata")
        return som_data, compression_metadata

Verification

To verify the fix, test the langchain-plasmate package by fetching a web page as structured SOM content and navigating with interactive element details.

Extra Tips

  • Make sure to install the plasmate binary and required dependencies before using the langchain-plasmate package.
  • Refer to the Plasmate GitHub and SOM Spec v1.0 for more information on Plasmate and SOM.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING