langchain - ✅(Solved) Fix Security: path traversal in FileChatMessageHistory docstring example via unsanitized session_id [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36887Fetched 2026-04-20 11:58:57
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1labeled ×1

The BaseChatMessageHistory docstring in libs/core/langchain_core/chat_history.py contains a FileChatMessageHistory example that developers copy into real applications. That example constructs file paths with:

file_path = os.path.join(self.storage_path, self.session_id)

session_id is user-controlled in typical web deployments — it arrives via RunnableWithMessageHistory's configurable dict, which is populated from request parameters.

Root Cause

The BaseChatMessageHistory docstring in libs/core/langchain_core/chat_history.py contains a FileChatMessageHistory example that developers copy into real applications. That example constructs file paths with:

file_path = os.path.join(self.storage_path, self.session_id)

session_id is user-controlled in typical web deployments — it arrives via RunnableWithMessageHistory's configurable dict, which is populated from request parameters.

Fix Action

Fixed

PR fix notes

PR #36888: fix(core): guard FileChatMessageHistory example against path traversal

Description (problem / solution / changelog)

Fixes #36887

Summary

The BaseChatMessageHistory docstring example uses os.path.join(storage_path, session_id) directly. session_id is user-controlled in typical web deployments (via RunnableWithMessageHistory's configurable dict). Two bypass vectors exist:

  1. Relative traversalsession_id="../../etc/shadow" resolves outside storage_path
  2. Absolute path — on POSIX, os.path.join("/safe/dir", "/etc/shadow") returns "/etc/shadow", discarding the base entirely

Fix: Added _safe_path() helper using os.path.realpath() to assert the resolved path stays within storage_path before any file I/O. All three operations (messages, add_messages, clear) go through this helper.

def _safe_path(self) -> str:
    base = os.path.realpath(self.storage_path)
    resolved = os.path.realpath(os.path.join(self.storage_path, self.session_id))
    if not resolved.startswith(base + os.sep) and resolved != base:
        raise ValueError(
            f"Invalid session_id '{self.session_id}': "
            "path resolves outside storage_path."
        )
    return resolved

Test plan

  • session_id = "safe_session" → works normally
  • session_id = "../../etc/passwd" → raises ValueError
  • session_id = "/etc/passwd" → raises ValueError (absolute path bypass)
  • session_id = "subdir/session" → works if subdir is within storage_path

Note: langchain-community's FileChatMessageHistory uses a different API (file_path directly) and is not affected.

Some code in this commit was written with assistance from Claude Sonnet 4.6 (AI).

🤖 Generated with Claude Code

Changed files

  • libs/core/langchain_core/chat_history.py (modified, +19/-7)

Code Example

file_path = os.path.join(self.storage_path, self.session_id)

---

session_id = "../../etc/shadow"
os.path.join("/safe/storage", "../../etc/shadow")
# → "/safe/storage/../../etc/shadow" → resolves to "/etc/shadow"

---

session_id = "/etc/passwd"
os.path.join("/safe/storage", "/etc/passwd")
# → "/etc/passwd"  (os.path.join discards the base for absolute second arg)

---

def _safe_path(self) -> str:
    base = os.path.realpath(self.storage_path)
    resolved = os.path.realpath(os.path.join(self.storage_path, self.session_id))
    if not resolved.startswith(base + os.sep) and resolved != base:
        raise ValueError(
            f"Invalid session_id '{self.session_id}': "
            "path resolves outside storage_path."
        )
    return resolved
RAW_BUFFERClick to expand / collapse

Description

The BaseChatMessageHistory docstring in libs/core/langchain_core/chat_history.py contains a FileChatMessageHistory example that developers copy into real applications. That example constructs file paths with:

file_path = os.path.join(self.storage_path, self.session_id)

session_id is user-controlled in typical web deployments — it arrives via RunnableWithMessageHistory's configurable dict, which is populated from request parameters.

Attack vectors

1. Relative traversal

session_id = "../../etc/shadow"
os.path.join("/safe/storage", "../../etc/shadow")
# → "/safe/storage/../../etc/shadow" → resolves to "/etc/shadow"

2. Absolute path bypass (POSIX)

session_id = "/etc/passwd"
os.path.join("/safe/storage", "/etc/passwd")
# → "/etc/passwd"  (os.path.join discards the base for absolute second arg)

Both allow reading or writing arbitrary files as the process user.

Affected code

  • libs/core/langchain_core/chat_history.py — docstring example (lines ~57–89)
  • Likely also langchain-community's FileChatMessageHistory production implementation

Proposed fix

Add a _safe_path() helper that validates the resolved path stays within storage_path:

def _safe_path(self) -> str:
    base = os.path.realpath(self.storage_path)
    resolved = os.path.realpath(os.path.join(self.storage_path, self.session_id))
    if not resolved.startswith(base + os.sep) and resolved != base:
        raise ValueError(
            f"Invalid session_id '{self.session_id}': "
            "path resolves outside storage_path."
        )
    return resolved

I have a ready PR with this fix — happy to reopen it once this issue is acknowledged.

extent analysis

TL;DR

To prevent arbitrary file access vulnerabilities, validate that the resolved file path stays within the intended storage directory using a _safe_path() helper function.

Guidance

  • Implement the proposed _safe_path() function to ensure the session_id does not lead to paths outside the storage_path.
  • Use os.path.realpath() to resolve both the base storage_path and the joined path with session_id to handle symlinks and relative paths correctly.
  • Raise a ValueError if the resolved path is not within the storage_path to prevent potential security breaches.
  • Review and test the FileChatMessageHistory production implementation in langchain-community to ensure it also uses the secure path validation.

Example

def _safe_path(self) -> str:
    base = os.path.realpath(self.storage_path)
    resolved = os.path.realpath(os.path.join(self.storage_path, self.session_id))
    if not resolved.startswith(base + os.sep) and resolved != base:
        raise ValueError(
            f"Invalid session_id '{self.session_id}': "
            "path resolves outside storage_path."
        )
    return resolved

Notes

This fix assumes that the storage_path is a secure and controlled directory. Ensure that the storage_path itself does not have any vulnerabilities that could be exploited.

Recommendation

Apply the proposed workaround by implementing the _safe_path() function to validate and secure file paths, as it directly addresses the identified security vulnerability.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING