langchain - ✅(Solved) Fix Feature: Work Ledger integration — regression testing & diff for LangChain runs [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#35725Fetched 2026-04-08 00:24:55
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #3044: docs: add Work Ledger callback handler integration

Description (problem / solution / changelog)

Summary

Adds an integration page for Work Ledger under Python > Integrations > Callbacks.

Work Ledger is an open-source tool (MIT) for recording, diffing, and regression-testing LangChain runs. It ships a WorkLedgerCallbackHandler that inherits from BaseCallbackHandler and captures LLM calls, tool invocations, retriever queries, and chain I/O with token metrics and causal links.

Related feature request: https://github.com/langchain-ai/langchain/issues/35725

Changes

  • Added src/oss/python/integrations/callbacks/work_ledger.mdx

Notes

  • docs.json navigation update may be needed — happy to add it if you point me to the right section
  • The handler has been tested with real OpenAI API calls through langchain-openai
  • 276 tests passing, no heavy dependencies

Changed files

  • src/oss/python/integrations/callbacks/work_ledger.mdx (added, +97/-0)

Code Example

from work_ledger import WorkLedger, WorkLedgerCallbackHandler
from langchain_openai import ChatOpenAI

ledger = WorkLedger(store="./runs")
handler = WorkLedgerCallbackHandler(ledger, run_name="my-chain")

chain.invoke({"question": "hi"}, config={"callbacks": [handler]})
run = handler.get_run()  # structured Run with steps, metrics, causal links

---

from work_ledger.testing.diff import RunDiff

diff = RunDiff(run_v1, run_v2)
print(f"Similarity: {diff.similarity:.0%}")
print(f"Token delta: {diff.token_diff:+d}")
print(f"Steps added: {diff.steps_added}")
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain-core

Feature Description

Work Ledger is an open-source library (MIT) for recording, replaying, and comparing LLM agent runs. It already ships a WorkLedgerCallbackHandler that inherits from langchain_core.callbacks.BaseCallbackHandler and captures:

  • LLM/chat model calls with token usage
  • Tool invocations with inputs/outputs
  • Retriever queries with documents
  • Chain-level inputs/outputs
  • Causal links between steps
from work_ledger import WorkLedger, WorkLedgerCallbackHandler
from langchain_openai import ChatOpenAI

ledger = WorkLedger(store="./runs")
handler = WorkLedgerCallbackHandler(ledger, run_name="my-chain")

chain.invoke({"question": "hi"}, config={"callbacks": [handler]})
run = handler.get_run()  # structured Run with steps, metrics, causal links

After recording runs, you can diff them:

from work_ledger.testing.diff import RunDiff

diff = RunDiff(run_v1, run_v2)
print(f"Similarity: {diff.similarity:.0%}")
print(f"Token delta: {diff.token_diff:+d}")
print(f"Steps added: {diff.steps_added}")

Use Case

When developing LangChain applications, prompt changes, model swaps, or tool modifications can silently alter behavior. Currently there's no standard way to:

  1. Record a chain/agent execution as a structured artifact
  2. Compare two runs to see exactly what changed (steps, outputs, tokens, cost)
  3. Set up golden-file regression tests for CI

Work Ledger fills this gap. It's not an observability platform — it's a testing/debugging tool that complements LangSmith.

Typical workflow:

  • Record a "known good" run
  • Make changes (prompt, model, tools)
  • Record the new run
  • RunDiff shows exactly what changed
  • CLI: work-ledger diff <run1> <run2>

Proposed Solution

I'd like to propose adding Work Ledger to LangChain's community integrations documentation, so users can discover it as a testing tool.

The integration is already working — the WorkLedgerCallbackHandler properly inherits from BaseCallbackHandler, passes isinstance checks, and works with LCEL chains via RunnableConfig.

Tested with real OpenAI API calls through langchain-openai:

  • Simple LLM calls ✓
  • LCEL chains (prompt | llm | parser) ✓
  • Tool-calling LLMs ✓
  • Cross-run diff ✓

Alternatives Considered

  • LangSmith: Great for observability/monitoring but focuses on tracing, not structured regression testing with diffs
  • Manual scripts: Every team builds their own — no standard approach
  • agent-vcr / agentgraph: Similar tools but single-framework; Work Ledger supports LangChain, LangGraph, PydanticAI, CrewAI, LlamaIndex, OpenAI SDK, Anthropic SDK

Additional Context

  • Repository: https://github.com/metawake/work-ledger
  • License: MIT
  • 276 tests passing
  • Also integrates with LangGraph via wrap_graph()
  • No heavy dependencies — langchain-core is optional (handler falls back to plain class)

extent analysis

Solution Plan

Add Work Ledger to LangChain's Community Integrations Documentation

To integrate Work Ledger with LangChain, follow these steps:

  1. Update LangChain Documentation:

    • Create a new section in the community integrations documentation for Work Ledger.
    • Include a brief description, installation instructions, and example usage.
  2. Add Work Ledger to LangChain's CI/CD Pipeline:

    • Integrate Work Ledger with LangChain's CI/CD pipeline to ensure seamless testing and validation.
  3. Update LangChain's API Reference:

    • Add Work Ledger to the list of supported callback handlers in the API reference.
  4. Example Usage:

from langchain_core import BaseCallbackHandler from work_ledger import WorkLedgerCallbackHandler

Create a Work Ledger instance

ledger = WorkLedger(store="./runs")

Create a Work Ledger callback handler

handler = WorkLedgerCallbackHandler(ledger, run_name="my-chain")

Use the callback handler in your LangChain chain

chain.invoke({"question": "hi"}, config={"callbacks": [handler]})


5. **Verify the Integration**:
   - Test the integration with various LangChain chains and callback handlers.
   - Ensure that the Work Ledger callback handler properly captures and records chain execution data.

### Verification

To verify that the integration is working correctly:

1. **Run a LangChain Chain with Work Ledger**:
   - Create a LangChain chain with the Work Ledger callback handler.
   - Run the chain and verify that the Work Ledger instance is properly capturing and recording chain execution data.

2. **Compare Runs with RunDiff**:
   - Record two runs with different inputs or configurations.
   - Use the `RunDiff` class to compare the two runs and verify that the differences are correctly reported.

### Extra Tips

- Make sure to update the Work Ledger documentation to reflect the new integration with LangChain.
- Consider adding

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING