litellm - 💡(How to fix) Fix [Bug]: langfuse_otel /v1/rerank requests emit duplicate traces even when responses/embeddings stay single-trace [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25720Fetched 2026-04-16 06:36:58
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
commented ×1labeled ×1

In a self-hosted LiteLLM + Langfuse setup, POST /v1/rerank requests produce duplicate raw Langfuse trace rows for a single logical request, while equivalent responses and embeddings requests remain single-trace.

Verified on both:

  • production LiteLLM 1.83.7
  • separately staged LiteLLM candidate 1.83.7 on the same host after restoring config parity with production

This points to a rerank-path-specific tracing problem rather than stale-version debt or local config drift.

Root Cause

This makes rerank observability materially noisier than other routes and forces downstream dedupe logic in operator tooling. It also makes route-level behavior harder to compare cleanly in Langfuse without custom normalization.

Code Example

source ~/.secrets
ai-gateway-tunnel up
python examples/private-api/python-rerank-trace-repro.py

---

source ~/.secrets
ai-gateway-tunnel up
bash examples/private-api/collect-rerank-duplicate-evidence.sh prod

---

source ~/.secrets
AI_GATEWAY_BASE_URL=http://127.0.0.1:14110/v1 bash examples/private-api/collect-rerank-duplicate-evidence.sh candidate
RAW_BUFFERClick to expand / collapse

Summary

In a self-hosted LiteLLM + Langfuse setup, POST /v1/rerank requests produce duplicate raw Langfuse trace rows for a single logical request, while equivalent responses and embeddings requests remain single-trace.

Verified on both:

  • production LiteLLM 1.83.7
  • separately staged LiteLLM candidate 1.83.7 on the same host after restoring config parity with production

This points to a rerank-path-specific tracing problem rather than stale-version debt or local config drift.

Environment

  • LiteLLM: 1.83.7
  • Langfuse: self-hosted :3 channel images
  • callback mode: langfuse_otel
  • rerank provider shape: LiteLLM cohere/* rerank path backed by a local OpenAI-compatible server endpoint
  • observed route: /v1/rerank

Ruled out

Not just an old LiteLLM build

Compared:

  • prod 1.83.4
  • candidate 1.83.7

Rerank duplicate behavior appeared on both.

Not stale staged-candidate contract drift

A later bakeoff surfaced a separate staging problem where the candidate config had drifted behind prod and was missing assistant-private / assistant-deep. That caused false chat-lane failures, but after syncing candidate config from prod and re-running the bakeoff, rerank duplication still remained.

Not a generic trace duplication problem on every route

In the same stack and with the same trace-discipline tooling:

  • responses can remain single-trace for a logical request
  • embeddings can remain single-trace for a logical request
  • rerank is the route that repeatedly inflates

Minimal repro

We use a small repro that sends repeated rerank requests with:

  • unique request-level trace_name
  • shared batch/session identity
  • explicit metadata for route, capability, lane, and model

Example:

source ~/.secrets
ai-gateway-tunnel up
python examples/private-api/python-rerank-trace-repro.py

This emits rerank requests with unique trace names like:

  • rerank-duplicate-repro-<batch>-req01
  • rerank-duplicate-repro-<batch>-req02
  • rerank-duplicate-repro-<batch>-req03

Actual behavior

For each logical rerank request:

  • one request-level trace name is emitted
  • one request-level session/batch id is emitted
  • two raw Langfuse trace rows appear for that single logical request

Concrete bundle counts from this stack:

  • production bundle: 6 raw rows for 3 logical requests (3 extra duplicate rows)
  • candidate bundle: 6 raw rows for 3 logical requests (3 extra duplicate rows)

Representative normalized helper output shows duplicate burden concentrated on:

  • /v1/rerank | local-rerank

Expected behavior

A single logical rerank request should produce one raw Langfuse trace row/group, comparable to the behavior seen on responses and embeddings.

Why this matters

This makes rerank observability materially noisier than other routes and forces downstream dedupe logic in operator tooling. It also makes route-level behavior harder to compare cleanly in Langfuse without custom normalization.

Local evidence bundle

We built a small evidence collector that writes a bundle containing:

  • raw repro output
  • raw Langfuse trace fetch
  • filtered matching traces for the repro batch
  • normalized duplicate summary

Collector command:

source ~/.secrets
ai-gateway-tunnel up
bash examples/private-api/collect-rerank-duplicate-evidence.sh prod

Equivalent candidate run:

source ~/.secrets
AI_GATEWAY_BASE_URL=http://127.0.0.1:14110/v1 bash examples/private-api/collect-rerank-duplicate-evidence.sh candidate

Current captured examples:

  • .artifacts/rerank-duplicate-evidence/prod-rerank-duplicate-evidence-prod-20260414T222150.287024Z/
  • .artifacts/rerank-duplicate-evidence/candidate-rerank-duplicate-evidence-candidate-20260414T222201.383579Z/

Repo-local references

  • examples/private-api/python-rerank-trace-repro.py
  • examples/private-api/collect-rerank-duplicate-evidence.sh
  • examples/private-api/run-runtime-bakeoff.sh

extent analysis

TL;DR

The most likely fix for the duplicate Langfuse trace rows issue on the /v1/rerank route is to investigate and adjust the tracing configuration specific to the langfuse_otel callback mode and the LiteLLM cohere/* rerank path.

Guidance

  • Review the tracing configuration for the langfuse_otel callback mode to ensure it is not causing duplicate traces for the /v1/rerank route.
  • Investigate the LiteLLM cohere/* rerank path implementation to see if it is generating multiple traces for a single logical request.
  • Use the provided python-rerank-trace-repro.py script to reproduce the issue and gather more information about the tracing behavior.
  • Analyze the output of the collect-rerank-duplicate-evidence.sh script to understand the pattern of duplicate traces and identify potential causes.

Example

No code snippet is provided as the issue is more related to configuration and tracing setup rather than a specific code problem.

Notes

The issue seems to be specific to the /v1/rerank route and the langfuse_otel callback mode, so any changes or adjustments should be targeted at this specific configuration. It's also important to review the tracing setup for the LiteLLM cohere/* rerank path to ensure it is not contributing to the duplicate traces.

Recommendation

Apply a workaround by adjusting the tracing configuration for the langfuse_otel callback mode and the LiteLLM cohere/* rerank path to prevent duplicate traces for the /v1/rerank route. This will likely involve modifying the tracing setup to only generate a single trace for each logical request.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A single logical rerank request should produce one raw Langfuse trace row/group, comparable to the behavior seen on responses and embeddings.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING