litellm - 💡(How to fix) Fix [Bug]: langfuse_otel /v1/rerank requests emit duplicate traces even when responses/embeddings stay single-trace [1 comments, 1 participants]

StartupBros · 2026-04-14T22:41:09Z

[litellm] In a self-hosted LiteLLM + Langfuse setup, POST /v1/rerank requests produce duplicate raw Langfuse trace rows for a single logical request, while equ… In a self-hosted LiteLLM + Langfuse setup, `POST /v1/rerank` requests produce duplicate raw Langfuse trace rows for a single logical request, while equivalent `responses` and `embeddings` requests remain single-trace. Verified on both: - production LiteLLM `1.83.7` - separately staged LiteLLM candidate `1.83.7` on the same host after restoring config parity with production This points to a rerank-path-specific tracing problem rather than stale-version debt or local config drift. ## Summary In a self-hosted LiteLLM + Langfuse setup, `POST /v1/rerank` requests produce duplicate raw Langfuse trace rows for a single logical request, while equivalent `responses` and `embeddings` requests remain single-trace. Verified on both: - production LiteLLM `1.83.7` - separately staged LiteLLM candidate `1.83.7` on the same host after restoring config parity with production This points to a rerank-path-specific tracing problem rather than stale-version debt or local config drift. ## Environment - LiteLLM: `1.83.7` - Langfuse: self-hosted `:3` channel images - callback mode: `langfuse_otel` - rerank provider shape: LiteLLM `cohere/*` rerank path backed by a local OpenAI-compatible server endpoint - observed route: `/v1/rerank` ## Ruled out ### Not just an old LiteLLM build Compared: - prod `1.83.4` - candidate `1.83.7` Rerank duplicate behavior appeared on both. ### Not stale staged-candidate contract drift A later bakeoff surfaced a separate staging problem where the candidate config had drifted behind prod and was missing `assistant-private` / `assistant-deep`. That caused false chat-lane failures, but after syncing candidate config from prod and re-running the bakeoff, rerank duplication still remained. ### Not a generic trace duplication problem on every route In the same stack and with the same trace-discipline tooling: - `responses` can remain single-trace for a logical request - `embeddings` can remain single-trace for a logical request - rerank is the route that repeatedly inflates ## Minimal repro We use a small repro that sends repeated rerank requests with: - unique request-level `trace_name` - shared batch/session identity - explicit metadata for route, capability, lane, and model Example: ```bash source ~/.secrets ai-gateway-tunnel up python examples/private-api/python-rerank-trace-repro.py ``` This emits rerank requests with unique trace names like: - `rerank-duplicate-repro- -req01` - `rerank-duplicate-repro- -req02` - `rerank-duplicate-repro- -req03` ## Actual behavior For each logical rerank request: - one request-level trace name is emitted - one request-level session/batch id is emitted - two raw Langfuse trace rows appear for that single logical request Concrete bundle counts from this stack: - production bundle: `6` raw rows for `3` logical requests (`3` extra duplicate rows) - candidate bundle: `6` raw rows for `3` logical requests (`3` extra duplicate rows) Representative normalized helper output shows duplicate burden concentrated on: - `/v1/rerank | local-rerank` ## Expected behavior A single logical rerank request should produce one raw Langfuse trace row/group, comparable to the behavior seen on responses and embeddings. ## Why this matters This makes rerank observability materially noisier than other routes and forces downstream dedupe logic in operator tooling. It also makes route-level behavior harder to compare cleanly in Langfuse without custom normalization. ## Local evidence bundle We built a small evidence collector that writes a bundle containing: - raw repro output - raw Langfuse trace fetch - filtered matching traces for the repro batch - normalized duplicate summary Collector command: ```bash source ~/.secrets ai-gateway-tunnel up bash examples/private-api/collect-rerank-duplicate-evidence.sh prod ``` Equivalent candidate run: ```bash source ~/.secrets AI_GATEWAY_BASE_URL=http://127.0.0.1:14110/v1 bash examples/private-api/collect-rerank-duplicate-evidence.sh candidate ``` Current captured examples: - `.artifacts/rerank-duplicate-evidence/prod-rerank-duplicate-evidence-prod-20260414T222150.287024Z/` - `.artifacts/rerank-duplicate-evidence/candidate-rerank-duplicate-evidence-candidate-20260414T222201.383579Z/` ## Repo-local references - `examples/private-api/python-rerank-trace-repro.py` - `examples/private-api/collect-rerank-duplicate-evidence.sh` - `examples/private-api/run-runtime-bakeoff.sh`

litellm2026-04-14 22:41:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25720•Fetched 2026-04-16 06:36:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

StartupBros

Participants

StartupBros

Timeline (top)

commented ×1labeled ×1

In a self-hosted LiteLLM + Langfuse setup, POST /v1/rerank requests produce duplicate raw Langfuse trace rows for a single logical request, while equivalent responses and embeddings requests remain single-trace.

Verified on both:

production LiteLLM 1.83.7
separately staged LiteLLM candidate 1.83.7 on the same host after restoring config parity with production

This points to a rerank-path-specific tracing problem rather than stale-version debt or local config drift.

Root Cause

This makes rerank observability materially noisier than other routes and forces downstream dedupe logic in operator tooling. It also makes route-level behavior harder to compare cleanly in Langfuse without custom normalization.

Code Example

source ~/.secrets
ai-gateway-tunnel up
python examples/private-api/python-rerank-trace-repro.py

---

source ~/.secrets
ai-gateway-tunnel up
bash examples/private-api/collect-rerank-duplicate-evidence.sh prod

---

source ~/.secrets
AI_GATEWAY_BASE_URL=http://127.0.0.1:14110/v1 bash examples/private-api/collect-rerank-duplicate-evidence.sh candidate

RAW_BUFFERClick to expand / collapse

Summary

Verified on both:

production LiteLLM 1.83.7
separately staged LiteLLM candidate 1.83.7 on the same host after restoring config parity with production

This points to a rerank-path-specific tracing problem rather than stale-version debt or local config drift.

Environment

LiteLLM: 1.83.7
Langfuse: self-hosted :3 channel images
callback mode: langfuse_otel
rerank provider shape: LiteLLM cohere/* rerank path backed by a local OpenAI-compatible server endpoint
observed route: /v1/rerank

Ruled out

Not just an old LiteLLM build

Compared:

prod 1.83.4
candidate 1.83.7

Rerank duplicate behavior appeared on both.

Not stale staged-candidate contract drift

A later bakeoff surfaced a separate staging problem where the candidate config had drifted behind prod and was missing assistant-private / assistant-deep. That caused false chat-lane failures, but after syncing candidate config from prod and re-running the bakeoff, rerank duplication still remained.

Not a generic trace duplication problem on every route

In the same stack and with the same trace-discipline tooling:

responses can remain single-trace for a logical request
embeddings can remain single-trace for a logical request
rerank is the route that repeatedly inflates

Minimal repro

We use a small repro that sends repeated rerank requests with:

unique request-level trace_name
shared batch/session identity
explicit metadata for route, capability, lane, and model

Example:

source ~/.secrets
ai-gateway-tunnel up
python examples/private-api/python-rerank-trace-repro.py

This emits rerank requests with unique trace names like:

rerank-duplicate-repro-<batch>-req01
rerank-duplicate-repro-<batch>-req02
rerank-duplicate-repro-<batch>-req03

Actual behavior

For each logical rerank request:

one request-level trace name is emitted
one request-level session/batch id is emitted
two raw Langfuse trace rows appear for that single logical request

Concrete bundle counts from this stack:

production bundle: 6 raw rows for 3 logical requests (3 extra duplicate rows)
candidate bundle: 6 raw rows for 3 logical requests (3 extra duplicate rows)

Representative normalized helper output shows duplicate burden concentrated on:

/v1/rerank | local-rerank

Expected behavior

A single logical rerank request should produce one raw Langfuse trace row/group, comparable to the behavior seen on responses and embeddings.

Why this matters

Local evidence bundle

We built a small evidence collector that writes a bundle containing:

raw repro output
raw Langfuse trace fetch
filtered matching traces for the repro batch
normalized duplicate summary

Collector command:

source ~/.secrets
ai-gateway-tunnel up
bash examples/private-api/collect-rerank-duplicate-evidence.sh prod

Equivalent candidate run:

source ~/.secrets
AI_GATEWAY_BASE_URL=http://127.0.0.1:14110/v1 bash examples/private-api/collect-rerank-duplicate-evidence.sh candidate

Current captured examples:

.artifacts/rerank-duplicate-evidence/prod-rerank-duplicate-evidence-prod-20260414T222150.287024Z/
.artifacts/rerank-duplicate-evidence/candidate-rerank-duplicate-evidence-candidate-20260414T222201.383579Z/

Repo-local references

examples/private-api/python-rerank-trace-repro.py
examples/private-api/collect-rerank-duplicate-evidence.sh
examples/private-api/run-runtime-bakeoff.sh

extent analysis

TL;DR

The most likely fix for the duplicate Langfuse trace rows issue on the /v1/rerank route is to investigate and adjust the tracing configuration specific to the langfuse_otel callback mode and the LiteLLM cohere/* rerank path.

Guidance

Review the tracing configuration for the langfuse_otel callback mode to ensure it is not causing duplicate traces for the /v1/rerank route.
Investigate the LiteLLM cohere/* rerank path implementation to see if it is generating multiple traces for a single logical request.
Use the provided python-rerank-trace-repro.py script to reproduce the issue and gather more information about the tracing behavior.
Analyze the output of the collect-rerank-duplicate-evidence.sh script to understand the pattern of duplicate traces and identify potential causes.

Example

No code snippet is provided as the issue is more related to configuration and tracing setup rather than a specific code problem.

Notes

The issue seems to be specific to the /v1/rerank route and the langfuse_otel callback mode, so any changes or adjustments should be targeted at this specific configuration. It's also important to review the tracing setup for the LiteLLM cohere/* rerank path to ensure it is not contributing to the duplicate traces.

Recommendation

Apply a workaround by adjusting the tracing configuration for the langfuse_otel callback mode and the LiteLLM cohere/* rerank path to prevent duplicate traces for the /v1/rerank route. This will likely involve modifying the tracing setup to only generate a single trace for each logical request.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

A single logical rerank request should produce one raw Langfuse trace row/group, comparable to the behavior seen on responses and embeddings.

#api #dependency conflict #environment setup #docker error #permission error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: langfuse_otel /v1/rerank requests emit duplicate traces even when responses/embeddings stay single-trace [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Environment

Ruled out

Not just an old LiteLLM build

Not stale staged-candidate contract drift

Not a generic trace duplication problem on every route

Minimal repro

Actual behavior

Expected behavior

Why this matters

Local evidence bundle

Repo-local references

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: langfuse_otel /v1/rerank requests emit duplicate traces even when responses/embeddings stay single-trace [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Environment

Ruled out

Not just an old LiteLLM build

Not stale staged-candidate contract drift

Not a generic trace duplication problem on every route

Minimal repro

Actual behavior

Expected behavior

Why this matters

Local evidence bundle

Repo-local references

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING