litellm - 💡(How to fix) Fix [Bug]: When an agent with tools turns on either redis-semanticor qdrant-semantic cache mode and carries tool return JSON requests to the gateway, the tool return content will be lost.

litellm2026-05-25 12:22:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Observed Behavior After tool invocation, the agent began repeating requests. Logs confirmed that tool responses were correctly returned, so the issue was not an infinite loop. Root Cause Analysis The most probable cause is improper injection of tool results into the model context: the results lacked the required role=tooland corresponding tool_call_id. Langfuse Tracing Multiple ToolNodes were observed, each showing normal tool input and output. However, once execution returned to the parent Chat Model node, the tool outputs disappeared. Only the tool request and assistant tool_call_idremained. Hypothesis When tool invocation is combined with semantic similarity caching, tool response payloads are lost. The agent, seeing only the tool request, assumes the tool has not been executed and repeatedly issues new calls, resulting in persistent errors.

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Steps to Reproduce

create a simple agent with a tool
set up litellm on your machine with the following config: model_list:

model_name: tokenhub-deepseek-v4-pro litellm_params: model: openai/tokenhub-deepseek-v4-pro api_base: os.environ/UPSTREAM_OPENAI_BASE_URL api_key: os.environ/UPSTREAM_OPENAI_API_KEY timeout: 600 temperature: 0.6 top_p: 0.95
model_name: venus-embedding litellm_params: model: openai/bge-large-zh api_base: os.environ/VENUS_PROXY_EMBEDDING_BASE_URL api_key: os.environ/VENUS_PROXY_API_TOKEN timeout: 60

litellm_settings: set_verbose: true cache: true cache_params: type: redis-semantic # mode: default_off ttl: 600 similarity_threshold: 0.5 # similarity threshold for semantic cache redis_semantic_cache_embedding_model: openai/bge-large-zh

general_settings: master_key: os.environ/LITELLM_MASTER_KEY database_url: os.environ/DATABASE_URL disable_master_key_return: true store_model_in_db: true store_prompts_in_spend_logs: true 3. ask the agent to call the tool and observe.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.6

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering