litellm - ✅(Solved) Fix [Bug]: Pass-through endpoint forwards stale Content-Length after gzip auto-decompression [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25624Fetched 2026-04-14 05:38:38
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×1cross-referenced ×1labeled ×1referenced ×1

Error Message

  1. Client reads 34 bytes, expects 24 more, connection closes → error

Error Messages

mismatch: 58 - 34 = 24 bytes (matches curl error exactly)

Root Cause

In pass_through_endpoints.py line 321:

excluded_headers = {"transfer-encoding", "content-encoding"}

The flow:

  1. LiteLLM's internal httpx client requests the upstream with Accept-Encoding: gzip (httpx default)
  2. Upstream returns gzip-compressed body (e.g. 58 bytes) with Content-Length: 58 and Content-Encoding: gzip
  3. httpx auto-decompresses the body to 34 bytes, but response.headers["content-length"] still reads 58
  4. get_response_headers() strips content-encoding: gzip but keeps content-length: 58
  5. FastAPI sends 34 bytes of body with Content-Length: 58 header to the client
  6. Client reads 34 bytes, expects 24 more, connection closes → error

Fix Action

Fix / Workaround

v1.82.0-stable.patch5 (ghcr.io/berriai/litellm-database:main-v1.82.0-stable.patch5)

PR fix notes

PR #25655: fix: strip stale Content-Length in pass-through after gzip decompression

Description (problem / solution / changelog)

Summary

Fixes #25624

When a pass-through endpoint proxies a gzip-compressed upstream response, httpx auto-decompresses the body but response.headers["content-length"] still reflects the compressed size. get_response_headers() strips content-encoding but keeps the now-stale content-length, causing a mismatch between the header and actual body size.

The fix: Add "content-length" to excluded_headers in get_response_headers(). FastAPI's Response(content=bytes) auto-calculates the correct value from the actual body.

This is safe for all cases:

  • Gzip responses: Fixes the mismatch
  • Non-gzip responses: FastAPI calculates the same value as upstream
  • Streaming responses: StreamingResponse does not use Content-Length

Changes

  • litellm/proxy/pass_through_endpoints/pass_through_endpoints.py: Added "content-length" to excluded_headers set

Test plan

  • Verify pass-through with gzip-compressed upstream (e.g. BytePlus Ark API) returns correct Content-Length
  • Verify non-gzip pass-through responses are unaffected
  • Verify streaming pass-through responses are unaffected

Changed files

  • litellm/proxy/pass_through_endpoints/pass_through_endpoints.py (modified, +1/-1)

Code Example

excluded_headers = {"transfer-encoding", "content-encoding"}

---

general_settings:
  pass_through_endpoints:
    - path: /pass/bytepluses
      target: https://ark.ap-southeast.bytepluses.com/api/v3
      headers:
        Authorization: Bearer <API_KEY>
        content-type: application/json
      include_subpath: true

---

curl --http1.1 -v -X POST http://localhost:4000/pass/bytepluses/contents/generations/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <LITELLM_KEY>" \
  -d '{"model":"dreamina-seedance-2-0-260128","content":[{"type":"text","text":"a cat walking"}],"ratio":"16:9","duration":5,"watermark":false}'

---

< Content-Length: 58        ← compressed size from upstream
{ [34 bytes data]           ← actual decompressed body
* transfer closed with 24 bytes remaining to read

---

excluded_headers = {"transfer-encoding", "content-encoding", "content-length"}

---

Content-Length: 58
actual body: {"id":"cgt-20260413154202-bd945"} (34 bytes)
mismatch: 58 - 34 = 24 bytes (matches curl error exactly)
RAW_BUFFERClick to expand / collapse

What happened?

When a pass-through endpoint proxies a request to an upstream API that returns a gzip-compressed response, the client receives a Content-Length header that does not match the actual response body size, causing connection errors.

This is because get_response_headers() in pass_through_endpoints.py strips content-encoding but does not strip content-length. Since httpx auto-decompresses the gzip response, the forwarded content-length reflects the compressed size while the actual body sent to the client is the decompressed (smaller) size.

Root Cause

In pass_through_endpoints.py line 321:

excluded_headers = {"transfer-encoding", "content-encoding"}

The flow:

  1. LiteLLM's internal httpx client requests the upstream with Accept-Encoding: gzip (httpx default)
  2. Upstream returns gzip-compressed body (e.g. 58 bytes) with Content-Length: 58 and Content-Encoding: gzip
  3. httpx auto-decompresses the body to 34 bytes, but response.headers["content-length"] still reads 58
  4. get_response_headers() strips content-encoding: gzip but keeps content-length: 58
  5. FastAPI sends 34 bytes of body with Content-Length: 58 header to the client
  6. Client reads 34 bytes, expects 24 more, connection closes → error

Steps to Reproduce

  1. Configure a pass-through endpoint to any upstream that returns gzip-compressed responses (e.g. BytePlus Ark API):
general_settings:
  pass_through_endpoints:
    - path: /pass/bytepluses
      target: https://ark.ap-southeast.bytepluses.com/api/v3
      headers:
        Authorization: Bearer <API_KEY>
        content-type: application/json
      include_subpath: true
  1. Make a request through the proxy:
curl --http1.1 -v -X POST http://localhost:4000/pass/bytepluses/contents/generations/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <LITELLM_KEY>" \
  -d '{"model":"dreamina-seedance-2-0-260128","content":[{"type":"text","text":"a cat walking"}],"ratio":"16:9","duration":5,"watermark":false}'
  1. Observe the response headers show mismatched Content-Length:
< Content-Length: 58        ← compressed size from upstream
{ [34 bytes data]           ← actual decompressed body
* transfer closed with 24 bytes remaining to read

Error Messages

  • HTTP/1.1: curl: (18) transfer closed with 24 bytes remaining to read
  • HTTP/2: curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
  • Python httpx: httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

Proposed Fix

Add "content-length" to excluded_headers:

excluded_headers = {"transfer-encoding", "content-encoding", "content-length"}

FastAPI's Response(content=bytes) automatically calculates the correct Content-Length from the actual body. This is safe for all cases:

  • Gzip responses: Fixes the mismatch (compressed size → correct decompressed size)
  • Non-gzip responses: FastAPI calculates the same value as upstream, no behavior change
  • Streaming responses: StreamingResponse does not use Content-Length

Related Issues

This is the response-direction counterpart of #10010, which reported the same class of Content-Length mismatch bug in the request direction for Bedrock pass-through.

Relevant log output

Content-Length: 58
actual body: {"id":"cgt-20260413154202-bd945"} (34 bytes)
mismatch: 58 - 34 = 24 bytes (matches curl error exactly)

What LiteLLM version are you on?

v1.82.0-stable.patch5 (ghcr.io/berriai/litellm-database:main-v1.82.0-stable.patch5)

Are you a ML Ops Team?

No

extent analysis

TL;DR

To fix the issue, add "content-length" to the excluded_headers set in pass_through_endpoints.py to prevent forwarding the incorrect Content-Length header from the upstream API.

Guidance

  • The root cause is the mismatch between the Content-Length header and the actual response body size due to gzip decompression.
  • To verify the fix, check the response headers and body size after making the change to excluded_headers.
  • Ensure that FastAPI's automatic Content-Length calculation is used by removing the Content-Length header from the upstream response.
  • Test the fix with both gzip-compressed and non-gzip responses to ensure correct behavior in all cases.

Example

excluded_headers = {"transfer-encoding", "content-encoding", "content-length"}

This change will prevent the incorrect Content-Length header from being forwarded to the client.

Notes

This fix assumes that the upstream API returns a valid Content-Length header for non-gzip responses. If this is not the case, additional handling may be required.

Recommendation

Apply the proposed fix by adding "content-length" to excluded_headers, as it is a safe and effective solution that fixes the mismatch issue for gzip responses and does not affect non-gzip responses.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING