litellm - ✅(Solved) Fix [Bug]: Pass-through endpoint forwards stale Content-Length after gzip auto-decompression [1 pull requests, 1 comments, 2 participants]

darin-srp · 2026-04-13T09:53:20Z

[litellm] PR 25655: fix: strip stale Content-Length in pass-through after gzip decompression - Repository: BerriAI/litellm - Author: vishal-vanam - State: open… # PR #25655: fix: strip stale Content-Length in pass-through after gzip decompression - Repository: BerriAI/litellm - Author: vishal-vanam - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/25655 ## Description (problem / solution / changelog) ## Summary Fixes #25624 When a pass-through endpoint proxies a gzip-compressed upstream response, httpx auto-decompresses the body but `response.headers["content-length"]` still reflects the **compressed** size. `get_response_headers()` strips `content-encoding` but keeps the now-stale `content-length`, causing a mismatch between the header and actual body size. **The fix:** Add `"content-length"` to `excluded_headers` in `get_response_headers()`. FastAPI's `Response(content=bytes)` auto-calculates the correct value from the actual body. This is safe for all cases: - **Gzip responses:** Fixes the mismatch - **Non-gzip responses:** FastAPI calculates the same value as upstream - **Streaming responses:** `StreamingResponse` does not use `Content-Length` ## Changes - `litellm/proxy/pass_through_endpoints/pass_through_endpoints.py`: Added `"content-length"` to `excluded_headers` set ## Test plan - [ ] Verify pass-through with gzip-compressed upstream (e.g. BytePlus Ark API) returns correct Content-Length - [ ] Verify non-gzip pass-through responses are unaffected - [ ] Verify streaming pass-through responses are unaffected ## Changed files - `litellm/proxy/pass_through_endpoints/pass_through_endpoints.py` (modified, +1/-1) ## Fix / Workaround v1.82.0-stable.patch5 (`ghcr.io/berriai/litellm-database:main-v1.82.0-stable.patch5`) ### What happened? When a pass-through endpoint proxies a request to an upstream API that returns a **gzip-compressed response**, the client receives a `Content-Length` header that does not match the actual response body size, causing connection errors. This is because `get_response_headers()` in `pass_through_endpoints.py` strips `content-encoding` but **does not strip `content-length`**. Since httpx auto-decompresses the gzip response, the forwarded `content-length` reflects the **compressed** size while the actual body sent to the client is the **decompressed** (smaller) size. ### Root Cause In [`pass_through_endpoints.py` line 321](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/pass_through_endpoints/pass_through_endpoints.py#L321): ```python excluded_headers = {"transfer-encoding", "content-encoding"} ``` The flow: 1. LiteLLM's internal httpx client requests the upstream with `Accept-Encoding: gzip` (httpx default) 2. Upstream returns gzip-compressed body (e.g. 58 bytes) with `Content-Length: 58` and `Content-Encoding: gzip` 3. httpx **auto-decompresses** the body to 34 bytes, but `response.headers["content-length"]` still reads `58` 4. `get_response_headers()` strips `content-encoding: gzip` but **keeps** `content-length: 58` 5. FastAPI sends 34 bytes of body with `Content-Length: 58` header to the client 6. Client reads 34 bytes, expects 24 more, connection closes → error ### Steps to Reproduce 1. Configure a pass-through endpoint to any upstream that returns gzip-compressed responses (e.g. BytePlus Ark API): ```yaml general_settings: pass_through_endpoints: - path: /pass/bytepluses target: https://ark.ap-southeast.bytepluses.com/api/v3 headers: Authorization: Bearer content-type: application/json include_subpath: true ``` 2. Make a request through the proxy: ```bash curl --http1.1 -v -X POST http://localhost:4000/pass/bytepluses/contents/generations/tasks \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{"model":"dreamina-seedance-2-0-260128","content":[{"type":"text","text":"a cat walking"}],"ratio":"16:9","duration":5,"watermark":false}' ``` 3. Observe the response headers show mismatched `Content-Length`: ``` < Content-Length: 58 ← compressed size from upstream { [34 bytes data] ← actual decompressed body * transfer closed with 24 bytes remaining to read ``` ### Error Messages - **HTTP/1.1**: `curl: (18) transfer closed with 24 bytes remaining to read` - **HTTP/2**: `curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)` - **Python httpx**: `httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)` ### Proposed Fix Add `"content-length"` to `excluded_headers`: ```python excluded_headers = {"transfer-encoding", "content-encoding", "content-length"} ``` FastAPI's `Response(content=bytes)` automatically calculates the correct `Content-Length` from the actual body. This is safe for all cases: - **Gzip responses**: Fixes the mismatch (compressed size → correct decompressed size) - **Non-gzip responses**: FastAPI calculates the same value as upstream, no behavior change - **Streaming responses**: `StreamingResponse` does n

litellm2026-04-13 09:53:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25624•Fetched 2026-04-14 05:38:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

darin-srp

Participants

darin-srp

vishal-vanam

Timeline (top)

commented ×1cross-referenced ×1labeled ×1referenced ×1

Error Message

Client reads 34 bytes, expects 24 more, connection closes → error

Error Messages

mismatch: 58 - 34 = 24 bytes (matches curl error exactly)

Root Cause

In pass_through_endpoints.py line 321:

excluded_headers = {"transfer-encoding", "content-encoding"}

The flow:

LiteLLM's internal httpx client requests the upstream with Accept-Encoding: gzip (httpx default)
Upstream returns gzip-compressed body (e.g. 58 bytes) with Content-Length: 58 and Content-Encoding: gzip
httpx auto-decompresses the body to 34 bytes, but response.headers["content-length"] still reads 58
get_response_headers() strips content-encoding: gzip but keeps content-length: 58
FastAPI sends 34 bytes of body with Content-Length: 58 header to the client
Client reads 34 bytes, expects 24 more, connection closes → error

Fix Action

Fix / Workaround

v1.82.0-stable.patch5 (ghcr.io/berriai/litellm-database:main-v1.82.0-stable.patch5)

PR fix notes

PR #25655: fix: strip stale Content-Length in pass-through after gzip decompression

Repository: BerriAI/litellm
Author: vishal-vanam
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25655

Description (problem / solution / changelog)

Summary

Fixes #25624

When a pass-through endpoint proxies a gzip-compressed upstream response, httpx auto-decompresses the body but response.headers["content-length"] still reflects the compressed size. get_response_headers() strips content-encoding but keeps the now-stale content-length, causing a mismatch between the header and actual body size.

The fix: Add "content-length" to excluded_headers in get_response_headers(). FastAPI's Response(content=bytes) auto-calculates the correct value from the actual body.

This is safe for all cases:

Gzip responses: Fixes the mismatch
Non-gzip responses: FastAPI calculates the same value as upstream
Streaming responses: StreamingResponse does not use Content-Length

Changes

litellm/proxy/pass_through_endpoints/pass_through_endpoints.py: Added "content-length" to excluded_headers set

Test plan

Verify pass-through with gzip-compressed upstream (e.g. BytePlus Ark API) returns correct Content-Length
Verify non-gzip pass-through responses are unaffected
Verify streaming pass-through responses are unaffected

Changed files

litellm/proxy/pass_through_endpoints/pass_through_endpoints.py (modified, +1/-1)

Code Example

excluded_headers = {"transfer-encoding", "content-encoding"}

---

general_settings:
  pass_through_endpoints:
    - path: /pass/bytepluses
      target: https://ark.ap-southeast.bytepluses.com/api/v3
      headers:
        Authorization: Bearer <API_KEY>
        content-type: application/json
      include_subpath: true

---

curl --http1.1 -v -X POST http://localhost:4000/pass/bytepluses/contents/generations/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <LITELLM_KEY>" \
  -d '{"model":"dreamina-seedance-2-0-260128","content":[{"type":"text","text":"a cat walking"}],"ratio":"16:9","duration":5,"watermark":false}'

---

< Content-Length: 58        ← compressed size from upstream
{ [34 bytes data]           ← actual decompressed body
* transfer closed with 24 bytes remaining to read

---

excluded_headers = {"transfer-encoding", "content-encoding", "content-length"}

---

Content-Length: 58
actual body: {"id":"cgt-20260413154202-bd945"} (34 bytes)
mismatch: 58 - 34 = 24 bytes (matches curl error exactly)

RAW_BUFFERClick to expand / collapse

What happened?

When a pass-through endpoint proxies a request to an upstream API that returns a gzip-compressed response, the client receives a Content-Length header that does not match the actual response body size, causing connection errors.

This is because get_response_headers() in pass_through_endpoints.py strips content-encoding but does not strip content-length. Since httpx auto-decompresses the gzip response, the forwarded content-length reflects the compressed size while the actual body sent to the client is the decompressed (smaller) size.

Root Cause

In pass_through_endpoints.py line 321:

excluded_headers = {"transfer-encoding", "content-encoding"}

The flow:

LiteLLM's internal httpx client requests the upstream with Accept-Encoding: gzip (httpx default)
Upstream returns gzip-compressed body (e.g. 58 bytes) with Content-Length: 58 and Content-Encoding: gzip
httpx auto-decompresses the body to 34 bytes, but response.headers["content-length"] still reads 58
get_response_headers() strips content-encoding: gzip but keeps content-length: 58
FastAPI sends 34 bytes of body with Content-Length: 58 header to the client
Client reads 34 bytes, expects 24 more, connection closes → error

Steps to Reproduce

Configure a pass-through endpoint to any upstream that returns gzip-compressed responses (e.g. BytePlus Ark API):

general_settings:
  pass_through_endpoints:
    - path: /pass/bytepluses
      target: https://ark.ap-southeast.bytepluses.com/api/v3
      headers:
        Authorization: Bearer <API_KEY>
        content-type: application/json
      include_subpath: true

Make a request through the proxy:

curl --http1.1 -v -X POST http://localhost:4000/pass/bytepluses/contents/generations/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <LITELLM_KEY>" \
  -d '{"model":"dreamina-seedance-2-0-260128","content":[{"type":"text","text":"a cat walking"}],"ratio":"16:9","duration":5,"watermark":false}'

Observe the response headers show mismatched Content-Length:

< Content-Length: 58        ← compressed size from upstream
{ [34 bytes data]           ← actual decompressed body
* transfer closed with 24 bytes remaining to read

Error Messages

HTTP/1.1: curl: (18) transfer closed with 24 bytes remaining to read
HTTP/2: curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
Python httpx: httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

Proposed Fix

Add "content-length" to excluded_headers:

excluded_headers = {"transfer-encoding", "content-encoding", "content-length"}

FastAPI's Response(content=bytes) automatically calculates the correct Content-Length from the actual body. This is safe for all cases:

Gzip responses: Fixes the mismatch (compressed size → correct decompressed size)
Non-gzip responses: FastAPI calculates the same value as upstream, no behavior change
Streaming responses: StreamingResponse does not use Content-Length

Related Issues

This is the response-direction counterpart of #10010, which reported the same class of Content-Length mismatch bug in the request direction for Bedrock pass-through.

Relevant log output

Content-Length: 58
actual body: {"id":"cgt-20260413154202-bd945"} (34 bytes)
mismatch: 58 - 34 = 24 bytes (matches curl error exactly)

What LiteLLM version are you on?

v1.82.0-stable.patch5 (ghcr.io/berriai/litellm-database:main-v1.82.0-stable.patch5)

Are you a ML Ops Team?

extent analysis

TL;DR

To fix the issue, add "content-length" to the excluded_headers set in pass_through_endpoints.py to prevent forwarding the incorrect Content-Length header from the upstream API.

Guidance

The root cause is the mismatch between the Content-Length header and the actual response body size due to gzip decompression.
To verify the fix, check the response headers and body size after making the change to excluded_headers.
Ensure that FastAPI's automatic Content-Length calculation is used by removing the Content-Length header from the upstream response.
Test the fix with both gzip-compressed and non-gzip responses to ensure correct behavior in all cases.

Example

excluded_headers = {"transfer-encoding", "content-encoding", "content-length"}

This change will prevent the incorrect Content-Length header from being forwarded to the client.

Notes

This fix assumes that the upstream API returns a valid Content-Length header for non-gzip responses. If this is not the case, additional handling may be required.

Recommendation

Apply the proposed fix by adding "content-length" to excluded_headers, as it is a safe and effective solution that fixes the mismatch issue for gzip responses and does not affect non-gzip responses.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #tool integration #LLM response #prompt template

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Pass-through endpoint forwards stale Content-Length after gzip auto-decompression [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #25655: fix: strip stale Content-Length in pass-through after gzip decompression

Description (problem / solution / changelog)

Summary

Changes

Test plan

Changed files

Code Example

What happened?

Root Cause

Steps to Reproduce

Error Messages

Proposed Fix

Related Issues

Relevant log output

What LiteLLM version are you on?

Are you a ML Ops Team?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING