litellm - ✅(Solved) Fix [Bug]: Null bytes (\x00) in LLM request/response payloads cause PostgreSQL 22P05 error in spend logs [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24310Fetched 2026-04-08 01:13:23
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

When LLM request or response content contains null bytes (\x00 / \^@ characters), the spend log write to PostgreSQL fails with:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

This causes update_spend_logs to fail silently or raise an exception, losing spend tracking data.

Error Message

ERROR: invalid byte sequence for encoding "UTF8": 0x00 SQLSTATE: 22P05

Root Cause

The safe_dumps() function in litellm/litellm_core_utils/safe_json_dumps.py does not strip null bytes before serialization. When the resulting JSON string is written to PostgreSQL, the database rejects the \x00 character (which is invalid in PostgreSQL UTF-8 text columns).

The issue affects any field that passes through spend_tracking_utils.py:

  • messages field (from _get_messages_for_spend_logs_payload)
  • request_body field (from _get_proxy_server_request_for_spend_logs_payload)
  • Any string value in the spend log payload

Fix Action

Fix / Workaround

Current Workaround Pattern

PR fix notes

PR #24314: fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs

Description (problem / solution / changelog)

Summary

Fixes PostgreSQL 22P05: invalid byte sequence for encoding "UTF8": 0x00 errors that occur when LLM request/response payloads containing null bytes are written to spend log tables.

Fixes #24310
Related: #21290, #15519

Problem

Null bytes (\x00 / \^@) can appear in LLM payloads — e.g., from multimodal requests, tool call responses, or certain model outputs. When these reach PostgreSQL text columns via json.dumps(), the DB rejects them with:

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

Changes

litellm/litellm_core_utils/safe_json_dumps.py

Add strip_null_bytes() helper and integrate null byte removal into safe_dumps() at the string serialization level:

def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    ...

Inside _serialize():

- if isinstance(obj, (str, int, float, bool, type(None))):
-     return obj
+ if isinstance(obj, str):
+     return obj.replace("\x00", "")   # strip null bytes inline
+ if isinstance(obj, (int, float, bool, type(None))):
+     return obj
  ...
  try:
-     return str(obj)
+     return str(obj).replace("\x00", "")  # also strip fallback str()

litellm/proxy/spend_tracking/spend_tracking_utils.py

Replace ad-hoc json.dumps() with safe_dumps() in two call sites:

- return json.dumps(messages, default=str)
+ return safe_dumps(messages)

- _request_body_json_str = json.dumps(_request_body, default=str)
+ _request_body_json_str = safe_dumps(_request_body)

Also add early null byte stripping in _sanitize_request_body_for_spend_logs_payload string handling:

  elif isinstance(value, str):
+     value = strip_null_bytes(value)
      if len(value) > max_string_length_prompt_in_db:

Why centralize in safe_dumps vs. caller level

The current codebase has ad-hoc _strip_null_bytes() in proxy/utils.py for some paths, but safe_dumps() is the shared serialization utility. Centralizing here means any future caller of safe_dumps() is automatically protected without remembering to strip separately.

Testing

The existing safe_json_dumps test suite covers the serialization path. New behavior:

  • Strings with \x00 pass through safe_dumps() with null bytes removed
  • All spend log serialization paths (messages, request_body) now use safe_dumps()

Impact

  • Minimal scope: 2 files, ~25 lines
  • No breaking changes: safe_dumps() signature unchanged; output may differ only when input contains \x00
  • Backward compatible: strip_null_bytes() exported as public function for reuse

Changed files

  • litellm/litellm_core_utils/safe_json_dumps.py (modified, +20/-2)
  • litellm/proxy/spend_tracking/spend_tracking_utils.py (modified, +4/-3)

Code Example

ERROR: invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

---

# litellm/litellm_core_utils/safe_json_dumps.py

def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    if isinstance(data, tuple):
        return tuple(strip_null_bytes(item) for item in data)
    if isinstance(data, set):
        return {strip_null_bytes(item) for item in data}
    return data


def safe_dumps(data: Any, max_depth: int = DEFAULT_MAX_RECURSE_DEPTH) -> str:
    def _serialize(obj, depth, seen):
        ...
        if isinstance(obj, str):
            return strip_null_bytes(obj)   # ← strip here
        ...
        try:
            return strip_null_bytes(str(obj))   # ← and here for fallback
        except Exception:
            return "Unserializable Object"

---

# Before
return json.dumps(messages, default=str)
_request_body_json_str = json.dumps(_request_body, default=str)

# After  
return safe_dumps(messages)
_request_body_json_str = safe_dumps(_request_body)
RAW_BUFFERClick to expand / collapse

Description

When LLM request or response content contains null bytes (\x00 / \^@ characters), the spend log write to PostgreSQL fails with:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

This causes update_spend_logs to fail silently or raise an exception, losing spend tracking data.

Root Cause

The safe_dumps() function in litellm/litellm_core_utils/safe_json_dumps.py does not strip null bytes before serialization. When the resulting JSON string is written to PostgreSQL, the database rejects the \x00 character (which is invalid in PostgreSQL UTF-8 text columns).

The issue affects any field that passes through spend_tracking_utils.py:

  • messages field (from _get_messages_for_spend_logs_payload)
  • request_body field (from _get_proxy_server_request_for_spend_logs_payload)
  • Any string value in the spend log payload

Current Workaround Pattern

Some call sites in proxy/utils.py have ad-hoc _strip_null_bytes() helpers, but the core serialization path still allows null bytes to pass through.

Proposed Fix

Add null byte stripping directly into safe_dumps() so all serialization paths are protected:

# litellm/litellm_core_utils/safe_json_dumps.py

def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    if isinstance(data, tuple):
        return tuple(strip_null_bytes(item) for item in data)
    if isinstance(data, set):
        return {strip_null_bytes(item) for item in data}
    return data


def safe_dumps(data: Any, max_depth: int = DEFAULT_MAX_RECURSE_DEPTH) -> str:
    def _serialize(obj, depth, seen):
        ...
        if isinstance(obj, str):
            return strip_null_bytes(obj)   # ← strip here
        ...
        try:
            return strip_null_bytes(str(obj))   # ← and here for fallback
        except Exception:
            return "Unserializable Object"

Additionally, replace ad-hoc json.dumps() calls in spend_tracking_utils.py with safe_dumps():

# Before
return json.dumps(messages, default=str)
_request_body_json_str = json.dumps(_request_body, default=str)

# After  
return safe_dumps(messages)
_request_body_json_str = safe_dumps(_request_body)

Why in safe_dumps vs. caller level

Centralizing null byte stripping in safe_dumps() ensures all serialization paths are protected without requiring every call site to remember to strip. This is more robust than the current ad-hoc approach.

Related Issues

  • #21290 (open) — update_spend_logs fails with PostgreSQL 22P05
  • #15519 (closed) — DB exception in update_spend caused by null bytes

I have a PR ready with tests.

Environment

  • LiteLLM proxy with PostgreSQL backend
  • Triggered by: multimodal requests, tool use responses, or any model that returns binary/null content in its output

extent analysis

Fix Plan

To fix the issue of null bytes in LLM request or response content causing spend log write failures to PostgreSQL, follow these steps:

  • Update the safe_dumps() function in litellm/litellm_core_utils/safe_json_dumps.py to strip null bytes before serialization:
def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    if isinstance(data, tuple):
        return tuple(strip_null_bytes(item) for item in data)
    if isinstance(data, set):
        return {strip_null_bytes(item) for item in data}
    return data


def safe_dumps(data: Any, max_depth: int = DEFAULT_MAX_RECURSE_DEPTH) -> str:
    def _serialize(obj, depth, seen):
        ...
        if isinstance(obj, str):
            return strip_null_bytes(obj)   
        ...
        try:
            return strip_null_bytes(str(obj))   
        except Exception:
            return "Unserializable Object"
  • Replace ad-hoc json.dumps() calls in spend_tracking_utils.py with safe_dumps():
# Before
return json.dumps(messages, default=str)
_request_body_json_str = json.dumps(_request_body, default=str)

# After  
return safe_dumps(messages)
_request_body_json_str = safe_dumps(_request_body)

Verification

To verify that the fix worked, test the following scenarios:

  • Send a request with null bytes in the content and verify that the spend log is written successfully to PostgreSQL.
  • Check the PostgreSQL logs for any 22P05 errors related to invalid byte sequences.
  • Run the tests included in the PR to ensure that the fix does not introduce any regressions.

Extra Tips

  • Make sure to update all call sites that use json.dumps() to use safe_dumps() instead, to ensure that all serialization paths are protected.
  • Consider adding additional tests to cover different scenarios where null bytes may be present in the request or response content.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING