llamaIndex - ✅(Solved) Fix fix: base64 encode returns bytes instead of string, causing serialization crashes [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21186Fetched 2026-04-08 01:45:30
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
cross-referenced ×3referenced ×2commented ×1mentioned ×1

base64.b64encode() returns a bytes object (e.g., b"Zm9v"). The function uses typing.cast(str, ...) which only satisfies the static type checker but does not actually convert the bytes to a string at runtime. Because the function signature advertises a str return type, downstream code will likely attempt to concatenate it with other strings (e.g., f"data:image/jpeg;base64,{img}") or serialize it to JSON for API requests. This will result in runtime errors like TypeError: Object of type bytes is not JSON serializable.

Severity: high File: llama-index-core/llama_index/core/img_utils.py

Root Cause

base64.b64encode() returns a bytes object (e.g., b"Zm9v"). The function uses typing.cast(str, ...) which only satisfies the static type checker but does not actually convert the bytes to a string at runtime. Because the function signature advertises a str return type, downstream code will likely attempt to concatenate it with other strings (e.g., f"data:image/jpeg;base64,{img}") or serialize it to JSON for API requests. This will result in runtime errors like TypeError: Object of type bytes is not JSON serializable.

PR fix notes

PR #21187: refactor(llama-index-core): base64 encode returns bytes instead of string, causing serialization crashes

Description (problem / solution / changelog)

Code Quality

Problem

base64.b64encode() returns a bytes object (e.g., b"Zm9v"). The function uses typing.cast(str, ...) which only satisfies the static type checker but does not actually convert the bytes to a string at runtime. Because the function signature advertises a str return type, downstream code will likely attempt to concatenate it with other strings (e.g., f"data:image/jpeg;base64,{img}") or serialize it to JSON for API requests. This will result in runtime errors like TypeError: Object of type bytes is not JSON serializable.

Severity: high File: llama-index-core/llama_index/core/img_utils.py

Solution

Replace the cast with an actual decode call:

Changes

  • llama-index-core/llama_index/core/img_utils.py (modified)

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods Contributed by Lê Thành Chỉnh Code is a tool. Mindset is the real value.

Closes #21186

Changed files

  • llama-index-core/llama_index/core/img_utils.py (modified, +2/-3)

PR #21209: fix: img_2_b64 returns bytes instead of str due to cast bypass

Description (problem / solution / changelog)

Fixes #21186

Problem

img_2_b64() calls base64.b64encode() which returns bytes, but wraps it with typing.cast(str, ...)\). The cast only satisfies the static type checker — at runtime the return value is still bytes. Downstream code that concatenates the result (e.g. f"data:image/jpeg;base64,{img}") or serializes it to JSON will crash with TypeError: Object of type bytes is not JSON serializable`.

Fix

Replace cast(str, base64.b64encode(buff.getvalue())) with base64.b64encode(buff.getvalue()).decode("ascii").

The unused cast import can also be removed (it's still used by b64_2_img, so it stays).

Changed files

  • llama-index-core/llama_index/core/img_utils.py (modified, +1/-1)

PR #21316: Fix: base64 image encoding returns bytes instead of string (#21186)

Description (problem / solution / changelog)

Description

This PR fixes issue #21186.

The function img_2_b64 returned a base64-encoded value as bytes, which caused JSON serialization issues and inconsistent behavior across the library.

Before

img_2_b64(image) returned: b'/9j/4AAQSkZJRgABAQ...'

After

img_2_b64(image) returns: '/9j/4AAQSkZJRgABAQ...'

Fixes # (issue) Added .decode("utf-8") to ensure the output is a string.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/img_utils.py (modified, +1/-1)
  • llama-index-core/tests/test_img_utils.py (added, +41/-0)
RAW_BUFFERClick to expand / collapse

Description

base64.b64encode() returns a bytes object (e.g., b"Zm9v"). The function uses typing.cast(str, ...) which only satisfies the static type checker but does not actually convert the bytes to a string at runtime. Because the function signature advertises a str return type, downstream code will likely attempt to concatenate it with other strings (e.g., f"data:image/jpeg;base64,{img}") or serialize it to JSON for API requests. This will result in runtime errors like TypeError: Object of type bytes is not JSON serializable.

Severity: high File: llama-index-core/llama_index/core/img_utils.py

Expected Behavior

The code should handle this case properly to avoid unexpected errors or degraded quality.

extent analysis

Fix Plan

To fix the issue, we need to decode the bytes object returned by base64.b64encode() to a string.

Code Changes

We can achieve this by using the decode() method:

import base64

def encode_image(img):
    # Assuming img is the image data to be encoded
    encoded_img = base64.b64encode(img)
    # Decode the bytes object to a string
    encoded_img_str = encoded_img.decode('utf-8')
    return encoded_img_str

Alternatively, you can use the base64.b64encode() function in combination with the decode() method directly in the function signature:

import base64

def encode_image(img):
    # Assuming img is the image data to be encoded
    return base64.b64encode(img).decode('utf-8')

Verification

To verify that the fix worked, you can test the function with a sample image and check that the returned value is a string:

img = b'foo'  # Sample image data
encoded_img = encode_image(img)
print(type(encoded_img))  # Should print <class 'str'>

Extra Tips

  • Make sure to handle any potential decoding errors that may occur if the bytes object is not valid UTF-8.
  • Consider adding type hints to the function signature to indicate that the return value is a string.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING