litellm - ✅(Solved) Fix [Bug]: amazon.titan-embed-image-v1 — usage.prompt_tokens=0 for image inputs (EmbeddingResponse field, not cost calc) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25857Fetched 2026-04-17 08:28:39
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1referenced ×1subscribed ×1

Root Cause

In litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py, _transform_response():

total_prompt_tokens += _parsed_response["inputTextTokenCount"]  # always 0 for image inputs

# image_count IS correctly counted (fix from #21646):
if image_count > 0:
    prompt_tokens_details = PromptTokensDetailsWrapper(image_count=image_count)

# But prompt_tokens is set from total_prompt_tokens, which is still 0:
usage = Usage(
    prompt_tokens=total_prompt_tokens,   # <- 0 for image-only input
    total_tokens=total_prompt_tokens,    # <- 0
    prompt_tokens_details=prompt_tokens_details,
)

AWS returns inputTextTokenCount=0 for image inputs by design — amazon.titan-embed-image-v1 charges per image at a flat rate of $0.00006/image, not per token. LiteLLM's pricing registry already encodes this correctly:

"amazon.titan-embed-image-v1": {
    "input_cost_per_image": 6e-05,
    "input_cost_per_token": 8e-07
}

PR #21646 fixed the cost calculator path (generic_cost_per_token in utils.py now reads image_count from prompt_tokens_details and applies input_cost_per_image). But the usage.prompt_tokens field on the EmbeddingResponse was never updated, so it remains 0.


Fix Action

Fix / Workaround

FieldExpectedActual (v1.83.8)
usage.prompt_tokensN (number of images, e.g. 1)0 ← bug
usage.total_tokensN0 ← bug
usage.prompt_tokens_details.image_countNcorrect (fixed by #21646)
litellm.completion_cost(response)correct ($0.00006)correct (fixed by #21646)

PR fix notes

PR #25907: fix: count image tokens in Titan embed response

Description (problem / solution / changelog)

What's broken?

When using amazon.titan-embed-image-v1 with image inputs, EmbeddingResponse.usage.prompt_tokens is always 0, even though the model successfully processed the image and returned an embedding.

Who is affected?

Any downstream consumer (quota/billing system, logging layer, monitoring tool) reading response.usage.prompt_tokens directly — rather than going through LiteLLM's cost calculator — gets 0 and cannot detect that work was performed.

Text-only embedding calls are not affected (they correctly report inputTextTokenCount).

Root Cause

In _transform_response() of amazon_titan_multimodal_transformation.py, prompt_tokens is set solely from inputTextTokenCount, which AWS intentionally returns as 0 for image inputs (Titan charges per-image at a flat rate, not per-token).

PR #21646 previously fixed the cost calculator path by reading image_count from prompt_tokens_details, but the usage.prompt_tokens field on the EmbeddingResponse was never updated.

Fix

Added image_count to both prompt_tokens and total_tokens in the Usage object:

usage = Usage(
    prompt_tokens=total_prompt_tokens + image_count,   # was: total_prompt_tokens
    total_tokens=total_prompt_tokens + image_count,     # was: total_prompt_tokens
    ...
)

This is backward-compatible: text-only paths are unaffected since image_count=0.

Testing

Updated existing unit test test_titan_multimodal_embedding_image_cost_tracking to assert prompt_tokens == 1 and total_tokens == 1 for image inputs. All 3 related tests pass:

  • test_titan_multimodal_embedding_image_cost_tracking
  • test_titan_multimodal_embedding_text_no_image_count
  • test_titan_multimodal_embedding_backward_compat_no_batch_data

Fixes #25857

Changed files

  • litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py (modified, +2/-2)
  • tests/test_litellm/llms/bedrock/embed/test_bedrock_embedding.py (modified, +2/-0)

Code Example

total_prompt_tokens += _parsed_response["inputTextTokenCount"]  # always 0 for image inputs

# image_count IS correctly counted (fix from #21646):
if image_count > 0:
    prompt_tokens_details = PromptTokensDetailsWrapper(image_count=image_count)

# But prompt_tokens is set from total_prompt_tokens, which is still 0:
usage = Usage(
    prompt_tokens=total_prompt_tokens,   # <- 0 for image-only input
    total_tokens=total_prompt_tokens,    # <- 0
    prompt_tokens_details=prompt_tokens_details,
)

---

"amazon.titan-embed-image-v1": {
    "input_cost_per_image": 6e-05,
    "input_cost_per_token": 8e-07
}

---

prompt_tokens_details = None
if image_count > 0:
    prompt_tokens_details = PromptTokensDetailsWrapper(
        image_count=image_count,
        image_tokens=image_count,  # populate the existing but currently unused field
    )

usage = Usage(
    prompt_tokens=total_prompt_tokens + image_count,   # non-zero when images were processed
    completion_tokens=0,
    total_tokens=total_prompt_tokens + image_count,
    prompt_tokens_details=prompt_tokens_details,
)

---

import base64
from io import BytesIO
from PIL import Image
from litellm import embedding

# Image must be a data URI — litellm's is_base64_encoded() requires `data:` prefix
img = Image.new("RGB", (8, 8), color="blue")
buf = BytesIO()
img.save(buf, format="PNG")
img_b64 = base64.b64encode(buf.getvalue()).decode("utf-8")
img_data_uri = f"data:image/png;base64,{img_b64}"

response = embedding(
    model="amazon.titan-embed-image-v1",
    input=[img_data_uri],
    aws_region_name="us-east-1",
)

print(response.usage)
# Usage(prompt_tokens=0, total_tokens=0, ...)BUG: should be non-zero

print(response.usage.prompt_tokens_details)
# PromptTokensDetailsWrapper(image_count=1)  ← image IS tracked here, just not surfaced in prompt_tokens

---

import json, base64, boto3
from io import BytesIO
from PIL import Image

img = Image.new("RGB", (8, 8), color="blue")
buf = BytesIO()
img.save(buf, format="PNG")
img_b64 = base64.b64encode(buf.getvalue()).decode("utf-8")

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = bedrock.invoke_model(
    modelId="amazon.titan-embed-image-v1",
    body=json.dumps({"inputImage": img_b64}),
    accept="application/json",
    contentType="application/json",
)

print(json.loads(resp["body"].read()))
# {"embedding": [...], "inputTextTokenCount": 0}
# AWS intentionally returns inputTextTokenCount=0 for images —
# the model is priced PER IMAGE ($0.00006/image), not per token.
# There is no "inputImageTokenCount" field — AWS does not expose one.

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When calling litellm.embedding() with amazon.titan-embed-image-v1 and an image input (data URI), the returned EmbeddingResponse.usage.prompt_tokens is always 0, even though the model successfully processed the image and returned an embedding.

This is distinct from the cost-calculation bug fixed in PR #21646 (merged Feb 20, 2026). That PR correctly fixed LiteLLM's internal cost calculator to use input_cost_per_image via prompt_tokens_details.image_count. However, usage.prompt_tokens itself is still 0 in the returned response object.

Any downstream consumer (e.g., a quota/billing system, logging layer, or monitoring tool) reading response.usage.prompt_tokens directly — rather than going through LiteLLM's cost calculator — gets 0 and cannot detect that work was performed.


Root cause

In litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py, _transform_response():

total_prompt_tokens += _parsed_response["inputTextTokenCount"]  # always 0 for image inputs

# image_count IS correctly counted (fix from #21646):
if image_count > 0:
    prompt_tokens_details = PromptTokensDetailsWrapper(image_count=image_count)

# But prompt_tokens is set from total_prompt_tokens, which is still 0:
usage = Usage(
    prompt_tokens=total_prompt_tokens,   # <- 0 for image-only input
    total_tokens=total_prompt_tokens,    # <- 0
    prompt_tokens_details=prompt_tokens_details,
)

AWS returns inputTextTokenCount=0 for image inputs by design — amazon.titan-embed-image-v1 charges per image at a flat rate of $0.00006/image, not per token. LiteLLM's pricing registry already encodes this correctly:

"amazon.titan-embed-image-v1": {
    "input_cost_per_image": 6e-05,
    "input_cost_per_token": 8e-07
}

PR #21646 fixed the cost calculator path (generic_cost_per_token in utils.py now reads image_count from prompt_tokens_details and applies input_cost_per_image). But the usage.prompt_tokens field on the EmbeddingResponse was never updated, so it remains 0.


Expected vs. actual

FieldExpectedActual (v1.83.8)
usage.prompt_tokensN (number of images, e.g. 1)0 ← bug
usage.total_tokensN0 ← bug
usage.prompt_tokens_details.image_countNcorrect (fixed by #21646)
litellm.completion_cost(response)correct ($0.00006)correct (fixed by #21646)

Proposed fix

In _transform_response() of amazon_titan_multimodal_transformation.py:

prompt_tokens_details = None
if image_count > 0:
    prompt_tokens_details = PromptTokensDetailsWrapper(
        image_count=image_count,
        image_tokens=image_count,  # populate the existing but currently unused field
    )

usage = Usage(
    prompt_tokens=total_prompt_tokens + image_count,   # non-zero when images were processed
    completion_tokens=0,
    total_tokens=total_prompt_tokens + image_count,
    prompt_tokens_details=prompt_tokens_details,
)

This ensures prompt_tokens > 0 for any completed image embedding call and is fully backward-compatible (text-only paths unaffected since image_count=0).

The same gap likely exists in amazon_nova_transformation.py — worth checking AmazonNovaEmbeddingConfig._transform_response() for identical behaviour.


Related

  • PR #21646 — fixed cost calculation, but not usage.prompt_tokens

Steps to Reproduce

Minimal pure-LiteLLM repro (no real AWS call needed for the logic, but an AWS account is needed to run):

import base64
from io import BytesIO
from PIL import Image
from litellm import embedding

# Image must be a data URI — litellm's is_base64_encoded() requires `data:` prefix
img = Image.new("RGB", (8, 8), color="blue")
buf = BytesIO()
img.save(buf, format="PNG")
img_b64 = base64.b64encode(buf.getvalue()).decode("utf-8")
img_data_uri = f"data:image/png;base64,{img_b64}"

response = embedding(
    model="amazon.titan-embed-image-v1",
    input=[img_data_uri],
    aws_region_name="us-east-1",
)

print(response.usage)
# Usage(prompt_tokens=0, total_tokens=0, ...)  ← BUG: should be non-zero

print(response.usage.prompt_tokens_details)
# PromptTokensDetailsWrapper(image_count=1)  ← image IS tracked here, just not surfaced in prompt_tokens

Raw boto3 confirmation (proves this is a LiteLLM-level issue, not AWS):

import json, base64, boto3
from io import BytesIO
from PIL import Image

img = Image.new("RGB", (8, 8), color="blue")
buf = BytesIO()
img.save(buf, format="PNG")
img_b64 = base64.b64encode(buf.getvalue()).decode("utf-8")

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = bedrock.invoke_model(
    modelId="amazon.titan-embed-image-v1",
    body=json.dumps({"inputImage": img_b64}),
    accept="application/json",
    contentType="application/json",
)

print(json.loads(resp["body"].read()))
# {"embedding": [...], "inputTextTokenCount": 0}
# AWS intentionally returns inputTextTokenCount=0 for images —
# the model is priced PER IMAGE ($0.00006/image), not per token.
# There is no "inputImageTokenCount" field — AWS does not expose one.

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v1.83.8

Twitter / LinkedIn details

https://www.linkedin.com/in/shay-margalit-40581668/

extent analysis

TL;DR

Update the _transform_response() method in amazon_titan_multimodal_transformation.py to correctly calculate usage.prompt_tokens for image inputs.

Guidance

  • Review the proposed fix in the issue description, which suggests updating the usage calculation to include image_count when processing image inputs.
  • Verify that the image_count is correctly populated in the prompt_tokens_details object.
  • Check for similar issues in other transformation modules, such as amazon_nova_transformation.py.
  • Test the updated code with the provided minimal repro example to ensure that usage.prompt_tokens is correctly calculated.

Example

prompt_tokens_details = None
if image_count > 0:
    prompt_tokens_details = PromptTokensDetailsWrapper(
        image_count=image_count,
        image_tokens=image_count,  # populate the existing but currently unused field
    )

usage = Usage(
    prompt_tokens=total_prompt_tokens + image_count,   # non-zero when images were processed
    completion_tokens=0,
    total_tokens=total_prompt_tokens + image_count,
    prompt_tokens_details=prompt_tokens_details,
)

Notes

The issue is specific to the LiteLLM SDK (Python package) and version v1.83.8. The proposed fix is backward-compatible and does not affect text-only paths.

Recommendation

Apply the proposed workaround by updating the _transform_response() method in amazon_titan_multimodal_transformation.py to correctly calculate usage.prompt_tokens for image inputs. This fix ensures that downstream consumers can accurately detect work performed on image inputs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING