litellm - 💡(How to fix) Fix [Bug]: Vertex AI Batch usage/cost recorded as 0 after batch output is auto-transformed to OpenAI format [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#27891Fetched 2026-05-14 03:29:56
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×2subscribed ×1

Error Message

"error": null "error": None,

Root Cause

The Vertex batch cost path has a format mismatch with the file-content path. Walking the flow that the enterprise CheckBatchCost poller takes:

  1. CheckBatchCost.check_batch_cost() polls LiteLLM_ManagedObjectTable and calls llm_router.aretrieve_batch(model=model_id, batch_id=batch_id). Status transitions to completed correctly (JOB_STATE_SUCCEEDEDcompleted via VertexAIBatchTransformation._get_batch_job_status_from_vertex_ai_batch_response).

  2. CheckBatchCost downloads the output file via afile_content(file_id=raw_output_file_id, **credentials) (enterprise/litellm_enterprise/proxy/common_utils/check_batch_cost.py around line 258).

  3. For custom_llm_provider == \"vertex_ai\", afile_content routes to VertexAIFilesHandler.afile_content (litellm/llms/vertex_ai/files/handler.py:148). That downloads predictions.jsonl from GCS and then runs VertexAIFilesConfig.transform_file_content_response (litellm/llms/vertex_ai/files/transformation.py:570), which calls _try_transform_vertex_batch_output_to_openai.

    This was added in PR #25627. After this call, the bytes returned to the caller are OpenAI batch JSONL:

    {
      \"id\": \"batch_req_...\",
      \"custom_id\": \"request-1\",
      \"response\": {
        \"status_code\": 200,
        \"request_id\": \"chatcmpl-...\",
        \"body\": { /* OpenAI ChatCompletion ModelResponse, including \"usage\": {\"prompt_tokens\": ..., \"completion_tokens\": ..., \"total_tokens\": ...} */ }
      },
      \"error\": null
    }

    The original Vertex shape (response.usageMetadata.{promptTokenCount,candidatesTokenCount,totalTokenCount}) is gone unless the caller has set litellm.disable_vertex_batch_output_transformation = True.

  4. CheckBatchCost then calls calculate_batch_cost_and_usage(file_content_dictionary, custom_llm_provider=\"vertex_ai\", model_name=..., model_info=deployment_model_info).

  5. Inside litellm/batches/batch_utils.py, _batch_cost_calculator (line ~117) takes the Vertex branch:

    if custom_llm_provider == \"vertex_ai\" and model_name:
        batch_cost, _ = calculate_vertex_ai_batch_cost_and_usage(
            file_content_dictionary, model_name
        )
        return batch_cost

    calculate_vertex_ai_batch_cost_and_usage (line ~134) reads response.usageMetadata.promptTokenCount / candidatesTokenCount / totalTokenCount from each line. Those fields do not exist on the now-OpenAI-shaped JSONL, so every line evaluates to _prompt=0, _completion=0, _total=0, and total_cost stays at 0.0. The usage Usage(...) returned to _get_batch_job_total_usage_from_file_content for the vertex branch (line ~366) is also all zeros.

So today: file content is OpenAI-shaped, but the cost calculator only knows how to read raw Vertex shape. Two paths that both touch Vertex have drifted out of sync.

Code Example

{
     \"id\": \"batch_req_...\",
     \"custom_id\": \"request-1\",
     \"response\": {
       \"status_code\": 200,
       \"request_id\": \"chatcmpl-...\",
       \"body\": { /* OpenAI ChatCompletion ModelResponse, including \"usage\": {\"prompt_tokens\": ..., \"completion_tokens\": ..., \"total_tokens\": ...} */ }
     },
     \"error\": null
   }

---

if custom_llm_provider == \"vertex_ai\" and model_name:
       batch_cost, _ = calculate_vertex_ai_batch_cost_and_usage(
           file_content_dictionary, model_name
       )
       return batch_cost

---

if custom_llm_provider == \"vertex_ai\":
       raise ValueError(\"Vertex AI does not support file content retrieval\")

---

# config.yaml
model_list:
  - model_name: gemini-batch
    litellm_params:
      model: vertex_ai/gemini-2.0-flash-001
      vertex_project: <your-project>
      vertex_location: us-central1
      vertex_credentials: <path-or-json>

general_settings:
  master_key: sk-1234
  database_url: <postgres-url>

litellm_settings:
  callbacks: [\"prometheus\"]

---

from litellm.batches.batch_utils import calculate_vertex_ai_batch_cost_and_usage

# Vertex line shape POST PR #25627 transformation (what CheckBatchCost actually sees today):
openai_shaped = [{
    \"id\": \"batch_req_1\",
    \"custom_id\": \"request-1\",
    \"response\": {
        \"status_code\": 200,
        \"request_id\": \"chatcmpl-xyz\",
        \"body\": {
            \"id\": \"chatcmpl-xyz\",
            \"model\": \"gemini-2.0-flash-001\",
            \"usage\": {\"prompt_tokens\": 42, \"completion_tokens\": 17, \"total_tokens\": 59},
            \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"...\"}, \"finish_reason\": \"stop\"}]
        }
    },
    \"error\": None,
}]

cost, usage = calculate_vertex_ai_batch_cost_and_usage(openai_shaped, model_name=\"gemini-2.0-flash-001\")
print(cost, usage)   # -> 0.0, Usage(prompt_tokens=0, completion_tokens=0, total_tokens=0)

---

# verbose_logger.info from calculate_vertex_ai_batch_cost_and_usage on a completed Vertex batch
vertex_ai batch cost: cost=0.0, prompt=0, completion=0, total=0

---

# from llm_router.aretrieve_batch on the same batch_id
status=completed, output_file_id=gs://<bucket>/litellm-vertex-files/.../predictions.jsonl
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate. Related: #19925 (closed/stale, but the symptom still reproduces) and #14044.

What happened?

Vertex AI Batch jobs submitted through LiteLLM Proxy complete successfully and surface in the UI logs, but batch cost and token usage are always recorded as 0. OpenAI Batch jobs in the same proxy attribute cost and tokens correctly.

This is not the same root cause that the original reporter on #19925 hit (Vertex usageMetadata at the root / wrapped in a list). After PR #25627 ("transform batch prediction outputs to OpenAI format"), the GCS predictions.jsonl is now automatically rewritten into OpenAI batch shape before any downstream consumer sees it — but the cost-tracking path was not updated to match.

The net effect is the same as #19925 from a user/UI perspective ($0 cost, 0 tokens for completed Vertex batches), so users assume Vertex Batch tracking is unsupported. It is partially supported — only the cost/usage step is broken.

Expected: Vertex Batch jobs that complete via /v1/batches show non-zero spend, prompt_tokens, and completion_tokens in the proxy UI logs / spend logs / S3 callbacks, matching the behavior of OpenAI batches.

Actual: spend = 0, prompt_tokens = 0, completion_tokens = 0 on every completed Vertex batch, even though the batch finished successfully and the output file is present in GCS.

Root cause

The Vertex batch cost path has a format mismatch with the file-content path. Walking the flow that the enterprise CheckBatchCost poller takes:

  1. CheckBatchCost.check_batch_cost() polls LiteLLM_ManagedObjectTable and calls llm_router.aretrieve_batch(model=model_id, batch_id=batch_id). Status transitions to completed correctly (JOB_STATE_SUCCEEDEDcompleted via VertexAIBatchTransformation._get_batch_job_status_from_vertex_ai_batch_response).

  2. CheckBatchCost downloads the output file via afile_content(file_id=raw_output_file_id, **credentials) (enterprise/litellm_enterprise/proxy/common_utils/check_batch_cost.py around line 258).

  3. For custom_llm_provider == \"vertex_ai\", afile_content routes to VertexAIFilesHandler.afile_content (litellm/llms/vertex_ai/files/handler.py:148). That downloads predictions.jsonl from GCS and then runs VertexAIFilesConfig.transform_file_content_response (litellm/llms/vertex_ai/files/transformation.py:570), which calls _try_transform_vertex_batch_output_to_openai.

    This was added in PR #25627. After this call, the bytes returned to the caller are OpenAI batch JSONL:

    {
      \"id\": \"batch_req_...\",
      \"custom_id\": \"request-1\",
      \"response\": {
        \"status_code\": 200,
        \"request_id\": \"chatcmpl-...\",
        \"body\": { /* OpenAI ChatCompletion ModelResponse, including \"usage\": {\"prompt_tokens\": ..., \"completion_tokens\": ..., \"total_tokens\": ...} */ }
      },
      \"error\": null
    }

    The original Vertex shape (response.usageMetadata.{promptTokenCount,candidatesTokenCount,totalTokenCount}) is gone unless the caller has set litellm.disable_vertex_batch_output_transformation = True.

  4. CheckBatchCost then calls calculate_batch_cost_and_usage(file_content_dictionary, custom_llm_provider=\"vertex_ai\", model_name=..., model_info=deployment_model_info).

  5. Inside litellm/batches/batch_utils.py, _batch_cost_calculator (line ~117) takes the Vertex branch:

    if custom_llm_provider == \"vertex_ai\" and model_name:
        batch_cost, _ = calculate_vertex_ai_batch_cost_and_usage(
            file_content_dictionary, model_name
        )
        return batch_cost

    calculate_vertex_ai_batch_cost_and_usage (line ~134) reads response.usageMetadata.promptTokenCount / candidatesTokenCount / totalTokenCount from each line. Those fields do not exist on the now-OpenAI-shaped JSONL, so every line evaluates to _prompt=0, _completion=0, _total=0, and total_cost stays at 0.0. The usage Usage(...) returned to _get_batch_job_total_usage_from_file_content for the vertex branch (line ~366) is also all zeros.

So today: file content is OpenAI-shaped, but the cost calculator only knows how to read raw Vertex shape. Two paths that both touch Vertex have drifted out of sync.

Secondary issues uncovered while triaging

These should be fixed at the same time so the path is robust:

  1. Deployment-level batch pricing is ignored on the Vertex branch. _batch_cost_calculator only passes model_info to _get_batch_job_cost_from_file_content (non-vertex branch). The vertex branch calls calculate_vertex_ai_batch_cost_and_usage, which internally calls batch_cost_calculator(...) without model_info. This means custom per-deployment overrides like input_cost_per_token_batches / output_cost_per_token_batches in model_info (which the CheckBatchCost call site already plumbs through as deployment_model_info) do not apply to Vertex even after the primary bug is fixed.

  2. _handle_completed_batch is dead for Vertex. _get_batch_output_file_content_as_dictionary (litellm/batches/batch_utils.py:222) unconditionally raises:

    if custom_llm_provider == \"vertex_ai\":
        raise ValueError(\"Vertex AI does not support file content retrieval\")

    This is reachable from any SDK / hook path that calls _handle_completed_batch (e.g. a future on-retrieve cost callback, or anyone using the SDK helper directly). It should now succeed because afile_content does work for vertex_ai (it goes through VertexAIFilesHandler.afile_content).

  3. disable_vertex_batch_output_transformation is a footgun. If a user sets this flag (added in commit 439217511a so they can consume raw predictions.jsonl), the file content remains in Vertex shape and the current cost calculator would work — but most users won't set it. A robust fix must handle both shapes regardless of the flag.

Steps to Reproduce

Minimal config (no provider credentials shown):

# config.yaml
model_list:
  - model_name: gemini-batch
    litellm_params:
      model: vertex_ai/gemini-2.0-flash-001
      vertex_project: <your-project>
      vertex_location: us-central1
      vertex_credentials: <path-or-json>

general_settings:
  master_key: sk-1234
  database_url: <postgres-url>

litellm_settings:
  callbacks: [\"prometheus\"]
  1. Start proxy: uv run litellm --config config.yaml --port 4000.
  2. Upload a Vertex batch input file via OpenAI SDK against the proxy (POST /v1/files with purpose=\"batch\" and extra_headers={\"custom-llm-provider\": \"vertex_ai\"}).
  3. Create a batch via POST /v1/batches with endpoint=/v1/chat/completions, input_file_id=<id from step 2>, extra_headers={\"custom-llm-provider\": \"vertex_ai\"}.
  4. Wait for JOB_STATE_SUCCEEDED. CheckBatchCost should poll and emit the success log callback.
  5. Inspect the proxy UI log entry for the batch, or query LiteLLM_SpendLogs for the row created by CheckBatchCost.

Observed: spend = 0.0, prompt_tokens = 0, completion_tokens = 0, total_tokens = 0. Identical input run against an OpenAI deployment records non-zero values.

To confirm the diagnosis without rerunning a full batch:

from litellm.batches.batch_utils import calculate_vertex_ai_batch_cost_and_usage

# Vertex line shape POST PR #25627 transformation (what CheckBatchCost actually sees today):
openai_shaped = [{
    \"id\": \"batch_req_1\",
    \"custom_id\": \"request-1\",
    \"response\": {
        \"status_code\": 200,
        \"request_id\": \"chatcmpl-xyz\",
        \"body\": {
            \"id\": \"chatcmpl-xyz\",
            \"model\": \"gemini-2.0-flash-001\",
            \"usage\": {\"prompt_tokens\": 42, \"completion_tokens\": 17, \"total_tokens\": 59},
            \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"...\"}, \"finish_reason\": \"stop\"}]
        }
    },
    \"error\": None,
}]

cost, usage = calculate_vertex_ai_batch_cost_and_usage(openai_shaped, model_name=\"gemini-2.0-flash-001\")
print(cost, usage)   # -> 0.0, Usage(prompt_tokens=0, completion_tokens=0, total_tokens=0)

Requirements for fix

  1. Vertex AI batch jobs that complete via /v1/batches MUST record non-zero spend, prompt_tokens, completion_tokens, and total_tokens in proxy spend logs, the UI log table, S3/Prometheus callbacks, and CheckBatchCost's emitted success event.
  2. The fix MUST work whether the GCS output has been transformed to OpenAI shape (default since PR #25627) or left in raw Vertex shape (litellm.disable_vertex_batch_output_transformation = True).
  3. Deployment-level batch pricing overrides (input_cost_per_token_batches, output_cost_per_token_batches, cache_read_input_token_cost, etc. in model_info) MUST apply to Vertex the same way they apply to OpenAI/Azure batches.
  4. _handle_completed_batch MUST succeed for custom_llm_provider=\"vertex_ai\" and return a valid (cost, usage, models) tuple.
  5. Unit + integration tests covering both line shapes (Vertex-raw and OpenAI-transformed) so this can't silently regress again.
  6. No behavior change for OpenAI / Azure / Bedrock / Anthropic batches.

Proposed steps to fix

(Direction only — happy to defer to maintainers on the exact factoring.)

  1. Make the Vertex cost calculator format-aware. In litellm/batches/batch_utils.py::calculate_vertex_ai_batch_cost_and_usage, detect the line shape and extract usage from whichever is present:
    • If response.usageMetadata exists → use Vertex keys (promptTokenCount / candidatesTokenCount / totalTokenCount).
    • Else if response.body.usage (or response.body.usage.prompt_tokens) exists → use OpenAI keys.
    • Else → log debug and skip the line.
  2. Plumb model_info through the Vertex branch. Extend the signature of calculate_vertex_ai_batch_cost_and_usage (and the caller in _batch_cost_calculator) to accept model_info: Optional[ModelInfo] = None, and pass it into batch_cost_calculator(usage=..., model=..., custom_llm_provider=\"vertex_ai\", model_info=model_info) so input_cost_per_token_batches overrides apply.
  3. Reuse the existing OpenAI cost path when the file is OpenAI-shaped. Lowest-risk alternative to (1): if the first line of file_content_dictionary has the OpenAI shape (response.body.usage), fall through to _get_batch_job_cost_from_file_content with custom_llm_provider=\"vertex_ai\" and model_info — this already iterates response.body.usage correctly and respects model_info. Keep the Vertex-raw branch as a fallback for users who set disable_vertex_batch_output_transformation = True.
  4. Remove the dead ValueError for Vertex in _get_batch_output_file_content_as_dictionary. Now that afile_content for vertex_ai actually works (downloads from GCS, transforms), _handle_completed_batch should be supported. The function already takes the same litellm_params that the call site holds, so credentials propagation is fine.
  5. Unify the usage extraction in _get_batch_job_total_usage_from_file_content. Same dual-shape detection as (1), so emitted usage matches emitted cost.
  6. Tests in tests/batches_tests/:
    • test_calculate_vertex_ai_batch_cost_and_usage_openai_shape — feed the OpenAI-transformed line, expect non-zero cost matching model_prices_and_context_window.json.
    • test_calculate_vertex_ai_batch_cost_and_usage_raw_vertex_shape — feed raw {response: {usageMetadata: {...}}} line, expect equivalent cost (regression coverage for #19925).
    • test_calculate_vertex_ai_batch_cost_with_deployment_input_cost_per_token_batches — confirm model_info-based override is applied.
    • test_handle_completed_batch_vertex_ai — end-to-end through _handle_completed_batch with a mocked afile_content, asserting non-zero cost and that the ValueError is gone.
    • End-to-end CheckBatchCost test (already mocked extensively in tests/test_litellm/enterprise/proxy/test_file_deletion_blocking.py etc.) — extend to assert spend/usage on the emitted success event.

Why this matters

Many production users plan to shift large pipelines onto Vertex Batch specifically because of the lower batch pricing — the whole point of using batch is to track and optimize that lower spend, and right now LiteLLM silently records it as $0. Customers using LiteLLM for cost attribution / chargeback to teams cannot adopt Vertex Batch until this is closed.

Relevant log output

# verbose_logger.info from calculate_vertex_ai_batch_cost_and_usage on a completed Vertex batch
vertex_ai batch cost: cost=0.0, prompt=0, completion=0, total=0

(Job state on the same batch confirms it succeeded:)

# from llm_router.aretrieve_batch on the same batch_id
status=completed, output_file_id=gs://<bucket>/litellm-vertex-files/.../predictions.jsonl

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on?

Reproduced against the current main branch as of 2026-05-13 (commit 7c94149). All referenced line numbers are against that commit. PR #25627 (the auto-transformation) merged 2026-05-02 and is the inflection point.

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Vertex AI Batch usage/cost recorded as 0 after batch output is auto-transformed to OpenAI format [1 participants]