langchain - 💡(How to fix) Fix Bug: Inflated Token Usage Reporting for accounts/fireworks/models/qwen3p6-plus with reasoning_effort="none"

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

We are tracking token usage and cost accounting across multiple Fireworks-hosted and external models using standard LangChain callbacks and response usage_metadata.

All other Fireworks models behave correctly. However, starting recently (without any changes to our local code), the reasoning model accounts/fireworks/models/qwen3p6-plus has started returning massively inflated, corrupted token counts.

Previously, we were able to work around token accounting bugs on this model by explicitly setting "reasoning_effort": "none" in the provider's model_kwargs. However, the workaround no longer mitigates the issue, and the API response payload now returns extremely large, invalid integer values for usage metrics.

What we are doing: Instantiating the accounts/fireworks/models/qwen3p6-plus model via the dynamic init_chat_model utility with the reasoning_effort set to "none".

What we expect to happen: The model should return reasonable, accurate token counts matching the actual prompt size and completion size (e.g., ~400 to 600 total tokens for a standard query of ~300 words).

What is currently happening: The model returns massively inflated numbers (e.g., total_tokens: 1229418586), breaking downstream cost tracking and database operations.

Error Message

No direct python exception is thrown during invocation; instead, the returned response metadata contains corrupted integer values:

{ "token_usage": { "prompt_tokens": 125657, "completion_tokens": 1229292929, "total_tokens": 1229418586 } }

Root Cause

We are tracking token usage and cost accounting across multiple Fireworks-hosted and external models using standard LangChain callbacks and response usage_metadata.

All other Fireworks models behave correctly. However, starting recently (without any changes to our local code), the reasoning model accounts/fireworks/models/qwen3p6-plus has started returning massively inflated, corrupted token counts.

Previously, we were able to work around token accounting bugs on this model by explicitly setting "reasoning_effort": "none" in the provider's model_kwargs. However, the workaround no longer mitigates the issue, and the API response payload now returns extremely large, invalid integer values for usage metrics.

What we are doing: Instantiating the accounts/fireworks/models/qwen3p6-plus model via the dynamic init_chat_model utility with the reasoning_effort set to "none".

What we expect to happen: The model should return reasonable, accurate token counts matching the actual prompt size and completion size (e.g., ~400 to 600 total tokens for a standard query of ~300 words).

What is currently happening: The model returns massively inflated numbers (e.g., total_tokens: 1229418586), breaking downstream cost tracking and database operations.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Initialize using standard dynamic model initialization

llm = init_chat_model( model="accounts/fireworks/models/qwen3p6-plus", model_provider="fireworks", model_kwargs={ "reasoning_effort": "none" # Workaround that no longer prevents the issue } )

Previously, we were able to work around token accounting bugs on this model by explicitly setting "reasoning_effort": "none" in the provider's model_kwargs. However, the workaround no longer mitigates the issue, and the API response payload now returns extremely large, invalid integer values for usage metrics.

Code Example

import os
from langchain.chat_models import init_chat_model

# Ensure FIREWORKS_API_KEY is present in environment
os.environ["FIREWORKS_API_KEY"] = "your_fireworks_api_key_here"

# Initialize using standard dynamic model initialization
llm = init_chat_model(
    model="accounts/fireworks/models/qwen3p6-plus",
    model_provider="fireworks",
    model_kwargs={
        "reasoning_effort": "none"  # Workaround that no longer prevents the issue
    }
)

# Execute query
response = llm.invoke("Explain vector databases in simple terms")

# Inspect response outputs
print("--- RESPONSE METADATA ---")
print(response.response_metadata)

print("\n--- USAGE METADATA ---")
if hasattr(response, "usage_metadata"):
    print(response.usage_metadata)
else:
    print("No usage_metadata found on response object.")

---

No direct python exception is thrown during invocation; instead, the returned response metadata contains corrupted integer values:

{ "token_usage": { "prompt_tokens": 125657, "completion_tokens": 1229292929, "total_tokens": 1229418586 } }
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

import os
from langchain.chat_models import init_chat_model

# Ensure FIREWORKS_API_KEY is present in environment
os.environ["FIREWORKS_API_KEY"] = "your_fireworks_api_key_here"

# Initialize using standard dynamic model initialization
llm = init_chat_model(
    model="accounts/fireworks/models/qwen3p6-plus",
    model_provider="fireworks",
    model_kwargs={
        "reasoning_effort": "none"  # Workaround that no longer prevents the issue
    }
)

# Execute query
response = llm.invoke("Explain vector databases in simple terms")

# Inspect response outputs
print("--- RESPONSE METADATA ---")
print(response.response_metadata)

print("\n--- USAGE METADATA ---")
if hasattr(response, "usage_metadata"):
    print(response.usage_metadata)
else:
    print("No usage_metadata found on response object.")

Error Message and Stack Trace (if applicable)

No direct python exception is thrown during invocation; instead, the returned response metadata contains corrupted integer values:

{ "token_usage": { "prompt_tokens": 125657, "completion_tokens": 1229292929, "total_tokens": 1229418586 } }

Description

We are tracking token usage and cost accounting across multiple Fireworks-hosted and external models using standard LangChain callbacks and response usage_metadata.

All other Fireworks models behave correctly. However, starting recently (without any changes to our local code), the reasoning model accounts/fireworks/models/qwen3p6-plus has started returning massively inflated, corrupted token counts.

Previously, we were able to work around token accounting bugs on this model by explicitly setting "reasoning_effort": "none" in the provider's model_kwargs. However, the workaround no longer mitigates the issue, and the API response payload now returns extremely large, invalid integer values for usage metrics.

What we are doing: Instantiating the accounts/fireworks/models/qwen3p6-plus model via the dynamic init_chat_model utility with the reasoning_effort set to "none".

What we expect to happen: The model should return reasonable, accurate token counts matching the actual prompt size and completion size (e.g., ~400 to 600 total tokens for a standard query of ~300 words).

What is currently happening: The model returns massively inflated numbers (e.g., total_tokens: 1229418586), breaking downstream cost tracking and database operations.

System Info

OS: macOS (ARM64) Python Version: 3.13 langchain: >=1.2.15 langchain-core: >=0.3.0 langchain-openai: >=1.2.1 langchain-fireworks: >=1.3.1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING