langchain - 💡(How to fix) Fix Bug: Inflated Token Usage Reporting for accounts/fireworks/models/qwen3p6-plus with reasoning_effort="none"

langchain2026-05-28 18:34:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

We are tracking token usage and cost accounting across multiple Fireworks-hosted and external models using standard LangChain callbacks and response usage_metadata.

All other Fireworks models behave correctly. However, starting recently (without any changes to our local code), the reasoning model accounts/fireworks/models/qwen3p6-plus has started returning massively inflated, corrupted token counts.

Previously, we were able to work around token accounting bugs on this model by explicitly setting "reasoning_effort": "none" in the provider's model_kwargs. However, the workaround no longer mitigates the issue, and the API response payload now returns extremely large, invalid integer values for usage metrics.

What we are doing: Instantiating the accounts/fireworks/models/qwen3p6-plus model via the dynamic init_chat_model utility with the reasoning_effort set to "none".

What we expect to happen: The model should return reasonable, accurate token counts matching the actual prompt size and completion size (e.g., ~400 to 600 total tokens for a standard query of ~300 words).

What is currently happening: The model returns massively inflated numbers (e.g., total_tokens: 1229418586), breaking downstream cost tracking and database operations.

Error Message

No direct python exception is thrown during invocation; instead, the returned response metadata contains corrupted integer values:

{ "token_usage": { "prompt_tokens": 125657, "completion_tokens": 1229292929, "total_tokens": 1229418586 } }

Root Cause

We are tracking token usage and cost accounting across multiple Fireworks-hosted and external models using standard LangChain callbacks and response usage_metadata.

What we are doing: Instantiating the accounts/fireworks/models/qwen3p6-plus model via the dynamic init_chat_model utility with the reasoning_effort set to "none".

What is currently happening: The model returns massively inflated numbers (e.g., total_tokens: 1229418586), breaking downstream cost tracking and database operations.

Fix Action

Fix / Workaround

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Initialize using standard dynamic model initialization

llm = init_chat_model( model="accounts/fireworks/models/qwen3p6-plus", model_provider="fireworks", model_kwargs={ "reasoning_effort": "none" # Workaround that no longer prevents the issue } )

Code Example

import os
from langchain.chat_models import init_chat_model

# Ensure FIREWORKS_API_KEY is present in environment
os.environ["FIREWORKS_API_KEY"] = "your_fireworks_api_key_here"

# Initialize using standard dynamic model initialization
llm = init_chat_model(
    model="accounts/fireworks/models/qwen3p6-plus",
    model_provider="fireworks",
    model_kwargs={
        "reasoning_effort": "none"  # Workaround that no longer prevents the issue
    }
)

# Execute query
response = llm.invoke("Explain vector databases in simple terms")

# Inspect response outputs
print("--- RESPONSE METADATA ---")
print(response.response_metadata)

print("\n--- USAGE METADATA ---")
if hasattr(response, "usage_metadata"):
    print(response.usage_metadata)
else:
    print("No usage_metadata found on response object.")

---

No direct python exception is thrown during invocation; instead, the returned response metadata contains corrupted integer values:

{ "token_usage": { "prompt_tokens": 125657, "completion_tokens": 1229292929, "total_tokens": 1229418586 } }

RAW_BUFFERClick to expand / collapse

Submission checklist

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

import os
from langchain.chat_models import init_chat_model

# Ensure FIREWORKS_API_KEY is present in environment
os.environ["FIREWORKS_API_KEY"] = "your_fireworks_api_key_here"

# Initialize using standard dynamic model initialization
llm = init_chat_model(
    model="accounts/fireworks/models/qwen3p6-plus",
    model_provider="fireworks",
    model_kwargs={
        "reasoning_effort": "none"  # Workaround that no longer prevents the issue
    }
)

# Execute query
response = llm.invoke("Explain vector databases in simple terms")

# Inspect response outputs
print("--- RESPONSE METADATA ---")
print(response.response_metadata)

print("\n--- USAGE METADATA ---")
if hasattr(response, "usage_metadata"):
    print(response.usage_metadata)
else:
    print("No usage_metadata found on response object.")

Error Message and Stack Trace (if applicable)

No direct python exception is thrown during invocation; instead, the returned response metadata contains corrupted integer values:

{ "token_usage": { "prompt_tokens": 125657, "completion_tokens": 1229292929, "total_tokens": 1229418586 } }

Description

We are tracking token usage and cost accounting across multiple Fireworks-hosted and external models using standard LangChain callbacks and response usage_metadata.

What we are doing: Instantiating the accounts/fireworks/models/qwen3p6-plus model via the dynamic init_chat_model utility with the reasoning_effort set to "none".

What is currently happening: The model returns massively inflated numbers (e.g., total_tokens: 1229418586), breaking downstream cost tracking and database operations.

System Info

OS: macOS (ARM64) Python Version: 3.13 langchain: >=1.2.15 langchain-core: >=0.3.0 langchain-openai: >=1.2.1 langchain-fireworks: >=1.3.1

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix Bug: Inflated Token Usage Reporting for accounts/fireworks/models/qwen3p6-plus with reasoning_effort="none"

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Initialize using standard dynamic model initialization

Code Example

Submission checklist

Package (Required)

Related Issues / PRs

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

System Info

Still need to ship something?

TRENDING