ollama - 💡(How to fix) Fix add the support for the token calculations and reduce the input tokens while answering [1 participants]

ollama2026-04-17 01:33:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15639•Fetched 2026-04-17 08:27:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

KunjShah95

Participants

KunjShah95

Timeline (top)

labeled ×1

RAW_BUFFERClick to expand / collapse

do add the support to keep a track on how much tokens are we spending in the ollama selected models whether we are running locally and cloud based models and store all the recently used ones into cache so we can try to reduce the cost and all

extent analysis

TL;DR

Implement a token tracking mechanism to monitor and store the usage of Ollama selected models, both locally and cloud-based, to optimize cost.

Guidance

Identify the key components involved in tracking token usage, such as the Ollama models and the cache storage.
Design a data structure to store the token usage information, including the model name, usage count, and timestamp.
Develop a mechanism to update the token usage information in real-time, considering both local and cloud-based model usage.
Consider implementing a caching strategy to store recently used models and their corresponding token usage.

Example

# Example of a simple token usage tracker
class TokenTracker:
    def __init__(self):
        self.usage_cache = {}

    def update_usage(self, model_name, token_count):
        if model_name in self.usage_cache:
            self.usage_cache[model_name] += token_count
        else:
            self.usage_cache[model_name] = token_count

    def get_usage(self, model_name):
        return self.usage_cache.get(model_name, 0)

Notes

The implementation details may vary depending on the specific requirements and the technology stack used. This example provides a basic idea of how to track token usage.

Recommendation

Apply workaround: Implement a custom token tracking mechanism, as described in the guidance section, to monitor and optimize token usage for Ollama selected models.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tool integration #LLM response #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix add the support for the token calculations and reduce the input tokens while answering [1 participants]

Recommended Tools

GitHub issue graph ai analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix add the support for the token calculations and reduce the input tokens while answering [1 participants]

Recommended Tools

GitHub issue graph ai analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING