hermes - ✅(Solved) Fix Session auto-title generation silently fails with reasoning models (max_tokens too low) [1 pull requests, 1 comments, 2 participants]

hermes2026-05-05 16:16:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20305•Fetched 2026-05-06 06:37:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

WegoW

Participants

alt-glitch

WegoW

Timeline (top)

labeled ×4commented ×1cross-referenced ×1

Error Message

Session auto-title generation silently fails when using reasoning-capable LLMs (e.g., DeepSeek V4 Flash, Claude with thinking, QwQ, etc.). The resulting session titles remain NULL with no user-facing error or warning. This is a silent failure because generate_title only logs at WARNING on exceptions, but when finish_reason='length' with empty content, no exception is raised — the code simply strips the empty string and returns None silently.

Root Cause

In agent/title_generator.py, the generate_title() function calls call_llm() with max_tokens=500. Reasoning models output a separate reasoning_content block before the actual response content. With only 500 tokens budgeted, these models frequently exhaust the entire budget on reasoning alone, producing finish_reason='length' and an empty content string. The function strips the empty string and returns None.

# Current upstream (line ~60):
response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=500,        # <-- too small for reasoning models
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

Fix Action

Fix

Increase max_tokens from 500 to at least 2000 in agent/title_generator.py:

max_tokens=2000,

This provides enough headroom for the reasoning phase while still leaving tokens for the actual title output. The title itself is only 3–7 words, so 2000 tokens is generous but lightweight enough to not meaningfully impact cost or latency.

PR fix notes

PR #20338: fix(title-gen): increase max_tokens for reasoning model compatibility

Repository: NousResearch/hermes-agent
Author: luyao618
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20338

Description (problem / solution / changelog)

Summary

Fixes #20305 — Session auto-title generation silently fails with reasoning models.

Problem

Reasoning-capable LLMs (DeepSeek V4 Flash, Claude with thinking, QwQ, etc.) output a reasoning_content block before the actual response content. With only 500 tokens budgeted in generate_title(), these models exhaust the entire budget on reasoning alone, producing finish_reason='length' and empty content. The function strips the empty string and returns None — leaving session titles as NULL with no log warning.

Changes

agent/title_generator.py: Increase max_tokens from 500 to 2000 to provide enough headroom for the reasoning phase while still being lightweight for cost/latency
Add a WARNING-level log message when title generation returns empty content, making the previously silent failure path visible in agent.log

Testing

All 21 existing title_gen tests pass (pytest -k title_gen: 21 passed, 4 skipped)
The title itself is only 3–7 words, so 2000 tokens is generous but ensures reasoning models can complete both their thinking and the actual output

Risk

Minimal — only affects the token budget for a fire-and-forget background task. No behavioral change for non-reasoning models (they don't use the extra headroom).

Changed files

agent/title_generator.py (modified, +3/-1)

Code Example

# Current upstream (line ~60):
response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=500,        # <-- too small for reasoning models
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

---

max_tokens=2000,

RAW_BUFFERClick to expand / collapse

Description

Root Cause

# Current upstream (line ~60):
response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=500,        # <-- too small for reasoning models
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

Symptoms

hermes sessions list shows sessions with empty titles
SQLite query shows many rows with title IS NULL
Agent log shows repeated INFO agent.auxiliary_client: Auxiliary title_generation: using auto … entries
No Title generation failed warning in logs (the function returns None without raising when content is empty)

Steps to Reproduce

Configure a reasoning-capable model (e.g., DeepSeek V4 Flash, Claude Sonnet with thinking enabled)
Start a new session and send a message
Wait for the auto-title generation to complete
Check the session title via /title or hermes sessions list → it will be empty

Fix

Increase max_tokens from 500 to at least 2000 in agent/title_generator.py:

max_tokens=2000,

Environment

Hermes Agent version: current main
Model affected: DeepSeek V4 Flash (and any other model that outputs reasoning_content)
Provider: OpenRouter / opencode-go (but provider-agnostic)

Additional Context

This is a silent failure because generate_title only logs at WARNING on exceptions, but when finish_reason='length' with empty content, no exception is raised — the code simply strips the empty string and returns None silently.

The same max_tokens issue could affect other auxiliary tasks (compression, web_extract, session_search) when used with reasoning models.

extent analysis

TL;DR

Increase the max_tokens parameter in agent/title_generator.py to at least 2000 to accommodate the output of reasoning-capable LLMs.

Guidance

Verify that the issue is indeed caused by the max_tokens limit by checking the finish_reason in the LLM response, which should be 'length' when the token budget is exhausted.
Update the max_tokens value to 2000 or more in agent/title_generator.py to provide sufficient headroom for the reasoning phase and title output.
Test the fix by reproducing the steps to reproduce and checking that the session title is generated correctly.
Consider reviewing other auxiliary tasks (e.g., compression, web_extract, session_search) that may also be affected by the max_tokens limit when used with reasoning models.

Example

response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=2000,  # Increased token budget
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

Notes

The fix assumes that increasing the max_tokens value will not significantly impact cost or latency. However, this may vary depending on the specific use case and model configuration.

Recommendation

Apply the workaround by increasing the max_tokens value to at least 2000, as this provides a straightforward solution to the silent failure issue caused by the token budget limit.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#agent execution #callback error #memory management #API rate limit #retriever error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Session auto-title generation silently fails with reasoning models (max_tokens too low) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

PR fix notes

PR #20338: fix(title-gen): increase max_tokens for reasoning model compatibility

Description (problem / solution / changelog)

Summary

Problem

Changes

Testing

Risk

Changed files

Code Example

Description

Root Cause

Symptoms

Steps to Reproduce

Fix

Environment

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING