hermes - ✅(Solved) Fix Session auto-title generation silently fails with reasoning models (max_tokens too low) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20305Fetched 2026-05-06 06:37:27
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×1cross-referenced ×1

Session auto-title generation silently fails when using reasoning-capable LLMs (e.g., DeepSeek V4 Flash, Claude with thinking, QwQ, etc.). The resulting session titles remain NULL with no user-facing error or warning.

Error Message

Session auto-title generation silently fails when using reasoning-capable LLMs (e.g., DeepSeek V4 Flash, Claude with thinking, QwQ, etc.). The resulting session titles remain NULL with no user-facing error or warning. This is a silent failure because generate_title only logs at WARNING on exceptions, but when finish_reason='length' with empty content, no exception is raised — the code simply strips the empty string and returns None silently.

Root Cause

In agent/title_generator.py, the generate_title() function calls call_llm() with max_tokens=500. Reasoning models output a separate reasoning_content block before the actual response content. With only 500 tokens budgeted, these models frequently exhaust the entire budget on reasoning alone, producing finish_reason='length' and an empty content string. The function strips the empty string and returns None.

# Current upstream (line ~60):
response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=500,        # <-- too small for reasoning models
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

Fix Action

Fix

Increase max_tokens from 500 to at least 2000 in agent/title_generator.py:

max_tokens=2000,

This provides enough headroom for the reasoning phase while still leaving tokens for the actual title output. The title itself is only 3–7 words, so 2000 tokens is generous but lightweight enough to not meaningfully impact cost or latency.

PR fix notes

PR #20338: fix(title-gen): increase max_tokens for reasoning model compatibility

Description (problem / solution / changelog)

Summary

Fixes #20305 — Session auto-title generation silently fails with reasoning models.

Problem

Reasoning-capable LLMs (DeepSeek V4 Flash, Claude with thinking, QwQ, etc.) output a reasoning_content block before the actual response content. With only 500 tokens budgeted in generate_title(), these models exhaust the entire budget on reasoning alone, producing finish_reason='length' and empty content. The function strips the empty string and returns None — leaving session titles as NULL with no log warning.

Changes

  • agent/title_generator.py: Increase max_tokens from 500 to 2000 to provide enough headroom for the reasoning phase while still being lightweight for cost/latency
  • Add a WARNING-level log message when title generation returns empty content, making the previously silent failure path visible in agent.log

Testing

  • All 21 existing title_gen tests pass (pytest -k title_gen: 21 passed, 4 skipped)
  • The title itself is only 3–7 words, so 2000 tokens is generous but ensures reasoning models can complete both their thinking and the actual output

Risk

Minimal — only affects the token budget for a fire-and-forget background task. No behavioral change for non-reasoning models (they don't use the extra headroom).

Changed files

  • agent/title_generator.py (modified, +3/-1)

Code Example

# Current upstream (line ~60):
response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=500,        # <-- too small for reasoning models
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

---

max_tokens=2000,
RAW_BUFFERClick to expand / collapse

Description

Session auto-title generation silently fails when using reasoning-capable LLMs (e.g., DeepSeek V4 Flash, Claude with thinking, QwQ, etc.). The resulting session titles remain NULL with no user-facing error or warning.

Root Cause

In agent/title_generator.py, the generate_title() function calls call_llm() with max_tokens=500. Reasoning models output a separate reasoning_content block before the actual response content. With only 500 tokens budgeted, these models frequently exhaust the entire budget on reasoning alone, producing finish_reason='length' and an empty content string. The function strips the empty string and returns None.

# Current upstream (line ~60):
response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=500,        # <-- too small for reasoning models
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

Symptoms

  • hermes sessions list shows sessions with empty titles
  • SQLite query shows many rows with title IS NULL
  • Agent log shows repeated INFO agent.auxiliary_client: Auxiliary title_generation: using auto … entries
  • No Title generation failed warning in logs (the function returns None without raising when content is empty)

Steps to Reproduce

  1. Configure a reasoning-capable model (e.g., DeepSeek V4 Flash, Claude Sonnet with thinking enabled)
  2. Start a new session and send a message
  3. Wait for the auto-title generation to complete
  4. Check the session title via /title or hermes sessions list → it will be empty

Fix

Increase max_tokens from 500 to at least 2000 in agent/title_generator.py:

max_tokens=2000,

This provides enough headroom for the reasoning phase while still leaving tokens for the actual title output. The title itself is only 3–7 words, so 2000 tokens is generous but lightweight enough to not meaningfully impact cost or latency.

Environment

  • Hermes Agent version: current main
  • Model affected: DeepSeek V4 Flash (and any other model that outputs reasoning_content)
  • Provider: OpenRouter / opencode-go (but provider-agnostic)

Additional Context

This is a silent failure because generate_title only logs at WARNING on exceptions, but when finish_reason='length' with empty content, no exception is raised — the code simply strips the empty string and returns None silently.

The same max_tokens issue could affect other auxiliary tasks (compression, web_extract, session_search) when used with reasoning models.

extent analysis

TL;DR

Increase the max_tokens parameter in agent/title_generator.py to at least 2000 to accommodate the output of reasoning-capable LLMs.

Guidance

  • Verify that the issue is indeed caused by the max_tokens limit by checking the finish_reason in the LLM response, which should be 'length' when the token budget is exhausted.
  • Update the max_tokens value to 2000 or more in agent/title_generator.py to provide sufficient headroom for the reasoning phase and title output.
  • Test the fix by reproducing the steps to reproduce and checking that the session title is generated correctly.
  • Consider reviewing other auxiliary tasks (e.g., compression, web_extract, session_search) that may also be affected by the max_tokens limit when used with reasoning models.

Example

response = call_llm(
    task="title_generation",
    messages=messages,
    max_tokens=2000,  # Increased token budget
    temperature=0.3,
    timeout=timeout,
    main_runtime=main_runtime,
)

Notes

The fix assumes that increasing the max_tokens value will not significantly impact cost or latency. However, this may vary depending on the specific use case and model configuration.

Recommendation

Apply the workaround by increasing the max_tokens value to at least 2000, as this provides a straightforward solution to the silent failure issue caused by the token budget limit.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING