hermes - 💡(How to fix) Fix [Bug]: custom:llmgateway tool calls fail when streamed

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

chat_completion_stream_request provider=custom:llmgateway model=gpt-5.4

Then the request fails repeatedly with:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hermes retries the same streaming path three times, then the tool workflow fails.

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

After that, Hermes used:

chat_completion_request provider=custom:llmgateway model=gpt-5.4

The same stress test then passed, completing 13 API calls and 18 tool turns.

Expected behavior: Hermes should provide a config option such as stream_tool_calls: false for custom providers, or retry non-streaming when a streamed tool turn fails before useful output.

Steps to Reproduce

Root Cause

I did not upload a full debug share because this install contains provider configuration and local logs. Relevant redacted logs are included above.

Fix Action

Fix / Workaround

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

Key redacted evidence:

  • Before workaround: custom:llmgateway + gpt-5.4 used chat_completion_stream_request and failed with JSONDecodeError: Expecting value: line 1 column 1.
  • After workaround: same provider/model used chat_completion_request and completed 13 API calls

Code Example

chat_completion_stream_request
provider=custom:llmgateway
model=gpt-5.4

Then the request fails repeatedly with:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hermes retries the same streaming path three times, then the tool workflow fails.

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

After that, Hermes used:

chat_completion_request
provider=custom:llmgateway
model=gpt-5.4

The same stress test then passed, completing 13 API calls and 18 tool turns.

Expected behavior: Hermes should provide a config option such as stream_tool_calls: false for custom providers, or retry non-streaming when a streamed tool turn fails before useful output.


**Steps to Reproduce**

---

model:
  provider: custom:llmgateway
  base_url: https://api.llmgateway.io/v1
  api_mode: chat_completions
  default: gpt-5.4
Start Hermes gateway.
Run a tool-heavy local task, such as:
write a Python file
run it
read it back
compute line count and SHA256
write a report
read the report
Observe that Hermes uses chat_completion_stream_request.
The workflow fails with:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
If streaming is disabled only for LLM Gateway tool turns, the same workflow succeeds.

**Environment**

---

chat_completion_stream_request
provider=custom:llmgateway
model=gpt-5.4

Then the request fails repeatedly with:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hermes retries the same streaming path three times, then the tool workflow fails.

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

After that, Hermes used:

chat_completion_request
provider=custom:llmgateway
model=gpt-5.4

The same stress test then passed, completing 13 API calls and 18 tool turns.

Expected behavior: Hermes should provide a config option such as stream_tool_calls: false for custom providers, or retry non-streaming when a streamed tool turn fails before useful output.


**Steps to Reproduce**

---

model:
  provider: custom:llmgateway
  base_url: https://api.llmgateway.io/v1
  api_mode: chat_completions
  default: gpt-5.4
Start Hermes gateway.
Run a tool-heavy local task, such as:
write a Python file
run it
read it back
compute line count and SHA256
write a report
read the report
Observe that Hermes uses chat_completion_stream_request.
The workflow fails with:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
If streaming is disabled only for LLM Gateway tool turns, the same workflow succeeds.

**Environment**

---

chat_completion_stream_request
provider=custom:llmgateway
model=gpt-5.4

### Affected Component

Agent Core (conversation loop, context compression, memory)

### Messaging Platform (if gateway-related)

_No response_

### Debug Report

---

### Operating System

Ubuntu 24.04

### Python Version

3.11.15

### Hermes Version

v0.13.0 (2026.5.7) Up to date as of 2026-05-12

### Additional Logs / Traceback (optional)
RAW_BUFFERClick to expand / collapse

Bug Description

Hermes tool-heavy turns fail with custom:llmgateway when Hermes uses the streaming chat-completions path.

Model tested: gpt-5.4
Provider: custom:llmgateway
Base URL: https://api.llmgateway.io/v1

The failing logs show Hermes using:

chat_completion_stream_request
provider=custom:llmgateway
model=gpt-5.4

Then the request fails repeatedly with:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hermes retries the same streaming path three times, then the tool workflow fails.

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

After that, Hermes used:

chat_completion_request
provider=custom:llmgateway
model=gpt-5.4

The same stress test then passed, completing 13 API calls and 18 tool turns.

Expected behavior: Hermes should provide a config option such as stream_tool_calls: false for custom providers, or retry non-streaming when a streamed tool turn fails before useful output.


**Steps to Reproduce**

```markdown
1. Configure Hermes with:

```yaml
model:
  provider: custom:llmgateway
  base_url: https://api.llmgateway.io/v1
  api_mode: chat_completions
  default: gpt-5.4
Start Hermes gateway.
Run a tool-heavy local task, such as:
write a Python file
run it
read it back
compute line count and SHA256
write a report
read the report
Observe that Hermes uses chat_completion_stream_request.
The workflow fails with:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
If streaming is disabled only for LLM Gateway tool turns, the same workflow succeeds.

**Environment**

```text
Hermes Agent: v0.13.0, up to date as of 2026-05-12
OS: Ubuntu 24.04
Python: 3.11.15
OpenAI SDK: 2.24.0
Provider: custom:llmgateway
Base URL: https://api.llmgateway.io/v1
API mode: chat_completions
Model: gpt-5.4

### Steps to Reproduce

Hermes tool-heavy turns fail with `custom:llmgateway` when Hermes uses the streaming chat-completions path.

Model tested: `gpt-5.4`  
Provider: `custom:llmgateway`  
Base URL: `https://api.llmgateway.io/v1`

The failing logs show Hermes using:

```text
chat_completion_stream_request
provider=custom:llmgateway
model=gpt-5.4

Then the request fails repeatedly with:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hermes retries the same streaming path three times, then the tool workflow fails.

I tested a local workaround: disable streaming only when the request has tools or functions and the provider/base URL is LLM Gateway.

After that, Hermes used:

chat_completion_request
provider=custom:llmgateway
model=gpt-5.4

The same stress test then passed, completing 13 API calls and 18 tool turns.

Expected behavior: Hermes should provide a config option such as stream_tool_calls: false for custom providers, or retry non-streaming when a streamed tool turn fails before useful output.


**Steps to Reproduce**

```markdown
1. Configure Hermes with:

```yaml
model:
  provider: custom:llmgateway
  base_url: https://api.llmgateway.io/v1
  api_mode: chat_completions
  default: gpt-5.4
Start Hermes gateway.
Run a tool-heavy local task, such as:
write a Python file
run it
read it back
compute line count and SHA256
write a report
read the report
Observe that Hermes uses chat_completion_stream_request.
The workflow fails with:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
If streaming is disabled only for LLM Gateway tool turns, the same workflow succeeds.

**Environment**

```text
Hermes Agent: v0.13.0, up to date as of 2026-05-12
OS: Ubuntu 24.04
Python: 3.11.15
OpenAI SDK: 2.24.0
Provider: custom:llmgateway
Base URL: https://api.llmgateway.io/v1
API mode: chat_completions
Model: gpt-5.4

### Expected Behavior

Hermes should complete the tool workflow successfully.

If streamed tool calls are unreliable for a custom provider, Hermes should either:

1. Allow disabling streamed tool calls for that provider, or
2. Retry once with non-streaming `chat_completion_request` when streaming fails before useful output.

Non-streaming tool calls worked with the same provider and model in my test.

### Actual Behavior

Hermes used the streaming path:

```text
chat_completion_stream_request
provider=custom:llmgateway
model=gpt-5.4

### Affected Component

Agent Core (conversation loop, context compression, memory)

### Messaging Platform (if gateway-related)

_No response_

### Debug Report

```shell
I did not upload a full debug share because this install contains provider configuration and local logs. Relevant redacted logs are included above.

Key redacted evidence:
- Before workaround: custom:llmgateway + gpt-5.4 used chat_completion_stream_request and failed with JSONDecodeError: Expecting value: line 1 column 1.
- After workaround: same provider/model used chat_completion_request and completed 13 API calls

Operating System

Ubuntu 24.04

Python Version

3.11.15

Hermes Version

v0.13.0 (2026.5.7) Up to date as of 2026-05-12

Additional Logs / Traceback (optional)

OpenAI client created (chat_completion_stream_request, shared=False)
provider=custom:llmgateway
base_url=https://api.llmgateway.io/v1
model=gpt-5.4

Streaming failed before delivery: Expecting value: line 1 column 1 (char 0)

API call failed (attempt 1/3) error_type=JSONDecodeError
API call failed (attempt 2/3) error_type=JSONDecodeError
API call failed (attempt 3/3) error_type=JSONDecodeError

API call failed after 3 retries. Expecting value: line 1 column 1 (char 0)

After local workaround:

OpenAI client created (chat_completion_request, shared=False)
provider=custom:llmgateway
base_url=https://api.llmgateway.io/v1
model=gpt-5.4

Successful run:
API calls: 13
tool turns: 18
finish_reason=stop

Root Cause Analysis (optional)

The likely root cause is that Hermes uses the streaming chat-completions path for tool turns through a custom OpenAI-compatible provider.

For custom:llmgateway, streamed tool turns can fail before useful output with an empty or malformed response. Hermes then surfaces this as:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The retry logic retries the same streaming path, so the retry does not recover.

Local evidence: the same provider/model/tool workflow failed with chat_completion_stream_request, but succeeded after forcing non-streaming chat_completion_request only for LLM Gateway tool turns.

Proposed Fix (optional)

Add a supported option to disable streaming for tool turns on custom providers, for example:

custom_providers: llmgateway: stream_tool_calls: false

Or automatically retry once with non-streaming chat_completion_request when a streamed tool turn fails before useful output with a JSON decode / empty response / malformed SSE error.

My local workaround was to set _use_streaming = False only when:

This fixed the issue without disabling streaming for normal text turns or other providers.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING