ollama - ✅(Solved) Fix bge-m3 only returns NaN on bitcoin whitepaper, other docs [1 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14657Fetched 2026-04-08 00:33:16
View on GitHub
Comments
5
Participants
4
Timeline
12
Reactions
0
Author
Timeline (top)
commented ×5mentioned ×2subscribed ×2cross-referenced ×1

Error Message

openai.InternalServerError: Error code: 500 - {'error': {'message': 'failed to encode response: json: unsupported value: NaN', 'type': 'api_error', 'param': None, 'code': None}}

Fix Action

Fixed

PR fix notes

PR #14739: server: handle NaN values in embedding responses

Description (problem / solution / changelog)

Fixes #14657

Summary

  • Added ValidateEmbedding function in the llm package to detect NaN/Inf values before JSON serialization
  • Applied validation in both runner-level embedding handlers (ollamarunner and llamarunner) where the crash originates
  • Also added NaN/Inf check in the deprecated EmbeddingsHandler endpoint which was missing the validation that EmbedHandler already had via normalize()
  • Returns a clear error message ("model produced invalid embedding values (NaN or Inf)") instead of crashing with json: unsupported value: NaN

Context

Go's encoding/json does not support NaN or Inf float values. When a model (e.g., bge-m3 with certain inputs) produces NaN values in its embeddings, the JSON encoder crashes with an unhelpful 500 error. The EmbedHandler path already catches this via the normalize() function, but the runner-level handlers and the deprecated EmbeddingsHandler did not have this protection.

The workaround OLLAMA_FLASH_ATTENTION=false mentioned in the issue suggests the root cause may be in flash attention computation, which could be investigated separately as a deeper fix.

Testing

  • Added TestValidateEmbedding with coverage for valid embeddings, NaN, positive/negative Inf, empty/nil slices, and edge cases
  • Existing TestNormalize continues to pass

This contribution was developed with AI assistance (Claude Code).

Changed files

  • llm/embedding_test.go (added, +79/-0)
  • llm/server.go (modified, +12/-0)
  • runner/llamarunner/runner.go (modified, +4/-0)
  • runner/ollamarunner/runner.go (modified, +7/-1)
  • server/routes.go (modified, +5/-0)

Code Example

ollama pull bge-m3

---

curl -X POST http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "This is a test sentence."
    }'

---

curl -X POST http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "Bitcoin: A Peer-to-Peer Electronic Cash System. Abstract. A purely peer-to-peer version of electronic cash
  would allow online payments to be sent directly from one party to another without going through a financial institution.
  Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to
   prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network
  timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be
  changed without redoing the proof-of-work."
    }'

---

{
    "error": {
      "message": "failed to encode response: json: unsupported value: NaN",
      "type": "api_error",
      "param": null,
      "code": null
    }
  }

---

openai.InternalServerError: Error code: 500 - {'error': {'message': 'failed to encode response: json: unsupported value: NaN', 'type': 'api_error', 'param': None, 'code': None}}
RAW_BUFFERClick to expand / collapse

What is the issue?

The bge-m3 model returns NaN values through the OpenAI-compatible embeddings API (/v1/embeddings) when processing certain text content, particularly technical documents.

This causes a 500 error with the message: failed to encode response: json: unsupported value: NaN

Environment

  • Ollama Version: 0.17.6
  • OS: Windows 10 / MSYS_NT-10.0-26100
  • Model: bge-m3:latest (ID: 790764642607, Size: 1.2 GB)
  • GPU: NVIDIA GeFORCE RTX 2080 Ti

Steps to Reproduce

  1. Pull the bge-m3 model
ollama pull bge-m3
  1. Test with simple text
  curl -X POST http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "This is a test sentence."
    }'

✅ Returns valid 1024-dimensional embedding

  1. Test with technical document content
  curl -X POST http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "Bitcoin: A Peer-to-Peer Electronic Cash System. Abstract. A purely peer-to-peer version of electronic cash
  would allow online payments to be sent directly from one party to another without going through a financial institution.
  Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to
   prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network
  timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be
  changed without redoing the proof-of-work."
    }'

Expected Behavior

Should return a valid 1024-dimensional embedding array, similar to the simple text case.

Actual Behavior

  {
    "error": {
      "message": "failed to encode response: json: unsupported value: NaN",
      "type": "api_error",
      "param": null,
      "code": null
    }
  }

Additional Context

  • The same technical text works perfectly with nomic-embed-text:latest, which returns valid 768-dimensional embeddings without any NaN values
  • This issue occurs consistently with content from technical PDFs (e.g., Bitcoin whitepaper, research papers)
  • The issue appears to be specific to bge-m3 - other embedding models handle the same content without issues

Relevant log output

openai.InternalServerError: Error code: 500 - {'error': {'message': 'failed to encode response: json: unsupported value: NaN', 'type': 'api_error', 'param': None, 'code': None}}

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.6

extent analysis

Fix Plan

To address the issue of NaN values returned by the bge-m3 model, we'll implement the following steps:

  • Clip input values: Ensure that the input values to the model are within a valid range to prevent NaN values.
  • Handle NaN values in the model: Modify the model to handle NaN values by replacing them with a suitable replacement value (e.g., zero).
  • Update the API to handle NaN values: Modify the API to handle NaN values in the response.

Code Changes

Here's an example code snippet in Python that demonstrates how to clip input values and handle NaN values:

import torch
import torch.nn as nn
import numpy as np

# Define a function to clip input values
def clip_input_values(input_values, min_value=-1e6, max_value=1e6):
    return np.clip(input_values, min_value, max_value)

# Define a function to handle NaN values in the model
def handle_nan_values(model_output):
    return torch.where(torch.isnan(model_output), torch.zeros_like(model_output), model_output)

# Example usage:
input_values = np.array([1.0, 2.0, np.nan, 4.0])
clipped_input_values = clip_input_values(input_values)
model_output = torch.tensor([1.0, 2.0, np.nan, 4.0])
handled_output = handle_nan_values(model_output)

print("Clipped Input Values:", clipped_input_values)
print("Handled Output:", handled_output)

API Updates

To handle NaN values in the API response, you can add a check for NaN values before returning the response:

import json

# Define a function to handle NaN values in the API response
def handle_nan_in_response(response):
    if np.isnan(response).any():
        return {"error": "NaN values encountered in response"}
    return response

# Example usage:
response = np.array([1.0, 2.0, np.nan, 4.0])
handled_response = handle_nan_in_response(response)

print("Handled Response:", handled_response)

Verification

To verify that the fix worked, you can test the API with the same technical document content that previously caused NaN values. The API should now return a valid response without NaN values.

Extra Tips

  • Ensure that the model is properly trained and validated to handle a wide range of input values.
  • Consider adding input validation and sanitization to prevent invalid or malicious input values.
  • Monitor the API for any issues related to NaN values and update the fix as needed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING