ollama - 💡(How to fix) Fix Error 499 and CUDA error [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14619Fetched 2026-04-08 00:33:43
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Timeline (top)
commented ×2closed ×1labeled ×1

Error Message

And ollama reacted like the log output. I think the error is related to https://github.com/ollama/ollama/issues/14615. ║ jobautomation/OpenEuroLLM-Catalan:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/jobautomation/OpenEuroLLM-Catalan:latest does not support tools"} ║ ║ nomic-embed-text:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":""nomic-embed-text:latest" does not support chat"} ║ ║ mxbai-embed-large:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":""mxbai-embed-large:latest" does not support chat"} ║ ║ deepseek-r1:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:latest does not support tools"} ║ ║ gemma3:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/gemma3:latest does not support tools"} ║ ║ codellama:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/codellama:latest does not support tools"} ║ time=2026-03-04T12:17:27.038+01:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\user.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 error="context canceled" time=2026-03-04T12:17:33.243+01:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout" time=2026-03-04T12:17:33.245+01:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"

Code Example

╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ llama3.2:1b                              │ SUPPORTED100Model emitted structured tool_calls.                                                                                                                 
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ phi4-mini:latest                         │ UNSUPPORTED0Failed to run chat request: This operation was aborted                                                                                               ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ jobautomation/OpenEuroLLM-Catalan:latest │ UNSUPPORTED0Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/jobautomation/OpenEuroLLM-Catalan:latest does not support tools"}╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ nomic-embed-text:latest                  │ UNSUPPORTED0Failed to run chat request: HTTP 400: Bad Request - {"error":"\"nomic-embed-text:latest\" does not support chat"}╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ mxbai-embed-large:latest                 │ UNSUPPORTED0Failed to run chat request: HTTP 400: Bad Request - {"error":"\"mxbai-embed-large:latest\" does not support chat"}╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ deepseek-r1:latest                       │ UNSUPPORTED0Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:latest does not support tools"}╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ gemma3:latest                            │ UNSUPPORTED0Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/gemma3:latest does not support tools"}╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ codellama:latest                         │ UNSUPPORTED0Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/codellama:latest does not support tools"}╚══════════════════════════════════════════╧═════════════╧═══════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

---

print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-03-04T12:17:09.811+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\user\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 45930"
time=2026-03-04T12:17:09.829+01:00 level=INFO source=sched.go:489 msg="system memory" total="15.7 GiB" free="1.3 GiB" free_swap="19.8 GiB"
time=2026-03-04T12:17:09.830+01:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 library=CUDA available="0 B" free="430.5 MiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-04T12:17:09.915+01:00 level=INFO source=server.go:497 msg="loading model" "model layers"=29 requested=-1
time=2026-03-04T12:17:10.059+01:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="4.1 GiB"
time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="224.0 MiB"
time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:272 msg="total memory" size="4.3 GiB"
time=2026-03-04T12:17:18.807+01:00 level=INFO source=runner.go:965 msg="starting go runner"
load_backend: loaded CPU backend from C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-03-04T12:17:27.038+01:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\user\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 error="context canceled"
[GIN] 2026/03/04 - 12:17:27 | 200 |    156.1757ms |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/04 - 12:17:27 | 499 |   45.5433371s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-04T12:17:30.611+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 8626"
time=2026-03-04T12:17:33.243+01:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2026-03-04T12:17:33.245+01:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"
time=2026-03-04T12:17:33.247+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\user\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
RAW_BUFFERClick to expand / collapse

What is the issue?

I was trying this command: npx llm-checker toolcheck --all And ollama reacted like the log output. I think the error is related to https://github.com/ollama/ollama/issues/14615.

Here half of the output of the command:

╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ llama3.2:1b                              │ SUPPORTED   │ 100   │ Model emitted structured tool_calls.                                                                                                                 ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ phi4-mini:latest                         │ UNSUPPORTED │ 0     │ Failed to run chat request: This operation was aborted                                                                                               ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ jobautomation/OpenEuroLLM-Catalan:latest │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/jobautomation/OpenEuroLLM-Catalan:latest does not support tools"}   ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ nomic-embed-text:latest                  │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"\"nomic-embed-text:latest\" does not support chat"}                                    ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ mxbai-embed-large:latest                 │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"\"mxbai-embed-large:latest\" does not support chat"}                                   ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ deepseek-r1:latest                       │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:latest does not support tools"}                 ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ gemma3:latest                            │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/gemma3:latest does not support tools"}                      ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ codellama:latest                         │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/codellama:latest does not support tools"}                   ║
╚══════════════════════════════════════════╧═════════════╧═══════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Relevant log output

print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-03-04T12:17:09.811+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\user\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 45930"
time=2026-03-04T12:17:09.829+01:00 level=INFO source=sched.go:489 msg="system memory" total="15.7 GiB" free="1.3 GiB" free_swap="19.8 GiB"
time=2026-03-04T12:17:09.830+01:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 library=CUDA available="0 B" free="430.5 MiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-04T12:17:09.915+01:00 level=INFO source=server.go:497 msg="loading model" "model layers"=29 requested=-1
time=2026-03-04T12:17:10.059+01:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="4.1 GiB"
time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="224.0 MiB"
time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:272 msg="total memory" size="4.3 GiB"
time=2026-03-04T12:17:18.807+01:00 level=INFO source=runner.go:965 msg="starting go runner"
load_backend: loaded CPU backend from C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-03-04T12:17:27.038+01:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\user\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 error="context canceled"
[GIN] 2026/03/04 - 12:17:27 | 200 |    156.1757ms |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/04 - 12:17:27 | 499 |   45.5433371s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-04T12:17:30.611+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 8626"
time=2026-03-04T12:17:33.243+01:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2026-03-04T12:17:33.245+01:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"
time=2026-03-04T12:17:33.247+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\user\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2

OS

Windows

GPU

Intel, Nvidia

CPU

Intel

Ollama version

0.17.6

extent analysis

Fix Plan

To resolve the issue, we need to address the errors related to unsupported models and failed chat requests. Here are the steps:

  • Update Ollama: Ensure you are running the latest version of Ollama. The current version is 0.17.6, but it's essential to check for any updates.
  • Model Compatibility: Verify that the models you are trying to use are compatible with your system and Ollama version. You can check the model's documentation or the Ollama GitHub page for more information.
  • GPU and CPU Configuration: Make sure your GPU and CPU are properly configured and recognized by Ollama. You can try running Ollama with the --cpu flag to force it to use the CPU instead of the GPU.
  • Increase Memory Allocation: Some models require a significant amount of memory to run. You can try increasing the memory allocation for Ollama by setting the OLLAMA_MEMORY environment variable.

Example code to increase memory allocation:

set OLLAMA_MEMORY=16G
npx llm-checker toolcheck --all

Replace 16G with the desired amount of memory.

  • Disable Tools: If a model does not support tools, you can try disabling tools for that model by adding the --no-tools flag.
npx llm-checker toolcheck --all --no-tools
  • Check for Conflicting Models: If you have multiple models installed, try removing or disabling conflicting models to see if it resolves the issue.

Verification

To verify that the fix worked, run the npx llm-checker toolcheck --all command again and check the output for any errors or warnings. If the issue persists, try checking the Ollama logs for more detailed error messages.

Extra Tips

  • Make sure to check the Ollama documentation and GitHub page for any known issues or updates related to the models you are using.
  • If you are still experiencing issues, try reaching out to the Ollama community or opening a new issue on the GitHub page for further assistance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING