ollama - 💡(How to fix) Fix Can't change context windows size of qwen3.6 (mlx runner) [6 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15944Fetched 2026-05-04 04:58:30
View on GitHub
Comments
6
Participants
4
Timeline
18
Reactions
0
Author
Timeline (top)
commented ×6mentioned ×5subscribed ×5labeled ×1

Code Example

$ cat ollama_serve_bigger_context
#!/bin/sh

export OLLAMA_REQUEST_TIMEOUT=120m 
export OLLAMA_KEEP_ALIVE=120m 
export OLLAMA_CONTEXT_LENGTH=190000
export OLLAMA_DEBUG=1
export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1

ollama serve

---

% ollama_serve_bigger_context 
time=2026-05-03T11:36:56.431+02:00 level=INFO source=routes.go:1782 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:190000 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:2h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/aitest/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2026-05-03T11:36:56.431+02:00 level=INFO source=routes.go:1784 msg="Ollama cloud disabled: false"
time=2026-05-03T11:36:56.456+02:00 level=INFO source=images.go:517 msg="total blobs: 997"
time=2026-05-03T11:36:56.460+02:00 level=INFO source=images.go:524 msg="total unused blobs removed: 0"
time=2026-05-03T11:36:56.460+02:00 level=DEBUG source=model_recommendations.go:59 msg="starting model recommendations cache" default_recommendations=6 refresh_interval=4h0m0s fetch_timeout=3s
time=2026-05-03T11:36:56.460+02:00 level=INFO source=routes.go:1847 msg="Listening on 127.0.0.1:11434 (version 0.22.1)"
time=2026-05-03T11:36:56.460+02:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-05-03T11:36:56.460+02:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-05-03T11:36:56.461+02:00 level=DEBUG source=model_recommendations.go:264 msg="loaded model recommendations snapshot" path=/Users/aitest/.ollama/cache/model-recommendations.json count=7
time=2026-05-03T11:36:56.461+02:00 level=DEBUG source=model_recommendations.go:194 msg="refreshing model recommendations from remote" url=https://ollama.com/api/experimental/model-recommendations
time=2026-05-03T11:36:56.461+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.22.1/libexec/ollama runner --ollama-engine --port 54145"
time=2026-05-03T11:36:56.461+02:00 level=DEBUG source=server.go:445 msg=subprocess OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_CONTEXT_LENGTH=190000 OLLAMA_KEEP_ALIVE=120m OLLAMA_REQUEST_TIMEOUT=120m PATH=/opt/homebrew/bin:/opt/homebrew/sbin:/Users/aitest/.opencode/bin:/Users/aitest/.nvm/versions/node/v24.14.1/bin:/Users/aitest/Library/Android/sdk/platform-tools:/Users/aitest/Library/Android/sdk/emulator:/Users/aitest/.local/bin:/Users/aitest/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/usr/local/go/bin OLLAMA_NUM_PARALLEL=1 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.22.1/libexec:/opt/homebrew/Cellar/ollama/0.22.1/libexec/lib/ollama/mlx_metal_v3 OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.22.1/libexec
time=2026-05-03T11:36:56.509+02:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=49.0995ms OLLAMA_LIBRARY_PATH=[/opt/homebrew/Cellar/ollama/0.22.1/libexec] extra_envs=map[]
time=2026-05-03T11:36:56.510+02:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
time=2026-05-03T11:36:56.510+02:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2026-05-03T11:36:56.510+02:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=49.259542ms
time=2026-05-03T11:36:56.510+02:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M4 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="36.0 GiB" available="36.0 GiB"
time=2026-05-03T11:36:56.510+02:00 level=INFO source=routes.go:1897 msg="vram-based default context" total_vram="36.0 GiB" default_num_ctx=32768
time=2026-05-03T11:36:56.666+02:00 level=DEBUG source=model_recommendations.go:227 msg="model recommendations refreshed" count=7
time=2026-05-03T11:36:56.673+02:00 level=DEBUG source=model_recommendations.go:304 msg="persisted model recommendations snapshot" path=/Users/aitest/.ollama/cache/model-recommendations.json count=7
time=2026-05-03T11:36:56.673+02:00 level=INFO source=model_recommendations.go:179 msg="model recommendations cache sleep scheduled" wait=4h35m36.405282306s consecutive_failures=0

---

time=2026-05-03T11:37:59.021+02:00 level=INFO source=runner.go:162 msg="Starting HTTP server" host=127.0.0.1 port=54156
time=2026-05-03T11:37:59.038+02:00 level=INFO source=server.go:189 msg=ServeHTTP method=GET path=/v1/status took=18.125µs status="200 OK"
time=2026-05-03T11:37:59.038+02:00 level=INFO source=client.go:147 msg="mlx runner is ready" port=54156
time=2026-05-03T11:37:59.038+02:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.6:35b-a3b-coding-nvfp4 runner.size="20.4 GiB" runner.vram="20.4 GiB" runner.parallel=1 runner.pid=96254 runner.model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d runner.num_ctx=190000
time=2026-05-03T11:37:59.038+02:00 level=INFO source=server.go:189 msg=ServeHTTP method=GET path=/v1/status took=3.291µs status="200 OK"
time=2026-05-03T11:37:59.039+02:00 level=INFO source=cache.go:126 msg="cache miss" total=195 matched=0 cached=0 left=195
time=2026-05-03T11:38:00.566+02:00 level=INFO source=pipeline.go:135 msg="Prompt processing progress" processed=191 total=195
time=2026-05-03T11:38:00.567+02:00 level=DEBUG source=cache.go:401 msg="created snapshot" offset=191
time=2026-05-03T11:38:00.672+02:00 level=INFO source=pipeline.go:135 msg="Prompt processing progress" processed=194 total=195
[GIN] 2026/05/03 - 11:38:01 | 200 |  5.914781542s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-03T11:38:01.897+02:00 level=INFO source=server.go:189 msg=ServeHTTP method=POST path=/v1/completions took=2.858332875s status="200 OK"
time=2026-05-03T11:38:01.897+02:00 level=DEBUG source=sched.go:581 msg="context for request finished"
time=2026-05-03T11:38:01.897+02:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.6:35b-a3b-coding-nvfp4 runner.size="20.4 GiB" runner.vram="20.4 GiB" runner.parallel=1 runner.pid=96254 runner.model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d runner.num_ctx=190000 refCount=1
time=2026-05-03T11:38:01.897+02:00 level=INFO source=pipeline.go:71 msg="peak memory" size="19.82 GiB"
time=2026-05-03T11:38:01.898+02:00 level=DEBUG source=cache.go:250 msg="switching cache path" page_out=1 page_in=0
time=2026-05-03T11:38:01.898+02:00 level=INFO source=cache.go:126 msg="cache miss" total=22877 matched=3 cached=0 left=22877
time=2026-05-03T11:38:01.918+02:00 level=INFO source=pipeline.go:135 msg="Prompt processing progress" processed=3 total=22877
time=2026-05-03T11:38:01.919+02:00 level=DEBUG source=cache.go:401 msg="created snapshot" offset=3
^Ctime=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:908 msg="shutting down runner" model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d
time=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.6:35b-a3b-coding-nvfp4 runner.size="20.4 GiB" runner.vram="20.4 GiB" runner.parallel=1 runner.pid=96254 runner.model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d runner.num_ctx=190000
time=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:161 msg="shutting down scheduler pending loop"
time=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:287 msg="shutting down scheduler completed loop"
[GIN] 2026/05/03 - 11:38:02 | 500 |  6.202840125s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-03T11:38:02.199+02:00 level=INFO source=client.go:182 msg="stopping mlx runner subprocess" pid=96254
time=2026-05-03T11:38:02.370+02:00 level=DEBUG source=model_recommendations.go:183 msg="stopping model recommendations cache"

---

% ollama ps
NAME                            ID              SIZE     PROCESSOR    CONTEXT    UNTIL            
qwen3.6:35b-a3b-coding-nvfp4    cd2692a833e6    21 GB    100% GPU     262144     2 hours from now

---

% ollama ps
NAME                          ID              SIZE     PROCESSOR    CONTEXT    UNTIL            
qwen3-coder:30b-a3b-q4_K_M    06c1097efce0    37 GB    100% GPU     190000     2 hours from now

---
RAW_BUFFERClick to expand / collapse

What is the issue?

I'm trying to change the context window size but models like qwen3.6:35b-a3b-coding-nvfp4 or qwen3.6:27b-coding-nvfp4 ignore the setting. I'm launching the ollama server with the following script:

$ cat ollama_serve_bigger_context
#!/bin/sh

export OLLAMA_REQUEST_TIMEOUT=120m 
export OLLAMA_KEEP_ALIVE=120m 
export OLLAMA_CONTEXT_LENGTH=190000
export OLLAMA_DEBUG=1
export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1

ollama serve

which starts the server:

% ollama_serve_bigger_context 
time=2026-05-03T11:36:56.431+02:00 level=INFO source=routes.go:1782 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:190000 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:2h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/aitest/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2026-05-03T11:36:56.431+02:00 level=INFO source=routes.go:1784 msg="Ollama cloud disabled: false"
time=2026-05-03T11:36:56.456+02:00 level=INFO source=images.go:517 msg="total blobs: 997"
time=2026-05-03T11:36:56.460+02:00 level=INFO source=images.go:524 msg="total unused blobs removed: 0"
time=2026-05-03T11:36:56.460+02:00 level=DEBUG source=model_recommendations.go:59 msg="starting model recommendations cache" default_recommendations=6 refresh_interval=4h0m0s fetch_timeout=3s
time=2026-05-03T11:36:56.460+02:00 level=INFO source=routes.go:1847 msg="Listening on 127.0.0.1:11434 (version 0.22.1)"
time=2026-05-03T11:36:56.460+02:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-05-03T11:36:56.460+02:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-05-03T11:36:56.461+02:00 level=DEBUG source=model_recommendations.go:264 msg="loaded model recommendations snapshot" path=/Users/aitest/.ollama/cache/model-recommendations.json count=7
time=2026-05-03T11:36:56.461+02:00 level=DEBUG source=model_recommendations.go:194 msg="refreshing model recommendations from remote" url=https://ollama.com/api/experimental/model-recommendations
time=2026-05-03T11:36:56.461+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.22.1/libexec/ollama runner --ollama-engine --port 54145"
time=2026-05-03T11:36:56.461+02:00 level=DEBUG source=server.go:445 msg=subprocess OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_CONTEXT_LENGTH=190000 OLLAMA_KEEP_ALIVE=120m OLLAMA_REQUEST_TIMEOUT=120m PATH=/opt/homebrew/bin:/opt/homebrew/sbin:/Users/aitest/.opencode/bin:/Users/aitest/.nvm/versions/node/v24.14.1/bin:/Users/aitest/Library/Android/sdk/platform-tools:/Users/aitest/Library/Android/sdk/emulator:/Users/aitest/.local/bin:/Users/aitest/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/usr/local/go/bin OLLAMA_NUM_PARALLEL=1 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.22.1/libexec:/opt/homebrew/Cellar/ollama/0.22.1/libexec/lib/ollama/mlx_metal_v3 OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.22.1/libexec
time=2026-05-03T11:36:56.509+02:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=49.0995ms OLLAMA_LIBRARY_PATH=[/opt/homebrew/Cellar/ollama/0.22.1/libexec] extra_envs=map[]
time=2026-05-03T11:36:56.510+02:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
time=2026-05-03T11:36:56.510+02:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2026-05-03T11:36:56.510+02:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=49.259542ms
time=2026-05-03T11:36:56.510+02:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M4 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="36.0 GiB" available="36.0 GiB"
time=2026-05-03T11:36:56.510+02:00 level=INFO source=routes.go:1897 msg="vram-based default context" total_vram="36.0 GiB" default_num_ctx=32768
time=2026-05-03T11:36:56.666+02:00 level=DEBUG source=model_recommendations.go:227 msg="model recommendations refreshed" count=7
time=2026-05-03T11:36:56.673+02:00 level=DEBUG source=model_recommendations.go:304 msg="persisted model recommendations snapshot" path=/Users/aitest/.ollama/cache/model-recommendations.json count=7
time=2026-05-03T11:36:56.673+02:00 level=INFO source=model_recommendations.go:179 msg="model recommendations cache sleep scheduled" wait=4h35m36.405282306s consecutive_failures=0

The moment I launch claude or opencode I get these logs:

time=2026-05-03T11:37:59.021+02:00 level=INFO source=runner.go:162 msg="Starting HTTP server" host=127.0.0.1 port=54156
time=2026-05-03T11:37:59.038+02:00 level=INFO source=server.go:189 msg=ServeHTTP method=GET path=/v1/status took=18.125µs status="200 OK"
time=2026-05-03T11:37:59.038+02:00 level=INFO source=client.go:147 msg="mlx runner is ready" port=54156
time=2026-05-03T11:37:59.038+02:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.6:35b-a3b-coding-nvfp4 runner.size="20.4 GiB" runner.vram="20.4 GiB" runner.parallel=1 runner.pid=96254 runner.model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d runner.num_ctx=190000
time=2026-05-03T11:37:59.038+02:00 level=INFO source=server.go:189 msg=ServeHTTP method=GET path=/v1/status took=3.291µs status="200 OK"
time=2026-05-03T11:37:59.039+02:00 level=INFO source=cache.go:126 msg="cache miss" total=195 matched=0 cached=0 left=195
time=2026-05-03T11:38:00.566+02:00 level=INFO source=pipeline.go:135 msg="Prompt processing progress" processed=191 total=195
time=2026-05-03T11:38:00.567+02:00 level=DEBUG source=cache.go:401 msg="created snapshot" offset=191
time=2026-05-03T11:38:00.672+02:00 level=INFO source=pipeline.go:135 msg="Prompt processing progress" processed=194 total=195
[GIN] 2026/05/03 - 11:38:01 | 200 |  5.914781542s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-03T11:38:01.897+02:00 level=INFO source=server.go:189 msg=ServeHTTP method=POST path=/v1/completions took=2.858332875s status="200 OK"
time=2026-05-03T11:38:01.897+02:00 level=DEBUG source=sched.go:581 msg="context for request finished"
time=2026-05-03T11:38:01.897+02:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.6:35b-a3b-coding-nvfp4 runner.size="20.4 GiB" runner.vram="20.4 GiB" runner.parallel=1 runner.pid=96254 runner.model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d runner.num_ctx=190000 refCount=1
time=2026-05-03T11:38:01.897+02:00 level=INFO source=pipeline.go:71 msg="peak memory" size="19.82 GiB"
time=2026-05-03T11:38:01.898+02:00 level=DEBUG source=cache.go:250 msg="switching cache path" page_out=1 page_in=0
time=2026-05-03T11:38:01.898+02:00 level=INFO source=cache.go:126 msg="cache miss" total=22877 matched=3 cached=0 left=22877
time=2026-05-03T11:38:01.918+02:00 level=INFO source=pipeline.go:135 msg="Prompt processing progress" processed=3 total=22877
time=2026-05-03T11:38:01.919+02:00 level=DEBUG source=cache.go:401 msg="created snapshot" offset=3
^Ctime=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:908 msg="shutting down runner" model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d
time=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.6:35b-a3b-coding-nvfp4 runner.size="20.4 GiB" runner.vram="20.4 GiB" runner.parallel=1 runner.pid=96254 runner.model=digest:cd2692a833e66c4c98991b67e9fbaa0bb15a93285baac9240c022f2f40075b6d runner.num_ctx=190000
time=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:161 msg="shutting down scheduler pending loop"
time=2026-05-03T11:38:02.199+02:00 level=DEBUG source=sched.go:287 msg="shutting down scheduler completed loop"
[GIN] 2026/05/03 - 11:38:02 | 500 |  6.202840125s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-03T11:38:02.199+02:00 level=INFO source=client.go:182 msg="stopping mlx runner subprocess" pid=96254
time=2026-05-03T11:38:02.370+02:00 level=DEBUG source=model_recommendations.go:183 msg="stopping model recommendations cache"

Despite parameters, it's using a greater context window:

% ollama ps
NAME                            ID              SIZE     PROCESSOR    CONTEXT    UNTIL            
qwen3.6:35b-a3b-coding-nvfp4    cd2692a833e6    21 GB    100% GPU     262144     2 hours from now

However, with previous models the context window parameter is respected:

% ollama ps
NAME                          ID              SIZE     PROCESSOR    CONTEXT    UNTIL            
qwen3-coder:30b-a3b-q4_K_M    06c1097efce0    37 GB    100% GPU     190000     2 hours from now

Relevant log output

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.22.1

extent analysis

TL;DR

The issue can be resolved by checking the model-specific context window size limits and adjusting the OLLAMA_CONTEXT_LENGTH parameter accordingly.

Guidance

  • Check the model documentation for qwen3.6:35b-a3b-coding-nvfp4 to see if it has a maximum allowed context window size.
  • Verify that the OLLAMA_CONTEXT_LENGTH parameter is being passed correctly to the ollama server by checking the server logs.
  • Adjust the OLLAMA_CONTEXT_LENGTH parameter to a value that is within the allowed limits for the model.
  • Test the updated configuration to ensure that the context window size is being respected.

Example

No code snippet is provided as the issue is related to configuration and model-specific limits.

Notes

The issue seems to be model-specific, and the context window size parameter is being ignored for certain models. The OLLAMA_CONTEXT_LENGTH parameter may need to be adjusted on a per-model basis.

Recommendation

Apply workaround: Adjust the OLLAMA_CONTEXT_LENGTH parameter to a value that is within the allowed limits for the model, and test the updated configuration to ensure that the context window size is being respected. This is because the issue is likely due to model-specific limits, and adjusting the parameter should resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Can't change context windows size of qwen3.6 (mlx runner) [6 comments, 4 participants]