ollama - ✅(Solved) Fix Add OLLAMA_NO_FILE_FRAGMENTATION option to prevent severe NTFS file fragmentation on Windows [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#16220Fetched 2026-05-20 03:39:32
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
commented ×1cross-referenced ×1labeled ×1

Error Message

@@ -97,7 +97,7 @@ func (p *blobDownloadPart) UnmarshalJSON(b []byte) error { Executing this command without an elevated prompt throws an Access is denied error. Forcing users to run the background Ollama server engine as an Administrator to avoid fragmentation represents an unacceptable security risk.

Root Cause

Root Cause & File System Mechanics:

Fix Action

Fix / Workaround

I've specifically tested this and confirmed the 0 fragments on that file (via Defraggler):

diff --git a/server/download.go b/server/download.go
index 0019fa13..c3dc39ed 100644
--- a/server/download.go
+++ b/server/download.go
@@ -97,7 +97,7 @@ func (p *blobDownloadPart) UnmarshalJSON(b []byte) error {
 }

PR fix notes

PR #16032: mlx: defer runtime initialization until imagegen use

Description (problem / solution / changelog)

Summary

This defers MLX runtime initialization until the image generation runner actually needs MLX work. Fixes #15481

Regular Ollama startup currently imports the imagegen packages and can reach MLX package initialization, which loads MLX and creates the global random state. On Windows systems with only an integrated GPU, or systems that only use cloud models and do not have a usable CUDA runtime, that startup-time MLX/CUDA work can fail before the user gets to use non-MLX paths.

This is not intended to make integrated GPUs support MLX image generation. It keeps MLX/CUDA runtime work out of ordinary CLI/server startup so cloud-only and non-imagegen users are not blocked by a local MLX runtime failure.

Changes

  • Split MLX initialization into library/symbol loading (InitMLX) and runtime work (InitRuntime).
  • Removed MLX runtime work from package initialization.
  • Updated imagegen runner entry points to call InitRuntime explicitly.
  • Updated MLX and NN tests to initialize the runtime before exercising MLX operations.

Validation

  • git diff --check HEAD~1..HEAD
  • go test -run '^$' ./cmd
  • go test -run '^$' ./runner
  • go test -run '^$' ./server
  • go test -run '^$' ./x/imagegen/mlx ./x/imagegen/nn ./x/imagegen/safetensors ./x/imagegen/models/...
  • go run . --version
  • OLLAMA_LIBRARY_PATH=%LOCALAPPDATA%\Programs\Ollama\lib\ollama go run . --version

Changed files

  • x/imagegen/cmd/engine/main.go (modified, +3/-3)
  • x/imagegen/mlx/mlx.go (modified, +43/-23)
  • x/imagegen/mlx/mlx_test.go (modified, +5/-5)
  • x/imagegen/nn/nn_test.go (modified, +1/-1)
  • x/imagegen/runner.go (modified, +3/-2)

Code Example

func setSparse(file *os.File) {
	// exFat (and other FS types) don't support sparse files, so ignore errors
	windows.DeviceIoControl( //nolint:errcheck
		windows.Handle(file.Fd()), windows.FSCTL_SET_SPARSE,
		nil, 0,
		nil, 0,
		nil, nil,
	)
}

---

diff --git a/server/download.go b/server/download.go
index 0019fa13..c3dc39ed 100644
--- a/server/download.go
+++ b/server/download.go
@@ -97,7 +97,7 @@ func (p *blobDownloadPart) UnmarshalJSON(b []byte) error {
 }
 
 const (
-	numDownloadParts          = 16
+	numDownloadParts          = 1
 	minDownloadPartSize int64 = 100 * format.MegaByte
 	maxDownloadPartSize int64 = 1000 * format.MegaByte
 )
diff --git a/server/sparse_windows.go b/server/sparse_windows.go
index f21cbbda..790ba9af 100644
--- a/server/sparse_windows.go
+++ b/server/sparse_windows.go
@@ -3,15 +3,15 @@ package server
 import (
 	"os"
 
-	"golang.org/x/sys/windows"
+//	"golang.org/x/sys/windows"
 )
 
 func setSparse(file *os.File) {
-	// exFat (and other FS types) don't support sparse files, so ignore errors
-	windows.DeviceIoControl( //nolint:errcheck
-		windows.Handle(file.Fd()), windows.FSCTL_SET_SPARSE,
-		nil, 0,
-		nil, 0,
-		nil, nil,
-	)
+//	// exFat (and other FS types) don't support sparse files, so ignore errors
+//	windows.DeviceIoControl( //nolint:errcheck
+//		windows.Handle(file.Fd()), windows.FSCTL_SET_SPARSE,
+//		nil, 0,
+//		nil, 0,
+//		nil, nil,
+//	)
 }

---

fsutil file createnew <filename>
fsutil file setvaliddata <filename> <size_in_bytes>
fsutil file queryValidData <filename>
RAW_BUFFERClick to expand / collapse

(Note: AI was used to write most of this)

Description:

When downloading large model layers (-partial files) on Windows (ie. ollama pull granite4.1:30b-q8_0 which is 31GB), the file system experiences extreme file fragmentation, thus future model loading is slow due to random reads required to be done due to fragmentation. Even when forcing the downloader to a single thread (numDownloadParts = 1 instead of 16), interrupting a 31GB download after just 3GB reveals over 859 individual file fragments on an NTFS volume. But typically larger files were seen having tens of thousands of fragments which slow down reads by 10x or more at the time ollama tries to load the model from disk (ollama run ...) (and due to the default OLLAMA_LOAD_TIMEOUT=5m0s will typically timeout before the model loading finishes, at least on a 56GB one, for me).

While the performance penalty of this layout is masked on high-speed NVMe/SSDs due to near-zero hardware seek times, it causes severe read/write degradation on mechanical drives (HDDs). Furthermore, creating ~8,500+ physical extents for a single 31GB file creates massive overhead within the Windows Master File Table (MFT) entirely unnecessarily.

Root Cause & File System Mechanics:

The fragmentation is caused by the unconditional invocation of FSCTL_SET_SPARSE inside server/sparse_windows.go (and in the future x/transfer/sparse_windows.go no doubt, didn't test tho) (in addition to having numDownloadParts = 16 instead of 1):

func setSparse(file *os.File) {
	// exFat (and other FS types) don't support sparse files, so ignore errors
	windows.DeviceIoControl( //nolint:errcheck
		windows.Handle(file.Fd()), windows.FSCTL_SET_SPARSE,
		nil, 0,
		nil, 0,
		nil, nil,
	)
}

The codebase uses sparse files to accommodate the default 16-thread concurrent downloader. Because threads write blocks out of order, the sparse attribute allows the file's logical size to be truncated to its full length without forcing the Windows kernel to block operations by auto-prefilling the gaps with zeroes (for security reasons).

However, flagging a file with FSCTL_SET_SPARSE causes NTFS to completely discard its sequential clustering optimizations. NTFS assumes the application will perform random, scattered writes, so it allocates physical storage in tiny, fragmented extents (averaging one fragment every ~3.5MB in testing) even if the data stream arrives linearly(that's why numDownloadParts = 1 isn't enough on its own, it also needs no sparseness).

The Interdependency Trap:

Simply bypassing SetSparse while maintaining a multi-threaded download is not a viable solution(I haven't tested this variant tho). If a standard (non-sparse) file is truncated to its full size on Windows and an application attempts to write out-of-order chunks (e.g., writing at the 15GB offset before the 1GB offset is filled), the Windows kernel will automatically block and zero-fill the entire unwritten gap in the background to enforce security boundaries(more below). On a 30GB+ model, this native filesystem zero-filling causes severe I/O stalls and system stutter.

Therefore, to achieve a zero-fragment layout without triggering background OS zero-fill stalls, the download pipeline must be both non-sparse AND strictly single-threaded/sequential.

Proposed Solution:

Introduce a new environment variable: OLLAMA_NO_FILE_FRAGMENTATION=1 (or OLLAMA_PREVENT_FILE_FRAGMENTATION=1 or (less clear) OLLAMA_HDD_MODE=1).

When this flag is active, it enforces a specialized, contiguous file-allocation mode by coupling two mandatory behaviors under the hood:

  1. It forces numDownloadParts = 1 inside server/download.go to ensure network blocks are requested end-to-end linearly.
  2. It acts as a conditional guard to completely skip the setSparse() call in sparse_windows.go, allowing the -partial payload to be created as a normal file, ensuring that NTFS allocates space contiguously.

When these two conditions are met simultaneously, NTFS recognizes the linear, sequential write pattern from byte 0 onward. It naturally maps physical clusters consecutively ahead of the growing write pointer, achieving 0 fragments over a massive file payload without requiring elevated privileges or causing background zero-fill delays.

I've specifically tested this and confirmed the 0 fragments on that file (via Defraggler):

diff --git a/server/download.go b/server/download.go
index 0019fa13..c3dc39ed 100644
--- a/server/download.go
+++ b/server/download.go
@@ -97,7 +97,7 @@ func (p *blobDownloadPart) UnmarshalJSON(b []byte) error {
 }
 
 const (
-	numDownloadParts          = 16
+	numDownloadParts          = 1
 	minDownloadPartSize int64 = 100 * format.MegaByte
 	maxDownloadPartSize int64 = 1000 * format.MegaByte
 )
diff --git a/server/sparse_windows.go b/server/sparse_windows.go
index f21cbbda..790ba9af 100644
--- a/server/sparse_windows.go
+++ b/server/sparse_windows.go
@@ -3,15 +3,15 @@ package server
 import (
 	"os"
 
-	"golang.org/x/sys/windows"
+//	"golang.org/x/sys/windows"
 )
 
 func setSparse(file *os.File) {
-	// exFat (and other FS types) don't support sparse files, so ignore errors
-	windows.DeviceIoControl( //nolint:errcheck
-		windows.Handle(file.Fd()), windows.FSCTL_SET_SPARSE,
-		nil, 0,
-		nil, 0,
-		nil, nil,
-	)
+//	// exFat (and other FS types) don't support sparse files, so ignore errors
+//	windows.DeviceIoControl( //nolint:errcheck
+//		windows.Handle(file.Fd()), windows.FSCTL_SET_SPARSE,
+//		nil, 0,
+//		nil, 0,
+//		nil, nil,
+//	)
 }

(compiled with: go mod vendor then go build -mod=vendor -ldflags="-w -s -X=github.com/ollama/ollama/version.Version=custom-mod -X=github.com/ollama/ollama/server.mode=release" -o ollama.exe . and replaced the original exe) Avg. download speed was 344.44Mbps(for that 31GB) which is on par with typical download speeds I've seen before when sparse and 16 parts(tho I don't remember the speed for this specific model, before), or at worst it's either 150Mbps less, or about 100Mbps more depending on the situation(such as when a model was actively replying in the background).

Alternative Approaches Dismissed:

  1. Using SetFileValidData (Equivalent to fsutil file setvaliddata): While this method avoids the OS-level zero-fill requirement on standard files by advancing the Valid Data Length (VDL) to match the allocated size, it strictly requires elevated Administrator privileges (SE_MANAGE_VOLUME_NAME). This can be verified on the command line via:
fsutil file createnew <filename>
fsutil file setvaliddata <filename> <size_in_bytes>
fsutil file queryValidData <filename>

Executing this command without an elevated prompt throws an Access is denied error. Forcing users to run the background Ollama server engine as an Administrator to avoid fragmentation represents an unacceptable security risk. 2. Disabling sparse file allocation alone (Equivalent to omitting fsutil sparse setflag): If a file is created normally without the sparse attribute (e.g., pre-sizing a standard file using fsutil file seteof <filename> <size_in_bytes>), but it is still handed off to a multi-threaded downloader, severe disk degradation occurs. Because threads write chunks out of order (for instance, writing at the 20GB mark before the 5GB mark is filled), the Windows kernel is forced to automatically block the I/O loop and zero-fill the unwritten gaps in the background to maintain OS security boundaries. On 30GB+ models, this mandatory filesystem-level zero-filling causes massive write overhead, severe disk locking, and system stuttering.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING