ollama - 💡(How to fix) Fix cmd: add `ollama fit` to recommend compatible models based on available hardware [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14771Fetched 2026-04-08 00:31:51
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
2
Timeline (top)
labeled ×1

Code Example

$ ollama fit

Ollama Fit Check
──────────────────────────────────────────────────────────────
  CPU  : linux (amd64)
  RAM  : 22.4 GB free / 31.9 GB total
  GPU  : CUDA NVIDIA RTX 30809.2 GB free / 10.0 GB total
  Disk : 180.0 GB free  →  /home/user/.ollama/models
──────────────────────────────────────────────────────────────

IDEALFull GPU inference, fast
  ────────────────────────────────────────────────────────────
  llama3.2:3b          Q4_K_M    2.0 GB    ~82 tok/s  GPU
  phi3:3.8b            Q4_K_M    2.3 GB    ~80 tok/s  GPU
  mistral:7b           Q4_K_M    4.5 GB    ~55 tok/s  GPU
  llama3.1:8b          Q4_K_M    4.9 GB    ~51 tok/s  GPU

  🟡  GOODMinor CPU offload
  ────────────────────────────────────────────────────────────
  gemma2:9b            Q4_K_M    5.5 GB    ~38 tok/s  GPU+CPU

---

GET /api/fit
GET /api/fit?tags=code
GET /api/fit?family=qwen&all=true

---

VRAM score:  model fits entirely on GPU1.0 (RunMode: GPU)
             partial fit, RAM can offload → 0.250.65 (GPU+CPU)
             no GPU or can't fit → 0.0 (CPU)

RAM score:   available ≥ required → 1.0
             total ≥ 85% of required → 0.5 + warning note
             insufficient → 0.0

Disk score:  available ≥ model size → 1.0
             insufficient → 0.0

Speed score: Metal (Apple Silicon 36GB+)1.0 / ~120 tok/s
             CUDA SM9+ (H100, RTX 40xx)1.0 / ~150 tok/s
             CUDA SM8  (A100, RTX 30xx)0.85 / ~100 tok/s
             CUDA SM7  (V100, RTX 20xx)0.65 / ~60 tok/s
             ROCm0.70 / ~70 tok/s
             CPU only                     → 0.15 / ~3 tok/s

Final = VRAM×0.40 + RAM×0.25 + Disk×0.15 + Speed×0.20

Tier:0.82Ideal  |0.62Good  |0.38Marginal
0.15Possible  |  <0.15 or RAM+Disk both 0Too Large

---

# No hardware required
go test ./fitcheck/...

# Requires `ollama serve`
ollama fit
ollama fit --tags code --json | jq '.models[0]'
curl http://localhost:11434/api/fit?tags=embed | jq '.models[].req.name'
RAW_BUFFERClick to expand / collapse

Problem

A new Ollama user faces a blank prompt with no guidance on which model to run. Choosing wrong leads to:

  • Out-of-memory crashes when VRAM is insufficient
  • Multi-minute load times from unexpected CPU offloading
  • No way to know in advance whether a 70B model will run at all

There is currently no way to ask Ollama "what can my machine actually run?"

Proposed Solution

A new ollama fit subcommand — and matching GET /api/fit endpoint — that scans the machine and ranks a built-in model catalogue by hardware compatibility.

CLI example:

$ ollama fit

Ollama Fit Check
──────────────────────────────────────────────────────────────
  CPU  : linux (amd64)
  RAM  : 22.4 GB free / 31.9 GB total
  GPU  : CUDA NVIDIA RTX 30809.2 GB free / 10.0 GB total
  Disk : 180.0 GB free  →  /home/user/.ollama/models
──────────────────────────────────────────────────────────────

  ✅  IDEAL — Full GPU inference, fast
  ────────────────────────────────────────────────────────────
  llama3.2:3b          Q4_K_M    2.0 GB    ~82 tok/s  GPU
  phi3:3.8b            Q4_K_M    2.3 GB    ~80 tok/s  GPU
  mistral:7b           Q4_K_M    4.5 GB    ~55 tok/s  GPU
  llama3.1:8b          Q4_K_M    4.9 GB    ~51 tok/s  GPU

  🟡  GOOD — Minor CPU offload
  ────────────────────────────────────────────────────────────
  gemma2:9b            Q4_K_M    5.5 GB    ~38 tok/s  GPU+CPU

API example:

GET /api/fit
GET /api/fit?tags=code
GET /api/fit?family=qwen&all=true

Startup TUI: A "Fit Check" entry in the ollama menu opens a tabbed screen. Users browse tier tabs with ←/→, select models with space, and press Enter to pull them — without leaving the terminal.

Why This Belongs in Ollama Core

No new hardware detection. The implementation delegates entirely to discover.GPUDevices() and ml.SystemInfo — the same paths the scheduler already uses. Disk space uses syscall.Statfs, which is one syscall.

No new dependencies. Only packages already in go.mod are used.

Follows existing patterns exactly:

  • Handler is a *Server method in server/routes.go, same as ListHandler, ShowHandler, etc.
  • Client method in api/client.go follows the same pattern as client.List().
  • CLI uses the same Cobra + tabwriter pattern as ollama list.
  • TUI screen is a self-contained bubbletea model injected into the existing state machine — zero changes to the core render loop.

Works offline. The catalogue is static data compiled into the binary. No network calls.

Installed models detected correctly. Uses manifest.Manifests() — the same path as ollama list — to mark already-downloaded models.

Implementation

Working implementation on my fork: https://github.com/khalilkhamassi62-oss/ollama/commit/773609a7

New package fitcheck/:

  • hardware.go — collects GPU, RAM, disk into HardwareProfile
  • requirements.go — 165-entry catalogue across 72 model families (Llama, Mistral, Phi, Gemma, Qwen, DeepSeek, Granite, vision, embedding, reasoning models)
  • scorer.go — 4-component scoring: VRAM fit (40%), RAM headroom (25%), disk space (15%), GPU generation speed class (20%)
  • disk_unix.go / disk_windows.go — platform disk stats
  • scorer_test.go — 10 tests, no real hardware required

Scoring Model

VRAM score:  model fits entirely on GPU → 1.0 (RunMode: GPU)
             partial fit, RAM can offload → 0.25–0.65 (GPU+CPU)
             no GPU or can't fit → 0.0 (CPU)

RAM score:   available ≥ required → 1.0
             total ≥ 85% of required → 0.5 + warning note
             insufficient → 0.0

Disk score:  available ≥ model size → 1.0
             insufficient → 0.0

Speed score: Metal (Apple Silicon 36GB+) → 1.0 / ~120 tok/s
             CUDA SM9+ (H100, RTX 40xx)  → 1.0 / ~150 tok/s
             CUDA SM8  (A100, RTX 30xx)  → 0.85 / ~100 tok/s
             CUDA SM7  (V100, RTX 20xx)  → 0.65 / ~60 tok/s
             ROCm                         → 0.70 / ~70 tok/s
             CPU only                     → 0.15 / ~3 tok/s

Final = VRAM×0.40 + RAM×0.25 + Disk×0.15 + Speed×0.20

Tier:  ≥0.82 → Ideal  |  ≥0.62 → Good  |  ≥0.38 → Marginal
       ≥0.15 → Possible  |  <0.15 or RAM+Disk both 0 → Too Large

Alternatives Considered

AlternativeWhy not
Recommend models on ollama.comLeaves the terminal, ignores current free VRAM/RAM
Add requirements to ollama showOnly useful after you already know which model
Separate installable toolInstallation friction, splits the UX
Dynamic fetch from ollama.com/libraryNetwork dependency, latency at startup

Open Questions for Maintainers

  1. Should the TUI entry be gated behind OLLAMA_EXPERIMENT=fitcheck initially?
  2. Is GET /api/fit the right path, or would /api/hardware returning just the hardware profile (separate from scoring) be more composable?
  3. Should EstTPS be removed from the JSON response since it is estimated, not measured?

Tests

# No hardware required
go test ./fitcheck/...

# Requires `ollama serve`
ollama fit
ollama fit --tags code --json | jq '.models[0]'
curl http://localhost:11434/api/fit?tags=embed | jq '.models[].req.name'

Tested on: <"Ubuntu 24.04, No GPU, 32GB RAM">

extent analysis

Fix Plan

To implement the ollama fit subcommand and GET /api/fit endpoint, follow these steps:

  1. Create a new package fitcheck/:

    • Add hardware.go to collect GPU, RAM, and disk information into a HardwareProfile struct.
    • Add requirements.go to define a catalogue of models with their requirements.
    • Add scorer.go to calculate a score based on VRAM fit, RAM headroom, disk space, and GPU generation speed class.
    • Add disk_unix.go and disk_windows.go for platform-specific disk statistics.
    • Add scorer_test.go for unit tests.
  2. Implement the ollama fit subcommand:

    • Use the Cobra library to create a new command.
    • Call the fitcheck package to scan the machine and rank models by hardware compatibility.
    • Print the results in a tabular format.
  3. Implement the GET /api/fit endpoint:

    • Create a new handler function in server/routes.go.
    • Call the fitcheck package to scan the machine and rank models by hardware compatibility.
    • Return the results in JSON format.
  4. Add a TUI entry for the fit check:

    • Create a new bubbletea model for the fit check screen.
    • Inject the model into the existing state machine.

Example Code

// hardware.go
type HardwareProfile struct {
    CPU    string
    RAM    uint64
    GPU    string
    Disk   uint64
}

func GetHardwareProfile() (*HardwareProfile, error) {
    // Collect hardware information using discover.GPUDevices() and ml.SystemInfo
}

// scorer.go
type Model struct {
    Name    string
    VRAM    uint64
    RAM     uint64
    Disk    uint64
    Speed   float64
}

func CalculateScore(hardware *HardwareProfile, model *Model) float64 {
    // Calculate score based on VRAM fit, RAM headroom, disk space, and GPU generation speed class
}

// ollama_fit.go
func FitCommand() *cobra.Command {
    cmd := &cobra.Command{
        Use:   "fit",
        Short: "Check which models can run on your machine",
        Run: func(cmd *cobra.Command, args []string) {
            hardware, err := GetHardwareProfile()
            if err != nil {
                // Handle error
            }
            models := GetModelCatalogue()
            scores := make(map[string]float64)
            for _, model := range models {
                score := CalculateScore(hardware, model)
                scores[model.Name] = score
            }
            // Print results in a tabular format
        },
    }
    return cmd
}

// server/routes.go
func FitHandler(w http

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix cmd: add `ollama fit` to recommend compatible models based on available hardware [1 participants]