ollama - 💡(How to fix) Fix cmd: add `ollama fit` to recommend compatible models based on available hardware [1 participants]

ollama2026-03-11 01:11:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14771•Fetched 2026-04-08 00:31:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

khalilkhamassi62-oss

Participants

khalilkhamassi62-oss

Timeline (top)

labeled ×1

Code Example

$ ollama fit

Ollama Fit Check
──────────────────────────────────────────────────────────────
  CPU  : linux (amd64)
  RAM  : 22.4 GB free / 31.9 GB total
  GPU  : CUDA NVIDIA RTX 3080  •  9.2 GB free / 10.0 GB total
  Disk : 180.0 GB free  →  /home/user/.ollama/models
──────────────────────────────────────────────────────────────

  ✅  IDEAL — Full GPU inference, fast
  ────────────────────────────────────────────────────────────
  llama3.2:3b          Q4_K_M    2.0 GB    ~82 tok/s  GPU
  phi3:3.8b            Q4_K_M    2.3 GB    ~80 tok/s  GPU
  mistral:7b           Q4_K_M    4.5 GB    ~55 tok/s  GPU
  llama3.1:8b          Q4_K_M    4.9 GB    ~51 tok/s  GPU

  🟡  GOOD — Minor CPU offload
  ────────────────────────────────────────────────────────────
  gemma2:9b            Q4_K_M    5.5 GB    ~38 tok/s  GPU+CPU

---

GET /api/fit
GET /api/fit?tags=code
GET /api/fit?family=qwen&all=true

---

VRAM score:  model fits entirely on GPU → 1.0 (RunMode: GPU)
             partial fit, RAM can offload → 0.25–0.65 (GPU+CPU)
             no GPU or can't fit → 0.0 (CPU)

RAM score:   available ≥ required → 1.0
             total ≥ 85% of required → 0.5 + warning note
             insufficient → 0.0

Disk score:  available ≥ model size → 1.0
             insufficient → 0.0

Speed score: Metal (Apple Silicon 36GB+) → 1.0 / ~120 tok/s
             CUDA SM9+ (H100, RTX 40xx)  → 1.0 / ~150 tok/s
             CUDA SM8  (A100, RTX 30xx)  → 0.85 / ~100 tok/s
             CUDA SM7  (V100, RTX 20xx)  → 0.65 / ~60 tok/s
             ROCm                         → 0.70 / ~70 tok/s
             CPU only                     → 0.15 / ~3 tok/s

Final = VRAM×0.40 + RAM×0.25 + Disk×0.15 + Speed×0.20

Tier:  ≥0.82 → Ideal  |  ≥0.62 → Good  |  ≥0.38 → Marginal
       ≥0.15 → Possible  |  <0.15 or RAM+Disk both 0 → Too Large

---

# No hardware required
go test ./fitcheck/...

# Requires `ollama serve`
ollama fit
ollama fit --tags code --json | jq '.models[0]'
curl http://localhost:11434/api/fit?tags=embed | jq '.models[].req.name'

RAW_BUFFERClick to expand / collapse

Problem

A new Ollama user faces a blank prompt with no guidance on which model to run. Choosing wrong leads to:

Out-of-memory crashes when VRAM is insufficient
Multi-minute load times from unexpected CPU offloading
No way to know in advance whether a 70B model will run at all

There is currently no way to ask Ollama "what can my machine actually run?"

Proposed Solution

A new ollama fit subcommand — and matching GET /api/fit endpoint — that scans the machine and ranks a built-in model catalogue by hardware compatibility.

CLI example:

$ ollama fit

Ollama Fit Check
──────────────────────────────────────────────────────────────
  CPU  : linux (amd64)
  RAM  : 22.4 GB free / 31.9 GB total
  GPU  : CUDA NVIDIA RTX 3080  •  9.2 GB free / 10.0 GB total
  Disk : 180.0 GB free  →  /home/user/.ollama/models
──────────────────────────────────────────────────────────────

  ✅  IDEAL — Full GPU inference, fast
  ────────────────────────────────────────────────────────────
  llama3.2:3b          Q4_K_M    2.0 GB    ~82 tok/s  GPU
  phi3:3.8b            Q4_K_M    2.3 GB    ~80 tok/s  GPU
  mistral:7b           Q4_K_M    4.5 GB    ~55 tok/s  GPU
  llama3.1:8b          Q4_K_M    4.9 GB    ~51 tok/s  GPU

  🟡  GOOD — Minor CPU offload
  ────────────────────────────────────────────────────────────
  gemma2:9b            Q4_K_M    5.5 GB    ~38 tok/s  GPU+CPU

API example:

GET /api/fit
GET /api/fit?tags=code
GET /api/fit?family=qwen&all=true

Startup TUI: A "Fit Check" entry in the ollama menu opens a tabbed screen. Users browse tier tabs with ←/→, select models with space, and press Enter to pull them — without leaving the terminal.

Why This Belongs in Ollama Core

No new hardware detection. The implementation delegates entirely to discover.GPUDevices() and ml.SystemInfo — the same paths the scheduler already uses. Disk space uses syscall.Statfs, which is one syscall.

No new dependencies. Only packages already in go.mod are used.

Follows existing patterns exactly:

Handler is a *Server method in server/routes.go, same as ListHandler, ShowHandler, etc.
Client method in api/client.go follows the same pattern as client.List().
CLI uses the same Cobra + tabwriter pattern as ollama list.
TUI screen is a self-contained bubbletea model injected into the existing state machine — zero changes to the core render loop.

Works offline. The catalogue is static data compiled into the binary. No network calls.

Installed models detected correctly. Uses manifest.Manifests() — the same path as ollama list — to mark already-downloaded models.

Implementation

Working implementation on my fork: https://github.com/khalilkhamassi62-oss/ollama/commit/773609a7

New package fitcheck/:

hardware.go — collects GPU, RAM, disk into HardwareProfile
requirements.go — 165-entry catalogue across 72 model families (Llama, Mistral, Phi, Gemma, Qwen, DeepSeek, Granite, vision, embedding, reasoning models)
scorer.go — 4-component scoring: VRAM fit (40%), RAM headroom (25%), disk space (15%), GPU generation speed class (20%)
disk_unix.go / disk_windows.go — platform disk stats
scorer_test.go — 10 tests, no real hardware required

Scoring Model

VRAM score:  model fits entirely on GPU → 1.0 (RunMode: GPU)
             partial fit, RAM can offload → 0.25–0.65 (GPU+CPU)
             no GPU or can't fit → 0.0 (CPU)

RAM score:   available ≥ required → 1.0
             total ≥ 85% of required → 0.5 + warning note
             insufficient → 0.0

Disk score:  available ≥ model size → 1.0
             insufficient → 0.0

Speed score: Metal (Apple Silicon 36GB+) → 1.0 / ~120 tok/s
             CUDA SM9+ (H100, RTX 40xx)  → 1.0 / ~150 tok/s
             CUDA SM8  (A100, RTX 30xx)  → 0.85 / ~100 tok/s
             CUDA SM7  (V100, RTX 20xx)  → 0.65 / ~60 tok/s
             ROCm                         → 0.70 / ~70 tok/s
             CPU only                     → 0.15 / ~3 tok/s

Final = VRAM×0.40 + RAM×0.25 + Disk×0.15 + Speed×0.20

Tier:  ≥0.82 → Ideal  |  ≥0.62 → Good  |  ≥0.38 → Marginal
       ≥0.15 → Possible  |  <0.15 or RAM+Disk both 0 → Too Large

Alternatives Considered

Alternative	Why not
Recommend models on ollama.com	Leaves the terminal, ignores current free VRAM/RAM
Add requirements to `ollama show`	Only useful after you already know which model
Separate installable tool	Installation friction, splits the UX
Dynamic fetch from ollama.com/library	Network dependency, latency at startup

Open Questions for Maintainers

Should the TUI entry be gated behind OLLAMA_EXPERIMENT=fitcheck initially?
Is GET /api/fit the right path, or would /api/hardware returning just the hardware profile (separate from scoring) be more composable?
Should EstTPS be removed from the JSON response since it is estimated, not measured?

Tests

# No hardware required
go test ./fitcheck/...

# Requires `ollama serve`
ollama fit
ollama fit --tags code --json | jq '.models[0]'
curl http://localhost:11434/api/fit?tags=embed | jq '.models[].req.name'

Tested on: <"Ubuntu 24.04, No GPU, 32GB RAM">

extent analysis

Fix Plan

To implement the ollama fit subcommand and GET /api/fit endpoint, follow these steps:

Create a new package fitcheck/:
- Add hardware.go to collect GPU, RAM, and disk information into a HardwareProfile struct.
- Add requirements.go to define a catalogue of models with their requirements.
- Add scorer.go to calculate a score based on VRAM fit, RAM headroom, disk space, and GPU generation speed class.
- Add disk_unix.go and disk_windows.go for platform-specific disk statistics.
- Add scorer_test.go for unit tests.
Implement the ollama fit subcommand:
- Use the Cobra library to create a new command.
- Call the fitcheck package to scan the machine and rank models by hardware compatibility.
- Print the results in a tabular format.
Implement the GET /api/fit endpoint:
- Create a new handler function in server/routes.go.
- Call the fitcheck package to scan the machine and rank models by hardware compatibility.
- Return the results in JSON format.
Add a TUI entry for the fit check:
- Create a new bubbletea model for the fit check screen.
- Inject the model into the existing state machine.

Example Code

// hardware.go
type HardwareProfile struct {
    CPU    string
    RAM    uint64
    GPU    string
    Disk   uint64
}

func GetHardwareProfile() (*HardwareProfile, error) {
    // Collect hardware information using discover.GPUDevices() and ml.SystemInfo
}

// scorer.go
type Model struct {
    Name    string
    VRAM    uint64
    RAM     uint64
    Disk    uint64
    Speed   float64
}

func CalculateScore(hardware *HardwareProfile, model *Model) float64 {
    // Calculate score based on VRAM fit, RAM headroom, disk space, and GPU generation speed class
}

// ollama_fit.go
func FitCommand() *cobra.Command {
    cmd := &cobra.Command{
        Use:   "fit",
        Short: "Check which models can run on your machine",
        Run: func(cmd *cobra.Command, args []string) {
            hardware, err := GetHardwareProfile()
            if err != nil {
                // Handle error
            }
            models := GetModelCatalogue()
            scores := make(map[string]float64)
            for _, model := range models {
                score := CalculateScore(hardware, model)
                scores[model.Name] = score
            }
            // Print results in a tabular format
        },
    }
    return cmd
}

// server/routes.go
func FitHandler(w http

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix cmd: add `ollama fit` to recommend compatible models based on available hardware [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem

Proposed Solution

Why This Belongs in Ollama Core

Implementation

Scoring Model

Alternatives Considered

Open Questions for Maintainers

Tests

extent analysis

Fix Plan

Example Code

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix cmd: add `ollama fit` to recommend compatible models based on available hardware [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem

Proposed Solution

Why This Belongs in Ollama Core

Implementation

Scoring Model

Alternatives Considered

Open Questions for Maintainers

Tests

extent analysis

Fix Plan

Example Code

Still need to ship something?

RELATED_DISCOVERY

TRENDING