ollama - 💡(How to fix) Fix [Performance Disclosure] Temple C-Runtime: 41x faster TTFB and 59x storage reduction vs pgvector [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15816Fetched 2026-04-26 05:06:01
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
commented ×1mentioned ×1subscribed ×1

I am disclosing a performance breakthrough in LLM inference and storage architecture. Temple is a C-backed runtime boundary inspired by the 'motorbike' philosophy of Terry Davis.

I have released a public Proof Harness to verify these claims, while the core optimization engine remains a private suite.

Root Cause

By moving the validation and monolith path into a dedicated C engine, Temple bypasses the "Python tax" and standard vector bloat. This is not just a benchmark; it is a validated path to running high-accuracy models on edge hardware with near-zero latency.

Repo: https://github.com/xxartfulxx/Temple

i will shortly show a proof page demo of the rest of the suite i have built.

CC: @jmorganca

RAW_BUFFERClick to expand / collapse

Summary

I am disclosing a performance breakthrough in LLM inference and storage architecture. Temple is a C-backed runtime boundary inspired by the 'motorbike' philosophy of Terry Davis.

I have released a public Proof Harness to verify these claims, while the core optimization engine remains a private suite.

Benchmarked Results (Audited)

On the frozen_v1_x128 dataset, Temple outperforms the pgvector baseline significantly:

Metricpgvector (Baseline)Temple (Private Suite)Advantage
Startup (TTFB)0.414s0.010s41x Faster
Storage Footprint1,073 KB18 KB59x Smaller
Data I/O (Read)389 KB16 KB23x Less I/O

HF-backed scaling proof:

~10k train examples Source: comparison.md

Train examples: 9,984 Temple first batch: 0.060s Baseline first batch: 0.432s Temple throughput: 2354.376 ex/s Baseline throughput: 420.715 ex/s Temple token throughput: 46631.687 tok/s Baseline token throughput: 8039.136 tok/s Temple total train time: 4.241s Baseline total train time: 24.163s Temple padding waste: 0.019 Baseline padding waste: 0.130 Temple bytes read: 217,938 Baseline bytes read: 5,559,424 Temple storage: 234,190 Baseline storage: 6,447,104 Eval exact match: 0.0 / 0.0 ~100k train examples Source: comparison.md

Train examples: 100,608 Temple first batch: 0.050s Baseline first batch: 2.434s Temple throughput: 4986.644 ex/s Baseline throughput: 515.727 ex/s Temple token throughput: 96738.244 tok/s Baseline token throughput: 9612.227 tok/s Temple total train time: 20.175s Baseline total train time: 197.514s Temple padding waste: 0.018 Baseline padding waste: 0.129 Temple bytes read: 2,196,046 Baseline bytes read: 56,122,496 Temple storage: 2,359,231 Baseline storage: 60,243,968 Eval exact match: 0.0 / 0.0 What the curve shows

Temple startup stayed near-zero: 0.060s -> 0.050s Temple throughput increased: 2354 -> 4987 ex/s Temple padding waste stayed low: 0.019 -> 0.018 Baseline startup got much worse: 0.432s -> 2.434s Temple kept a very large bytes-read and storage advantage

The "Temple" Architecture

  1. The Public Proof (Available now): The GitHub repository contains the src/temple.c runtime layer and the proof/ harness. This allows anyone to verify the accuracy jumps (98% on gemma2:2b) and the speed-to-correct metrics using the public interface.
  2. The Private Suite: The logic responsible for the 90% model storage reduction and the 0.01s startup is contained in a separate, private runtime suite. This suite handles live training and immutable version review.

Why This Matters

By moving the validation and monolith path into a dedicated C engine, Temple bypasses the "Python tax" and standard vector bloat. This is not just a benchmark; it is a validated path to running high-accuracy models on edge hardware with near-zero latency.

Repo: https://github.com/xxartfulxx/Temple

i will shortly show a proof page demo of the rest of the suite i have built.

CC: @jmorganca

extent analysis

TL;DR

To achieve significant performance breakthroughs in LLM inference and storage architecture, as demonstrated by Temple, consider implementing a C-backed runtime boundary that optimizes startup time, storage footprint, and data I/O.

Guidance

  • Review the Temple architecture and its public Proof Harness to understand how it achieves accuracy jumps and speed-to-correct metrics.
  • Investigate the use of a dedicated C engine to bypass the "Python tax" and standard vector bloat, as seen in Temple's private suite.
  • Consider the implications of moving validation and monolith paths into a dedicated C engine for running high-accuracy models on edge hardware with near-zero latency.
  • Explore the Temple repository (https://github.com/xxartfulxx/Temple) for more information on the src/temple.c runtime layer and the proof/ harness.

Notes

The provided information focuses on the performance breakthroughs achieved by Temple and does not explicitly state a problem to be solved. Therefore, the guidance is centered around understanding and potentially implementing similar optimizations.

Recommendation

Apply the principles demonstrated by Temple's architecture, such as using a C-backed runtime boundary and optimizing for startup time, storage footprint, and data I/O, to achieve significant performance improvements in LLM inference and storage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING