Skip to content
All projects
QueryForge: A SQL Database Engine Built From Scratch
Systemscompleted

QueryForge: A SQL Database Engine Built From Scratch

Mar 20268 weeks
3 vs 18row visits with the index versus without — flip one toggle and watch the cost of a missing index appear in real time on screen.

Overview

Most "database demos" are a thin UI over Postgres. QueryForge is the opposite: the database is the project. Rows live in real 4 KB pages, the index is a real B+tree where every node is a page, and durability comes from a real write-ahead log — built in honest layers, each independently testable. The headline feature is that you can see it all happen: type SQL and the tokenizer, parser, planner, B+tree and pages animate in order, replayed from the engine's own event log.

Tech Stack

engine
TypeScript (strict)B+treeslotted pagesWAL
frontend
Next.jsReactTailwindCodeMirror 6
testing
Vitestfast-check (property-based)

Challenges

  • A balanced tree is an invariant, not a vibe — splits have to propagate correctly all the way up to a growing root, or lookups silently rot.
  • Durability with no database to lean on: surviving a "pull the plug" meant getting write-ahead logging and replay genuinely right.
  • Keeping the on-screen animation honest — it may only ever show events the engine actually emitted, never a pretty approximation.
  • A two-evening heisenbug: stale bytes from a shrunk B+tree node leaking into the next decode, so a leaf "remembered" keys it no longer had.

Solution

The engine is pure TypeScript with zero DOM dependencies, so the exact same code runs in the Vitest suite and in your browser tab. A tracer threads through the pager, B+tree and executor; the playground is a pure replay of that event log, so the visualization literally can't drift from reality. fast-check property tests pin down the parts where correctness is subtle — 10,000 random inserts, every lookup correct, every leaf at the same depth — and shrank that nasty split bug to a four-key counterexample in seconds. The fix was one line: zero the node body before re-serializing.

Outcome

QueryForge runs entirely client-side — you type a query, watch the plan light up, then flip "force full scan" and feel the difference in the row-visit counter. Pull the plug to wipe memory, hit Recover, and the WAL rebuilds every page in front of you. It's the most-used black box in software, opened up — and it's the project that convinced me I finally understand the thing under the thing.

What I'd do differently

The WAL is logical and replays from the start of history with no checkpoints — fine for a teaching engine, painful at scale, so checkpointing is first on the list. I scoped out DELETE/UPDATE, transactions and JOINs on purpose; the Volcano iterator model is already shaped for a nested-loop-then-hash join, so that's the natural next chapter rather than a rewrite.

Built with

TypeScriptNext.jsReactB+treeWALVitestfast-checkCodeMirror