Synthesis: An Autonomous Multi-Agent Research Engine

AIcompleted

Synthesis: An Autonomous Multi-Agent Research Engine

Apr 202610 weeks

0 keysthe entire planner → researcher → synthesizer → critic pipeline runs fully offline with no API keys and no containers — every dependency has a real fallback behind the same interface.

Live demo The build story Source

Overview

Synthesis is what happens when you point a "don't-trust-output-you-can't-trace" instinct straight at language models. The product is the visualization of the agent graph: a planner decomposes your question into sub-questions, a pool of researchers chases each one down in parallel (search → fetch → chunk → embed → retrieve), a synthesizer writes a cited report, and a critic verifies every claim against its sources — all streamed to a live control room over SSE.

Tech Stack

orchestration

concurrent event busServer-Sent Events

agents

PlannerResearchers (parallel)SynthesizerCritic

llm

Google Geminimock provider (offline)

data

SearXNGPostgres + pgvectorembeddings

frontend

Next.js (App Router)Tailwind v4

Challenges

Multiple researchers run at once, but SSE is a single ordered stream — fanning parallel agents into one timeline without races was the biggest time sink.
A model will happily write "[3]" after any sentence; citations had to be verified against the actual cited text, not taken on faith.
Knowing when all the concurrent producers were genuinely done — completion under load is deceptively hard.
Making the offline mode a real path, not a toy — the same orchestration, just with different leaves.

Solution

Agents never touch React; they emit typed RunEvents onto a many-producer / single-consumer event bus, and the SSE route drains it, so the UI is a pure projection of what the graph actually did. Citation integrity is a first-class check — the critic re-reads each claim against the passages it cites, labels confidence (supported / single-source / disputed), and can trigger a bounded revise loop. Every capability (LLM, search, embeddings, vector store) is an interface with a real and a mock implementation, selected at the boundary by env, so the agents only ever see the interface. There's even an MCP server exposing the research tools.

Outcome

Point it at a question and the lanes fill in real time — tool calls landing, sources registering, a report typing itself with inline citations, a per-claim confidence ledger, and an interactive evidence graph — then a shareable /run/<id> URL and a token + USD cost meter at the end. Dangling citations get caught and surfaced, so the confidence labels come from verification, not the model's own self-assessment.

What I'd do differently

SSE is one-way, so "stop" is a client-side AbortController rather than a real server signal — WebSockets would buy genuine mid-run control. The in-memory vector store is perfect for demos but pgvector is the path to anything that survives a restart, and the bounded revise loop is a cost-versus-quality dial, not a guarantee.

Built with

Next.jsTypeScriptSSEGoogle GeminiSearXNGpgvectorMCPVitestzod