A framework, two applications built on it, and a fourth bet on motion. NeuroStack is the foundation; everything else is what gets built on top.
// scroll to read about each
A brain-inspired memory framework for AI agents. Eight memory layers across four temperature zones, multi-signal retrieval, a compaction agent that runs the lifecycle, and a model router that's local-first by default. Built to be embedded by any agent that needs to remember, not just look things up.
"The most novel part of any agent architecture isn't the model. It's the memory."
// THE PROBLEM I SOLVED
The original architecture — profile table (hot) + episodes/papers table (cold) — looked clean on paper but failed at every junction. Each row below is a real symptom that drove a NeuroStack subsystem.
| Problem | Impact |
|---|---|
| 500-char hot memory | A business card — no room for actual job history |
| no temporal context | Agent had no idea what you'd been doing lately |
| LIKE-only episode search | Semantic similarity completely missing |
| no decay or salience | Trivial note ≡ career decision |
| no episodic → semantic pathway | Insights never distilled into lasting knowledge |
| no re-promotion | Cold memories could never resurface automatically |
| 3 duplicate consolidation paths | Chat / assistant / career each had their own broken impl |
| no dedicated memory agent | Memory management scattered, ad hoc, inconsistent |
// THE THEORY UNDERNEATH
// no langchain. no autogen. no orm. every layer hand-rolled.
// THE ARCHITECTURE
Hot lives in every prompt. Cool retrieves on demand. Archive stays soft-deleted forever.
~1500 chars · stable
Who you ARE — priority-ordered: personal → experience → education → projects → skills. Experience entries never evicted; skills compacted first.
~800 chars · 5-min TTL cache
What you've been DOING — today's sessions + this week's summary. Auto-rebuilt by CompactionAgent.
uncapped
Full job descriptions from resume import — every role, tech stack, achievement, always visible.
last N turns · auto-compress > 30
Verbatim recent conversation. When the buffer crosses 30 messages, the context compressor (qwen3.5:4b local) summarizes older turns to free room.
bi-temporal · ACT-R + A-MEM
Full episodes with salience scores. Semantic search via embeddings + LIKE fallback. Zettelkasten links between related episodes. Spreading activation: top-k → follow links 1-hop.
Kuzu embedded · bi-temporal
(subject, relation, object) facts extracted from episodes. valid_from / valid_to edges — old facts get invalidated, never deleted. SQLite fallback when Kuzu unavailable.
tool / task patterns
What CodeAgent succeeded or failed at. Lessons injected before next similar task. Embedded so similar tasks retrieve relevant priors.
activation < -3.0 · salience < 0.2 · 30+ days idle
Episodes that meet all three thresholds get archived = 1. Still queryable, never garbage-collected. Memory should fade, not disappear.
// MULTI-SIGNAL RETRIEVAL
brain.query() replaces flat cosine ranking.Four signals, weighted. Every memory included in a response gets an access logged → ACT-R activation increases → that memory ranks higher next time, automatically.
// THE FORGETTING-CURVE MATH
ACT-R's base-level activation IS the Ebbinghaus forgetting curve, generalized to multiple accesses. Each access adds a fresh term. Strong memories stay strong; unused ones fade past the archive threshold.
Ebbinghaus 1885 · Anderson & Lebiere 1998 · applied to agent context windows, 2026.
// THE COMPACTION AGENT
Runs every 5 minutes asynchronously. Replaces the old _embed_loop. Ten jobs, one process, no scattered consolidation paths.
// SALIENCE GATE — fires at every brain.save_episode()
A background thread embeds the episode, computes surprise (1 − cosine_sim vs. recent mean), scores importance via gpt-5-nano (1-10), and writes:
salience = 0.5 × surprise + 0.5 × (importance / 10)
// MODEL ROUTING — LOCAL-FIRST
Every LLM call goes through ModelRouter.call(). On model-unavailable / deprecated errors it walks the fallback chain — no code changes when OpenAI deprecates a model.
| Tier | Model | Backend | Tasks |
|---|---|---|---|
| LOCAL · NANO | qwen3.5:4b | Ollama · on-device · free | salience scoring, intent classification, entity / profile extraction, context compression |
| LONG_CTX | gpt-4.1 | OpenAI · 1M context | document ingestion (entire resume in one call) |
| STANDARD | gpt-5.2 | OpenAI | trends, paper chat, briefings, assistant chat, career, code generation |
| PRO | gpt-5.2 | OpenAI | memory consolidation, reflection, re-promotion decisions |
NANO fallback chain
qwen3.5:4b → gpt-5-nano → gpt-5-mini → gpt-4o-mini
STANDARD fallback chain
gpt-5.2 → gpt-5.1 → gpt-5 → gpt-4.1 → gpt-4o
A separate ModelCheckAgent runs at startup & weekly: tests every known OpenAI model with both max_tokens and max_completion_tokens, detects which parameter each requires, persists results to model_config.json and the model_availability table.
// THE API SURFACE
An application embedding NeuroStack only sees four primitives: write an episode, query for relevant memory, configure the model router, and let the compaction agent run. Everything else — decay, salience, consolidation, re-promotion, embeddings, graph extraction — is handled in the background.
The first thing I built on NeuroStack. A personal autonomous research agent that runs silently in the system tray on macOS & Windows — researches papers continuously, briefs me every morning, and remembers everything we've ever discussed.
// WHAT IT DOES
// HOW IT USES NEUROSTACK
Research Agent is mostly UI and orchestration. Every paper found, every chat turn, every reflection lands in NeuroStack. Every recall — "what did I read about KV-cache last month?" — runs through brain.query(). The agent doesn't manage memory. It uses memory.
// THREADING MODEL · 7 DAEMONS
Main tkinter root + ui_queue (100ms) Daemon 1 pystray system tray Thread 2 ResearchAgent — search every 2h Daemon 3 TrendAgent — every 4h Daemon 4 BriefingAgent — checks every 60s Daemon 5 CompactionAgent — every 5 min Daemon 6 ModelCheckAgent — weekly Daemon 7 LocalModelWarmup — qwen3.5:4b
UI calls dispatched via ui_queue → main thread. Blocking calls use threading.Event.
The next thing on NeuroStack — a fully local agent runtime. No OpenAI fallback, no Anthropic, no provider lock-in. Just NeuroStack memory + Ollama / llama.cpp / MLX, running entirely on your machine.
// WHY
// REQUEST FLOW
The missing layer between pose landmarks and movement understanding. A platform that generates the right movement analyzer for the right task, on demand.
Every action has a pattern over time. A movement isn't a set of frames — it's a repeatable signature that can be captured, scored, and compared.
// THE STATUS QUO
// THE GAP
Pose landmarks tell you where the joints are. They don't tell you whether the squat was deep enough, whether the elbow flared on the curl, or whether the rehab patient is making progress.
A user describes the movement they want tracked. An agent network generates the analyzer. A sandbox validates it. The runtime stays fast and deterministic.
// WHY IT MATTERS
Today it can understand a curl. Tomorrow rehab, sport, skill, and performance.
Fitness coaching, rep quality, real-time form correction.
Visit-to-visit guidance, longitudinal progress tracking.
Movement consistency, symmetry, ergonomic workflows.
// COLLABORATE
I'm always up to compare notes on agent memory, model routing, or motion analytics. Reach out — happy to share what's working and what's not.
Get in touch