Charith Purushotham

// LAB · ACTIVE EXPERIMENTS · MAY 2026

What I'm Building

A framework, two applications built on it, and a fourth bet on motion. NeuroStack is the foundation; everything else is what gets built on top.

// scroll to read about each

PROJECT 01 · THE FRAMEWORK in development

NeuroStack

A brain-inspired memory framework for AI agents. Eight memory layers across four temperature zones, multi-signal retrieval, a compaction agent that runs the lifecycle, and a model router that's local-first by default. Built to be embedded by any agent that needs to remember, not just look things up.

Python framework SQLite · 28 tables Kuzu graph DB ACT-R · A-MEM · CLS no LangChain · no AutoGen · no ORM

"The most novel part of any agent architecture isn't the model. It's the memory."

// THE PROBLEM I SOLVED

v0 was a flat 2-tier store. It broke in 8 ways.

The original architecture — profile table (hot) + episodes/papers table (cold) — looked clean on paper but failed at every junction. Each row below is a real symptom that drove a NeuroStack subsystem.

ProblemImpact
500-char hot memoryA business card — no room for actual job history
no temporal contextAgent had no idea what you'd been doing lately
LIKE-only episode searchSemantic similarity completely missing
no decay or salienceTrivial note ≡ career decision
no episodic → semantic pathwayInsights never distilled into lasting knowledge
no re-promotionCold memories could never resurface automatically
3 duplicate consolidation pathsChat / assistant / career each had their own broken impl
no dedicated memory agentMemory management scattered, ad hoc, inconsistent

// THE THEORY UNDERNEATH

Built from scratch. Inspired by six papers.

CLS
Complementary Learning Systems
McClelland et al — neuroscience: hippocampus + neocortex split
A-MEM
Zettelkasten for agents
cosine-linked memory notes; powers L3 episode_links
Zep / Graphiti
Bi-temporal knowledge graphs
valid_from / valid_to edges; powers L4 entity facts
MemGPT
Virtual context management
paging hot memory in/out; powers L1 trim_working_memory
ACT-R
Base-level activation
Anderson 1998 — the math of retrieval-strengthened memory
Generative Agents
Reflection
Park et al — periodic insight generation; powers maybe_reflect

// no langchain. no autogen. no orm. every layer hand-rolled.

// THE ARCHITECTURE

Eight layers. Four temperature zones.

Hot lives in every prompt. Cool retrieves on demand. Archive stays soft-deleted forever.

HOT MEMORY

// injected into every prompt · zero latency
L1A
Identity Block

~1500 chars · stable

Who you ARE — priority-ordered: personal → experience → education → projects → skills. Experience entries never evicted; skills compacted first.

L1B
Temporal Context Window

~800 chars · 5-min TTL cache

What you've been DOING — today's sessions + this week's summary. Auto-rebuilt by CompactionAgent.

L1C
Experience Details

uncapped

Full job descriptions from resume import — every role, tech stack, achievement, always visible.

SESSION

// in-memory · per conversation
L2
Episodic Buffer

last N turns · auto-compress > 30

Verbatim recent conversation. When the buffer crosses 30 messages, the context compressor (qwen3.5:4b local) summarizes older turns to free room.

COOL MEMORY

// retrieved on demand via brain.query()
L3
Episodic Store

bi-temporal · ACT-R + A-MEM

Full episodes with salience scores. Semantic search via embeddings + LIKE fallback. Zettelkasten links between related episodes. Spreading activation: top-k → follow links 1-hop.

L4
Entity Graph

Kuzu embedded · bi-temporal

(subject, relation, object) facts extracted from episodes. valid_from / valid_to edges — old facts get invalidated, never deleted. SQLite fallback when Kuzu unavailable.

L5
Procedural Memory

tool / task patterns

What CodeAgent succeeded or failed at. Lessons injected before next similar task. Embedded so similar tasks retrieve relevant priors.

ARCHIVE

// soft-deleted · never hard-deleted
L6
Cold Archive

activation < -3.0 · salience < 0.2 · 30+ days idle

Episodes that meet all three thresholds get archived = 1. Still queryable, never garbage-collected. Memory should fade, not disappear.

// MULTI-SIGNAL RETRIEVAL

brain.query() replaces flat cosine ranking.

Four signals, weighted. Every memory included in a response gets an access logged → ACT-R activation increases → that memory ranks higher next time, automatically.

score(memory) = 0.4 × cosine_similarity # semantic relevance + 0.3 × sigmoid(ACT-R_activation) # access freq × recency + 0.2 × salience # importance × surprise at write + 0.1 × recency_bonus # exp(-age_days/30)

// THE FORGETTING-CURVE MATH

Retrieval strengthens memory. Decay is exponential.

ACT-R's base-level activation IS the Ebbinghaus forgetting curve, generalized to multiple accesses. Each access adds a fresh term. Strong memories stay strong; unused ones fade past the archive threshold.

Bi = ln ( Σ tj −0.5 )
Bi = activation of memory i
tj = time since jth access
−0.5 = decay rate (Anderson)

Ebbinghaus 1885 · Anderson & Lebiere 1998 · applied to agent context windows, 2026.

// THE COMPACTION AGENT

The only entity that moves memories between tiers.

Runs every 5 minutes asynchronously. Replaces the old _embed_loop. Ten jobs, one process, no scattered consolidation paths.

01
embed_pending_papers
text-embedding-3-small over new arXiv / Semantic Scholar pulls
02
embed_pending_episodes
embed episode summaries + backfill missing salience
03
link_new_papers
A-MEM Zettelkasten cosine links — score > 0.5, top-3
04
link_new_episodes
A-MEM cosine links between episodes — score > 0.3, top-3
05
decay_and_prune
archive episodes with activation < -3.0 AND salience < 0.2
06
consolidate_to_semantic
extract (s, r, o) triples from mature episodes → L4 graph
07
maybe_reflect
cumulative-salience threshold → gpt-5.2 generates 3 insights, written back as 0.85-salience episodes
08
re_promote
3+ accesses in 7 days → gpt-5.2 decides if stable fact → staged for user review
09
trim_working_memory
evict stale profile entries: preferences first, experience/education last
10
update_temporal_context
invalidate L1B cache — next query rebuilds it

// SALIENCE GATE — fires at every brain.save_episode()

A background thread embeds the episode, computes surprise (1 − cosine_sim vs. recent mean), scores importance via gpt-5-nano (1-10), and writes:

salience = 0.5 × surprise + 0.5 × (importance / 10)

// MODEL ROUTING — LOCAL-FIRST

qwen3.5:4b on Ollama. OpenAI as fallback.

Every LLM call goes through ModelRouter.call(). On model-unavailable / deprecated errors it walks the fallback chain — no code changes when OpenAI deprecates a model.

TierModelBackendTasks
LOCAL · NANO qwen3.5:4b Ollama · on-device · free salience scoring, intent classification, entity / profile extraction, context compression
LONG_CTX gpt-4.1 OpenAI · 1M context document ingestion (entire resume in one call)
STANDARD gpt-5.2 OpenAI trends, paper chat, briefings, assistant chat, career, code generation
PRO gpt-5.2 OpenAI memory consolidation, reflection, re-promotion decisions

NANO fallback chain

qwen3.5:4b → gpt-5-nano → gpt-5-mini → gpt-4o-mini

STANDARD fallback chain

gpt-5.2 → gpt-5.1 → gpt-5 → gpt-4.1 → gpt-4o

A separate ModelCheckAgent runs at startup & weekly: tests every known OpenAI model with both max_tokens and max_completion_tokens, detects which parameter each requires, persists results to model_config.json and the model_availability table.

// THE API SURFACE

A small interface. The complexity hides behind it.

An application embedding NeuroStack only sees four primitives: write an episode, query for relevant memory, configure the model router, and let the compaction agent run. Everything else — decay, salience, consolidation, re-promotion, embeddings, graph extraction — is handled in the background.

  • brain.save_episode()writes to L3, fires salience-gate thread
  • brain.query()multi-signal scored retrieval across L3-L5
  • brain.profile.set()updates L1A identity block
  • router.call()local-first model dispatch with fallback
# Embedding NeuroStack in any agent from neurostack import Brain, ModelRouter router = ModelRouter( nano = Ollama("qwen3.5:4b"), standard = OpenAI("gpt-5.2"), ) brain = Brain(db="~/agent.db", router=router) # write brain.save_episode("Charith mentioned ACT-R") # read — scored retrieval, not just cosine memories = brain.query("forgetting curve math", k=5) # compaction runs in background CompactionAgent(brain).start() # every 5 min
PROJECT 02 · FIRST APPLICATION private repo · running in tray built on NeuroStack

Research Agent

The first thing I built on NeuroStack. A personal autonomous research agent that runs silently in the system tray on macOS & Windows — researches papers continuously, briefs me every morning, and remembers everything we've ever discussed.

tkinter + pystray arXiv + Semantic Scholar + web 7-daemon orchestrator ~696 KB Python

// WHAT IT DOES

every 2h
Autonomous research
arXiv, Semantic Scholar, web — papers in your fields, on its own clock
every 4h
Trend reports
synthesizes what's happening across your areas
daily
Morning briefings
overnight discoveries + your recent activity
on demand
Personal assistant chat
knows your full profile, work history, every conversation
on demand
Career assistant
resume tailoring, job analysis, cover letters — full profile loaded
sandboxed
Code agent
writes & executes Python in a workspace with procedural memory

// HOW IT USES NEUROSTACK

A thin app on top of a deep memory.

Research Agent is mostly UI and orchestration. Every paper found, every chat turn, every reflection lands in NeuroStack. Every recall — "what did I read about KV-cache last month?" — runs through brain.query(). The agent doesn't manage memory. It uses memory.

// THREADING MODEL · 7 DAEMONS

Main          tkinter root + ui_queue (100ms)
Daemon 1      pystray system tray
Thread 2      ResearchAgent — search every 2h
Daemon 3      TrendAgent — every 4h
Daemon 4      BriefingAgent — checks every 60s
Daemon 5      CompactionAgent — every 5 min
Daemon 6      ModelCheckAgent — weekly
Daemon 7      LocalModelWarmup — qwen3.5:4b

UI calls dispatched via ui_queue → main thread. Blocking calls use threading.Event.

PROJECT 03 · NEXT APPLICATION in design built on NeuroStack

Brain Component

The next thing on NeuroStack — a fully local agent runtime. No OpenAI fallback, no Anthropic, no provider lock-in. Just NeuroStack memory + Ollama / llama.cpp / MLX, running entirely on your machine.

// WHY

Local-first. Provider-independent. Works offline.

  • No vendor lock-in. Swap Ollama → llama.cpp → Claude → GPT-4 with one config flag.
  • Privacy by default. Your conversations, memories, and tool calls never leave your machine unless you opt in.
  • Cost-bounded. Use a 7B local model for cheap iteration, route only the hard calls to a frontier model.
  • Resilient. If a provider rate-limits you or sunsets a model, you keep working.
# Brain Component architecture class Brain: memory: NeuroStack # 3-tier brain memory model: ModelRouter # local OR cloud tools: ToolRegistry # MCP-compatible agents: AgentNetwork # planner + workers router = ModelRouter( local = Ollama("llama3.2:8b"), cloud = Anthropic("claude-opus-4-7"), policy = "local-first", # cost-aware ) brain = Brain(memory=NeuroStack(), model=router) brain.run() # offline-capable

// REQUEST FLOW

01
Intent
user prompt
02
NeuroStack
hot + episodic + semantic recall
03
Router
local · cloud · policy
04
Tools
MCP shell + file + web
05
Response
+ writeback to NeuroStack
PROJECT 04 research · drafted

Adaptive Motion Intelligence

The missing layer between pose landmarks and movement understanding. A platform that generates the right movement analyzer for the right task, on demand.

// FROM POSE SNAPSHOTS TO ACTION SIGNATURES

Every action has a pattern over time. A movement isn't a set of frames — it's a repeatable signature that can be captured, scored, and compared.

// THE STATUS QUO

Pose detection is a wall.

  • × Most systems stop at skeleton detection.
  • × Apps are hardcoded for a few movements.
  • × New movements require manual logic.
  • × Motion products are rigid and slow to scale.

// THE GAP

Task-specific movement understanding.

Pose landmarks tell you where the joints are. They don't tell you whether the squat was deep enough, whether the elbow flared on the curl, or whether the rehab patient is making progress.

NOVEL PART · 01

The analyzer is generated, not just prewritten.

A user describes the movement they want tracked. An agent network generates the analyzer. A sandbox validates it. The runtime stays fast and deterministic.

01
Intent
"track my front lever progress"
02
Agent Network
master + specialists generate analyzer
03
Validation Sandbox
simulate · test · gate
04
Execution
live runtime · deterministic

// WHY IT MATTERS

One platform · three surfaces.

Today it can understand a curl. Tomorrow rehab, sport, skill, and performance.

🏋️

Coaching

Fitness coaching, rep quality, real-time form correction.

🩺

Rehab

Visit-to-visit guidance, longitudinal progress tracking.

📊

Analytics

Movement consistency, symmetry, ergonomic workflows.

// COLLABORATE

Working on something adjacent?

I'm always up to compare notes on agent memory, model routing, or motion analytics. Reach out — happy to share what's working and what's not.

Get in touch