// LAB · ACTIVE EXPERIMENTS · 2026

What I'm Building

A framework, two applications built on it, and a fourth bet on motion. NeuroStack is the foundation; everything else is what gets built on top.

// scroll to read about each

PROJECT 01 · THE FRAMEWORK in development

NeuroStack

A brain-inspired memory framework for AI agents. Eight memory layers across four temperature zones, multi-signal retrieval, a compaction agent that runs the lifecycle, and a model router that's local-first by default. Built to be embedded by any agent that needs to remember, not just look things up.

Python framework SQLite · 28 tables Kuzu graph DB ACT-R · A-MEM · CLS no LangChain · no AutoGen · no ORM

"The most novel part of any agent architecture isn't the model. It's the memory."

// THE PROBLEM I SOLVED

v0 was a flat 2-tier store. It broke in 8 ways.

The original architecture — profile table (hot) + episodes/papers table (cold) — looked clean on paper but failed at every junction. Each row below is a real symptom that drove a NeuroStack subsystem.

Problem	Impact
500-char hot memory	A business card — no room for actual job history
no temporal context	Agent had no idea what you'd been doing lately
LIKE-only episode search	Semantic similarity completely missing
no decay or salience	Trivial note ≡ career decision
no episodic → semantic pathway	Insights never distilled into lasting knowledge
no re-promotion	Cold memories could never resurface automatically
3 duplicate consolidation paths	Chat / assistant / career each had their own broken impl
no dedicated memory agent	Memory management scattered, ad hoc, inconsistent

// THE THEORY UNDERNEATH

Built from scratch. Inspired by six papers.

CLS

Complementary Learning Systems

McClelland et al — neuroscience: hippocampus + neocortex split

A-MEM

Zettelkasten for agents

cosine-linked memory notes; powers L3 episode_links

Zep / Graphiti

Bi-temporal knowledge graphs

valid_from / valid_to edges; powers L4 entity facts

MemGPT

Virtual context management

paging hot memory in/out; powers L1 trim_working_memory

ACT-R

Base-level activation

Anderson 1998 — the math of retrieval-strengthened memory

Generative Agents

Reflection

Park et al — periodic insight generation; powers maybe_reflect

// no langchain. no autogen. no orm. every layer hand-rolled.

// THE ARCHITECTURE

Eight layers. Four temperature zones.

Hot lives in every prompt. Cool retrieves on demand. Archive stays soft-deleted forever.

HOT MEMORY // injected into every prompt · zero latency

L1A

Identity Block

~1500 chars · stable

Who you ARE — priority-ordered: personal → experience → education → projects → skills. Experience entries never evicted; skills compacted first.

L1B

Temporal Context Window

~800 chars · 5-min TTL cache

What you've been DOING — today's sessions + this week's summary. Auto-rebuilt by CompactionAgent.

L1C

Experience Details

uncapped

Full job descriptions from resume import — every role, tech stack, achievement, always visible.

SESSION // in-memory · per conversation

Episodic Buffer

last N turns · auto-compress > 30

Verbatim recent conversation. When the buffer crosses 30 messages, the context compressor (qwen3.5:4b local) summarizes older turns to free room.

COOL MEMORY // retrieved on demand via brain.query()

Episodic Store

bi-temporal · ACT-R + A-MEM

Full episodes with salience scores. Semantic search via embeddings + LIKE fallback. Zettelkasten links between related episodes. Spreading activation: top-k → follow links 1-hop.

Entity Graph

Kuzu embedded · bi-temporal

(subject, relation, object) facts extracted from episodes. valid_from / valid_to edges — old facts get invalidated, never deleted. SQLite fallback when Kuzu unavailable.

Procedural Memory

tool / task patterns

What CodeAgent succeeded or failed at. Lessons injected before next similar task. Embedded so similar tasks retrieve relevant priors.

ARCHIVE // soft-deleted · never hard-deleted

Cold Archive

activation < -3.0 · salience < 0.2 · 30+ days idle

Episodes that meet all three thresholds get archived = 1. Still queryable, never garbage-collected. Memory should fade, not disappear.

// MULTI-SIGNAL RETRIEVAL

`brain.query()` replaces flat cosine ranking.

Four signals, weighted. Every memory included in a response gets an access logged → ACT-R activation increases → that memory ranks higher next time, automatically.

score(memory) =  0.4 × cosine_similarity           # semantic relevance
                + 0.3 × sigmoid(ACT-R_activation)  # access freq × recency
                + 0.2 × salience                    # importance × surprise at write
                + 0.1 × recency_bonus               # exp(-age_days/30)

// THE FORGETTING-CURVE MATH

Retrieval strengthens memory. Decay is exponential.

ACT-R's base-level activation IS the Ebbinghaus forgetting curve, generalized to multiple accesses. Each access adds a fresh term. Strong memories stay strong; unused ones fade past the archive threshold.

B_i = ln ( Σ t_j^−0.5 )

B_i = activation of memory i

t_j = time since jth access

−0.5 = decay rate (Anderson)

Ebbinghaus 1885 · Anderson & Lebiere 1998 · applied to agent context windows, 2026.

// THE COMPACTION AGENT

The only entity that moves memories between tiers.

Runs every 5 minutes asynchronously. Replaces the old _embed_loop. Ten jobs, one process, no scattered consolidation paths.

embed_pending_papers

text-embedding-3-small over new arXiv / Semantic Scholar pulls

embed_pending_episodes

embed episode summaries + backfill missing salience

link_new_papers

A-MEM Zettelkasten cosine links — score > 0.5, top-3

link_new_episodes

A-MEM cosine links between episodes — score > 0.3, top-3

decay_and_prune

archive episodes with activation < -3.0 AND salience < 0.2

consolidate_to_semantic

extract (s, r, o) triples from mature episodes → L4 graph

maybe_reflect

cumulative-salience threshold → gpt-5.2 generates 3 insights, written back as 0.85-salience episodes

re_promote

3+ accesses in 7 days → gpt-5.2 decides if stable fact → staged for user review

trim_working_memory

evict stale profile entries: preferences first, experience/education last

update_temporal_context

invalidate L1B cache — next query rebuilds it

// SALIENCE GATE — fires at every brain.save_episode()

A background thread embeds the episode, computes surprise (1 − cosine_sim vs. recent mean), scores importance via gpt-5-nano (1-10), and writes:

salience = 0.5 × surprise + 0.5 × (importance / 10)

// MODEL ROUTING — LOCAL-FIRST

qwen3.5:4b on Ollama. OpenAI as fallback.

Every LLM call goes through ModelRouter.call(). On model-unavailable / deprecated errors it walks the fallback chain — no code changes when OpenAI deprecates a model.

Tier	Model	Backend	Tasks
LOCAL · NANO	qwen3.5:4b	Ollama · on-device · free	salience scoring, intent classification, entity / profile extraction, context compression
LONG_CTX	gpt-4.1	OpenAI · 1M context	document ingestion (entire resume in one call)
STANDARD	gpt-5.2	OpenAI	trends, paper chat, briefings, assistant chat, career, code generation
PRO	gpt-5.2	OpenAI	memory consolidation, reflection, re-promotion decisions

NANO fallback chain

qwen3.5:4b → gpt-5-nano → gpt-5-mini → gpt-4o-mini

STANDARD fallback chain

gpt-5.2 → gpt-5.1 → gpt-5 → gpt-4.1 → gpt-4o

A separate ModelCheckAgent runs at startup & weekly: tests every known OpenAI model with both max_tokens and max_completion_tokens, detects which parameter each requires, persists results to model_config.json and the model_availability table.

// THE API SURFACE

A small interface. The complexity hides behind it.

An application embedding NeuroStack only sees four primitives: write an episode, query for relevant memory, configure the model router, and let the compaction agent run. Everything else — decay, salience, consolidation, re-promotion, embeddings, graph extraction — is handled in the background.

brain.save_episode()writes to L3, fires salience-gate thread
brain.query()multi-signal scored retrieval across L3-L5
brain.profile.set()updates L1A identity block
router.call()local-first model dispatch with fallback

# Embedding NeuroStack in any agent
from neurostack import Brain, ModelRouter

router = ModelRouter(
    nano     = Ollama("qwen3.5:4b"),
    standard = OpenAI("gpt-5.2"),
)

brain = Brain(db="~/agent.db", router=router)

# write
brain.save_episode("Charith mentioned ACT-R")

# read — scored retrieval, not just cosine
memories = brain.query("forgetting curve math", k=5)

# compaction runs in background
CompactionAgent(brain).start()  # every 5 min

PROJECT 02 · FIRST APPLICATION private repo · running in tray built on NeuroStack

Research Agent

The first thing I built on NeuroStack. A personal autonomous research agent that runs silently in the system tray on macOS & Windows — researches papers continuously, briefs me every morning, and remembers everything we've ever discussed.

tkinter + pystray arXiv + Semantic Scholar + web 7-daemon orchestrator ~696 KB Python

// WHAT IT DOES

every 2h

Autonomous research

arXiv, Semantic Scholar, web — papers in your fields, on its own clock

every 4h

Trend reports

synthesizes what's happening across your areas

daily

Morning briefings

overnight discoveries + your recent activity

on demand

Personal assistant chat

knows your full profile, work history, every conversation

on demand

Career assistant

resume tailoring, job analysis, cover letters — full profile loaded

sandboxed

Code agent

writes & executes Python in a workspace with procedural memory

// HOW IT USES NEUROSTACK

A thin app on top of a deep memory.

Research Agent is mostly UI and orchestration. Every paper found, every chat turn, every reflection lands in NeuroStack. Every recall — "what did I read about KV-cache last month?" — runs through brain.query(). The agent doesn't manage memory. It uses memory.

// THREADING MODEL · 7 DAEMONS

Main          tkinter root + ui_queue (100ms)
Daemon 1      pystray system tray
Thread 2      ResearchAgent — search every 2h
Daemon 3      TrendAgent — every 4h
Daemon 4      BriefingAgent — checks every 60s
Daemon 5      CompactionAgent — every 5 min
Daemon 6      ModelCheckAgent — weekly
Daemon 7      LocalModelWarmup — qwen3.5:4b

UI calls dispatched via ui_queue → main thread. Blocking calls use threading.Event.

PROJECT 03 · NEXT APPLICATION in design built on NeuroStack

Brain Component

The next thing on NeuroStack — a fully local agent runtime. No OpenAI fallback, no Anthropic, no provider lock-in. Just NeuroStack memory + Ollama / llama.cpp / MLX, running entirely on your machine.

// WHY

Local-first. Provider-independent. Works offline.

No vendor lock-in. Swap Ollama → llama.cpp → Claude → GPT-4 with one config flag.
Privacy by default. Your conversations, memories, and tool calls never leave your machine unless you opt in.
Cost-bounded. Use a 7B local model for cheap iteration, route only the hard calls to a frontier model.
Resilient. If a provider rate-limits you or sunsets a model, you keep working.

# Brain Component architecture

class Brain:
    memory: NeuroStack      # 3-tier brain memory
    model:  ModelRouter     # local OR cloud
    tools:  ToolRegistry    # MCP-compatible
    agents: AgentNetwork    # planner + workers

router = ModelRouter(
    local  = Ollama("llama3.2:8b"),
    cloud  = Anthropic("claude-opus-4-7"),
    policy = "local-first",         # cost-aware
)

brain = Brain(memory=NeuroStack(), model=router)
brain.run()  # offline-capable

// REQUEST FLOW

Intent

user prompt

NeuroStack

hot + episodic + semantic recall

Router

local · cloud · policy

Tools

MCP shell + file + web

Response

+ writeback to NeuroStack

PROJECT 04 research · drafted

Adaptive Motion Intelligence

The missing layer between pose landmarks and movement understanding. A platform that generates the right movement analyzer for the right task, on demand.

// FROM POSE SNAPSHOTS TO ACTION SIGNATURES

Every action has a pattern over time. A movement isn't a set of frames — it's a repeatable signature that can be captured, scored, and compared.

// THE STATUS QUO

Pose detection is a wall.

× Most systems stop at skeleton detection.
× Apps are hardcoded for a few movements.
× New movements require manual logic.
× Motion products are rigid and slow to scale.

// THE GAP

Task-specific movement understanding.

Pose landmarks tell you where the joints are. They don't tell you whether the squat was deep enough, whether the elbow flared on the curl, or whether the rehab patient is making progress.

NOVEL PART · 01

The analyzer is generated, not just prewritten.

A user describes the movement they want tracked. An agent network generates the analyzer. A sandbox validates it. The runtime stays fast and deterministic.

Intent

"track my front lever progress"

Agent Network

master + specialists generate analyzer

Validation Sandbox

simulate · test · gate

Execution

live runtime · deterministic

// WHY IT MATTERS

One platform · three surfaces.

Today it can understand a curl. Tomorrow rehab, sport, skill, and performance.

Coaching

Fitness coaching, rep quality, real-time form correction.

Rehab

Visit-to-visit guidance, longitudinal progress tracking.

Analytics

Movement consistency, symmetry, ergonomic workflows.

// COLLABORATE

Working on something adjacent?

I'm always up to compare notes on agent memory, model routing, or motion analytics. Reach out — happy to share what's working and what's not.

Get in touch

What I'm Building

NeuroStack

v0 was a flat 2-tier store. It broke in 8 ways.

Built from scratch. Inspired by six papers.

Eight layers. Four temperature zones.

brain.query() replaces flat cosine ranking.

Retrieval strengthens memory. Decay is exponential.

The only entity that moves memories between tiers.

qwen3.5:4b on Ollama. OpenAI as fallback.

A small interface. The complexity hides behind it.

Research Agent

A thin app on top of a deep memory.

Brain Component

Local-first. Provider-independent. Works offline.

Adaptive Motion Intelligence

Pose detection is a wall.

Task-specific movement understanding.

The analyzer is generated, not just prewritten.

One platform · three surfaces.

Coaching

Rehab

Analytics

Working on something adjacent?

`brain.query()` replaces flat cosine ranking.