Stop giving your agents amnesia and calling it workflow.
ContextLattice gives Codex, Cursor, Claude Desktop, Open WebUI, and custom MCP agents one durable place to remember decisions, evidence, skills, checkpoints, behavior, and project context, then compile that memory into the next sharp model prompt.
Download options + technical bundles
Less technical users: DMG/Linux bundle/MSI. Technical/dev users: repo + ZIP remain the default.
Clone: git clone git@github.com:sheawinkler/ContextLattice.git
Open the platform map
More docs & tools (7)
Simple enough to install. Deep enough to become your agent memory infrastructure.
Give every agent the same memory contract.
Plug Codex, Cursor, Claude Desktop, Open WebUI, Claude Code, and custom MCP agents into one local-first layer for writes, recall, compiled context packets, prompt-ready session summaries, and repeatable handoff.
- CLI-first workflow:
contextlattice_agent_start,contextlattice_search, andcontextlattice_checkpoint. - Agent templates: copy-ready instructions for Codex, Claude Code, OpenCode, Hermes, ChatGPT, and Claude.
- Context compiler: turn durable memory, ranked evidence, risks, files, and checks into a sharp reference packet for the next model call.
- Skills Index: discover quarantined capabilities without loading every skill into every agent context.
Keep memory useful when the work gets heavier.
Move beyond "save some chat logs" into durable write truth, staged retrieval, behavior provenance, learning feedback, and graph-aware recall that can grow with the team.
- Durable fanout: write once, then route to rollups, vectors, ledgers, and deeper stores.
- Learning loops: feedback and eval cases improve ranking instead of freezing recall quality in place.
- Behavior provenance: preserve decisions, evidence, checkpoints, and agent/session context for audit-grade handoff.
Not just memory. The operating layer around memory.
ContextLattice packages durable memory, retrieval policy, session rollups, prompt-ready context packets, skills discovery, CLI workflows, templates, learning, provenance, and deep-memory lanes behind one local contract.
Your tools remember the same work.
Less replaying context. Less copy-pasting transcripts. Less "wait, what were we doing?" Your agents pick up decisions, evidence, and project state from the same shared memory spine.
Local-first, multi-lane, measurable.
Public local lite starts with topic rollups and Qdrant. Full/operator stacks can add pgvector, raw ledger, async continuation, memory-bank lanes, graph maintenance, and stronger reliability controls.
One command to launch safely
Use gmake quickstart for first-run setup, secure bootstrap, and health verification.
gmake quickstart
Agent crawlers and assistants should parse llms.txt first.
Prove service, auth, and agent memory
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
scripts/agent/agent-runtime-proof-pack --pretty
One command opens dashboard + live health checks
gmake monitor-open
# CLI-only checks:
gmake monitor-check
Dashboard URL: http://127.0.0.1:3000 (default local).
Getting 401 on local requests?
Secure mode is on by default. Keep using /health without auth, and include x-api-key for protected endpoints like /status, /memory/*, and /telemetry/*.
Production host split for paid launch
- Public marketing/docs:
https://contextlattice.io - Paid app + dashboard + billing API:
https://app.contextlattice.io - Billing infrastructure: managed through deployment-specific configuration.
What is public now vs private next
Public v3.4 (current launch lane)
- Current public release:
v3.4.10 - Frontdoor:
gateway-goon:8075 - Fallback: Python orchestrator on
:18075 - Memory-bank default:
shodh_spike - Retrieval behavior: staged fast-return + async slow continuation
- Personal computer target: HF/Glama
2-4vCPU /4-8 GBRAM /20-50 GBSSD, Lite2-4vCPU /8-12 GBRAM /25-80 GBSSD, Full6-8vCPU /12-20 GBRAM /100-180 GBSSD (no spike-lab) - Release posture: stable baseline for public operators
Private v4 (tuning lane)
- Frontdoor: same
:8075go gateway contract - Policy: aggressive adaptive tuning and candidate promotions
- Memory-bank:
shodh_spikewith deterministic fallback chain and optional hedge - Validation: benchmark + recall parity + soak gate before promotion
- Personal computer target: start from Full baseline and add headroom, especially SSD (external NVMe recommended)
- Release posture: private experimentation before any public cutover
Launch flow map
Orchestrator gets better at memory recall over time
The orchestrator uses a learning schema from feedback signals to rerank results and improve retrieval precision over time. Public local fast staged reads prioritize topic rollups and Qdrant, while full/operator continuation can also incorporate pgvector, MindsDB, Mongo raw, Letta, and memory-bank.
Read the detailed rollout notes on the Updates page and the execution plan on the V3 Roadmap.
Unified write + retrieval loop through the orchestrator
Every write enters through the orchestrator, which records durable raw data, fans out to specialized stores, and continuously protects queue and storage health. Every search comes back through the same orchestrator so results can be fused, reranked, and improved over time from feedback.
- Write intake
- Outbox fanout
- Federated search
- Learning rerank
- Retention + guardrails
Data Flow
Write Flow
Single orchestrator spine, explicit method stages, and parallel fanout branches to all write sinks.
Retrieval Flow
Federated sources converge to the orchestrator spine, then reranked results return with learning feedback.
Orchestrator
Benefit: one control plane for writes, retrieval, and policy.
Why: central coordination is what allows multi-source ranking and learning to compound.
Topic Rollups
Benefit: compact, high-signal summaries for fast staged recall.
Why: reduces deep-read pressure while preserving source grounding for follow-up dives.
Qdrant
Benefit: first-class local vector engine.
Why: payload-heavy filtering, quantization, snapshots, and distributed vector deployments keep the lite and full vector lanes aligned.
Postgres + pgvector
Benefit: SQL-co-located vector retrieval for full/operator stacks.
Why: joins, relational backups, and Postgres-native operations remain valuable when users already run the SQL lane.
Mongo Raw
Benefit: durable source-of-truth write ledger.
Why: protects recoverability and enables replay/rehydrate workflows.
Deep lane (MindsDB + Letta + memory-bank)
Benefit: richer long-horizon recall when fast lanes need deeper evidence.
Why: async continuation improves completeness without blocking fast user responses.
Fanout Outbox
Benefit: resilient async delivery with retries, coalescing, and admission control.
Why: prevents sink instability from breaking ingestion reliability.
Retention + Telemetry
Benefit: bounded storage growth and observable runtime behavior.
Why: operational stability is required for learning retrieval to stay trustworthy.
Learning is strongest when memory is both rich and reliable
The orchestrator's learning schema can only improve ranking if retrieval sources stay healthy, durable, and synchronized. This architecture makes that possible: topic rollups + Qdrant provide fast candidates, deep continuation adds MindsDB/Letta/memory-bank evidence, Mongo guarantees recovery, and guardrails keep the full loop from collapsing under pressure.
Deployment Modes
Hugging Face / Glama lite
Single-container lane focused on compatibility and low footprint.
- App version lane: Public
v3.4.x - Includes: gateway + orchestrator container with topic-rollup-first retrieval
- Compute: 2-4 vCPU recommended
- Memory: 4-8 GB RAM baseline
- Storage: 20-50 GB SSD depending on retention settings
Lite mode
Best for local development and constrained laptops where stable memory services matter more than deep analytics.
- App version lane: Public
v3.4.x - Includes: Gateway-Go frontdoor, orchestrator core, Memory Bank MCP, Mongo raw, Qdrant, outbox fanout, retention workers
- Fast staged retrieval:
topic_rollups + qdrantby default; pgvector remains first-class for full/operator stacks - Compute: 2-4 vCPU recommended
- Memory: 8-12 GB RAM baseline
- Storage: 25-80 GB SSD depending on write volume
Full mode
Best for high-write workloads and richer retrieval where learning loops use every sink, including RAG through Letta.
- App version lane: Public
v3.4.xFull, and baseline for privatev4tuning - Includes: Lite mode plus MindsDB analytics, Letta archival memory, observability stack, and full rehydrate tooling
- Deep continuation lane: async enrichment from
mindsdb + mongo_raw + letta + memory_bank - Compute: 6-8 vCPU recommended
- Memory: 12-20 GB RAM baseline (without spike-lab)
- Storage: 100-180 GB SSD depending on retention policy
- Spike-lab active: 24-32 GB RAM and 180-300 GB SSD/NVMe