// the find
plastic-labs/honcho
Memory library for building stateful agents
Honcho is a memory-as-a-service layer for AI agents — it stores conversations, runs background reasoning to build per-peer representations, and exposes those representations back to your LLM calls. It's aimed at developers who want agents that remember users across sessions without rebuilding that infrastructure themselves. You can run it managed or self-host the FastAPI server.
The peer model is the right abstraction — treating users and agents as first-class symmetric entities makes multi-agent sessions work naturally rather than being bolted on. Background async reasoning (the 'deriver') means you're not paying inference latency on every request; the static representation endpoint gives you a sub-LLM-latency path for injecting context. The hybrid BM25 + vector search is a real improvement over pure embedding similarity, which fails on exact names and keywords. The SDK ergonomics are genuinely clean — `session.context().to_openai()` is the right interface; you shouldn't have to think about prompt assembly.
AGPL-3.0 is a hard no for most commercial products — if you self-host and expose Honcho as part of a SaaS, your code is effectively open-sourced. The reasoning pipeline is a black box: you can query what Honcho 'knows' but you can't inspect the intermediate deduction steps or audit why a wrong conclusion was formed, which matters when the agent says something wrong about a user. Multi-provider LLM dependency at inference time (Gemini for derivation, Anthropic for reasoning, OpenAI for embeddings by default) means three separate API keys and three separate failure surfaces just to run the basic pipeline. Version history shows major API churn — v1 had Apps/Users, v2 had Peers/Observations, v3 renamed things again — early adopters have migrated twice and the migration guides suggest it's not fully mechanical.