// the find
BasedHardware/omi
AI that sees your screen, listens to your conversations and tells you what to do
Omi is an always-on AI capture system: it records your conversations and screen, transcribes in real-time with speaker diarization, and surfaces an LLM chat that can reference everything you've said or seen. It spans hardware (a Bluetooth wearable necklace and ESP32 glasses), a Flutter mobile app, a Swift/Rust macOS app, and a Python/FastAPI backend. Aimed at people who want a persistent personal knowledge base built from their actual life, not manual note-taking.
The speaker diarization pipeline (VAD + GPU diarizer + Deepgram STT) is the right architecture for meeting capture — most DIY setups skip diarization and get a useless transcript blob. The quick-start actually works: `./run.sh --yolo` connects to their cloud backend so you can evaluate the product without spinning up the full stack first, which is smart onboarding. Open hardware designs ship alongside the software — schematics, firmware, and a build guide — so 'open source wearable' isn't just marketing. The MCP server and multi-language SDKs (Python, Swift, React Native) make it a reasonable foundation for building on top of captured context rather than treating it as a closed silo.
The `--yolo` quick start silently sends your audio to their cloud — this is buried and the README doesn't flag it clearly, which is a serious issue for a tool that records every conversation you have. True self-hosting requires Firebase, Redis, GCP, and GPU compute for VAD/diarization; the 'fully open source' claim is accurate but 'fully self-hostable without cloud dependency' is not. The stack fragmentation is brutal for contributors: Swift+Rust for desktop, Flutter for mobile, Python for backend, C for firmware, Next.js for personas — five languages and five runtime environments means finding someone who can actually fix a cross-cutting bug is rare. There's no meaningful privacy documentation for an always-on audio and screen capture tool, which is exactly where users need to know data retention, who can access transcripts, and what happens to embeddings stored in the cloud backend.