// the find

BasedHardware/omi

★ 12,763 · Dart · MIT · updated Jun 2026

AI that sees your screen, listens to your conversations and tells you what to do

Omi is an always-on AI capture system: it records your conversations and screen, transcribes in real-time with speaker diarization, and surfaces an LLM chat that can reference everything you've said or seen. It spans hardware (a Bluetooth wearable necklace and ESP32 glasses), a Flutter mobile app, a Swift/Rust macOS app, and a Python/FastAPI backend. Aimed at people who want a persistent personal knowledge base built from their actual life, not manual note-taking.

The speaker diarization pipeline (VAD + GPU diarizer + Deepgram STT) is the right architecture for meeting capture — most DIY setups skip diarization and get a useless transcript blob. The quick-start actually works: `./run.sh --yolo` connects to their cloud backend so you can evaluate the product without spinning up the full stack first, which is smart onboarding. Open hardware designs ship alongside the software — schematics, firmware, and a build guide — so 'open source wearable' isn't just marketing. The MCP server and multi-language SDKs (Python, Swift, React Native) make it a reasonable foundation for building on top of captured context rather than treating it as a closed silo.

The `--yolo` quick start silently sends your audio to their cloud — this is buried and the README doesn't flag it clearly, which is a serious issue for a tool that records every conversation you have. True self-hosting requires Firebase, Redis, GCP, and GPU compute for VAD/diarization; the 'fully open source' claim is accurate but 'fully self-hostable without cloud dependency' is not. The stack fragmentation is brutal for contributors: Swift+Rust for desktop, Flutter for mobile, Python for backend, C for firmware, Next.js for personas — five languages and five runtime environments means finding someone who can actually fix a cross-cutting bug is rare. There's no meaningful privacy documentation for an always-on audio and screen capture tool, which is exactly where users need to know data retention, who can access transcripts, and what happens to embeddings stored in the cloud backend.

View on GitHub → Homepage ↗