// the find

akitaonrails/FrankSherlock

★ 278 · Rust · GPL-3.0 · updated Jun 2026

Intelligent File Searcher that can search for descriptions in media files like photos, for example "receipt from market of $42.00" using AI based classification

Frank Sherlock is a local-first desktop app (Tauri + Rust backend, React frontend) that classifies your photo/video/PDF library using a locally-running Ollama vision model, then lets you search it with natural language. Everything stays on your machine — no cloud, no subscription. It's aimed at people with large NAS libraries who want to find files by content rather than filename.

The 4-phase incremental scan is well-engineered: discovery uses only filesystem metadata, moved files are detected by fingerprint rather than re-classified, and each phase can be cancelled and resumed independently — rescanning 10k unchanged images takes seconds rather than minutes. The OCR pipeline shows real care: Surya runs in an isolated Python venv as primary, vision LLM as fallback, with actual benchmark data (RESULTS.md) justifying the choice at 95% reference similarity. Face clustering with SCRFD + ArcFace running as native ONNX — no Python dependency for the hot path — is a legitimate technical choice that avoids the usual 'just call a cloud API' shortcut. The test coverage (322 Rust + 299 frontend tests) is higher than most hobby projects of this complexity, and CI runs on all three platforms.

The dependency stack for a 'simple' local tool is heavy: Ollama running as a separate daemon, optional Surya requiring a Python venv, PDFium and ONNX runtime as downloaded native libraries, and ffmpeg for video. Setup is a multi-step manual process even with pre-built binaries, and the Windows installer triggers SmartScreen because there's no code-signing certificate. The query parser appears to be custom (query_parser.rs) but the README only shows basic examples — it's unclear how it handles ambiguous or unsupported queries, and there's no mention of error feedback when a query fails to parse. Video support exists but classification relies on keyframe extraction, so temporal content (e.g. 'the scene where X happens') isn't searchable — the README doesn't surface this limitation. There's no mention of an export path: if you build a library of 50k classified images and want to migrate to a different tool, your metadata is stuck in a local SQLite file with no documented schema or export tooling.

View on GitHub →