finds.dev← search

// the find

chiphuyen/sotawhat

★ 1,410 · Python · updated Feb 2024

Returns latest research results by crawling arxiv papers and summarizing abstracts. Helps you stay afloat with so many new papers everyday.

A CLI tool that searches arxiv, scrapes recent papers matching a keyword, and summarizes their abstracts using NLTK. Aimed at researchers or engineers who want a quick terminal-based way to scan what's been published on a topic without opening a browser.

Simple, focused interface — one command, one keyword, useful output with no ceremony. The idea of summarizing abstracts rather than just listing titles saves actual time. MIT licensed with a pip-installable package structure, so it's easy to drop into a workflow. Covers a genuinely useful niche that arxiv's own search UI handles poorly for quick lookups.

Last meaningful commit appears to be years old and the Heroku web UI it references is almost certainly dead (Heroku killed free dynos in 2022). The summarization is NLTK extractive summarization from 2018 — not LLM-based, so it's essentially sentence ranking from the abstract text, which is already short. No way to filter by date range, so results mixing 2018 and 2024 papers with the same keyword isn't surfaced. The SSL certificate workaround and Windows encoding hack documented in the README are early warning signs that setup friction was real even when this was actively maintained.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →