// the find

MinishLab/model2vec

★ 2,129 · Python · MIT · updated Jun 2026

Fast State-of-the-Art Static Embeddings

Model2Vec distills a sentence transformer into a static embedding model by forward-passing a vocabulary through it once, then keeping only the resulting token vectors. The result is a lookup table — no transformer at inference time — which is why it's 500x faster on CPU. It's aimed at anyone who needs embeddings at scale but can't afford the latency or memory of a full transformer.

The distillation step needs no training data and runs in ~30 seconds on a CPU, which is genuinely useful when you want domain-specific embeddings from your own base model without a labeling budget. The base package depends only on numpy, so dropping it into an existing service doesn't pull in a PyTorch dependency tree. The pre-trained potion models sit on MTEB above every other static embedding baseline by a meaningful margin, so the performance tradeoff versus a full transformer is concrete and documented, not hand-wavy. Integration with sentence-transformers and LangChain means you can swap it in with one line.

Static embeddings have a hard ceiling on semantic quality — once you've distilled away the contextual attention mechanism, you can't recover it. A sentence like 'the bank by the river' and 'the bank approved my loan' will embed similarly because context doesn't exist at inference. The multilingual model is 128M params, which narrows the size advantage considerably for that use case. Fine-tuning is limited to classification heads on top of frozen embeddings, so if your task needs the embedding space itself to shift, you're out of luck. The quantization and dimensionality reduction features are relatively new and the docs don't say much about where performance degrades — you'd need to run your own benchmarks.

View on GitHub → Homepage ↗