We're crowdsourcing the data labeling that embedding models need to scale.
Run a container on your machine, label query-document pairs while you sleep,
and help build open-source models that top the leaderboard.
LLMs improve with more data. Embedding models don't, because scaling their training data introduces false negatives: documents incorrectly treated as irrelevant. This poisons the contrastive learning signal and eventually makes more data hurt performance.
The fix is straightforward: use a language model to verify every (query, document) pair and filter the false negatives. But at the scale needed - hundreds of millions of pairs - no single lab can afford it.
The community can.
The labeling task is simple: “Does this document answer this query?” A small model running on consumer hardware handles this reliably. Leave the container running overnight. Wake up having contributed to the best embedding model ever built.
| Hardware | Model | Speed | Overnight (8hr) |
|---|---|---|---|
| RTX 4090 (24GB) | Qwen2.5-7B Q8 | ~120 pairs/min | ~57,600 pairs |
| RTX 3090 (24GB) | Qwen2.5-7B Q8 | ~85 pairs/min | ~40,800 pairs |
| RTX 3060 (12GB) | Qwen2.5-3B Q8 | ~60 pairs/min | ~28,800 pairs |
| M2/M3 MacBook (16GB) | Qwen2.5-3B Q4 | ~20 pairs/min | ~9,600 pairs |
| CPU only (16GB RAM) | Qwen2.5-1.5B Q4 | ~8 pairs/min | ~3,840 pairs |
The container probes your hardware and picks the best model. No config needed.
The model runs locally. Only pair ID + label are uploaded.
Every pair labeled by 2+ contributors. Honeypots catch bad actors.
Public ledger. Weekly retraining. MTEB scores published live.
We accept donations of API credits, GPU compute time, and cloud resources. Every donated resource is accounted for publicly, and we publish exactly how it's used. No waste. If you give us compute, we label pairs.