What LLM Can I Run is a hardware-first local LLM ranker. Type your GPU, get every model that fits — ranked by real bench
| Founded year: | 2026 |
| Country: | Armenia |
| Funding rounds: | Not set |
| Total funding amount: | Not set |
Description
What LLM Can I Run is a hardware-first ranker for local large language models. The premise: most LLM "reviews" tell you which model is smartest in absolute terms, then you discover it needs 80 GB of VRAM you don't have. We flip the question — tell us your GPU, and we'll rank only the models that actually fit, scored by real benchmarks rather than vibes.The registry covers 88 GPUs across NVIDIA RTX 20/30/40/50 series, datacenter cards (H100, H200, A100, L40S, A6000 Pro), and Apple Silicon M1 through M4 with every memory configuration including the often-forgotten M3 Pro 18 GB and M3 Max 128 GB. For each card, we compute VRAM headroom across six quantization levels (Q2_K through FP16), estimate tokens/sec from memory bandwidth, and surface a fit grade — great / ok / tight — so you know whether you're cruising or scraping by.
Model rankings cross-match three benchmark sources: LiveBench (reasoning, coding, language), Aider Polyglot (real-world code editing pass rate), and Chatbot Arena ELO (human preference). Every score has a confidence level, every number is dated, every provenance link is one click away. If data is stale, we say so. No email gate, no upsell, no AI-generated filler text.
Built for the local-LLM crowd running Ollama, llama.cpp, MLX, and anyone wondering whether the RTX 3090 → 5090 upgrade is worth it (usually no). Next time someone asks "what LLM can I run on my machine," send them this link instead of a 40-tab Reddit thread.