The computer for local AI
| Founded year: | 2026 |
| Country: | United States of America |
| Funding rounds: | Not set |
| Total funding amount: | Not set |
Description
Lucebox pairs an RTX 3090 with a Ryzen AI MAX+ 395 and a custom inference engine tuned for it. It beats every machine at its price on tokens per second, and ships with everything pre-installed.Key Features
- Hybrid memory architecture: 128GB of LPDDR5X unified memory on the Ryzen AI MAX+ 395 holds large models, while the RTX 3090's 24GB of fast GDDR6X serves as a high-bandwidth tier. Speculative decoding across the two tiers delivers up to 10x faster inference than comparable single-tier machines.
- Custom open-source inference engine: Lucebox ships with hand-tuned CUDA kernels, DFlash speculative decoding, and PFlash speculative prefill (10x faster than llama.cpp), all open source with 2,000+ GitHub stars and an active contributor community.
- One-command model deployment: A single CLI pulls, configures, and serves any open model. No driver hunting, no quantization guesswork, no environment setup. Plug it in and run inference in minutes.
- Pre-tuned for the exact hardware: Unlike generic builds, the entire software stack is optimized for this specific chip pairing, so you get the full performance the silicon is capable of, out of the box.
Benefits
- Best tokens per second for the money: At $4,900, Lucebox outperforms every machine in its price class, including the Mac Studio and NVIDIA DGX Spark, on local LLM inference speed.
- Your data never leaves your desk: Models run entirely on-device, so prompts, code, and documents stay private. No cloud provider sees your traffic.
- Predictable costs instead of token bills: One upfront purchase replaces escalating cloud API spend. Run agents 24/7 without watching a usage meter.
- Zero setup time: Everything arrives pre-installed and pre-tuned. You skip the weeks of kernel tweaking, driver debugging, and benchmark chasing that a DIY build demands.