NVIDIA DGX Spark (GB10, 128GB)
Why this one: The only way to load 70B–200B models at this price: 128GB of unified memory in a book-sized turnkey CUDA box. It solves the problem the 5090 physically can't — a 120B model simply loads and runs.
What it beat: Mac Studio at equivalent memory (more expensive, no CUDA — most LLM tooling is CUDA-first) and 4× GPU franken-rigs for buyers who won't build one.
Tighter budget? A pair of used 3090s (48GB, ~$1,900) runs 70B Q4 faster than the Spark runs anything — if you'll build it. The Spark's premium is capacity + turnkey.
Single-vendor integrated hardware with NVIDIA support behind it. It's a v1 platform — early firmware quirks are documented but patched steadily; buy from retailers with real return windows.
Common concerns (3)
- Speed reality check — 273 GB/s bandwidth means ~6–8 tok/s on 27–30B models: reading pace, not chat pace. A 5090 is ~7× faster on anything that fits in 32GB. Buy the Spark for capacity, never speed.
- Who it's actually for — running/fine-tuning 70B+ models, MoE models (sparse activation suits low bandwidth), and prototyping against the full CUDA stack without a server.
- Benchmark context — a 5× used-3090 rig hit 124 tok/s on a 120B model vs the Spark's 38: DIY still wins raw throughput; the Spark wins your weekends back.
