Used GeForce RTX 3090 24GB (business sellers only)
Why this one: The community-standard budget inference card for good reason: 24GB VRAM and 936 GB/s run 32B-class models fully on-card at roughly 87% of a 4090's inference speed, for well under $1,000. And it stacks: two cards = 48GB = 70B Q4 at genuinely usable speeds.
What it beat: New RTX 5060 Ti 16GB (~$450: the VRAM ceiling walls you out of the models worth running) and 4060-class cards (bandwidth-starved for this job).
Tighter budget? This IS the save-money pick. Below it, run small models (8B) on whatever GPU you have.
It's used, mining-era silicon with no manufacturer warranty — the honest 3/5. Manage the risk: buy ONLY from established business sellers with 30-day+ returns and real stock (our standing rule — never one-off listings), expect to repaste and check thermal pads, and stress-test within the return window.
Common concerns (4)
- Two-card setups — x8/x8 PCIe 4.0 is fine for inference (splitting layers/tensors needs little bus bandwidth); 350W each, so 1200W+ PSU, and power-limit to ~280W for ~5% loss and much less heat.
- Skip NVLink — llama.cpp/exllama don't meaningfully benefit; save the $200 bridge money.
- Software support? — Full CUDA support, still first-class in every inference stack. Ampere isn't going anywhere for years.
- eBay safely — business sellers, 1,000+ feedback, quantity listings, 30-day returns. Test hard in week one: memtest_vulkan, sustained load, all 24GB touched.
