ExoBrain
chips and hardwarecompute infrastructureconsumer AIlocal AI

Nvidia ships a beautiful disappointment

Nvidia's DGX Spark faces criticism for poor inference performance relative to its price, highlighting the critical importance of memory bandwidth in local AI hardware.

Joel Miller

Joel Miller

3 min read
Nvidia ships a beautiful disappointment

After months of delays, Nvidia’s DGX Spark has finally reached reviewers’ hands. The £3,999 compact desktop, built around the GB10 Grace Blackwell chip, promises one petaflop of AI performance in a Mac mini-sized box. UK retailer Scan suggests stock will arrive by late October, making this one of the first consumer-facing products combining Nvidia’s ARM CPU and Blackwell GPU architectures.

But sadly, the reviews paint a mixed picture. The hardware is stunning, but delivering 4.35 tokens per second on Llama 3.1 70B with quantisation is a disappointment. A Mac Studio with M3 Ultra manages 6.5 tokens per second on the same model, whilst a dual RTX 5090 setup reaches ~50 tokens per second. The DGX Spark’s 128GB unified memory offers flexibility for development work, yet its 273GB/s bandwidth becomes a major bottleneck at this price point. The Nvidia RTX Pro 6000 has by comparison 1,800GB/s (although is double the price for just the GPU). Memory bandwidth remains the most critical performance factor in the LLM age.

Meanwhile Apple announced the M5 chip this week, claiming over 4x the peak GPU compute performance of M4. The M5’s Neural Accelerator embedded in each GPU core targets exactly the workloads where the DGX Spark struggles. Where the DGX Spark shows promise is in Nvidia’s software ecosystem. The DGX OS, CUDA playbooks, and fine-tuning capabilities create a rich environment for AI research and model development. Clustering demonstrations with the extremely fast ConnectX-7 connectivity options both with other Sparks and even Mac Studios show interesting possibilities for distributed inference. But for standalone inference work, the value proposition weakens considerably.

The wider competitive landscape offers other alternatives. HP’s Z2 G1a mini workstation with an AMD AI Max+ 395 costs around £2,000 in the UK and delivers comparable performance. Even older discrete GPUs like the RTX 3090 remain compelling for many workloads. Looking ahead, Nvidia’s more powerful GB300 will represent the true “supercomputer on a desk” (with 20 FP4 PFLOPS) but expect very high pricing and limited availability.

Takeaways: The DGX Spark confirms that building competitive ARM-based AI hardware remains difficult outside Apple’s ecosystem. GPU availability is improving, but pricing reflects continued supply constraints. For most developers, a Mac with M-series silicon or a traditional x86 workstation with discrete Nvidia RTX cards offers much better value. The real question isn’t whether the DGX Spark is good enough, but whether local AI development needs specialised hardware at all when cloud GPU availability is probably the easier route to high performance compute. But the Spark is nonetheless a beautiful piece of hardware for those who can afford it, and will no doubt inspire new AI innovation with its focus on experimentation.