Tech

What is DiffusionGemma? Google DeepMind's open model that runs local AI 4x faster

Ars Technica2 h ago
A modern laptop screen displaying abstract code lines
A modern laptop screen displaying abstract code linesPhoto: Daniil Komov / Pexels

Google DeepMind released the latest member of its open Gemma series this week. According to user tests reported by Ars Technica, DiffusionGemma runs roughly four times faster on laptop and mobile hardware than a conventional transformer of the same size.

The difference is architectural. Transformer models, the standard for chat and text generation over the past three years, produce output one token at a time. Diffusion models shape and refine the entire response in parallel.

The diffusion architecture has been the established choice for image generation. DeepMind's contribution is to make the same approach practically usable for language tasks. The company released two versions, at 2 billion and 9 billion parameters, as open weights.

The speed advantage comes from hardware utilisation. Token-by-token generation has a device's GPU waiting between steps. Diffusion treats the whole response as parallel computation, which doubles or triples utilisation.

In benchmarks reported by Ars Technica, an Apple Silicon laptop with 8 GB of memory ran the classical 9-billion-parameter Gemma 2 at around 12 tokens per second. DiffusionGemma on the same hardware can output roughly 48 tokens per second.

General accuracy scores come in marginally below the classical model. DeepMind says DiffusionGemma scores about 3% lower than transformer Gemma 2 on MMLU and HumanEval. In return there are gains on speed, latency and energy.

For developers, the practical impact is local agent applications. AI-powered features running on the device no longer need a cloud round-trip, which feeds new data into the privacy debate.

Mobile hardware makers are already engaged. Engineers at Qualcomm, Samsung and MediaTek told Ars Technica that optimisation work to run the model on phone silicon is under way.

The open-weights decision matters for competition. Against the closed models of OpenAI and Anthropic, a fast and runnable diffusion model gives manufacturers and applications a concrete lever for distribution.

Vesper covers tech news for information only. The performance figures cited come from publisher-published tests and will vary with hardware, drivers and workload.

This article is an AI-curated summary based on Ars Technica. The illustration is a stock photo by Daniil Komov from Pexels.

Read next