What is DiffusionGemma? Google DeepMind's open model that runs local AI 4x faster

Google DeepMind released the latest member of its open Gemma series this week. According to user tests reported by Ars Technica, DiffusionGemma runs roughly four times faster on laptop and mobile hardware than a conventional transformer of the same size.
The difference is architectural. Transformer models, the standard for chat and text generation over the past three years, produce output one token at a time. Diffusion models shape and refine the entire response in parallel.
The diffusion architecture has been the established choice for image generation. DeepMind's contribution is to make the same approach practically usable for language tasks. The company released two versions, at 2 billion and 9 billion parameters, as open weights.
The speed advantage comes from hardware utilisation. Token-by-token generation has a device's GPU waiting between steps. Diffusion treats the whole response as parallel computation, which doubles or triples utilisation.
In benchmarks reported by Ars Technica, an Apple Silicon laptop with 8 GB of memory ran the classical 9-billion-parameter Gemma 2 at around 12 tokens per second. DiffusionGemma on the same hardware can output roughly 48 tokens per second.
General accuracy scores come in marginally below the classical model. DeepMind says DiffusionGemma scores about 3% lower than transformer Gemma 2 on MMLU and HumanEval. In return there are gains on speed, latency and energy.
For developers, the practical impact is local agent applications. AI-powered features running on the device no longer need a cloud round-trip, which feeds new data into the privacy debate.
Mobile hardware makers are already engaged. Engineers at Qualcomm, Samsung and MediaTek told Ars Technica that optimisation work to run the model on phone silicon is under way.
The open-weights decision matters for competition. Against the closed models of OpenAI and Anthropic, a fast and runnable diffusion model gives manufacturers and applications a concrete lever for distribution.
Vesper covers tech news for information only. The performance figures cited come from publisher-published tests and will vary with hardware, drivers and workload.
Read next

How Wing's drone delivery moved from novelty to routine
Alphabet's drone delivery unit Wing has shifted, over the past year, from a TV story to a logistics watchlist entry. Operational figures reported by TechCrunch show drone delivery is no longer a novelty but a settled flow.

No one needs AI to search the internet, court rules against Google
A US federal court has made a notable finding in the multi-strand antitrust case against Google: AI is not an unavoidable feature of a search service. The ruling, summarised by Ars Technica, directly undercuts Google's AI Overviews defence.

Nearly a million passports and photo IDs were left unprotected on the public internet — where the chain broke
A data breach reported by The Verge has revealed that nearly a million passports and photo IDs belonging to a user verification platform were left unprotected on the open internet. The exposure was not the result of a hack but a configuration error.

North Koreans behind nearly half of US tech industry hacks, CrowdStrike report says
A new CrowdStrike report says 46% of targeted cyber attacks on the US tech industry in the past year came from North Korea-linked actors. According to TechCrunch, the operations are not only data theft but infiltration via fake job applications.

NASA names the Artemis III crew and sets an aggressive flight timeline
According to Ars Technica, NASA has formally named the crew for Artemis III, the first crewed lunar surface mission since Apollo, and set an aggressive launch timeline. Several critical mission architecture components are still in testing.
