Google's new Gemini Omni model: an AI that turns any input into any output

The Verge1 d ago

Aerial view of Mountain View in Silicon Valley in daylight. — Photo: Zetong Li / Pexels

Google's AI research unit has unveiled its new 'Gemini Omni' model. According to an extensive hands-on review published in The Verge, the model is positioned as a universal multimodal AI capable of converting any input (text, image, video, audio, code) into any output.

In The Verge's test, the model produced a short video from a photograph, a music composition from a text, and a 3D model from an audio recording's spoken scene description. The Verge writer Alex Heath commented: 'Gemini Omni's capability is broad enough to eliminate the importance of tracking which model can do what.'

In Google DeepMind's launch blog post, Gemini Omni's 'universal multimodal' structure is described as merging into a single model the functions of Imagen 4, Veo 3.5, and Lyria 2 — models that had previously been developed separately. Google DeepMind chief Demis Hassabis described the model as 'the most comprehensive product of our AI research so far.'

According to Google's limited published technical information, Gemini Omni uses an architecture called a 'unified representation space'. This architecture enables different kinds of inputs to be represented in a shared vector space and to be converted directly between one another.

A notable point in The Verge's test was that the model also produced high-quality results for synthetic content (deepfake) generation. Heath wrote that he had created a professionally realistic fake video from a photograph of himself: 'The result could fool any viewer; if I saw my own face saying something like this, I would believe it.'

In its safety policy announcement, Google said the model would be shipped with SynthID visual and audio content provenance markers. SynthID is a watermarking system whose digital signature can be detected by content-scanning algorithms in image or audio recordings. Google said it had also applied a filter blocking prompts requesting face or voice data of political figures.

EU AI Office Brussels spokesperson Margrethe Vestager said after Google's announcement: 'The release of multimodal AI systems requires assessment under the EU AI Act framework as a high-risk classification; we will be examining Google's safety measures through our review process.'

Research-community concerns include the misuse potential of the generative model, copyright issues and energy consumption. Stanford University Institute for Human-Centered AI director Fei-Fei Li, in a post on X, wrote: 'As the capability boundary of multimodal systems expands, control mechanisms must develop in parallel.'

Developer availability of Gemini Omni will begin via Google AI Studio with limited access from 1 June. General public release has been announced for September 2026. Integration of the model into Google Search is planned with the Gemini 4.0 release.

On market positioning, Google's move represents a significant counter to OpenAI's GPT-5 and Anthropic's Claude Opus 4.7 in competition. Bloomberg Intelligence analyst Mandeep Singh said: 'With its unified multimodal models, Google is strengthening its position in the enterprise market against the Microsoft-OpenAI partnership.' This article is general information; given the pace of AI technology development, information may evolve over time.

This article is an AI-curated summary based on The Verge. The illustration is a stock photo by Zetong Li from Pexels.