Gemini Omni came to change everything you knew about Google AI
This new multimodal model can receive text, images, audio and video at the same time and generate content with unprecedented quality
Google did not come to play at its I/O 2026. The technology giant presented one of the most ambitious launches in its recent history, and as we have already been telling you in La Opinión, the company is building an artificial intelligence ecosystem that no longer looks anything like what we knew just a year ago.
In the midst of a cascade of announcements, there was one that stole all the attention from the first moment it appeared on stage: Gemini Omni, the new model that promises to completely redefine how users interact with Google's AI.
And it is not an exaggeration. This is something genuinely different.
What makes Gemini Omni different from everything before it
To understand why Gemini Omni matters so much, you have to understand the problem that existed before. Generative video models, like Google's own Veo, worked with “text in, video out” logic. You wrote something to it, and the model tried to create it. Useful, yes. But limited compared to what's coming now.
Gemini Omni is what Google calls a “natively multimodal” model, and that means it can receive text, images, audio, and video clips simultaneously to generate much richer and more accurate content. We are not talking about combining one or another type of input, but rather about mixing all of them at the same time within a single prompt to obtain a result that no previous model could produce with that cohesion. Sundar Pichai himself summed it up during the event: “Gemini Omni is our new model capable of generating samples in any output mode from any input data.”
Plus, the model comes bundled with all of Gemini's knowledge and reasoning, meaning it doesn't just “generate pretty pictures.” It understands the context, reasons about it, and then produces the content. That's a huge leap from generative video tools that simply interpret keywords.
And the icing on the cake is that Google confirmed that Gemini Omni will replace Veo within the Gemini application. Veo's era as the company's flagship video model is over, and its successor is considerably more powerful.
Create, edit and clone yourself: the three superpowers of Gemini Omni
Once you dive into what Gemini Omni can do, the model reveals itself to be a kind of audiovisual production studio tucked inside an app. Its capabilities are grouped into three large areas that, together, change the creative experience radically.
The first thing is multimodal video generation. You can combine written instructions, reference photos, music, and previous clips to build entire scenes from scratch. The result is much more realistic than what previous generations produced, with special precision in elements that have historically been the Achilles heel of generative AI, such as signs, subtitles, and people typing text on the screen.
The second thing is advanced editing of existing videos, which is perhaps the most disruptive use case. Gemini Omni doesn't just create new content; It can also take a recording you made with your cell phone and modify it in depth. From changing the camera angle to generating new characters, altering the sequence of scenes or adding details that completely transform the visual narrative. We are talking about professional video editing with natural language, without having to touch traditional editing software.
The third thing is the Avatar feature, which is where the model gets truly futuristic. With just a text prompt, Gemini Omni can generate videos using the user's voice, appearance and style without the user having to record themselves in front of any camera. This feature is already launching this week for YouTube Shorts users, turning anyone with an account into a potential video content creator – no camera, no set, no manual editing.
Google Gemini Omni availability
Initial access is not universal, but it is not reserved for an unattainable elite either. Google has already enabled Gemini Omni Flash—the first model in the Omni family—in the Gemini app, Google Flow, and YouTube Shorts. Full access in the Gemini app is available to users with Google AI Plus, Pro and Ultra plans, all over 18 years of age.
However, Google is clear that it needs to bring this to as many people as possible, so some Omni features, especially those related to the creation of Shorts, will arrive for free on YouTube in the coming months. Developers and companies will also have access through APIs that will be opened in the coming weeks, which opens a huge field of possibilities to integrate this technology into external products.
What you have to keep in mind is that Gemini Omni is just getting started. For now the focus is on video generation and editing, but Google announced that later the model will also be able to create images and audio with the same multimodal logic. In other words, what we saw at Google I/O 2026 is just the initial version of something that will grow in capabilities steadily.
The big question is no longer whether generative AI is going to transform the way people create content. That's already happening. The question now is how quickly Gemini Omni will get into the hands of everyday users, and how long will it take for creating a quality video without technical experience to become as normal as posting a photo on Instagram. Based on what Google showed this week, that time isn't that far away.

