Google’s Gemini Omni turns images, audio, and text into video — and that’s just the beginning

🔥 Check out this must-read post from TechCrunch 📖

📂 **Category**: AI,Media & Entertainment,gemini omni flash,Google,google gemini omni,google io,google io 2026,Veo

✅ **What You’ll Learn**:

When Google launched Gemini three years ago, the goal was to build a large multimedia language model — a single neural network that was trained on text, images, audio, and video and could create content in any of those formats.

Today, at Google I/O developers conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimedia models that Google CEO Sundar Pichai says will be able to “create anything from any input.”

The Omni will start the video. Users can now combine images, audio, video and text, and instead of just stringing these inputs together, Omni considers them all to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science.

Omni also allows users to edit images using plain text commands instead of complex editing software, similar to Google’s Nano Banana.

Google already has a dedicated video template, Veo, that lets users turn text and images into videos, and even direct and customize avatars. But Nicole Breshtova, director of product management at Google DeepMind, says today’s release is more than just an update to Veo: “It’s the next step toward progress in combining Gemini’s intelligence with the rendering capabilities of our media models.”

One example that Koray Cavukoglu, DeepMind’s chief technologist, gave to reporters during a press conference on Monday: When Omni was given a prompt as simple as “Explain protein folding with clay,” it quickly showed a stop-motion demonstration video with a voiceover saying: “Proteins start out as chains of amino acids. They fold into patterns like alpha helices and flat parts called beta sheets, forming a perfect 3D shape.”

Omni’s long-term vision is broader, including the model used to do things like create images from audio, or audio from video.

“When we first announced Gemini, our first AI model was natively multimodal,” Pichai said during the press conference. “We knew that training it on a combination of text, code, audio, images and video would give it a deeper understanding of the world. With global models, AI goes from predicting text to simulating reality. Gemini Omni is the next step in this direction.”

As part of the release, users will also be able to create videos using their digital avatars – something OpenAI popularized in the now-defunct Sora app with Cameos. To prevent deepfakes, users would have to go through a dedicated product, which involves registering themselves and speaking publicly with a series of numbers, according to Prishtova. The avatar is then stored for future use.

Additionally, all videos created with Omni will include Google’s SynthID digital watermark, which allows users to verify whether videos were created via Gemini products.

The first model in the family is the Gemini Omni Flash, which launches today on the Gemini app, YouTube Shorts, and AI Creative Studio Flow. Flash will be able to display 10 seconds of video, which Prishtova says is not a limitation of the model, but rather a decision based on the desire to get it into more hands and the expectation that most users won’t want to create much longer videos yet. However, longer video durations are in the works in the near future.

Google appears to be promoting Omni Flash as more of a consumer tool. The examples Breshtova and Gabi Barth-Maron, a research engineer at DeepMind, gave on a call with TechCrunch about the uses of digital avatars were all personal: creating a video of yourself winning an award or going to the moon, or removing a bystander from the background of a video you took while on vacation.

Parth Maroon put it more simply: “It’s like personal memes.”

“We definitely focused on making this easy to use for consumers,” Breshtova said. “Not many video models have been able to bridge that gap with consumers, so this is our game to do that.”

Ease of use comes with a caveat: Prishtova and Parth Maron point out that editing prompts need to be very specific, otherwise Omni risks over-editing or inadvertently changing elements the user wants to keep — a problem Nano Banana users may encounter.

Despite the near-term consumer focus, Omni’s enterprise and creative implications are clear, and Google will be making Omni available via an application programming interface (API) in the coming weeks. The avatar creation tool — a capability available today in Shorts — is something Google expects content creators to pick up. But on a larger scale, end-to-end multimedia workflows could be transformative for advertisers and filmmakers.

Startup Luma AI is building something similar, an agent tool that can create an entire ad campaign based on a short synopsis and product image, powered by its “unified” model.

“We’re actually very proud of the model’s text display capabilities, which is really useful for things like ads,” Breshtova said. “If you want a product out there, or even just a logo, it has to be precise… We definitely expect filmmakers and other creatives to use this model as well.”

More professional use cases may be better served by the Omni Pro model, which should perform better across all Omni tasks. Google hasn’t said when it will launch Pro yet, but Brichtova said it will happen when “we feel like we’re at a point where we have a step change over Flash.”

Follow the rest of the important news for Google IO 2026

Google search is over, you know

Google updates Gemini to address ChatGPT and Cloud

Google offers Gemini Spark, a 24/7 proxy assistant with Gmail integration

How to use Google’s new information agents

When you buy through links in our articles, we may earn a small commission. This does not affect our editorial independence.

💬 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Googles #Gemini #Omni #turns #images #audio #text #video #beginning**

🕒 **Posted on**: 1779252450

🌟 **Want more?** Click here for more info! 🌟

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the beginning

Follow the rest of the important news for Google IO 2026

By

Leave a Reply Cancel reply