Gemini Omni: How to Actually Use Google’s New Video Model

Gemini Omni just dropped at I/O 2026. Here's a hands-on guide to using Omni Flash, the 10-second cap, and the avatar trick that confuses most users.

Riley Brooks2026-05-207 min readBeginner

It’s Tuesday night. You scroll through your feed and half the people you follow are posting weird 10-second clips of themselves turning into mirror-people, or claymation explainers of protein folding. That’s Gemini Omni, and Google dropped it about 36 hours ago at I/O 2026. The hype is real, but the actual experience of using it is less polished than the demos suggest – so here’s what I figured out after spending an evening with it.

The scenario: you have one weekend and a Google AI Plus account

Say you’ve got a 6-second selfie video of yourself walking through a parking lot. You want to turn it into something interesting – yourself walking on Mars, or yourself as a claymation figure, or whatever. Before Omni, this took either real editing skills or a clumsy chain of three different AI tools. Now it’s one chat window.

The catch: you have to know exactly where to click and what the model will (and won’t) do. The blog post Google published makes it sound effortless. It isn’t quite.

What Gemini Omni actually is (in one paragraph)

Announced May 19, 2026 by DeepMind CTO Koray Kavukcuoglu, Omni Flash takes mixed inputs – images, audio, video, text, all at once – and generates video grounded in Gemini’s real-world knowledge, then lets you refine through conversation. Not a Veo rebrand: Omni and Veo are now separate model surfaces, with Omni sitting under Gemini while Veo continues as Google’s dedicated video line. That split matters for where you find it and what it costs.

That’s the marketing. The interesting parts are buried.

Getting access without wasting an hour

Three ways in. The Gemini app is the fastest – available to Google AI Plus, Pro, and Ultra subscribers globally as of May 19, 2026. Google Flow works better for multi-shot projects and needs the same subscription. YouTube Shorts and the YouTube Create App are rolling out at no cost this week, so if you don’t have a paid plan, that’s your route.

API access? Not yet. Google says developers and enterprise customers get it “in the coming weeks” – that’s a direct quote from the launch post, with no specific date. If you’re a dev planning a product around Omni, you’re waiting.

The avatar setup nobody explains

This is the part where most tutorials hand-wave and you get stuck staring at a recording screen. To use Omni’s “you in the video” feature, you have to build a personal avatar first. The onboarding requires recording yourself and speaking a series of numbers aloud – confirmed by Nicole Brichtova, Google DeepMind’s product director for Omni, in a TechCrunch interview. The avatar gets stored for reuse, which is Google’s anti-deepfake mechanism: the model can only generate your likeness after you’ve actively enrolled it.

If you skip this step and try to generate yourself doing something, Omni produces a generic person instead of you. People keep running into this in early Reddit threads, and the answer is always the same: you didn’t enroll your avatar.

Pro tip: Do the avatar enrollment in a quiet room with even lighting on your face. Reading numbers aloud sounds silly but the cadence helps the model lock your voice. Once it’s saved, you don’t redo it per project.

Actually generating something

Open the Gemini app, look for the Videos or Create video tab in the side nav. Then:

Drop in your inputs. One video maximum per project – that’s a hard constraint, not a UI bug. You can pair that single video with several still images if you need more visual material. This trips up most people on day one.
Write the prompt as an instruction, not a description. “Replace the parking lot with a Martian valley, keep my walk cycle, dusty red light” works better than “a cool Mars video.”
Wait. Clip generation isn’t instant – and it costs quota.
Refine through chat. Each instruction builds on the last: characters stay consistent, the physics hold, the scene remembers context. That continuity is what separates this from older generation tools.

Total time for a usable 10-second clip on my first try: about 8 minutes, including two re-prompts. Not bad.

The honest limitations

The 10-second wall. Turns out this isn’t a hard model limit – it’s a deployment decision. Brichtova described it to TechCrunch as a way to widen access while compute demand is high, betting that most users don’t yet need longer clips. Could change overnight. Don’t design a content series around 10 seconds being permanent, but don’t expect 60-second clips next week either.

The quota burn. Per early hands-on reports, one generation eats a meaningful slice of your daily allowance. Google hasn’t published exact numbers for Omni-specific limits as of May 20, 2026 – you’ll find your ceiling by hitting it. Plan accordingly and don’t burn through your quota on test runs before you have a real prompt ready.

Audio editing isn’t really shipping. This one matters most. Google’s launch post states plainly that, beyond the avatar feature, editing audio and speech in existing videos is something it is “still working to test” so it can “better understand how we can bring this capability to users responsibly.” So when you see a demo of someone changing what a person in a video says – you can’t do that yet. Only your own avatar voice works, because you enrolled it.

Where this goes next

A higher-end Omni Pro model is planned. No release date. Brichtova said it arrives when Google sees a “step change above Flash” – which is at least honest about the vagueness. Meanwhile, Decrypt reported Hassabis calling Omni “our new model that can create anything from any input” and describing it as a step toward AGI. Google’s launch post also notes plans to expand Omni’s output modalities to include image and audio over time.

So the Omni you’re using this weekend is the smallest, most cautious version of what’s coming. That’s either reassuring or frustrating depending on what you wanted from it.

FAQ

Is Gemini Omni free?

Only on YouTube Shorts and the YouTube Create App. The Gemini app and Google Flow both require a paid Google AI Plus, Pro, or Ultra subscription.

Can I download my generated video without the SynthID watermark?

No. Every Omni-generated clip carries a SynthID digital watermark, verifiable through the Gemini app. It’s invisible to viewers but always present – per Google’s official launch post. Given where the deepfake conversation is heading, that’s probably the right call on Google’s part, though it does mean Omni clips can always be identified as AI-generated.

Why does my generated person look generic instead of like me?

You skipped avatar enrollment – the step where you record yourself speaking a number sequence in the Gemini app settings. Here’s the fix: go to settings, find avatar setup, record the voice and number sequence, save it. Then re-run your original prompt, this time explicitly referencing yourself. Omni will pull the stored avatar. New clips will look like you. The enrollment only has to happen once; it carries across all future projects.

Next move: Open the Gemini app right now and do the avatar enrollment before you generate anything else. You’ll thank yourself when you actually want to put your face in a clip and don’t have to interrupt your creative flow to set it up.