Google Cloud - Veo Studios

Turning dreams and doodles into immersive movie trailers with Gemini’s multimodal reasoning

2025
Services
Apps + Platforms,
Artificial Intelligence,
UI / UX,
Visual Design,
Front-End Development,
Back-End Development

Veo Studios turned imagination into reality in no time, trailer compliments of Google Cloud

Introduction

From the time we kicked off work for Cloud Next ‘25, we knew that Veo Studios was going to make a focal point of the interplay between multimodal reasoning, Imagen and Veo, ultimately resulting in rich generative video. The demo ended up being the living embodiment of so many of the things that the Vertex suite does best.


From outside the Serra-inspired sculptural coil, attendees could view featured videos recently made inside of the experience, but they could also see through the tall, blue, gauzy walls into what we called the ‘edit bay of the future.’ Over the course of two and a half days, 1,000+ generative images were made and edited using the interpretive power of Gemini, and over 575 videos resulted from Gemini, Imagen and Veo’s collaborations. 


The experience was the product of simple and intuitive UI that allowed people to sketch and doodle things from their imagination—whether dream images or interpretations of the world around them in the moment—and watch as Gemini recognized them. The rough doodles could be tied together and sequenced with a simple tap, and the sequenced images became a shot, shots became scenes, and so on until a final video was produced in a matter of a few minutes.

Background

Veo Studios played at the edges of possibility, with Google's most advanced AI models working in concert. The evolution of the experience was led by one key question: how can we celebrate the creative potential of multimodal reasoning in a way that feels both magical and accessible?


The showcase aimed to prove how Gemini's interpretive prowess can bridge the gap between human creativity and the generative capabilities of AI. We wanted users to experience the thrill of seeing their creative impulses transformed into cinematic narratives that retained their original meaning, but also went beyond, in many cases.


Past just showcasing the technical capabilities, we hoped to create a memorable hands-on experience that sparked imagination about future applications in content creation, from rapid prototyping in advertising to democratizing video production for creators without traditional technical skills. So, a doodle that builds a Hollywood-quality trailer.

Creative Technology

Veo Studios orchestrated a symphony of models that each played its own distinct “instrument” in the performance.


The journey began when creators provided a simple sketch or two on the touchscreen. Gemini was then able to analyze this input through sophisticated multimodal reasoning, recognizing not just what was drawn, but extrapolating how it could be output as a cinematic element. Rather than simply identifying "a cloud" or "a lighthouse," our prompt structure guided Gemini to think in terms of shot composition, emotional resonance and narrative potential—"a lighthouse stands defiant against the storm, its beam slicing through the chaos of wind and waves."


This generative output was passed to Imagen 3, where the initial visual representation was rendered with remarkable fidelity to the creative intent. Gemini then provided native image editing capabilities, allowing for real-time refinements. Don’t like the shot composition? Too close-in? Want to add a stylistic layer, like a film noir aesthetic? Veo Studios has you covered.

Veo Studios edit bay

From there, the Imagen Style Transfer tool ensured visual consistency across multiple shots, maintaining cohesive aesthetics that would later make up a single, unified video.


In the scene analysis phase, Gemini examined the generative imagery and crafted detailed prompts for Veo. These weren't simple descriptions but sophisticated directorial instructions that specified camera movements, transitions and composition. The system's ability to infer narrative from static imagery was extremely cool; creators were roused and inspired by the ways it consistently suggested motions and progressions that enhanced the storytelling rather than just animating elements arbitrarily.


Finally, Veo 2 transformed these static images into dynamic video, stitching multiple shots together with intelligent transitions that respected the narrative flow. The result was a finished movie trailer that maintained thematic and visual consistency from the first doodle to the final frame.


The technical architecture made it possible to keep the experience fluid and responsive, and most of the prompting complexity happened behind the scenes so users could focus entirely on their creative expression. Many attendees were fascinated by the seamless collaboration between these models, particularly how each maintained context awareness throughout the creative process.

Someone using Veo Studios

Results

Veo Studios captured the imagination of Next '25 attendees, many of whom returned multiple times to create new videos. The installation's ability to transform simple doodles into emotionally resonant movie trailers demonstrated the extraordinary potential of AI as a creative collaborator rather than just a tool.


The metrics tell the story: over 1,000 generative images created, 575+ completed videos, and countless moments of delight as creators watched their rough sketches evolve into cinematic worlds in front of their eyes. 


Particularly impressive was how the system maintained thematic consistency throughout the process. A simple cloud doodle might evolve into a hot air balloon journey through dawn-lit skies, with Gemini inferring the narrative potential of flight and freedom rather than just generating a literal interpretation of the sketch.


We were exceedingly proud of how we answered the call and solved for the  challenge of patching together multiple AI models into a smooth, joyous creative canvas.


Veo Studios will be available as a single-screen web experience after Cloud Next ‘25 to increase access for those who couldn’t attend the conference. We’re also expecting to see the booths re-homed and used in other Google properties across the world, so that Google and clients can continue to explore applications ranging from advertising production to educational storytelling. 
Basically, anywhere where the ability to quickly visualize narrative concepts could transform traditional workflows, Veo Studios has the potential to help.


In showcasing how AI can amplify rather than replace human creativity, Veo Studios points toward a future where the boundary between imagination and realization continues to blur—where anyone with a creative vision can bring it to life, regardless of technical skill or production resources.