我在 Google 的新 AI 世界生成器中建造了棉花糖城堡

Techcrunch

大約 1 個月前

AI 生成摘要

Google DeepMind 發布了實驗性 AI 工具 Project Genie，使用者可透過文字或圖像提示創建互動式遊戲世界，旨在收集未來世界模型開發的反饋和訓練數據。

I built marshmallow castles in Google’s new AI world generator | TechCrunch

Topics

Latest

Amazon

Apps

Biotech & Health

Climate

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Staff

Events

Startup Battlefield

StrictlyVC

Newsletters

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

I built marshmallow castles in Google’s new AI world generator

Google DeepMind is opening up access to Project Genie, its AI tool for creating interactive game worlds from text prompts or images.

Starting Thursday, Google AI Ultra subscribers in the U.S. can play around with the experimental research prototype, which is powered by a combination of Google’s latest world model Genie 3, its image generation model Nano Banana Pro, and Gemini.

Coming five months after Genie 3’s research preview, the move is part of a broader push to gather user feedback and training data as DeepMind races to develop more capable world models.

World models are AI systems that generate an internal representation of an environment, and can be used to predict future outcomes and plan actions. Many AI leaders, including those at DeepMind, believe world models are a crucial step to achieving artificial general intelligence (AGI). But in the nearer term, labs like DeepMind envision a go-to-market plan that starts with video games and other forms of entertainment and branches out into training embodied agents (aka robots) in simulation.

DeepMind’s release of Project Genie comes as the world model race is beginning to heat up. Fei-Fei Li’s World Labs late last year released its first commercial product called Marble. Runway, the AI video generation startup, has also launched a world model recently. And former Meta chief scientist Yann LeCun’s startup AMI Labs will also focus on developing world models.

“I think it’s exciting to be in a place where we can have more people access it and give us feedback,” Shlomi Fruchter, a research director at DeepMind, told TechCrunch via video interview, smiling ear-to-ear in clear excitement over Project Genie’s release.

DeepMind researchers that TechCrunch spoke to were upfront about the tool’s experimental nature. It can be inconsistent, sometimes impressively generating playable worlds, other times producing baffling results that miss the mark. Here’s how it works.

TechCrunch Founder Summit 2026: Tickets Live

TechCrunch Founder Summit: Tickets Live

You start with a “world sketch” by providing text prompts for both the environment and a main character, whom you will later be able to maneuver through the world in either first or third person view. Nano Banana Pro creates an image based on the prompts that you can, in theory, modify before Genie uses the image as a jumping off point for an interactive world. The modifications mostly worked, but the model occasionally stumbled and would give you purple hair when you asked for green.

You can also use real life photos as a baseline for the model to build a world on, which, again, was hit or miss. (More on that later.)

Once you’re satisfied with the image, it takes a few seconds for Project Genie to create an explorable world. You can also remix existing worlds into new interpretations by building on top of their prompts, or explore curated worlds in the gallery or via the randomizer tool for inspiration. You can then download videos of the world you just explored.

DeepMind is only granting 60 seconds of world generation and navigation at the moment, in part due to the budget and compute constraints. Because Genie 3 is an auto-regressive model, it takes a lot of dedicated compute – which puts a tight ceiling on how much DeepMind is able to provide to users.

“The reason we limit it to 60 seconds is because we wanted to bring it to more users,” Fruchter said. “Basically when you’re using it, there’s a chip somewhere that’s only yours and it’s being dedicated to your session.”

He added that extending it beyond 60 seconds would diminish the incremental value of the testing.

“The environments are interesting, but at some point, because of their level of interaction and the dynamism of the environment is somewhat limited. Still, we see that as a limitation we hope to improve on.”

Whimsy works, realism doesn’t

When I used the model, the safety guardrails were already up and running. I couldn’t generate anything resembling nudity, nor could I generate worlds that even remotely sniffed of Disney or other copyrighted material. (In December, Disney hit Google with a cease-and-desist, accusing the firm’s AI models of copyright infringement by training on Disney’s characters and IP and generating unauthorized content, among other things.) I couldn’t even get Genie to generate worlds of mermaids exploring underwater fantasy lands or ice queens in their wintery castles.

Still, the demo was deeply impressive. The first world I built was an attempt to live out a small childhood fantasy, in which I could explore a castle in the clouds made up of marshmallows with a chocolate sauce river and trees made of candy. (Yes, I was a chubby kid.) I asked the model to do it in claymation style, and it delivered a whimsical world that childhood me would have eaten up, the castle’s pastel-and-white colored spires and turrets looking puffy and tasty enough to rip off a chunk and dunk it into the chocolate moat. (Video above.)

That said, Project Genie still has some kinks to work out.

The models excelled at creating worlds based on artistic prompts, like using watercolors, anime style or classic cartoon aesthetics. But it tended to fail when it came to photorealistic or cinematic worlds, often coming out looking like a video game rather than real people in a real setting.

It also didn’t always respond well when given real photos to work with. When I gave it a photo of my office and asked it to create a world based on the photo exactly as it was, it gave me a world that had some of the same furnishings of my office – a wooden desk, plants, a grey couch – laid out differently. And it looked sterile, digital, not lifelike.

When I fed it a photo of my desk with a stuffed toy, Project Genie animated the toy navigating the space, and even had other objects occasionally react as it moved past them.

That interactivity is something DeepMind is working on improving. There were several occasions when my characters walked right through walls or other solid objects.

When DeepMind released Genie 3 initially, researchers highlighted how the model’s auto-regressive architecture meant that it could remember what it had generated, so I wanted to test that by returning to parts of the environment it generated already to see if it would be the same. For the most part, the model succeeded. In one case, I generated a cat exploring yet another desk, and only once when I turned back to the right side of the desk did the model generate a second mug.

The part I found most frustrating was the way you navigated the space using the arrows to look around, the spacebar to jump or ascend, and the W-A-S-D keys to move. I’m not a gamer, so this didn’t come naturally to me, but the keys were often non-responsive, or they sent you in the wrong direction. Trying to walk from one side of the room to a doorway on the other side often became a chaotic zigzagging exercise, like trying to steer a shopping cart with a broken wheel.

Fruchter assured me that his team was aware of these shortcomings, reminding me again that Project Genie is an experimental prototype. In the future, he said, the team hopes to enhance the realism and improve interaction capabilities, including giving users more control over actions and environments.

“We don’t think about [Project Genie] as an end-to-end product that people can go back to everyday, but we think there is already a glimpse of something that’s interesting and unique and can’t be done in another way,” he said.

Topics

Senior Reporter

Rebecca Bellan is a senior reporter at TechCrunch where she covers the business, policy, and emerging trends shaping artificial intelligence. Her work has also appeared in Forbes, Bloomberg, The Atlantic, The Daily Beast, and other publications.

You can contact or verify outreach from Rebecca by emailing [email protected] or via encrypted message at rebeccabellan.491 on Signal.

Tickets are live at the lowest rates of the year. Save up to $680 on your pass — and if you’re among the first 500 registrants, score a +1 pass at 50% off.Meet investors. Discover your next portfolio company. Hear from 250+ tech leaders, dive into 200+ sessions, and explore 300+ startups building what’s next. Don’t miss these one-time savings.

我在 Google 的新 AI 世界生成器中建造了棉花糖城堡

I built marshmallow castles in Google’s new AI world generator | TechCrunch

Topics

More from TechCrunch

I built marshmallow castles in Google’s new AI world generator

TechCrunch Founder Summit 2026: Tickets Live

TechCrunch Founder Summit: Tickets Live

Whimsy works, realism doesn’t

Most Popular

Tesla is killing off the Model S and Model X

The price gap between Waymo and Uber is narrowing

Anthropic launches interactive Claude apps, including Slack and other workplace tools

This founder cracked firefighting — now he’s creating an AI gold mine

TikTok users freak out over app’s ‘immigration status’ collection — here’s what it means

Researchers say Russian government hackers were behind attempted Poland power outage

Microsoft gave FBI a set of BitLocker encryption keys to unlock suspects’ laptops: Reports