Google’s new AI tool Whisk uses images as prompts

Google has another AI tool to add to the pile. Whisk is a Google Labs image generator that lets you use an existing image as your prompt. But its output only captures the “essence” of your starter image, not recreates it with new details. So, it’s better suited to brainstorming and rapid-fire visualizations than editing the source image.

The company describes Whisk as “a new kind of creative tool.” The input screen starts with a bare-bones interface with inputs for style and topic. This simple introductory interface lets you choose from just three predefined styles: sticker, enamel pin, and plushie. I suspect Google found that these three allow for the kind of rough-outline output for which the experimental tool is most ideal in its current form.

As you can see in the image above, it produced a solid image of a Wilford Brimley plushie. (Taking photos of celebrities is prohibited by Google’s terms, but Wilford sneaked through the gate with Quaker Oats on hand without the guard noticing.)

Whisk also includes a more advanced editor (found by clicking “Start from scratch” from the main screen). In this mode, you can use text or a source image in three categories: subject, scene, and style. There’s also an input bar for adding more text for finishing touches. However, in their current form, the advanced controls didn’t produce results that looked like my queries.

For example, check out my attempt to generate the late Mr. Brimley in a lightbox scene in the style of a walrus plushie image I found online.

Whisk showed something like a Wilford Brimley-esque actor eating oatmeal inside a lightbox frame. As far as I can tell, the guy is not a plushie. So, it’s clear why Google recommends using this tool for “fast visual exploration” and less for production-ready content.

Google admits that Whisk will only draw from “some key features” of your source image. “For example, the generated subject’s height, weight, hairstyle, or skin color may be different,” the company warns.

To understand why that is, look no further than Google’s description of how Whisk works. It uses the Gemini language model to write a detailed caption of the source image you upload. It then feeds that description into the Imagen 3 image generator. So, the result is an image based on Gemini’s words about your image — not the source image itself.

Whisk is only available in the US, at least for now. You can try it out on the project’s Google Labs site.

Leave a Comment Cancel reply