Efficient image generation AI StableCascade

MLBoy
2 min readFeb 16, 2024

What is efficient?

Devices that everyone in the future will have

It seems that the training time and inference time are shorter than StableDiffusion.
It seems to use an efficient architecture.

Furthermore, our compact latent representations allows us to perform inference over twice as fast, slashing the usual costs and carbon footprint of a state-of-the-art (SOTA) diffusion model significantly, without compromising the end performance.

[slashing the usual costs and carbon footprint] That sounds like 2024.

how to use

install

A sample notebook is also available in the GitHub repository, but it didn't work, so I installed it from diffuser.

pip install git+https://github.com/kashif/diffusers.git@a3dc21385b7386beb3dab3a9845962ede6765887

HuggingFace had
pip install git+ https://github.com/kashif/diffusers.git@wuerstchen-v3
, but it was broken, so I looked at past commits.

Run

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

device = "cuda"
num_images_per_prompt = 2

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)

prompt = "Clothes that make you popular"
negative_prompt = "illustration"

prior_output = prior(
prompt=prompt,
height=1024,
width=1024,
negative_prompt=negative_prompt,
guidance_scale=4.0,
num_images_per_prompt=num_images_per_prompt,
num_inference_steps=20
)
decoder_output = decoder(
image_embeddings=prior_output.image_embeddings.half(),
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=0.0,
output_type="pil",
num_inference_steps=10
).images

#Now decoder_output is a list with your PIL images
for i,image in enumerate(decoder_output):
image.save(f"{i}.jpg")
Clothes that make you popular

🐣

I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com

I am creating applications using machine learning and AR technology.

I send machine learning / AR related information.

GitHub

Twitter
Medium

--

--