Efficient image generation AI StableCascade

2 min readFeb 16, 2024

What is efficient?

Devices that everyone in the future will have

GitHub - Stability-AI/StableCascade

Contribute to Stability-AI/StableCascade development by creating an account on GitHub.

github.com

It seems that the training time and inference time are shorter than StableDiffusion.
It seems to use an efficient architecture.

Würstchen: An Efficient Architecture for Large-Scale Text-to-Image...

We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with…

openreview.net

Furthermore, our compact latent representations allows us to perform inference over twice as fast, slashing the usual costs and carbon footprint of a state-of-the-art (SOTA) diffusion model significantly, without compromising the end performance.

[slashing the usual costs and carbon footprint] That sounds like 2024.

how to use

install

A sample notebook is also available in the GitHub repository, but it didn't work, so I installed it from diffuser.

stabilityai/stable-cascade · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

pip install git+https://github.com/kashif/diffusers.git@a3dc21385b7386beb3dab3a9845962ede6765887

HuggingFace had
pip install git+ https://github.com/kashif/diffusers.git@wuerstchen-v3
, but it was broken, so I looked at past commits.

Run

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

device = "cuda"
num_images_per_prompt = 2

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade",  torch_dtype=torch.float16).to(device)

prompt = "Clothes that make you popular"
negative_prompt = "illustration"

prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=num_images_per_prompt,
    num_inference_steps=20
)
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.half(),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images

#Now decoder_output is a list with your PIL images
for i,image in enumerate(decoder_output):
  image.save(f"{i}.jpg")

Clothes that make you popular

🐣

I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com

I am creating applications using machine learning and AR technology.

I send machine learning / AR related information.

GitHub

Twitter
Medium