Upscaler comparison: Easily try out different models with spandrel

MLBoy
4 min readJun 16, 2024

--

spandrel is a library that can restore/run pytorch models from only checkpoint files (pth). It
also has many super resolution upscalers. It is the one used in the Stable Diffusion WebUI.

The model is this repository

There’s lots on this web page.

This time we will try it with this 256*238 image.

If you simply resize it by 4,

4xRealWebPhoto_v3_atd.pth 4x t4 execution time 5 seconds

4xTextures_GTAV_rgt-s.pth t4 execution time 4 seconds

4xRealWebPhoto_v4_drct-l.pth t4 execution time 11 seconds

4xRealWebPhoto_v4_dat2.pth t4 execution time

4xRealWebPhoto_v2_rtg_s.pth t4 execution time 6 seconds

4x_UniversalUpscalerV2-Sharp_101000_G.pth t4 execution time 0.6 seconds

4xLexicaDAT2_otf.pth t4 execution time 4 seconds

4x_foolhardy_Remacri.pth t4 execution time 0.7 seconds

4x-UltraSharp.pth t4 execution time 0.6 seconds

8x_NMKD-Superscale_150000_G.pth t4 execution time 1 second

4x_NMKD-Siax_200k.pth t4 execution time 0.7 seconds

Swin2SR_RealworldSR_X4_64_BSRGAN_PSNR.pth t4 execution time 2 seconds

4xNickelbackFS_72000_G.pth t4 execution time 0.6 seconds

4x_Valar_v1.pth t4 execution time 0.6 seconds

4xNomos8kSC.pth t4 execution time 0.6 seconds

4xNomos8k_span_otf_medium.pth t4 execution time seconds

How to use spandrel

Install spandrel

pip install spandrel

Model Initialization

from spandrel import ImageModelDescriptor, ModelLoader
import torch
model_name = "your_model.pth"
# load a model from disk
model = ModelLoader().load_from_file(model_name)
# make sure it's an image to image model
assert isinstance(model, ImageModelDescriptor)
# send it to the GPU and put it in inference mode
model.cuda().eval()

Inference and image pre-/post-processing functions

from PIL import Image
import numpy as np
def pil_image_to_torch_bgr(img: Image.Image) -> torch.Tensor:
img = np.array(img.convert("RGB"))
img = img[:, :, ::-1] # flip RGB to BGR
img = np.transpose(img, (2, 0, 1)) # HWC to CHW
img = np.ascontiguousarray(img) / 255 # Rescale to [0, 1]
return torch.from_numpy(img).unsqueeze(0).float().cuda()
def torch_bgr_to_pil_image(tensor: torch.Tensor) -> Image.Image:
if tensor.ndim == 4:
# If we're given a tensor with a batch dimension, squeeze it out
# (but only if it's a batch of size 1).
if tensor.shape[0] != 1:
raise ValueError(f"{tensor.shape} does not describe a BCHW tensor")
tensor = tensor.squeeze(0)
assert tensor.ndim == 3, f"{tensor.shape} does not describe a CHW tensor"
# TODO: is `tensor.float().cpu()...numpy()` the most efficient idiom?
arr = tensor.float().cpu().clamp_(0, 1).numpy() # clamp
arr = 255.0 * np.moveaxis(arr, 0, 2) # CHW to HWC, rescale
arr = arr.round().astype(np.uint8)
arr = arr[:, :, ::-1] # flip BGR to RGB
return Image.fromarray(arr, "RGB")
def process(image: torch.Tensor) -> torch.Tensor:
with torch.no_grad():
return model(image)

inference

image = pil_image_to_torch_bgr(Image.open("input.jpg"))
image = process(image)
image = torch_bgr_to_pil_image(image)
image.save("output.png")

Yes, it’s easy.

🐣

I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com

I am creating applications using machine learning and AR technology.

I send machine learning / AR related information.

GitHub

Twitter
Medium

--

--

MLBoy
MLBoy

No responses yet