Super-resolution has come a long way. SeeSR complements the details to make the image clearer.

MLBoy
3 min readJun 16, 2024

--

Complementing every detail

↑This is a low-resolution image, but when upscaled with the popular super-resolution model RealESRGAN,

↑The details are smoothed out like this.
So if you use SeeSR to complement the details with Diffusion, you get the following:

Previous super-resolution techniques, even if the image size increased and became a little clearer, tended to smooth out the details. There was a limit to how much detail that was not in the original image could be made clearer.
However, it is now possible to super-resolution while complementing the details with a diffusion model.
SeerSR first analyzes what is in the image and then complements it from the text.

set up

I did it on Colab. I needed to set it to high memory, otherwise it would be insufficient.

Clone the SeeSR repository

git clone https://github.com/cswry/SeeSR.git
cd SeeSR

Edit requirements.txt as follows (because I don’t use conda)

requirements.txt

diffusers==0.21.0
torch==2.0.1
pytorch_lightning
accelerate
transformers==4.25.0
xformers
loralib
fairscale
pydantic==1.10.11
gradio==3.24.0
accelerate==0.25.0
diffusers==0.21.0
torch==2.0.1
pytorch_lightning==2.1.3
transformers==4.25.0
xformers
loralib==0.1.2
fairscale==0.4.13
opencv-python==4.9.0.80
chardet==5.2.0
einops==0.7.0
scipy==1.10.1
timm==0.9.12

Install the libraries listed in requirements.txt

pip install -r requirements.txt

Create a preset/models directory to place the models you want to use.

import os
os.makedirs(os.path.join("preset", "models"), exist_ok=True)

Download the stable diffusion model from huggingface and save it in preset/models.

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch
model_id = "stabilityai/stable-diffusion-2-base"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
save_directory = "preset/models/stable-diffusion-2-base"
pipe.save_pretrained(save_directory)
scheduler.save_pretrained(save_directory + "/scheduler")

Download ram_swin_large_14m.pth from HuggingFace and place it in preset/models.

wget https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/ram_swin_large_14m.pth

Get the SeeSR folder and DAPE.pth from the GoogleDrive link provided in the repository. If you are using Colab, you can create a shortcut on your drive to access them, or you can download them locally.

execution

Put the images you want to super-resolve into preset/datasets/test_datasets and run it.

!python test_seesr.py \
--pretrained_model_path preset/models/stable-diffusion-2-base \
--prompt '' \
--seesr_model_path /content/drive/MyDrive/pretrained\ models/seesr \
--ram_ft_path /content/drive/MyDrive/pretrained\ models/DAPE.pth \
--image_path preset/datasets/test_datasets \
--output_dir preset/datasets/output \
--start_point lr \
--num_inference_steps 50 \
--guidance_scale 5.5 \
--process_size 512

Large images are divided into small tiles, but if they are too large they will exceed memory limits.

So, if you’re having trouble with details disappearing when upscaling, this is definitely for you.

The Diffusion step takes quite a bit of time, so there is also a faster model called AddSR.

🐣

I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com

I am creating applications using machine learning and AR technology.

I send machine learning / AR related information.

GitHub

Twitter
Medium

--

--