Converting mlprogram output to image ~ Example using GFPGAN

MLBoy
3 min readJan 3, 2024

--

Case study GFPGAN

GFPGAN can improve facial images with poor quality.
We will be able to output images from this Pytorch model using CoreML.

It is not possible to make a good image as it is.

When converting a Pytorch model to a CoreML model using CoreMLTools, you can set the output to an image as shown below, but normally the image will not be output properly as is.

import coremltools as ct
dummy_input = torch.randn((1,3,512,512)).cuda()
jit_model = torch.jit.trace(model,dummy_input)
coreml_model = ct.convert(
jit_model,
convert_to="mlprogram",
compute_precision=ct.precision.FLOAT32,
compute_units=ct.ComputeUnit.CPU_AND_GPU,
inputs=[
ct.ImageType(name="image",
shape=dummy_input.shape,
bias=[-1,-1,-1],
scale=1/127.5)
],
outputs=[ct.ImageType(name="output")])

The result is pure black ↓ It’s sad.

This is because
when running a Pytorch model, the results are usually post-processed after model inference to create an image, and if you convert just the model to a CoreML model, no post-processing is included.

Post-processing required

For example, the output of torch is normalized to -1 to 1, but the pixel range of the image is 0 to 255, so if you convert it to an image as is, the value will be too small and the image will be pitch black.
Therefore, multiplying the resulting value by 127.5 and adding 127.5 returns it to 0~255.

# pytorch post proccessing.
_tensor = _tensor.squeeze(0).float().detach().cpu().clamp_((-1,1))
_tensor = (_tensor - (-1)) / (1 - (-1))
img_np = _tensor.numpy()
img_np = img_np.transpose(1, 2, 0)
img_np = (img_np * 255.0).round()

This is post-processing using BasicSR’s tensor2img, and it’s hard to understand, but I think it basically means that -1~1 is changed to 0~1 and multiplied by 255.
I think it is equivalent to the following.

output = torch.clamp(_tensor * 127.5 + 127.5, min=0, max=255)

In CV2, it needs to be treated as (512,512,3), but when converting to a CoreML model, a 4-dimensional shape (1,3,512,512) is fine, so squeeze and transpose are not necessary here.

Create a wrapped model in Python and then convert it

So, how can I use post-processing on iOS? You can handle multiarrays in swift, but let’s include them in the CoreML model.
Create a model class with post-processing added before conversion and convert.

class CoreMGFPGAN(torch.nn.Module):
def __init__(self, gfpgan):
super(CoreMGFPGAN, self).__init__()
self.gfpgan = gfpgan
def forward(self, image):
gfpgan_out = self.gfpgan(image)
output = torch.clamp(gfpgan_out * 127.5 + 127.5, min=0, max=255)
return output

model = CoreMGFPGAN(gfpgan).eval()

All you have to do is convert it normally.

from coremltools.converters.mil.input_types import ColorLayout
ex = torch.randn((1,3,512,512)).cuda()
jit_model = torch.jit.trace(model,ex)

import coremltools as ct

coreml_model = ct.convert(
jit_model,
convert_to="mlprogram",
compute_precision=ct.precision.FLOAT32,
compute_units=ct.ComputeUnit.CPU_AND_GPU,
inputs=[
ct.ImageType(name="image",
shape=ex.shape,
bias=[-1,-1,-1],
scale=1/127.5)
],
outputs=[ct.ImageType(name="output")])

The image will now be output.

Click here for a conversion example.

🐣

I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com

I am creating applications using machine learning and AR technology.

I send machine learning / AR related information.

GitHub

Twitter
Medium

--

--