Generate images using GAN on iOS devices

15 min readJun 6, 2021

Theory

Machine learning model can be used on iOS

By using CoreML, the trained model can be used even on edge devices.
All of the following models can be used on iOS devices without server communication.

Image Classification：Convert Efficientnet to Core ML [There is a converted model]

Similarity measurement

Object Detection

Semantic Segmentation：Converting DeepLabV3 to CoreML Model.

Image Generation：This Article

Style Transfer

Sound Classification

Text Classification

Data classification

Data Regression

Recommendation

Here is an example of an app that uses a machine learning model on iOS

AnimateU: Make your face photo look like an anime with image generation

Blur: Cut and blur people with semantic segmentation

Rework: Pix2Pix to remove facial treatment

What is GAN?

GAN is a field of machine learning.
By training with images, you can generate images that are similar to the data group or transform the images into the taste of data groups .

It consists of a neural network called Generator (first half) and Discriminator (second half).

In a neural network, a calculation graph with a structure similar to the human brain repeatedly calculates and automatically becomes a graph value optimized for the task.

Generator receives numerics or image and outputs the image. Early stage, generator generates a random noise image.

Discriminator is an image classification network. It classify “correct answer” and “incorrect answer”.
The discriminator is trained by the back propagation method to outputs 1 when it received a data from dataset (correct answer) , and make an output approaches 0 when it received a fake image from the generator(incorrect answer).

The Generator passes its output image to the Discriminator and is trained by the backpropagation method so that the Discriminator’s output approaches the correct answer(1).

By alternately training the two networks, the Discriminator will improve its discernment and the Generator will generate realistic images.

After training, Generator will make realistic images.

In this article, we will convert a trained GAN model Generator for iOS to generate the images.

GAN collection

UGATIT https://github.com/taki0112/UGATIT

CartoonGAN https://github.com/mnicnc404/CartoonGan-tensorflow

WarpGAN https://qiita.com/john-rocky/items/a2693bf9e08292504284

How to use GAN on iOS

Machine learning models are often written in frameworks such as TensorFlow and Pytorch.
TensorFlow is an open source for machine learning released by Google and Pytorch by Microsoft.
The language is Python.
As it is, they cannot be used on iOS.

To use these models on iOS, we need to convert them to Apple’s Core ML framework.
The model converted to Core ML format can be used in an xcode project.
For conversion, use Core ML Tools which is a conversion tool released by Apple.
Core ML Tools accepts TensorFlow and Pytorch machine learning models and outputs Core ML models.
The Core ML model file can be used by dragging and dropping it into an Xcode project or downloading from URL.

To use the Core ML model in your app, you can easily use it by issuing a model execution request with VNCoreMLRequest of Apple’s Vision (computer vision framework).

You don’t have to know Python

It’ OK you don’t understand the codes like this.

The original GAN model is written in Python like this. You don’t need to understand all the machine learning code to convert and use it in Core ML.

The point is to find the model to convert in the script.

Conversion script

Now, here is a script to convert from other frameworks to Core ML with Core ML Tools.

import coremltools as ctmlmodel = ct.convert(model,
                     inputs=[ct.ImageType(shape=[1,256,256,3],bias=[-1,-1,-1], scale=1/127)]

Give one of the following formats to the “model” argument.

TensorFlow2:
tf.keras.Model, HDF file path(.h5), SavedModel directory path, concrete function
TensorFlow1:
Frozen graph, Frozen graph file path (.pb)
Pytorch:
TorchScript

Once you can get the model format of the table, all you have to do is pass it to the above conversion script.
I think the key to converting to Core ML is how to extract these model formats from Python scripts.

This time, only the generator will be converted, so the point is to find the generator efficiently from the original script.
My strong recommendation is to look for the Generator in the test script of the original project.
You can find a model that can be converted with high probability by following the test script.
You also need to understand how to preprocess the input, as in the conversion script, which can also be found in the tests of the original project.
I will tell you about these in the practical edition.

Save model

mlmodel.save('./gan.mlmodel')

(Up to this point, it will be executed as Python code.)

Use the model with Xcode

If the input is an image

Use the Vision framework.
Create a VNCoreMLRequest and execute it with ImageRequestHandler.

import Visionlazy var coreMLRequest:VNCoreMLRequest = {
     let model:VNCoreMLModel = try! VNCoreMLModel(for: gan(configuration: MLModelConfiguration()).model)
     let request = VNCoreMLRequest(model: model, completionHandler: self.completionHandler)
     return request
 }let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])
DispatchQueue.global(qos: .userInitiated).async { [self] in
     do {
         try handler.perform([self.request])
     } catch let error {
         print(error)
     }
 }

When the Request has finished executing, the Completion Handler is called.

If the input is a numeric column

Use the prediction function within the Core ML framework.

let model = dcgan()
let input = try? MLMultiArray(shape: [1,100] as [NSNumber], dataType: MLMultiArrayDataType.float32)
  
for i in 0...input!.count - 1 {
     input![i] = NSNumber(value: Float32.random(in: 0...1))
}
  
let mlinput = dcganInput(dense_input: input!) 
let output = try? model.prediction(input: mlinput)

Turn the output into an image

Method 1: Change the Core ML Model itself to an image output type

By default, the output of the Core ML model is multi array.
As it is, it cannot be displayed as an image in the app.
The output of the CoreML model can be a pixel buffer.
Here’s the Core ML Tools code snippet for that.

import coremltools.proto.FeatureTypes_pb2 as ft
  
mlmodel = ct.models.MLModel('./gan.mlmodel') 
spec = mlmodel.get_spec() 
output = spec.description.output[0] 
output.type.imageType.colorSpace = ft.ImageFeatureType.RGB 
output.type.imageType.height = 256 
output.type.imageType.width = 256 
ct.utils.save_spec(spec, './ganImageOut.mlmodel')

After converting the Core ML M model to the spec format, change the output type and save it.
However, if the shape of the output does not match the settings of the width, height, and color channel of Core ML, the output will not be successful.
In that case, you need to change the output shape on the last layer.

Method 2: Use Core ML Helpers

The easiest way is to leave the output in a multi-array and use OSS’s Core ML Helpers.
Core ML Helpers has helper functions that can convert multi-arrays to images.
Since there are options that can handle various widths, heights, and channel arrangement shapes, it is recommended to check whether the image can be output properly using Core ML Helpers.

Converting from MultiArray to Image with CoreML Helpers.

Practice

Find the model to convert

You can find the models in GitHub and the official model hubs of TensorFlow , Pytorch or such as in the conference pages of machine learning and computer vision.
GitHub search Results: GAN
TensorFlowHub
Pytorch Hub

You can also find the models at machine learning conferences and social media.

Search for GAN on GitHub.
Models that look interesting are lined up.

Cases the model have already been learned and cases you let the model learn by yourself

In the case of the model on GitHub, the author of the repository often publishes the weight (model state) trained and saved in advance as a file. In this case, you can convert that pre-trained model to a CoreML model.
If you don’t have a pre-trained model, or if you want to train with your own data, you’ll need to train the model yourself and then convert it.

DCGAN conversion (TensorFlow2): Training and then conversion

A basic model of GAN made in the early days.
It is a model that gives a random numerical value to Generator and generates an image like a data image.
For the practice, we will train DCGAN in the TensorFlow tutorial and then convert it.

Training at Colab

TensorFlow Core DCGAN Tutorial Colaboratory Notebook

With this tutorial , we can train DCGAN that generates number image from random noises.

You can train DCGAN by simply running the Colab cells all the way down.
It does everything from module installation → dataset loading → model definition → training steps.
Even if you don’t know what it is, it’s okay if you can do it for the time being.

Training ends in tens of minutes.

Convert on Colab

After training, We will move on to conversion.
We will write the conversion as it is in the Colab notebook.
As an aside, Colab is also very useful for CoreML conversion. Since the latest version such as Python environment is installed, you will not get stuck in the environment settings.
You can use the GPU for free, access the data in Google Drive, train as above, or clone the model from GitHub to try it out and then do CoreML conversion.
By default, the environment settings such as Python, TensorFlow, Pytorch, and NVIDIA GPU drivers are set.
It’s free and safe, and you can make as many new notebooks as you like.
Suggestions for properties, methods, and arguments are also displayed.

Install Core ML Tools

Add a new codeline to Colab with the + button and install Core ML Tools.

!pip install coremltools
# When running a Shell script in Colab Put ! on.

Write a conversion script

import coremltools as ct mlmodel = ct.convert(model) # Let's find a generator to put in the model!

Let’s find the model to pass to this conversion script.
The trick to finding a model instance is to find the line that is actually generating the image.
If you are generating an image, there should be a Generator there.

def generate_and_save_images(model, epoch, test_input):
   # Notice `training` is set to False.
   # This is so all layers run in inference mode (batchnorm).    predictions = model(test_input, training=False) # ←Here！！！！！！！！
   fig = plt.figure(figsize=(4,4)) 
   for i in range(predictions.shape[0]):
      plt.subplot(4, 4, i+1)
      plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap=’gray’)
      plt.axis(‘off’)   plt.savefig(‘image_at_epoch_{:04d}.png’.format(epoch)) 
   plt.show()

Test input is being passed to an instance of the model class. And the Predict result is displayed by plt.
So, this model is an argument of the function, so if you look for the place where the function is used, …

Used in the training step on line 20 of Colab.
The generator passed to the function here is the model to be converted this time.

def train(dataset, epochs):
  for epoch in range(epochs):
    start = time.time()    for image_batch in dataset:
      train_step(image_batch)     # Produce images for the GIF as we go
    display.clear_output(wait=True)
    generate_and_save_images(generator, # ←Here it is！！！！！！！！！
                             epoch + 1,
                             seed)    # Save the model every 15 epochs
    if (epoch + 1) % 15 == 0:
      checkpoint.save(file_prefix = checkpoint_prefix)    print (‘Time for epoch {} is {} sec’.format(epoch + 1, time.time()-start))   # Generate after the final epoch 
  display.clear_output(wait=True)
  generate_and_save_images(generator, epochs, seed)

By the way, this generator is initialized in the 10th line, and TensorFlow2 keeps the state after training of the model instance, so you can pass it to the conversion script as it is.

mlmodel = ct.convert(generator)

When executed, we can see like this.

Running TensorFlow Graph Passes: 100% | ██████████ | 5/5 [00:00 <00:00, 13.17 passes / s]
Converting Frontend ==> MIL Ops: 100% | ███ ███████ | 65/65 [00:00 <00:00, 1223.26 ops / s]
Running MIL optimization passes: 100% | ██████████ | 17/17 [00: 00 <00:00, 324.31 passes / s]
Translating MIL ==> MLModel Ops: 100% | ██████████ | 51/51 [00:00 <00:00, 136.84 ops / s]

Save it and take a look at the description of the model.

mlmodel.save(‘./dcgan.mlmodel’) 
print(mlmodel)

The input is a multi-array in the shape of [1,100] (100 random numbers in a row).

You can download it from the file folder on the left of Colab.

Looking at Xcode, it looks like this.

Run in Xcode

let model = dcgan() // Model initialization 
let input = try? MLMultiArray(shape: [1,100] as [NSNumber], dataType: MLMultiArrayDataType.float32) // Make an empty multi-array with a shape of 1 * 100 for i in 0…input!.count — 1 {
  input![i] = NSNumber(value: Float32.random(in: 0…1))
} // Put random numbers into the multi-array. The number to enter here is in the range of 0 to 1.let mlinput = dcganInput(dense_input: input!) // Shape the multi-array into the input of the model. 
let output = try? model.prediction(input: mlinput) // Run

Input is normalized

Now we need to find out the range of numbers in the input . It is 0 to 1 in this model.
The only way to find out is to look at the input pre-processing and output post-processing of the Python model.
Machine learning inputs are normalized from raw pixel values 0–255 to smaller values to streamline calculations.

For example, -1 to 1 or 0 to 1.
In this case, on line 6 of Colab, we subtract 127.5 from the input of the dataset image and then divide by 127.5.

train_images = (train_images - 127.5) / 127.5

The original pixel value is in the range 0–255, so if you subtract 127.5 and then divide by 127.5, you can see that it is normalized to an input from 0 to 1.

Preprocessing : (Data image — 127.5) / 127.5
Numerical value of input after normalization: 0 to 1
Preprocessing: (Data image / 127.5) -1–1 to 1
Numerical value of input after normalization: -1 to 1

There are many patterns.

The numerical ranges of input and output are basically the same.

Let’s display the output of the model as an image with CoreML Helpers .

See the link for how to use CoreML Helpers.

let uiImage = output?.Identity.image(min: 0, max: 1, channel: nil, axes: nil) 
DispatchQueue.main.async { imageView.image = uiImage }

Something like a numerical value was displayed on iOS .

AnimeGanv2 conversion (TensorFlow1): Convert from trained model

The conversion of TensorFlow1 is quite different from that of TensorFlow2.

AnimeGANv2 is a model that takes an image as input and converts it into an anime style.

This is also converted in Colab.

Try the model

Clone the repository from GitHub.

!git clone https://github.com/TachibanaYoshino/AnimeGANv2.git

The structure of the repository looks like this.
The GAN repository is generally configured to call the model class (AnimeGAN v2.py) from main.py or test.py and execute it.

Move the working directory.

cd AnimeGANv2/

The default version of Colab’s TensorFlow is 2, so switch to 1.

%tensorflow_version 1.x

After cloning the model, it’s a good idea to run a test script to see if the model works.
The test method is usually described on the project page on GitHub .

Here is the test script provided by the author.

We will execute it as it is. The argument specifies the weight checkpoint of the pretrained model, the image to be tested, and the save destination of the result.

!python test.py --checkpoint_dir checkpoint/generator_Hayao_weight --test_dir dataset/test/HR_photo --style_name Hayao/HR_photo

The test image is converted to an animated style and saved in the Results directory.

Now that we know that the model works, let’s convert it.

Make a freeze graph. Where is the model?

The installation procedure of CoreML Tools is the same as DCGAN above.
With TensorFlow1, we needs to create and convert a frozen graph.

To do this, we need the .pbtext that defined the model graph and the checkpoint weights.
Checkpoints are located in the Checkpoint directory of the repository.
pbtext needs to be created.
To make pbtext, we need to look for a model graph.
Where can I find the model graph?

The trick is to look in the test file that generated the image.
We ran test.py in the previous test.
Let’s open test.py.
The model is running on line 67.

# test.pyfake_img = sess.run(test_generated, feed_dict = {test_real : sample_image})

This code are running a graph session with sample_image as input.
When you need to find the model in Tensorflow1, look for this sess.run.
Usually the model are being run like this.
Now that we have the graph, let’s create a pbtext.

The script to create pbtext is like this.

# test.pytf.train.write_graph(sess.graph_def, './', 'animegan.pbtxt')

My recommended method is to write the script that creates pbtext directly to test.py.
Since it is the code that actually generates the image, you can surely get the graph including Generator. Write it just below
the
sess.run
line you found earlier .

We also need the name of the output node of the graph to make the frozen graph, so write the method to print it out just below.

# test.pygraph = sess.graph
print([node.name for node in graph.as_graph_def().node])

Now when you run the test script (test.py) again,
animegan.pbtxt will be saved in the current directory
and all the node names will be printed out.

[‘test’,’generator / G_MODEL / A / MirrorPad / paddings’,’generator / G_MODEL / A / MirrorPad’,’generator / G_MODEL / A / Conv / weights / Initializer / truncated_normal / shape’,’generator / G_MODEL / A / Conv / weights / Initializer / truncated_normal / mean’,
・・・
‘generator / G_MODEL / out_layer / Conv / weights / read’,’generator / G_MODEL / out_layer / Conv / dilation_rate’,’generator / G_MODEL / out_layer / Conv / Conv2D’,’generator / G_MODEL / out_layer / Tanh’,’ save / filename / input’,
…
‘ save / Assign_74’,’ save / Assign_75',’ save / Assign_76',’ save / restore_all’]

Find the last node in the section labeled generator. The name of the output is
generator / G_MODEL / out_layer / Tanh
.
Tanh is an activation function. The last node usually called like this.

Make a frozen graph with the pbtext you got, the name of the output and the checkpoint.

from tensorflow.python.tools.freeze_graph import freeze_graphgraph_def_file = 'animegan.pbtxt'
checkpoint_file = 'checkpoint/generator_Hayao_weight/Hayao-64.ckpt' 
frozen_model_file = './frozen_model.pb' 
output_node_names = 'generator/G_MODEL/out_layer/Tanh'  freeze_graph(input_graph=graph_def_file,
              input_saver="",
              input_binary=False,
              input_checkpoint=checkpoint_file,
              output_node_names=output_node_names,
              restore_op_name="save/restore_all",
              filename_tensor_name="save/Const:0",
              output_graph=frozen_model_file,
              clear_devices=True,
              initializer_nodes="")

Convert.

Pass this frozen graph to CoreMLTools for conversion.

import coremltools as ct
mlmodel = ct.convert('./frozen_model.pb',
                     inputs=[ct.ImageType(shape=[1,256,256,3],bias=[-1,-1,-1], scale=1/127)])
mlmodel.save('./animegan.mlmodel')

Since the input is an image this time, the image type is specified in inputs.
Here you need to specify the shape of the input image and the preprocessing method.
Look for this in the test script as well.

# test.pydef test(checkpoint_dir, style_name, test_dir, if_adjust_brightness, img_size=[256,256]): # It turns out that the height and width are 256
     # tf.reset_default_graph()
    result_dir = 'results/'+style_name
    check_folder(result_dir)
    test_files = glob('{}/*.*'.format(test_dir))

    test_real = tf.placeholder(tf.float32, [1, None, None, 3], name='test') # It turns out that there are 3 channels of batch 1.・・・    sample_image = np.asarray(load_test_data(sample_file, img_size)) # Preprocessed with a function called load_test_data

load_test_data is in utils.py, and if you follow it further, you can see that preprocessing divides by 127.5 and subtracts 1.

def preprocessing(img, size):
     h, w = img.shape[:2]
     if h <= size[0]:
         h = size[0]
     else:
         x = h % 32
         h = h - x
     if w < size[1]:
         w = size[1]
     else:
         y = w % 32
         w = w - y
     # the cv2 resize func : dsize format is (W ,H)
     img = cv2.resize(img, (w, h))
     return img/127.5 - 1.0 # ←Here

In the pre-processing, the value of each pixel is divided by 127.5, so the scale of ImageType is 1 / 127.5, and 1 is subtracted from all color channels, so the bias is [-1, -1, -1] (red). -Blue / Green) is specified. Color channels are specified separately because some models have pre-processing that subtracts different values for each color channel.

In this state, the output multi-array can be used in the code snippet of the Vision framework of Theory and CoreML Helpers.

func completionHandler(request:VNRequest?,error:Error?) {
    let result = request?.results?.first as! VNCoreMLFeatureValueObservation
    let multiArray = result.featureValue.multiArrayValue
    let uiImage = multiArray?.image(min: -1, max: 1, channel: nil, axes:nil) # Since it was normalized with / 127.5-1, the output range is -1 to 1. If the image output does not work well, the shape of the output does not match, so try setting axes: (3,1,2).
}