- Stable Diffusion can also be converted to Core ML.
It can also be used on MacOS and (although it takes longer to run) iOS.
Conversion steps
Install ml-stable-diffusion
Install Apple’s stable duffusion repository.
git clone https://github.com/apple/ml-stable-diffusion.git
cd ml-stable-diffusion
pip3 install -r requirements.txt
pip3 install omegaconf
pip3 install safetensors
case 1: Convert Stable Diffusion model for Hugging Face Hub
Models with Hugging Face Hub’s model_index.json can be converted.
python3 -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker --model-version <model-version-string-from-hub> -o <output-mlpackages-directory>
case 2: Convert various other models Checkpoint → diffuser → Core ML
There are Stable Diffusion models in the world that are trained using a variety of data.
These are often uploaded as safetensors or ckpt containing model structures.
Convert these files to diffuser format and then convert them to Core ML.
Checkpoint→diffuser
Create a file named convert_original_stable_diffusion_to_diffusers.py and write the following script.
Or download it from diffusers .
convert_original_stable_diffusion_to_diffusers.py
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Conversion script for the LDM checkpoints. """
import argparse
import importlib
import torch
from diffusers.pipelines.stable_diffusion.convert_from_ckpt import download_from_original_stable_diffusion_ckpt
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--checkpoint_path", default=None, type=str, required=True, help="Path to the checkpoint to convert."
)
# !wget https://raw.githubusercontent.com/CompVis/stable-diffusion/main/configs/stable-diffusion/v1-inference.yaml
parser.add_argument(
"--original_config_file",
default=None,
type=str,
help="The YAML config file corresponding to the original architecture.",
)
parser.add_argument(
"--config_files",
default=None,
type=str,
help="The YAML config file corresponding to the architecture.",
)
parser.add_argument(
"--num_in_channels",
default=None,
type=int,
help="The number of input channels. If `None` number of input channels will be automatically inferred.",
)
parser.add_argument(
"--scheduler_type",
default="pndm",
type=str,
help="Type of scheduler to use. Should be one of ['pndm', 'lms', 'ddim', 'euler', 'euler-ancestral', 'dpm']",
)
parser.add_argument(
"--pipeline_type",
default=None,
type=str,
help=(
"The pipeline type. One of 'FrozenOpenCLIPEmbedder', 'FrozenCLIPEmbedder', 'PaintByExample'"
". If `None` pipeline will be automatically inferred."
),
)
parser.add_argument(
"--image_size",
default=None,
type=int,
help=(
"The image size that the model was trained on. Use 512 for Stable Diffusion v1.X and Stable Siffusion v2"
" Base. Use 768 for Stable Diffusion v2."
),
)
parser.add_argument(
"--prediction_type",
default=None,
type=str,
help=(
"The prediction type that the model was trained on. Use 'epsilon' for Stable Diffusion v1.X and Stable"
" Diffusion v2 Base. Use 'v_prediction' for Stable Diffusion v2."
),
)
parser.add_argument(
"--extract_ema",
action="store_true",
help=(
"Only relevant for checkpoints that have both EMA and non-EMA weights. Whether to extract the EMA weights"
" or not. Defaults to `False`. Add `--extract_ema` to extract the EMA weights. EMA weights usually yield"
" higher quality images for inference. Non-EMA weights are usually better to continue fine-tuning."
),
)
parser.add_argument(
"--upcast_attention",
action="store_true",
help=(
"Whether the attention computation should always be upcasted. This is necessary when running stable"
" diffusion 2.1."
),
)
parser.add_argument(
"--from_safetensors",
action="store_true",
help="If `--checkpoint_path` is in `safetensors` format, load checkpoint with safetensors instead of PyTorch.",
)
parser.add_argument(
"--to_safetensors",
action="store_true",
help="Whether to store pipeline in safetensors format or not.",
)
parser.add_argument("--dump_path", default=None, type=str, required=True, help="Path to the output model.")
parser.add_argument("--device", type=str, help="Device to use (e.g. cpu, cuda:0, cuda:1, etc.)")
parser.add_argument(
"--stable_unclip",
type=str,
default=None,
required=False,
help="Set if this is a stable unCLIP model. One of 'txt2img' or 'img2img'.",
)
parser.add_argument(
"--stable_unclip_prior",
type=str,
default=None,
required=False,
help="Set if this is a stable unCLIP txt2img model. Selects which prior to use. If `--stable_unclip` is set to `txt2img`, the karlo prior (https://huggingface.co/kakaobrain/karlo-v1-alpha/tree/main/prior) is selected by default.",
)
parser.add_argument(
"--clip_stats_path",
type=str,
help="Path to the clip stats file. Only required if the stable unclip model's config specifies `model.params.noise_aug_config.params.clip_stats_path`.",
required=False,
)
parser.add_argument(
"--controlnet", action="store_true", default=None, help="Set flag if this is a controlnet checkpoint."
)
parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
parser.add_argument(
"--vae_path",
type=str,
default=None,
required=False,
help="Set to a path, hub id to an already converted vae to not convert it again.",
)
parser.add_argument(
"--pipeline_class_name",
type=str,
default=None,
required=False,
help="Specify the pipeline class name",
)
args = parser.parse_args()
if args.pipeline_class_name is not None:
library = importlib.import_module("diffusers")
class_obj = getattr(library, args.pipeline_class_name)
pipeline_class = class_obj
else:
pipeline_class = None
pipe = download_from_original_stable_diffusion_ckpt(
checkpoint_path_or_dict=args.checkpoint_path,
original_config_file=args.original_config_file,
config_files=args.config_files,
image_size=args.image_size,
prediction_type=args.prediction_type,
model_type=args.pipeline_type,
extract_ema=args.extract_ema,
scheduler_type=args.scheduler_type,
num_in_channels=args.num_in_channels,
upcast_attention=args.upcast_attention,
from_safetensors=args.from_safetensors,
device=args.device,
stable_unclip=args.stable_unclip,
stable_unclip_prior=args.stable_unclip_prior,
clip_stats_path=args.clip_stats_path,
controlnet=args.controlnet,
vae_path=args.vae_path,
pipeline_class=pipeline_class,
)
if args.half:
pipe.to(dtype=torch.float16)
if args.controlnet:
# only save the controlnet model
pipe.controlnet.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors)
else:
pipe.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors)
Run the script to convert the checkpoint to diffuser.
python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path <MODEL-NAME>.safetensors --from_safetensors --device cpu --extract_ema --dump_path <MODEL-NAME>_diffusers
diffuser→Core ML
python -m python_coreml_stable_diffusion.torch2coreml --convert-vae-decoder --convert-vae-encoder --convert-unet --unet-support-controlnet --convert-text-encoder --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation SPLIT_EINSUM -o <MODEL-NAME>_split-einsum && python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation SPLIT_EINSUM -o <MODEL-NAME>_split-einsum
This will generate Core ML’s mlpackage and Resources folders in the directory specified by -o.
The generated models are TextEncoder.mlmodelcUnet.mlmodelc VAEEncoder.mlmodelc VAEDecoder.mlmodelc merges.txt vocab.json
in the Resources folder .
Conversion script arguments
— model-version
You can specify the HuggingFaceHub repository name or the name of a locally generated diffuser folder.
— quantize-nbits
Quantize Unet and TextEncoder to the specified number of bits (8, 6, etc.).
Reduces model size and increases inference speed.
-chunk-unet
Split Unet into two files.
This operation is required when using a model that is not quantized to 6 bits or less on iOS/iPadOS.
— attention-implementation
SPLIT_EINSUM optimizes the transformer for the neural engine. Recommended for use with iPhone and iPad.
ORIGINAL is for Mac.
Use in Swift
When using it in a project, load the Resources folder from Swift and use it.
(I don’t think you need the mlpackage file unless you want to quantize it later)
Add package
Add this Apple repository as a SwiftPackage.
Initializing the model
Add the generated Resources folder to your project from Files → Add files to… in Xcode.
import StableDiffusion
guard let resourceURL = Bundle.main.url(forResource: "Resources", withExtension: nil) else { return }
do {
pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL, controlNet: [])
try pipeline.loadResources()
} catch let error {
print(error)
}
execution
var config = StableDiffusionPipeline.Configuration(prompt: "cat")
config.negativePrompt = ""
config.seed = 1
let image = try pipeline.generateImages (configuration: config, progressHandler: { progress in
print(progress.step)
return true
}).first
let resultImage = UIImage(cgImage: image!!)
A convenient repository for use on Mac
It is convenient to download the Mac sample app from the repository below and place the converted model in the models folder for immediate use.
Annotation
You can also use Colab etc. to convert.
You need a Mac and an Xcode compiler to compile to mlmodelc.
🐣
I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com
I am creating applications using machine learning and AR technology.
I send machine learning / AR related information.