AI that can generate various images of a person from a photo

MLBoy
2 min readFeb 16, 2024

--

You can specify anything by that person’s face.

I’ll try it with this person anyway.

Who. This is Jeff, the person from Pexels.

https://www.pexels.com/@jeffreyreed/
One photo is fine.
He inputs a request for a photo of this person’s face in text.

a man img wearing a kimono and juggling snakes, in the sea

A fun image is generated.

This can be done with a model called Photo Maker.
It’s open source.

How to use

install

pip install diffusers
pip install git+https://github.com/TencentARC/PhotoMaker.git
pip install accelerate
git clone https://github.com/TencentARC/PhotoMaker.git
cd PhotoMaker/

Model pipeline initialization

import torch
import numpy as np
import random
import os
from PIL import Image

from diffusers.utils import load_image
from diffusers import EulerDiscreteScheduler, DDIMScheduler
from huggingface_hub import hf_hub_download
from photomaker import PhotoMakerStableDiffusionXLPipeline

base_model_path = 'SG161222/RealVisXL_V3.0'
device = "cuda"
photomaker_ckpt = hf_hub_download(repo_id="TencentARC/PhotoMaker", filename="photomaker-v1.bin", repo_type="model")

pipe = PhotoMakerStableDiffusionXLPipeline.from_pretrained(
base_model_path,
torch_dtype=torch.bfloat16,
use_safetensors=True,
variant="fp16",
).to(device)

pipe.load_photomaker_adapter(
os.path.dirname(photomaker_ckpt),
subfolder="",
weight_name=os.path.basename(photomaker_ckpt),
trigger_word="img"
)
pipe.id_encoder.to(device)

#pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
#pipe.fuse_lora()

pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
# pipe.set_adapters(["photomaker"], adapter_weights=[1.0])
pipe.fuse_lora()

Run

Enter the text of the image you want to generate in the prompt and run it.

input_folder_name = 'jeff' # input images directory
image_basename_list = os.listdir(input_folder_name)
image_path_list = sorted([os.path.join(input_folder_name, basename) for basename in image_basename_list])

input_id_images = []
for image_path in image_path_list:
input_id_images.append(load_image(image_path))

## Note that the trigger word `img` must follow the class word for personalization
prompt = "close up portrait, a man img wearing a kimono and juggling snakes, in the sea, face, high quality, film grain"
negative_prompt = "(asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch)"
generator = torch.Generator(device=device).manual_seed(42)

## Parameter setting
num_steps = 50
style_strength_ratio = 30
start_merge_step = int(float(style_strength_ratio) / 100 * num_steps)
if start_merge_step > 30:
start_merge_step = 30

images = pipe(
prompt=prompt,
input_id_images=input_id_images,
negative_prompt=negative_prompt,
num_images_per_prompt=4,
num_inference_steps=num_steps,
start_merge_step=start_merge_step,
generator=generator,
).images

save_path = "./outputs"
os.makedirs(save_path, exist_ok=True)
for idx, image in enumerate(images):
image.save(os.path.join(save_path, f"photomaker_{idx:02d}.png"))

Output as PIL Image.

A person who has completely become a cat person

🐣

I’m a freelance engineer.
Work consultation
Please feel free to contact us with a brief development description.
rockyshikoku@gmail.com

I am creating applications using machine learning and AR technology.

I send machine learning / AR related information.

GitHub

Twitter
Medium

--

--