How to use TensorFlow Object Detection API. (with the Colab sample notebooks)

MLBoy
22 min readMay 13, 2021

After read this, you will have already known how to use TensorFlow Object Detection API.

Inference

The sample notebook

You can try the inference of TensorFlow Object Detection API by just running the cells in the sample notebook one by one. If you change the Image_Path in the last cell, you can try the object detection with the your own images.

TensorFlow Object Detection API has a lot of the models!!

The Codes

0. Install TensorFlow2

!pip install -U --pre tensorflow=="2.2.0"

1. Clone “Models” from the TensorFlow repository

import os 
import pathlib
# If you are in the sub directory of "models" directory, move to the "models" directory. If not, clone it. if "models" in pathlib.Path.cwd().parts:
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
elif not pathlib.Path('models').exists():
!git clone --depth 1 https://github.com/tensorflow/models

2. Install the Object Detection API and the modules required

%%bash # Enable bash commands.
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

3. Import the modules

import matplotlib 
import matplotlib.pyplot as plt
import io
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder
%matplotlib inline

4. The functions for reading the images

def load_image_into_numpy_array(path):   """ Read the images and put them to the numpy array
Puts image into numpy array to feed into tensorflow graph.
Note that by convention we put it into a numpy array with shape
(height, width, channels), where channels=3 for RGB.
Args:
path: the file path to the image
Returns:
uint8 numpy array with shape (img_height, img_width, 3)
"""
img_data = tf.io.gfile.GFile(path, 'rb').read()
image = Image.open(BytesIO(img_data))
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
def get_keypoint_tuples(eval_config): """Return a tuple list of keypoint edges from the eval config.
Args:
eval_config: an eval config containing the keypoint edges
Returns:
a list of edge tuples, each in the format (start, end)
"""
tuple_list = []
kp_list = eval_config.keypoint_edge
for edge in kp_list:
tuple_list.append((edge.start, edge.end))
return tuple_list

5. Download the model

!wget http://download.tensorflow.org/models/object_detection/tf2/20200713/centernet_hg104_512x512_coco17_tpu-8.tar.gz
!tar -xf centernet_hg104_512x512_coco17_tpu-8.tar.gz

You can get the model you want from the Model Zoo.

You can see the download URL by put the mouse over the model name in the link above.

The model zoo

It is fun to see the comparing of the benchmarks of the models.

Once the downloading and the extracting complete, you can get the directory contains the checkpoint, saved_model, pipeline.config.

6. Read the pipeline config (the configurations of the model), and build the model.

# The path to the pipeline config.
pipeline_config = "./centernet_hg104_512x512_coco17_tpu-8/pipeline.config"
# The to the checkpoint.
model_dir = "./centernet_hg104_512x512_coco17_tpu-8/checkpoint"
# Reading the model configurations.
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
# Build the model with the configurations read.
detection_model = model_builder.build(model_config=model_config, is_training=False)
# Restore the weights from the checkpoint.
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model) ckpt.restore(os.path.join(model_dir, 'ckpt-0')).expect_partial()

The repo has the “configs” folder that has collection of the configs, but there are little bit difference with the downloaded configs. So we use the downloaded one.

7. prepare the inferencing function

def get_model_detection_function(model):   """Get a tf.function for detection."""    @tf.function   def detect_fn(image):     """Detect objects in image."""      image, shapes = model.preprocess(image)
prediction_dict = model.predict(image, shapes)
detections = model.postprocess(prediction_dict, shapes)
return detections, prediction_dict, tf.reshape(shapes, [-1]) return detect_fn detect_fn = get_model_detection_function(detection_model)

8. Prepare the labels

For the inferencing of the object detection, you need the labels of the objects had been used in the training.

You can find the labels in “models/research/object_detection/data/” in the repository. We use mscoco_label_map.pbtxt because our model have been trained by COCO Datasets.

label_map_path = './models/research/object_detection/data/mscoco_label_map.pbtxt'
label_map = label_map_util.load_labelmap(label_map_path)
categories = label_map_util.convert_label_map_to_categories(
label_map,
max_num_classes=label_map_util.get_max_label_map_index(label_map),
use_display_name=True)
category_index = label_map_util.create_category_index(categories) label_map_dict = label_map_util.get_label_map_dict(label_map, use_display_name=True)

9. Run the object detection with your own images

Upload your own images to Colab, then set the path to the images to image_path.

If your images have 4 channels (alpha), you need to change the images to 3 channels.

image_dir = 'models/research/object_detection/test_images/' image_path = os.path.join(image_dir, 'image2.jpg') 
image_np = load_image_into_numpy_array(image_path)
# Things to try:
# Flip horizontally
# image_np = np.fliplr(image_np).copy()
# Convert image to grayscale
# image_np = np.tile(
# np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)
input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
detections, predictions_dict, shapes = detect_fn(input_tensor)
label_id_offset = 1
image_np_with_detections = image_np.copy()
# Use keypoints if available in detections
keypoints, keypoint_scores = None, None
if 'detection_keypoints' in detections:
keypoints = detections['detection_keypoints'][0].numpy()
keypoint_scores = detections['detection_keypoint_scores'][0].numpy()
viz_utils.visualize_boxes_and_labels_on_image_array(
image_np_with_detections,
detections['detection_boxes'][0].numpy(),
(detections['detection_classes'][0].numpy() + label_id_offset).astype(int),
detections['detection_scores'][0].numpy(),
category_index,
use_normalized_coordinates=True,
max_boxes_to_draw=200,
min_score_thresh=.30,
agnostic_mode=False,
keypoints=keypoints,
keypoint_scores=keypoint_scores,
keypoint_edges=get_keypoint_tuples(configs['eval_config']))
plt.figure(figsize=(12,16))
plt.imshow(image_np_with_detections)
plt.show()

The boxes, the labels and the confidences will be displayed.

The simplified training by few shots

The sample notebook

You can detect objects you want detect by fine tuning of the pre trained model.

After the training, you can save and restore the model that have been trained by your own objects.

This is the transfer learning of the last layer of the pre trained model.

The Codes

0. Install TensorFlow2

!pip install -U --pre tensorflow=="2.2.0"

1. Clone “Models” from the TensorFlow repository

import os 
import pathlib
# If you are in the sub directory of "models" directory, move to the "models" directory. If not, clone it.if "models" in pathlib.Path.cwd().parts:
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
elif not pathlib.Path('models').exists():
!git clone --depth 1 https://github.com/tensorflow/models

2. Install the Object Detection API and the modules required

%%bash # Enable bash commands.
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

3. Import the modules

import matplotlib 
import matplotlib.pyplot as plt
import io
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder
%matplotlib inline

4. The functions for reading the images

def load_image_into_numpy_array(path):""" Read the images and put them to the numpy array
Puts image into numpy array to feed into tensorflow graph.
Note that by convention we put it into a numpy array with shape
(height, width, channels), where channels=3 for RGB.
Args:
path: the file path to the image
Returns:
uint8 numpy array with shape (img_height, img_width, 3)
"""
img_data = tf.io.gfile.GFile(path, 'rb').read()
image = Image.open(BytesIO(img_data))
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)

5. The functions visualize the results.

def plot_detections(image_np,
boxes,
classes,
scores,
category_index,
figsize=(12, 16),
image_name=None):
"""Wrapper function to visualize detections.

Args:
image_np: uint8 numpy array with shape (img_height, img_width, 3)
boxes: a numpy array of shape [N, 4]
classes: a numpy array of shape [N]. Note that class indices are 1-based,
and match the keys in the label map.
scores: a numpy array of shape [N] or None. If scores=None, then
this function assumes that the boxes to be plotted are groundtruth
boxes and plot all boxes as black with no classes or scores.
category_index: a dict containing category dictionaries (each holding
category index `id` and category name `name`) keyed by category indices.
figsize: size for the figure.
image_name: a name for the image file.
"""
image_np_with_annotations = image_np.copy()
viz_utils.visualize_boxes_and_labels_on_image_array(
image_np_with_annotations,
boxes,
classes,
scores,
category_index,
use_normalized_coordinates=True,
min_score_thresh=0.8)
if image_name:
plt.imsave(image_name, image_np_with_annotations)
else:
plt.imshow(image_np_with_annotations)

6. Prepare the images and the label map, the annotations data

Things required.

1. Array of paths to the images.

2. Label map ( Dictionary which label point to which ID)

3. Array of IDs

4. Array of bounding boxes

<Example>

# Array of paths to the images
train_image_filenames = [
'./datasets/train_images/train_image0001.jpg',
'./datasets/train_images/train_image0002.jpg'
]
# Label map ids start from "1"
category_index = {
1: {'id': 1, 'name': 'cat'},
2: {'id': 2, 'name': 'dog'}
}

# Number of classes
num_classes = 2
# Array of IDs
gt_labels = [
np.array([1,1]),
np.array([1,2,2])
]
# Bounding boxes. Numpy array of [ miny, minx, maxy, maxx ]
gt_boxes = [
np.array([[0.436, 0.591, 0.629, 0.712],[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),
np.array([[0.464, 0.414, 0.626, 0.548],[0.313, 0.308, 0.648, 0.526],[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)
]

<Requirements>

Images have to be resized to the input size of the model.

Indexes in arrays of images, labels, boxes have to be same with each other.

<You can resize batch of images with this codes>

import os 
import glob
from PIL import Image
src = glob.glob('./dataset/*.jpg') # Set paths to original images.
dst = './dataset_resized/' # Path to the destination directory for saving.
width = 513 # width you want
height = 513 # height
for f in src:
img = Image.open(f)
img = img.resize((width,height))
img.save(dst + os.path.basename(f))

7. Put images in numpy array

train_image_dir = 'models/research/object_detection/test_images/ducky/train/' # Path to the directory of images
train_images_np = []
for filename in train_image_filenames:
train_images_np.append(load_image_into_numpy_array(filename))
# Display
plt.imshow(train_image_np[0])
plt.show()

8. Put class labels in one hot tensor, put images in tensor, put boxes in tensor

“One hot” is the array of 0 and 1. It represent the number by pointing to the index with “1”.

“1” in 2 class is [1,0] as one hot. “2” is [0,1].

<Example of one hot>

[1,1,2]
can be represented :
array([ [1., 0.],[1., 0.],[0., 1.] ], dtype=float32)
as one hot
# Convert class labels to one-hot; convert everything to tensors.
# The `label_id_offset` here shifts all classes by a certain number of indices;
# we do this here so that the model receives one-hot labels where non-background
# classes start counting at the zeroth index. This is ordinarily just handled
# automatically in our training binaries, but we need to reproduce it here.
label_id_offset = 1
train_image_tensors = []
gt_classes_one_hot_tensors = []
gt_box_tensors = []
for (train_image_np, gt_box_np, gt_label_np) in zip(
train_images_np, gt_boxes, gt_labels):
train_image_tensors.append(tf.expand_dims(tf.convert_to_tensor(
train_image_np, dtype=tf.float32), axis=0)) # put images in tensor
gt_box_tensors.append(tf.convert_to_tensor(gt_box_np, dtype=tf.float32)) # put box in Tensor
zero_indexed_groundtruth_classes = tf.convert_to_tensor(
gt_label_np - label_id_offset) # put labels in Numpy array (min:0)
gt_classes_one_hot_tensors.append(tf.one_hot(
zero_indexed_groundtruth_classes, num_classes)) # label Tensor to one hot
print('Done prepping data.')

9. Visualize ground truth boxes

dummy_scores = np.array([1.0], dtype=np.float32) # Temporarily put 100% scoresplt.figure(figsize=(30, 15)) for idx in range(5):
plt.subplot(2, 3, idx+1)
plot_detections(
train_images_np[idx],
gt_boxes[idx],
gt_labels[idx],
dummy_scores, category_index)
plt.show()

10. Build the model and restore the weights

Restore weights except the last layer. Only the last layer is initialized with random weights for training.

In this article, we use ResNet back bone RetinaNet.

Object Detection API has a lot of models.

# Download the model.
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz # you can get the URLs of other models in the link above.
!tar -xf ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz !mv ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint models/research/object_detection/test_data/

Each pre trained model has its own pipeline config dictionary, parameters like class number are being written in. The pipeline config files are in Object Detection Repo or in the directory of the model downloaded and extracted.

Rewrite the class number in the config file to the class number of your own dataset.

Head specifies the layer to restore from the checkpoint. This time, we don’t restore the weight of the part for class classification, so only specify the weight of the part for box regression.

tf.keras.backend.clear_session()print('Building model and restoring weights for fine-tuning...', flush=True)
num_classes = 1 # number of classes of your dataset
pipeline_config = 'models/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
checkpoint_path = 'models/research/object_detection/test_data/checkpoint/ckpt-0'

# Load pipeline config and build a detection model.
#
# Since we are working off of a COCO architecture which predicts 90
# class slots by default, override the `num_classes` field here.
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
model_config.ssd.num_classes = num_classes
model_config.ssd.freeze_batchnorm = True
detection_model = model_builder.build(
model_config=model_config, is_training=True)

# Set up object-based checkpoint restore --- RetinaNet has two prediction
# `heads` --- one for classification, the other for box regression. We will
# restore the box regression head but initialize the classification head
# from scratch (we show the omission below by commenting out the line that
# we would add if we wanted to restore both heads)
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
# _prediction_heads=detection_model._box_predictor._prediction_heads,
# (i.e., the classification head that we *will not* restore)
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()

# Run model through a dummy image (array of zero) so that variables are created
image, shapes = detection_model.preprocess(tf.zeros([1, 640, 640, 3]))
prediction_dict = detection_model.predict(image, shapes)
_ = detection_model.postprocess(prediction_dict, shapes)
print('Weights restored!')

11. Training

We just need the few minutes for the training.

tf.keras.backend.set_learning_phase(True)

# These parameters can be tuned; since our training set has 5 images
# it doesn't make sense to have a much larger batch size, though we could
# fit more examples in memory if we wanted to.
batch_size = 4
learning_rate = 0.01
num_batches = 100

# Select variables in top layers to fine-tune.
trainable_variables = detection_model.trainable_variables
to_fine_tune = []
prefixes_to_train = [
'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead',
'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead']
for var in trainable_variables:
if any([var.name.startswith(prefix) for prefix in prefixes_to_train]):
to_fine_tune.append(var)

# Set up forward + backward pass for a single train step.
def get_model_train_step_function(model, optimizer, vars_to_fine_tune):
"""Get a tf.function for training step."""

# Use tf.function for a bit of speed.
# Comment out the tf.function decorator if you want the inside of the
# function to run eagerly.
@tf.function
def train_step_fn(image_tensors,
groundtruth_boxes_list,
groundtruth_classes_list):
"""A single training iteration.

Args:
image_tensors: A list of [1, height, width, 3] Tensor of type tf.float32.
Note that the height and width can vary across images, as they are
reshaped within this function to be 640x640.
groundtruth_boxes_list: A list of Tensors of shape [N_i, 4] with type
tf.float32 representing groundtruth boxes for each image in the batch.
groundtruth_classes_list: A list of Tensors of shape [N_i, num_classes]
with type tf.float32 representing groundtruth boxes for each image in
the batch.

Returns:
A scalar tensor representing the total loss for the input batch.
"""
shapes = tf.constant(batch_size * [[640, 640, 3]], dtype=tf.int32)
model.provide_groundtruth(
groundtruth_boxes_list=groundtruth_boxes_list,
groundtruth_classes_list=groundtruth_classes_list)
with tf.GradientTape() as tape:
preprocessed_images = tf.concat(
[detection_model.preprocess(image_tensor)[0]
for image_tensor in image_tensors], axis=0)
prediction_dict = model.predict(preprocessed_images, shapes)
losses_dict = model.loss(prediction_dict, shapes)
total_loss = losses_dict['Loss/localization_loss'] + losses_dict['Loss/classification_loss']
gradients = tape.gradient(total_loss, vars_to_fine_tune)
optimizer.apply_gradients(zip(gradients, vars_to_fine_tune))
return total_loss

return train_step_fn

optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9)
train_step_fn = get_model_train_step_function(
detection_model, optimizer, to_fine_tune)

print('Start fine-tuning!', flush=True)
for idx in range(num_batches):
# Grab keys for a random subset of examples
all_keys = list(range(len(train_images_np)))
random.shuffle(all_keys)
example_keys = all_keys[:batch_size]

# Note that we do not do data augmentation in this demo. If you want a
# a fun exercise, we recommend experimenting with random horizontal flipping
# and random cropping :)
gt_boxes_list = [gt_box_tensors[key] for key in example_keys]
gt_classes_list = [gt_classes_one_hot_tensors[key] for key in example_keys]
image_tensors = [train_image_tensors[key] for key in example_keys]

# Training step (forward pass + backwards pass)
total_loss = train_step_fn(image_tensors, gt_boxes_list, gt_classes_list)

if idx % 10 == 0:
print('batch ' + str(idx) + ' of ' + str(num_batches)
+ ', loss=' + str(total_loss.numpy()), flush=True)

print('Done fine-tuning!')

batch 0 of 100, loss=1.2068503
batch 10 of 100, loss=0.12002414
batch 20 of 100, loss=0.10228661
batch 30 of 100, loss=0.0361837
batch 40 of 100, loss=0.011348422
batch 50 of 100, loss=0.0028579112
batch 60 of 100, loss=0.0032960502
batch 70 of 100, loss=0.0023721359

12. Test with new images the model have never seen before

Put test images in numpy array and run the new model.

The results will be returned as 100 bounding boxes and 100 labels, 100 scores.

The number i of the bounding boxes is much with the number i of the labels and the number i of the scores.

Every time API returns 100 results.

You can use the high score results in this 100 results as the result bounding boxes of the inference and visualize them.

In the visualizing function, the default thresh is 0.8.

In my testing case, when the results I can check with my eyes as correct bounding boxes were 2, the results that have the score over 0.5 were 2. Other scores were very low value like 0.02. So, it is not difficult to find the boxes and the labels we can confidence( If the training have bee successed).

pip install natsort from natsort import natsorted  
test_image_dir = './dataset/test'
test_images_np = []
file_names = os.listdir(test_image_dir)
test_paths = natsorted(file_names)
for test_path in test_paths:
test_images_np.append(np.expand_dims(
load_image_into_numpy_array(test_path), axis=0))
# Again, uncomment this decorator if you want to run inference eagerly
@tf.function
def detect(input_tensor):
"""Run detection on an input image.

Args:
input_tensor: A [1, height, width, 3] Tensor of type tf.float32.
Note that height and width can be anything since the image will be
immediately resized according to the needs of the model within this
function.

Returns:
A dict containing 3 Tensors (`detection_boxes`, `detection_classes`,
and `detection_scores`).
"""
preprocessed_image, shapes = detection_model.preprocess(input_tensor)
prediction_dict = detection_model.predict(preprocessed_image, shapes)
return detection_model.postprocess(prediction_dict, shapes)

# Note that the first frame will trigger tracing of the tf.function, which will
# take some time, after which inference should be fast.

label_id_offset = 1
for i in range(len(test_images_np)):
input_tensor = tf.convert_to_tensor(test_images_np[i], dtype=tf.float32)
detections = detect(input_tensor)

plot_detections(
test_images_np[i][0],
detections['detection_boxes'][0].numpy(),
detections['detection_classes'][0].numpy().astype(np.uint32)
+ label_id_offset,
detections['detection_scores'][0].numpy(),
category_index, figsize=(15, 20), image_name="gif_frame_" + ('%02d' % i) + ".jpg")
print(detections)# The output result below. Although omitted, there are 100 each
#'detection_boxes''detection_classes''detection_scores' is the final result
#'detection_anchor_indices'' raw_detection_boxes'' raw_detection_scores' is the data in the middle used to calculate the final result (I think, maybe)

{‘detection_anchor_indices’: <tf.Tensor: shape=(1, 100), dtype=int32, numpy= array([[49416, 50753, … 51112, 26364]], dtype=int32)>,

‘detection_boxes’: <tf.Tensor: shape=(1, 100, 4), dtype=float32, numpy= array([[[0.43758985, 0.7465773 , 0.63472795, 0.9252911 ], [0.1677289 , 0.6480559 , 0.890319 , 1. ], … [0.40918362, 0.3183376 , 1. , 0.9439225 ], [0.639281 , 0.8898159 , 0.7221419 , 0.97141266]]], dtype=float32)>,

‘detection_classes’: <tf.Tensor: shape=(1, 100), dtype=float32, numpy= array([[0., 0., … 1., 0.]], dtype=float32)>,

‘detection_multiclass_scores’: <tf.Tensor: shape=(1, 100, 3), dtype=float32, numpy= array([[[5.47093153e-03, 3.10172260e-01, 1.57460570e-03], [3.18378210e-03, 2.98067868e-01, 1.27398968e-03], … [1.98462605e-03, 7.14010894e-02, 1.30185485e-03]]], dtype=float32)>,

‘detection_scores’: <tf.Tensor: shape=(1, 100), dtype=float32, numpy= array([[0.31017226, 0.29806787, 0.26563442, 0.23411435, 0.22276634, 0.21396422, 0.20716852, 0.18401867, 0.17277354, 0.16559672, … 0.14484483, 0.14467192, 0.13986477, 0.13589099, 0.13474342, 0.07329145, 0.0723871 , 0.07223672, 0.07157233, 0.07140109]], dtype=float32)>,

‘num_detections’: <tf.Tensor: shape=(1,), dtype=float32, numpy=array([100.], dtype=float32)>,

‘raw_detection_boxes’: <tf.Tensor: shape=(1, 51150, 4), dtype=float32, numpy= array([[[-3.6555314e-03, -1.2414398e-02, 1.4784184e-02, 1.0699857e-02], [-9.5088510e-03, -2.2957223e-02, 3.9035182e-02, 1.7941574e-02], …, [ 3.1216300e-01, 6.6491508e-01, 1.3707981e+00, 1.0911807e+00], [ 6.6202581e-02, 4.6959493e-01, 1.5031044e+00, 1.2707567e+00]]], dtype=float32)>,

‘raw_detection_scores’: <tf.Tensor: shape=(1, 51150, 3), dtype=float32, numpy= array([[[9.3629062e-03, 7.2856843e-03, 4.1753352e-03], [4.8707724e-03, 1.5826846e-06, 3.3203959e-03], …, [7.2056055e-03, 1.9515157e-02, 1.4944762e-02], [8.9454055e-03, 1.9429326e-03, 1.5336275e-03]]], dtype=float32)>}

12. Display the results as Gif

imageio.plugins.freeimage.download()

anim_file = 'duckies_test.gif'

filenames = glob.glob('gif_frame_*.jpg')
filenames = sorted(filenames)
last = -1
images = []
for filename in filenames:
image = imageio.imread(filename)
images.append(image)

imageio.mimsave(anim_file, images, 'GIF-FI', fps=5)

display(IPyImage(open(anim_file, 'rb').read()))

13. Save the new model

import os  
ckpt_path = 'ckpt/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8'
os.makedirs(ckpt_path, exist_ok=True)
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=detection_model)
manager = tf.train.CheckpointManager(checkpoint, directory=ckpt_path, max_to_keep=5)
manager.save()

14. Restore the model

trained_model = model_builder.build(model_config=model_config, is_training=False)  ckpt_trained = tf.compat.v2.train.Checkpoint(model=ssd_model)  # Generate weights by running with dummy inputs. 
image, shapes = trained_model.preprocess(tf.zeros([1, 640, 640, 3]))
prediction_dict = trained_model.predict(image, shapes)
_ = trained_model.postprocess(prediction_dict, shapes)
ckpt_trained.restore('ckpt/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/ckpt-1')
print('Restored!')

15. Run the restored model

Run the testing script in 12. above (rewrite “detect_model” in the script to “trained_model”)

Full training the model with your own dataset

The way to train object detection model from scratch.

You can use random weights (from scratch) and even do transfer learning.

1. Clone “Models” from the TensorFlow repository

import os 
import pathlib
# If you are in the sub directory of "models" directory, move to the "models" directory. If not, clone it.if "models" in pathlib.Path.cwd().parts:
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
elif not pathlib.Path('models').exists():
!git clone --depth 1 https://github.com/tensorflow/models

2. Install the Object Detection API and the modules required

%%bash # Enable bash commands.
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

3. Prepare training data and label map , validation data

Prepare as the structure of directory below.

.
├── data/
│ ├── eval-00000-of-00001.tfrecord # evaluation data
│ ├── label_map.txt # label map
│ ├── train-00000-of-00002.tfrecord # training data
│ └── train-00001-of-00002.tfrecord # training data
└── models/
└── my_model_dir/
├── eval/ # generated by evaluating
├── my_model.config
└── model_ckpt-100-data@1 # generated by training
└── model_ckpt-100-index # generated by training
└── checkpoint # generated by training

<Convert Dataset to TFRecord>

Convert your datasets to TFRecord.

The function that converts each data of an image to tf_example.

import tensorflow as tf
from object_detection.utils
import dataset_util
def create_tf_example(height,
width,
filename,
image_format,
xmins,xmaxs,
ymins,
ymaxs,
classes_text,
classes):
# TODO(user): Populate the following variables from your example.
# height = None # Image height # width = None # Image width
# filename = None # Filename of the image. Empty if image is not from file
# encoded_image_data = None # Encoded image bytes
# image_format = None # b'jpeg' or b'png'
# xmins = []
# List of normalized left x coordinates in bounding box (1 per box)
# xmaxs = []
# List of normalized right x coordinates in bounding box
# # (1 per box)
# ymins = []
# List of normalized top y coordinates in bounding box (1 per box)
# ymaxs = []
# List of normalized bottom y coordinates in bounding box
# # (1 per box)
# classes_text = []
# List of string class name of bounding box (1 per box)
# classes = [] # List of integer class id of bounding box (1 per box)
with tf.io.gfile.GFile(filename, 'rb') as fid:
encoded_jpg = fid.read()
# encoded_jpg_io = io.BytesIO(encoded_jpg)
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename.encode('utf-8')),
'image/source_id': dataset_util.bytes_feature(filename.encode('utf-8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example

Make each data for an image tf_example and write by TFRecordWriter.

For example, there are sample data below…

{
"categories": [
{
"id": 1,
"name": "cat"
},
{
"id": 2,
"name": "dog"
}
],
"annotations": [
{
"filename": "train_000.jpg",
"image_height": 3840,
"image_width": 2160,
"labels": [
1,
1,
2
],
"label_texts": [
"cat",
"cat",
"dog"
],
"boxes": [
[
1250,
790,
1850,
1300
],
[
920,
1230,
1310,
1550
],
[
12,
1180,
550,
1450
]
]
},
...
}
]
}
# a box is [minx, miny, maxx, maxy]

Make each image of dataset to a tf_example and write them tf_records.

tf.train.Feature can receive only byte data , so you need to convert the data to bytes.

import tensorflow as tf
import os
import numpy as np
from PIL import Image
# from object_detection.utils import dataset_util
output_path = './data.tfrecords'
image_dir = './train_images/'
writer = tf.io.TFRecordWriter(output_path)
annotations = dataset['annotations'] for annotation in annotations:
if annotation['boxes'] != []:
height = annotation['image_height']
width = annotation['image_width']
filename = (image_dir + annotation['filename']).encode('utf-8')
image_format = b'jpeg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
for box in annotation['boxes']:
xmins.append(box[0] / width) # normalize to 0~1
xmaxs.append(box[2] / width)
ymins.append(box[1] / height)
ymaxs.append(box[3] / height)
classes_text = []
for text in annotation['label_texts']:
classes_text.append(text.encode('utf-8'))
classes = []
for label in annotation['labels']:
classes.append(bytes([label]))
tf_example = create_tf_example(height,width,filename,image_format,xmins,xmaxs,ymins,ymaxs,classes_text,classes)
writer.write(tf_example.SerializeToString()) writer.close()

<Write as sharding datasets>

Small memory like colab can’t grab a large tfrecords.

If you have a large dataset, it is useful to split the TFRecord into files.
According to the official documentation

tf.data.Dataset API can read input examples in parallel improving throughput.

tf.data.Dataset API can shuffle the examples better with sharded files which improves performance of the model slightly.

import contextlib2 
from object_detection.dataset_tools import tf_record_creation_util
num_shards=10
output_filebase='./train_dataset.record'
with contextlib2.ExitStack() as tf_record_close_stack:
output_tfrecords = tf_record_creation_util.open_sharded_output_tfrecords(
tf_record_close_stack, output_filebase, num_shards)
annotations = dataset['annotations']
for i in range(len(annotations)):
if annotations[i]['boxes'] != []:
height = annotations[i]['image_height']
width = annotations[i]['image_width']
filename = (image_dir + annotations[i]['filename']).encode('utf-8')
image_format = b'jpeg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
for box in annotations[i]['boxes']:
xmins.append(box[0] / width)
xmaxs.append(box[2] / width)
ymins.append(box[1] / height)
ymaxs.append(box[3] / height)
classes_text = []

for text in annotations[i]['label_texts']:
classes_text.append(text.encode('utf-8'))
classes = []
for label in annotations[i]['labels']:
classes.append(bytes([label]))
tf_example = create_tf_example(height,width,filename,image_format,xmins,xmaxs,ymins,ymaxs,classes_text,classes)
output_shard_index = i % num_shards
output_tfrecords[output_shard_index].write(tf_example.SerializeToString())

This script make split files.

./train_dataset.record-00000–00010
./train_dataset.record-00001–00010

./train_dataset.record-00009–00010

When you use the sharding data for training, set input path like below.

tf_record_input_reader {
input_path:” /path/to/train_dataset.record-?????-of-00010 “
}

<Make label_map.pbtxt>

Label map relates the label IDs and the label names.

This is written as label_map.pbtxt

item {
id: 1
name: 'Abyssinian'
}
item {
id: 2
name: 'american_bulldog'
}
item {
id: 3
name: 'american_pit_bull_terrier'
}

An easy way to write a label map as correct format is copying and rewriting the sample label map in the repository for your dataset.

cp object_detection/data/pet_label_map.pbtxt data/my_label_map.pbtxt

<Make pipeline.config>

The file that sets the training configuration.

Once you write, add and edit the things in this file, they are reflected the training.

There are the config files for each models in “ object_detection/config” in the repository, so you can edit them for your own training.

🐥 I've put comments where you need to at least rewrite.model {
ssd {
num_classes: 90 🐥 Rewrite to the number of classes of your own dataset.
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "ssd_resnet50_v1_fpn_keras"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.00039999998989515007
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.029999999329447746
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
override_base_feature_extractor_hyperparams: true
fpn {
min_level: 3
max_level: 7
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.00039999998989515007
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
depth: 256
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.599999904632568
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: false
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
batch_size: 64 🐥 If you don't have large memory, reduce batch size.
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.03999999910593033
total_steps: 25000
warmup_learning_rate: 0.013333000242710114
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.8999999761581421
}
use_moving_average: false
}
fine_tune_checkpoint: "my_model_dir/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/saved_model/checkpoint/ckpt-01" 🐥 If you use a pre trained model and fine tune it, rewrite here to the path to the checkpoint of the pre trained model.
num_steps: 25000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "classification" 🐥 Rewrite this to 'detection'
use_bfloat16: true
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" 🐥 Rewrite here to the path to the label map file.
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/train_dataset.record-?????-of-0010.tfrecord" 🐥 Rewrite this to the path to the TFRecords file of your training dataset.
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader {
label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" 🐥 Rewrite here to the path to the label map file.
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/eval_dataset.record-?????-of-0010.tfrecord" 🐥 Rewrite this to the path to the TFRecords file of your evaluation dataset.
}
}

If you use a small memory machine, make batch size small. Otherwise, the training will crash.

If you have the checkpoint that is consisted of 3 files like

checkpoint/
├── checkpoint
├── ckpt-0.data-00000-of-00001
├── ckpt-0.index

set your checkpoint path in config file

PATH/checkpoint/ckpt-0

If you have a file called “model.ckpt” set

PATH/model.ckpt

<Pre trained model>

You can train the model from scratch, but it takes few days, so you can download and fine tune pre trained model from TensorFlowlow Model Zoo.

Download paths are in the Model Zoo.

wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz 
tar -xf ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz

The unzipped file contains the checkpoint and Config file.

The unzipped file contains the checkpoint and Config file.
If you use a pre-trained model, you can rewrite this configuration file to get the various parameters for your model.

Run Training

python object_detection/model_main_tf2.py \
--pipeline_config_path="my_model_dir/pipline.config" \
--model_dir="./my_model_dir" \
--alsologtostderr

If the running the training success, you can see the messages like below.

INFO:tensorflow:Step 100 per-step time 0.211s loss=35.350
I0102 05:04:25.884553 140388036892544 model_lib_v2.py:651] Step 100 per-step time 0.211s loss=35.350
INFO:tensorflow:Step 200 per-step time 0.218s loss=36.062
I0102 05:04:46.017316 140388036892544 model_lib_v2.py:651] Step 200 per-step time 0.218s loss=36.062
INFO:tensorflow:Step 300 per-step time 0.203s loss=35.008
I0102 05:05:06.347388 140388036892544 model_lib_v2.py:651] Step 300 per-step time 0.203s loss=35.008
INFO:tensorflow:Step 400 per-step time 0.219s loss=35.200

Evaluation

If you specify the checkpoint path of the execution argument, it will be executed in evaluation mode.

python object_detection/model_main_tf2.py \
— pipeline_config_path=”my_model_dir/pipeline.config” \
— model_dir=”/content/models/my_model_dir” \
— checkpoint_dir=”/content/models/my_model_dir” \ # Path to the checkpoint generated by training.
— alsologtostderr

Saving the model

The checkpoint will be saved in the checkpoint directory at every 1000 steps during the training run.

checkpoint/
├── checkpoint
├── ckpt-01.data-00000-of-00001
├── ckpt-01.index
├── ckpt-02.data-00000-of-00001
├── ckpt-02.index

You can set the checkpoint save frequency and update frequency (default is to update from the oldest one when 7 sets are accumulated) with the argument of model_main_tf2.py.

Restore the model

Install the modules

import matplotlib 
import matplotlib.pyplot as plt
import os
import io
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
import tensorflow as tf from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder
%matplotlib inline

Give the model builder the same pipeline.config file as you did during training to build the model structure, then restore the weights from the trained checkpoints.

pipeline_config = “my_model_dir/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/pipeline.config”model_dir = “my_model_dir” # Path to the checkpoint directory.configs = config_util.get_configs_from_pipeline_file(pipeline_config) model_config = configs[‘model’] # Read the configurationdetection_model = model_builder.build( model_config=model_config, is_training=False) # Build the model from the configurationckpt = tf.compat.v2.train.Checkpoint(model=detection_model)ckpt.restore(os.path.join(model_dir, ‘ckpt-20’)).expect_partial()  # Restore the checkpoint with specify the checkpoint number.

Infer new images with the trained model

Prepare inference function

def get_model_detection_function(model):
“””Get a tf.function for detection.”””
@tf.function def detect_fn(image):
“””Detect objects in image.”””
image, shapes = model.preprocess(image)
prediction_dict = model.predict(image, shapes)
detections = model.postprocess(prediction_dict, shapes)
return detections, prediction_dict, tf.reshape(shapes, [-1]) return detect_fn detect_fn = get_model_detection_function(detection_model)

Prepare a dictionary that associates the label ID with the label text by using the label map file used in the training.

label_map_path = ‘data/label_map.pbtxt’ 
label_map = label_map_util.load_labelmap(label_map_path)
categories = label_map_util.convert_label_map_to_categories(
label_map,
max_num_classes=label_map_util.get_max_label_map_index(label_map),
use_display_name=True)
category_index = label_map_util.create_category_index(categories)
label_map_dict = label_map_util.get_label_map_dict(label_map, use_display_name=True)

Prepare a function to make an image a Numpy Array

def load_image_into_numpy_array(path):
"""Make an image a numpy array for add it to tensorflow graph.
By convention, make a Numpy array of shapes (height, width, color channels).
Arg:
path: 画像ファイルのパス.
戻り値:
numpy array of shape (H, W, C) uint8.
"""
img_data = tf.io.gfile.GFile(path, 'rb').read()
image = Image.open(BytesIO(img_data))
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
def get_keypoint_tuples(eval_config):
"""Return a tuple list of keypoint edges from the eval config.
Args:
eval_config: an eval config containing the keypoint edges
Returns:
a list of edge tuples, each in the format (start, end)
"""
tuple_list = []
kp_list = eval_config.keypoint_edge
for edge in kp_list:
tuple_list.append((edge.start, edge.end))
return tuple_list

Inference

image_dir = ‘test_images’
image_path = os.path.join(image_dir, ‘test_0000.jpg’)
image_np = load_image_into_numpy_array(image_path)
# Things to try:
# Flip horizontally
# image_np = np.fliplr(image_np).copy()
# Convert image to grayscale
# image_np = np.tile(
# np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)
input_tensor = tf.convert_to_tensor(
np.expand_dims(image_np, 0), dtype=tf.float32)
detections, predictions_dict, shapes = detect_fn(input_tensor)
label_id_offset = 1
image_np_with_detections = image_np.copy()
# Use keypoints if available in detections keypoints,
keypoint_scores = None, None
if ‘detection_keypoints’ in detections:
keypoints = detections[‘detection_keypoints’][0].numpy() keypoint_scores = detections[‘detection_keypoint_scores’][0].numpy()
viz_utils.visualize_boxes_and_labels_on_image_array(
image_np_with_detections,
detections[‘detection_boxes’][0].numpy(),
(detections[‘detection_classes’][0].numpy() + label_id_offset).astype(int),
detections[‘detection_scores’][0].numpy(),
category_index,
use_normalized_coordinates=True,
max_boxes_to_draw=200,
min_score_thresh=.15,
agnostic_mode=False,
keypoints=keypoints,
keypoint_scores=keypoint_scores,
keypoint_edges=get_keypoint_tuples(configs[‘eval_config’]))
plt.figure(figsize=(12,16))
plt.imshow(image_np_with_detections)
plt.show()

Detection results (Detections) are returned in 100 boxes, scores and labels, respectively.

You will see an image with a box where the score exceeds the visualization tool argument 0.3.

The score threshold can be adjusted with arguments.

🐣

****

Request for work:

rockyshikoku@gmail.com

We send information related to machine learning.

--

--