Train DeepLab v3 + with your own dataset

4 min readOct 10, 2020

You can train DeepLab v3 + with the original dataset.
Use the official TensorFlow model.
How to use DeepLab is basically written in the official repository.
If you have any questions, please read this article.

The images above are PASCAL VOC dataset, but you can train with your own dataset.

Procedure

Step 1. Set the model and module

1. Clone the official model repository

git clone https://github.com/tensorflow/models

2. Work in the research directory.

cd models/research

3. Select TensorFlow1 when working with Colab.

%tensorflow_version 1.x

4. Install tf_slim.

pip install tf_slim

The required modules are

Numpy
Pillow 1.0
tf Slim (which is included in the “tensorflow/models/research/” checkout)
Matplotlib
Tensorflow

If not, please install them.

5. Add the tensorflow / models / research / directory to PYTHONPATH so that you can use the library.

# From tensorflow/models/research/export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

2. Prepare the data

Prepare the following.

1,Original images

2, Class images

・ Image requirements

Collect each in a directory.
Original images directory name: JPEGImage
Class images directory name: SegmentationClass
-Make the number and names of the original images and class images (name without extension) the same.
-Image size is arbitrary. Keep in mind that it will be cropped to 513,513 during training.
(If you want to fit the entire image, we recommend 513,513 or less)

Tools such as labelme can be used to create segmentation data.

3. Preprocess the data

If the label image is color, use a black and white label image.

mkdir "{DataDirectory}/SegmentationClassRaw"

python deeplab/datasets/remove_gt_colormap.py \
 --original_gt_folder= "{DataDirectory}/SegmentationClass"\
 --output_dir="{DataDirectory}/SegmentationClassRaw"

The black and white converted image is saved in the SegmentationClassRaw directory.
It is a black and white image with a small label value, so it is almost black.

2. Create a text file of list of the image file names.

image0
image1
image3

Create a text file with a list of file names in a format without an extension.
Create the following 3 text files.
1, trainval.txt: All image file names
2, train.txt: Training set (assign about 90% of trainval?)
3. val.txt: Of trainval.txt, internal verification set (allocate about 10% of trainval?)

There seem to be various theories about the optimal balance between training and verification allocation.

3. Make the dataset in TFRecord format.

Make dataset TFRcord format that can be read efficiently by training in TensorFlow.

python deeplab/datasets/build_voc2012_data.py \
  --image_folder="{DataDirectory}/JPEGImages" \
  --semantic_segmentation_folder="{DataDirectory}/SegmentationClassRaw" \
  --list_folder="{DataDirectory}//ImageSets/Segmentation" \
  --image_format="jpg" \
  --output_dir="{DataDirectory}/tfrecord/"

4. Set up training

Rewrite deeplab / datasets / data_generator.py according to your own data.

_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 4552, # Rewrite to the number of training image files
        'train_aug': 10582, #　OK as it is
        'trainval': 5088, # Rewrite to the total number of image files
        'val': 536, # Rewrite to the number of verification image files
    },
    num_classes=21, #　OK as it is
    ignore_label=255, #　OK as it is
)

2. Download the pre-trained model.

Use the weights of the pretrained model for transfer learning.
Download your favorite checkpoint from the official repository link.
There are MobileNetv2 base (22MB) and Xception base (439MB).

5, Training

In the case of MobileNet v2 backbone

python deeplab/train.py --logtostderr \
   --training_number_of_steps=30000 \
   --train_split="train" \
   --model_variant="mobilenet_v2" \
   --output_stride=16 \
   --decoder_output_stride=4 \
   --train_crop_size="513,513" \
   --train_batch_size=1 \
   --dataset="pascal_voc_seg" \
   --tf_initial_checkpoint="{CheckpointDirectory}/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
   --train_logdir="{DataDirectory}/checkpoint" \ # Create a checkpoint write destination directory first
   --dataset_dir="{DataDirectory}/tfrecord" \
   --fine_tune_batch_norm=false \ # If using CPU only, set ”true”.
   --initialize_last_layer=true \
   --last_layers_contain_logits_only=false

2. In the case of Xception_65 backbone

python deeplab/train.py --logtostderr \
   --training_number_of_steps=30000 \
   --train_split="train" \
   --model_variant="xception_65" \
   --atrous_rates=6 \
   --atrous_rates=12 \
   --atrous_rates=18 \
   --output_stride=16 \
   --decoder_output_stride=4 \
   --train_crop_size="513,513" \
   --train_batch_size=1 \
   --dataset="pascal_voc_seg" \
   --tf_initial_checkpoint="{CheckpointDirectory}/deeplabv3_pascal_train_aug/model.ckpt" \
   --train_logdir="{DataDirectory}/checkpoint" \ # Create a checkpoint write destination directory first
   --dataset_dir="{DataDirectory}/tfrecord" \
   --fine_tune_batch_norm=false \# If using CPU only, set ”true”.
   --initialize_last_layer=true \
   --last_layers_contain_logits_only=false

* Reference result

After training with 5000 images, 30000 epochs, MobileNetv2 backbone, and Colab GPU, it took about 1 hour.
The result was very satisfying, probably because there was only one object (2 labels) that I wanted to segment.

6, test

Test with a validation set.
The original image and the color segmentation map image are saved in the log directory.

python deeplab/vis.py --logtostderr \
  --vis_split="val" \
  --model_variant="mobilenet_v2" \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --vis_crop_size="513,513" \
  --dataset="pascal_voc_seg" \
  --checkpoint_dir="{CheckpointDirectory}/checkpoint" \
  --vis_logdir="{Directory path to write the result images}" \
  --dataset_dir="{DataDirectory}/tfrecord" \
  --max_number_of_iterations=1 --eval_interval_secs=0