Skip to main content
Open In Colab This notebook is authored by F. Chollet and contains code samples from Chapter 5, Section 2 of Deep Learning with Python.

Training a ConvNet from Scratch on a Small Dataset

Having to train an image classification model using only very little data is a common situation in computer vision. As a practical example, we focus on classifying images as “dogs” or “cats”, using 4000 pictures (2000 cats, 2000 dogs). We cover three strategies:
  1. Training from scratch - baseline accuracy of ~71%
  2. Data augmentation - improves to ~82% accuracy
  3. Transfer learning - achieves up to 95% accuracy

The Relevance of Deep Learning for Small-Data Problems

Deep learning models are highly repurposable. Pre-trained models (usually trained on ImageNet) can bootstrap powerful vision models from very little data.

Building the Network

from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

Data Augmentation

Data augmentation generates more training data from existing samples via random transformations:
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

Model with Dropout

Adding dropout to fight overfitting:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
With data augmentation and dropout, we achieve ~82% accuracy - a 15% relative improvement over the non-regularized model. Key references: (Simonyan & Zisserman, 2014; Szegedy et al., 2016; Ronneberger et al., 2015; Peng et al., 2014; Zeiler & Fergus, 2013)

References

  • Peng, X., Sun, B., Ali, K., Saenko, K. (2014). Learning Deep Object Detectors from 3D Models.
  • Ronneberger, O., Fischer, P., Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation.
  • Simonyan, K., Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition.
  • Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A. (2016). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.
  • Zeiler, M., Fergus, R. (2013). Visualizing and Understanding Convolutional Networks.