End-to-End Driving with NVIDIA PilotNet

This is a historical case study from 2016 using the now-discontinued Udacity self-driving car simulator. For the current hands-on tutorial on behavioral cloning, see Behavioral Cloning with CarRacing-v3. For conceptual background, see the imitation learning overview.

Introduction

This case study demonstrates end-to-end behavioral cloning for autonomous driving using the NVIDIA PilotNet architecture. The Udacity self-driving car simulator was used to collect training data, a human driver records images and steering angles, then a CNN learns to map images to steering commands. A screenshot of the simulator is shown below:

A convolution neural network was developed using the high level deep learning API Keras based on a Tensorflow backend. This network predicts steering angles from images.

Model Architecture and Training Strategy

The model was based on NVIDIA’s work with two preprocessing stages. The original NVIDIA model is shown in the figure below:

and except from the normalization layer it consists of five convolutional and five flat layers. The model was modified as follows:

The preprocessing involved cropping the input images by 30 lines and 20 lines in the top and bottom of all collected images respectively. This was done to eliminate unnecessary for the problem image content.
Batch normalization for the resulting cropped images was then performed.

   model.add(Cropping2D(cropping=((50, 20), (0, 0)), input_shape=(160, 320, 3)))
   model.add(BatchNormalization(epsilon=0.001, axis=3, input_shape=(90, 320, 3)))

The complete model is shown in Keras API below

def nvidia_model():

    model = Sequential()

    model.add(Cropping2D(cropping=((50, 20), (0, 0)), input_shape=(160, 320, 3)))
    model.add(BatchNormalization(epsilon=0.001, axis=3, input_shape=(90, 320, 3)))

    model.add(Conv2D(24, (5, 5), padding='valid', activation='relu', strides=(2, 2)))
    model.add(Conv2D(36, (5, 5), padding='valid', activation='relu', strides=(2, 2)))
    model.add(Conv2D(48, (5, 5), padding='valid', activation='relu', strides=(2, 2)))
    model.add(Conv2D(64, (3, 3), padding='valid', activation='relu', strides=(1, 1)))
    model.add(Conv2D(64, (3, 3), padding='valid', activation='relu', strides=(1, 1)))
    model.add(Flatten())
    model.add(Dense(1164, activation='relu'))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='tanh'))

    return model

The model was trained and validated on different data sets to ensure that the model was not over-fitting.

Datasets

The training datasets represent in our use case, the correct driving behavior. The dataset collection strategy adopted was as follows:

Initially three complete rounds of track-1 where the vehicle stayed as much as possible in the center of the road were recorded.
Subsequently the car was positioned such that it faced track-1 in the reverse direction and another three complete rounds of track-1 where recorded.
In selected turns, the car was positioned in orientations that recovery actions would be taken and the recoveries recorded. Note that only the recoveries where recorded - we have not recorded the deviations from the center of the road as we wanted to teach the network how to recover not how to enter in challenging situations.

For images were collected in BGR color space and were augmented via flipping each image as shown below.

images = []
augmented_images = []
measurements = []
augmented_measurements = []
for directory in drive_dirs:
    print(directory)
    with open(os.path.join(root_data_dir, directory, 'driving_log.csv')) as csvfile:
        reader = csv.reader(csvfile)
        for line in reader:
            # use all available cameras
            for camera in range(num_cameras):
                # the following line uses separator for simulator data collected in OSX
                filename = line[camera].split('/')[-1]
                # image is in BGR color space (default of cv.imread)
                image_bgr = cv2.imread(os.path.join(root_data_dir, directory, 'IMG/', filename))
                image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
                images.append(image_rgb)
                measurements.append(float(line[3]))

# augmentation
for image, measurement in zip(images, measurements):
    augmented_images.append(image)
    augmented_images.append(cv2.flip(image, 1))
    augmented_measurements.append(measurement)
    augmented_measurements.append(measurement*(-1.0))

The original (top) and flipped (bottom) images are shown below.

Cropping was applied to both images as described above. In total 5882 original images resulted in 11764 images after augmentation. With 20% validation set size, this meant 9411 images that were used for training and 2353 images used for validation. The model used an adam optimizer, so the learning rate was not tuned manually. The car was then test driven by the Convolutional Neural Network (CNN) autonomously - without any human intervention. The car successfully drives around track one without leaving the road as shown in the video below.

Imitation Learning Key references: (Ioffe & Szegedy, 2015; Szegedy et al., 2015; Bojarski et al., 2016; Mnih et al., 2013)

References

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., et al. (2016). End to End Learning for Self-Driving Cars.
Ioffe, S., Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., et al. (2013). Playing Atari with Deep Reinforcement Learning.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision.

Edit this page on GitHub or file an issue.

Physical AI

Imitation Learning

World Models

VLA Models

Sim-to-Real Transfer

End-to-End Driving with NVIDIA PilotNet

Introduction

Model Architecture and Training Strategy

Datasets

References

Physical AI

Imitation Learning

World Models

VLA Models

Sim-to-Real Transfer

​Introduction

​Model Architecture and Training Strategy

​Datasets

​References

Introduction

Model Architecture and Training Strategy

Datasets

References