Training a deep neural network on a large dataset takes a lot of time. For instance, the popular AlexNet took 5 to 6 days for training and did that with two GPUs. Not everyone has access to these computational requirements to train such a dataset and even at that, the time it takes. Transfer learning provides a way of avoiding these long training times when building a neural network. This popular technique involves re-using the weights of an already trained model that was built by a robust dataset. An example of such a dataset would be the training data in the ImageNet competition that contained more than 1 million dataset. The models that performed well on this kind of data can then be downloaded and reused in a new computer vision problem. That is transfer learning. In this tutorial, we will take a look at what transfer learning is about and how to build a CNN using transfer learning.
By the end of this tutorial, you will discover the following:
Let’s get started.
What is Transfer Learning
Transfer learning is the act of reusing an already built model that was trained on a dataset to solve a particular problem in another closely related problem. Tranfer learning is an important concept for data scientists to understand because not only does it save the time required for training the data. But it also allows you to make use of powerful models without needing so much computing power. It is basically a weight initialization scheme where the weights of a model that has done well on a data is used in another data.
Using Transfer Learning for Image Recognition
During the ImageNet challenge that lasted for 9 years, a lot of high performing image classification models were built. As of the last year of the competition, the winning model had an error rate of less than 5%, way better than human capabilities. These models have paved the way for the massive advancement that deep learning has brought especially through image recognition. What’s more interesting is that the models have been released to the public for personal consumption under a permissive license. Examples of such models include VGGNet, ResNet, Xception, DenseNet etc.
The availability of these models have formed the bedrock for transfer learning. Researchers do not have to reinvent the wheel by training models over and over again. Since these models were trained on over 1 million dataset and with over 1000 classes, they have proved to generalise well on new dataset that seeks to perform a similar task.
The model can be accessed through APIs and are downloadable for free. Going forward in this tutorial, we will see how to use one of these models in Keras.
How to use Pretrained Models
The possibilities with transfer learning are limitless. We can also apply them to as far as we know. Generally speaking, using pretrained models follows 3 patterns:
- Building a classifier: The models from ImageNet can be downloaded and without training, used immediately for classifying new images.
- Used a feature extractor: This is perhaps the most popular use case. Since convolutional neural networks identify images by learning progressively after each layer, the layers preceding the output layer can be reused for other similar tasks. In cases where the task is not so similar, the earlier layers which, at that point, identify lines, shadows, etc can be used for the basic pattern recognition. Further stacking these initial layers on a new set of layers gives you the ability to use great initial weights for pattern recognition then your own layers for the purpose you had in mind.
On the hand, if you hope to perform a similar model like the pre-trained model at your disposal. You can take out the input layer to fit the sort of data you wish to pass in. In addition, you will take out the output layer to fit the number of classes your data contains. Finally, you will leave all the hidden layers as-is, without training.
Using this approach, you can classify other images that were not contained in the ImageNet dataset with good accuracy. As earlier mentioned, this saves you computational power and training time since you would not need to train the model.
- Using the models for weight initialization: This involves using a portion of the pretrained model and using its weights initialization. In order words, that portion of the pretrained model is not trained by you. Their weights are initialized and used as is or are trained with a much lower learning rate so that the model still behaves like a weight initialization setup.
Using Pre-trained models in Keras
Keras offers a myriad of powerful pre-trained models especially from the ImageNet competitions. They can be accepted from the Applications API of Keras. The models can either be used on the pre-trained weights or without. One thing to point out is that it is good practice to prepare the models in ways that the pre-trained model would expect. For instance, the pretrained model may expect the image to be scaled within some range or the pixel within a particular value. To make the best of the model, you need to make sure the input data are in the format they expect. Some of the pre-trained models available in Keras include:
- XCeption
- VGG16
- VGG19
- ResNet50
- ResNet101
- InceptionV3
- MobileNet
- DenseNet etc.
Each of those models has its different architecture, number of parameters to be trained on and by extension top accuracy. Going forward, we will use the VGGNet16 model to classify images of our own.
Using the VGGNet16 Model as an Object Detector
In this section, we will use the VGGNet model to classify a completely new image. To be sure the VGGNet model has not seen the image, the picture to be fed into the model was taken by me. It is an image of one of the cups in my home. Here is the picture of my beautiful cup.
Now let’s use the VGGNet model to classify the image.
Sep 1: Load the file as an image.
The load_img() function of Keras is used for this purpose.
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions from keras import Input from keras.preprocessing.image import load_img, img_to_array #load the image my_image = load_img('cup.jpeg', target_size=(224, 224))
Observe two things. First the image was saved as cup.jpeg in the exact file location where the jupyter notebook is. Second a parameter called target_size was set to (224, 224). This is the shape of the image the VGGNet model expects. Since my phone camera may have higher pixels, it is important to reduce its pixels to such a shape.
Step 2: Convert the image to array
It is no longer news that all machine learning models work with numbers. The images therefore need to be converted to an array before being fed into the model. Once again, a Keras function called img_to_array() is used for this purpose.
#convert the image to an array my_image = img_to_array(my_image)
Step 3: Reshape and preprocess the array the array
Since it is coloured image, the actual image size is a 3-dimensional array of shape (3, 224, 224). The VGGNet however requires a 4-dimensional array of (1, 3, 224, 224). The extra one dimensional is to accommodate the fact that numerous samples/batches of images can be passed into the model at once. Also, there is a need to prepare the image for the model. The preprocess_input function is used for this purpose. Let’s reshape the image and preprocess it for the model.
#reshape the image my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1], my_image.shape[2])) #preprocess the image to be in the best form for the model my_image = preprocess_input(my_image)
Step 4: Instantiate the model and make predictions on the image (array).
#instantiate the model model = VGG16() #make predictions prediction = model.predict(my_image)
Step 5: Interpret the result
The predict() method returns the probabilities of each of the classes in the dataset. To convert the probabilities to the actual label, the decode_predictions() function is used. Now all the labels are returns sorted in descending order of their probabilities. But what we care about is the label with the highest probabilities, i.e the first label returned. The first item can be accessed through indexing as seen in the block of code below.
#change the probabilities to actual labels prediction = decode_predictions(prediction) #return the label with the highest probability item = prediction[0][0] # print the result print(f"{item[1]} with a probability of {int(item[2]*100)}%")
Output:
coffee_mug with a probability of 25%
As seen, the model predicts a coffee mug, going beyond just the general name but with a pretty accurate description. Here is the entire code for the model.
#import necessary libraries from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions from keras import Input from keras.preprocessing.image import load_img, img_to_array #load the image my_image = load_img('cup.jpeg', target_size=(224, 224)) #convert the image to an array my_image = img_to_array(my_image) #reshape the image my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1], my_image.shape[2])) #preprocess the image to be in the best form for the model my_image = preprocess_input(my_image) #instantiate the model model = VGG16() #make predictions prediction = model.predict(my_image) #change the probabilities to actual labels prediction = decode_predictions(prediction) #return the label with the highest probability item = prediction[0][0] # print the result print(f"{item[1]} with a probability of {int(item[2]*100)}%")
Using Transfer Learning for Image Classification
In this section, we will go ahead to use the VGGNet model to build an image classifier model. The model will predict whether it is sunny, rainy, sunrise or cloudy based on the image passed to it. The dataset could be downloaded here.
The idea is to use the VGGNet architecture except that the first layer would be chopped off, the last layer that would be flattened and a connected layer with 4 nodes will be added. This of course makes sense, since the model we want to predict contains only 4 classes.
See how the dataset is arranged on my PC and how the cloudy pictures are.
To begin, we import all necessary libraries.
#import all necessary libraries import numpy as np import matplotlib.pyplot as plt from glob import glob from keras.models import Model from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions from keras import Input from keras.preprocessing.image import load_img, img_to_array, ImageDataGenerator from keras.layers import Dense, Flatten
First thing is to instantiate the VGGNet model taking the imagenet weights. The first layer is not included because the data to be fed into the network is different. Next, we flatten the last late so that a fully connected layer which would serve as the output layer can be stacked on top of it. The functional API is used to create the model. The input shape is the shape the VGGNet model expects while the output is the fully connected layer with 4 nodes that represents each class. To have an overview of what the model looks like, you can use the summary() method. The code showcases how to implement the above explanation.
#instantiate the VGG model vgg = VGG16(weights='imagenet', include_top=False) #ensure the layers are not trained. In order words, the weights are used as is for layer in vgg.layers:     layer.trainable = False     #flatten the last layer and add a fully connected layer as output hidden = Flatten()(vgg.output) outputs = Dense(4, activation='softmax')(hidden)  #create the model model = Model(inputs=vgg.input, outputs=outputs)  #check the model architecture model.summary()
Output:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_4 (Dense) (None, 4) 100356
=================================================================
Total params: 14,815,044
Trainable params: 100,356
Non-trainable params: 14,714,688
_________________________________________________________________
The next step is to compile the model. The model was compiled using the categorical cross entropy loss, accuracy metrics and an adam optimizer.
#compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics='accuracy')
Now that model architecture has been successfully built, the next step is to define the datasets that would be passed into the model. The dataset has been split into Train and Test folders on my machine. In Keras, we use the ImageGenerator class to perform the necessary data preprocessing and augmentation to have a more robust data to be trained on. For instance, if a model is trained to identify a cup from only one angle, there is a tendency for wrong prediction if the angle at which the picture was taken changes. Hence, horizontal and vertical flip was activated, alongside some zoon and all. Note that the data augmentation is only necessary for the train data. The test data only need to be normalized such that the pixel values range from 0 to 1.
#generate the train and test data train_data_gen = ImageDataGenerator(rescale=1.0/255, shear_range=0.5, zoom_range=0.7, horizontal_flip=True, vertical_flip=True) test_data_gen = ImageDataGenerator(rescale=1.0/255) train_data = train_data_gen.flow_from_directory('Weather/Train', target_size=(224, 224), class_mode='categorical') test_data = test_data_gen.flow_from_directory('Weather/Test', target_size=(224, 224), class_mode='categorical')
Output:
Found 925 images belonging to 4 classes.
Found 200 images belonging to 4 classes.
As seen, it discovers the 4 classes and the respective number of images contained in the dataset.
#fit the model history = model.fit_generator(train_data, validation_data=test_data, epochs=5)
Output:
Epoch 1/5
29/29 [==============================] - 342s 12s/step - loss: 1.5313 - accuracy: 0.4267 - val_loss: 1.0044 - val_accuracy: 0.5400
Epoch 2/5
29/29 [==============================] - 330s 11s/step - loss: 0.6061 - accuracy: 0.7654 - val_loss: 0.2769 - val_accuracy: 0.9350
Epoch 3/5
29/29 [==============================] - 330s 11s/step - loss: 0.4741 - accuracy: 0.8202 - val_loss: 0.4416 - val_accuracy: 0.8700
Epoch 4/5
29/29 [==============================] - 334s 12s/step - loss: 0.3433 - accuracy: 0.8863 - val_loss: 0.2776 - val_accuracy: 0.9250
Epoch 5/5
29/29 [==============================] - 338s 12s/step - loss: 0.3262 - accuracy: 0.8750 - val_loss: 0.1970 - val_accuracy: 0.9400
As seen, in five epochs, it has an accuracy of 87.9% and a validation accuracy of 94%.
Now, let’s test the model on a completely new image. It is a beautiful morning, just about 9am here and the sun is rising. I decided to go out and take a picture of the sunrise. Here is the lovely picture.
Because I took this picture myself, I can be sure this image was not in the training data. We can write the line of code below to preprocess the image and pass it to our model for prediction. The predict() method returns the list of probabilities for each class.
#load the image my_image = load_img('weather_test.jpeg', target_size=(224, 224)) #preprocess the image my_image = img_to_array(my_image) my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1], my_image.shape[2])) my_image = preprocess_input(my_image) #make the prediction prediction = model.predict(my_image)
Output:
array([[3.4296335e-28, 1.6078907e-13, 3.4901095e-19, 1.0000000e+00]],
dtype=float32)
To make a clearer result, let’s round up the prediction to whole numbers.
[np.round(x) for x in prediction]
Output:
[array([0., 0., 0., 1.], dtype=float32)]
As seen, the fourth class is the predicted class. See the image below for the classes arrangement.
The fourth class is Sunrise which means the model prediction was correct. This is how to use transfer learning for other image classification problems.
In summary,
This tutorial has introduced you to transfer learning and why it is important. We went ahead to build a model for object detection and image classification using the weights from the popular VGGNet model. If you have any questions, feel free to leave them in the comment section and I’d do my best to answer them.
5 Responses
Hello,
is there some code missing in the weather example just before the lines:
#ensure the layers are not trained. In order words, the weights are used as is
for layer in vgg.layers:
layer.trainable = False
?
I’m sorry for the stupid question, but I’m very new to this.
Hi Chris. Apologies this reply is coming later.
The only code before that line was to import the necessary libraries which is important before calling the vgg.layers object.
If you had an error while running the code, please let me know the error message so I can be able to help you further.
Hi David
I think Chris is right, there is some missing code. In what is shown here, the vgg object is not actually defined anywhere. So there must be some code you executed which defines vgg. Maybe this is just something like
vgg = VGG16(include_top=False)
That’s very correct. The VGG model was not instantiated. Thanks for pointing it out. I have updated the article.