Multispectral image classification with Transfer Learning
Heads up, today we are jumping into the deep end of Deep learning with the fastai library. If you haven’t spent much time with fastai this walk through may be a little full on.
If you’re in the worlds of remote sensing and deep learning, you have no doubt run into the issue of wanting to use Transfer learning but also wanting to use multispectral imagery. Unfortunately there are two major issues when combining these. Firstly, pretrained models (used for Transfer learning) expect that you are going to use RGB imagery and secondly (depending on your library of choice), the built-in image augmentations may also expect RGB imagery. It turns out neither of these issues are showstoppers, they just required a couple days of experimentation and some help from the fastai forum (specifically Malcolm McLean) to solve.
In the fastai 2020 Lesson 6 tutorial, Jeremy Howard was asked about using pretrained models for four channel images. Jeremy’s response was that this ‘should be pretty much automatic’. This was exactly what I was after so I went digging and found this tutorial by Maurício Cordeiro. This tutorial was very helpful, however I only wanted to do image classification (not segmentation) and I wanted to use pretrained weights for my additional channels, unlike the tutorial which initialised the additional channels with weights of ‘0’, which is the fastai default behavior.
This post will walk through some of the pain-points of multispectral imagery and my work-arounds for dealing with these issues. In particular, this covers creating a custom data loader, modifying a pretrained model and sorting out multispectral augmentations using the fastai (v2) deep learning library for image classification.
The notebook starts by importing the necessary libraries. The only slightly uncommon libraries here are ‘rasterio’, which is a relatively user friendly interface for ‘GDAL’ raster operations, and ‘albumentations’, which is used for multispectral image augmentations.
The expected data structure for the notebook is multispectral ‘.tif’ files in folders denoting the class name. Just point the notebook at the parent folder of the data. This cell also adds a folder named ‘models’ which will contain our finished models.
The next cell sets the image size; all the images will be resized to a square with this many pixels on each side. This is useful if the data is inconsistently sized or if you just want to down sample your input data to speed up training. The batch size is also set at this point.
This cell sets up a bunch of helper functions, mostly for handling tensors and displaying images.
This next cell uses the fastai function ‘get_files()’ to retrieve a list of the training data files. This list will be used later to test the augmentations.
At this point, the notebook opens a sample of the data to make sure everything is working as expected. The ‘show_tensor()’ function will display three channels of a tensor as an image. In my particular example, I’m dealing with six channel imagery which is actually two stacked RGB images. The notebook is displaying the first three channels as an image and the last three as another image.
If the images above look as expected, the data structure is probably correct and you can move on to setting up some augmentations. The built in fastai image augmentations will no longer work as they expect three channel images. To work around this, this notebook uses the ‘albumentations’ library instead. ‘Albumentations’ has implemented augmentations which (mostly) work with multispectral imagery. The list of transforms chosen is roughly based on the default fastai image augmentations as they have been proven to be a good starting point. Keep in mind that these augmentations are performed on your CPU before each epoch, so you may experience a slow down in training if you add many of them.
Now that the augmentation list is defined, the notebook sets up a function to apply them. The ‘aug_tfm()’ function intakes a tensor and applies the augmentations one after another. The ‘if’ statement in this function simply checks the length of the input tensor, to avoid augmenting any image labels which are passed to this function. There is probably a better way to deal with this, but this works fine. One other problem here is that some augmentations will shift the tensor values outside the range of 0-1 (such as ‘RandomBrightnessContrast’). To address this, the numpy.clip function is applied to remove any high or low values.
Finally, the ‘multi_tfm()’ function is created. This uses ‘RandTransform’ to tell fastai to only apply our augmentations to the training images and not the validation set.
Now that all the transforms are sorted, let’s test them out! This next cell grabs an image, opens it as a tensor, and then applies a random set of augmentations six times. Each image you see should be slightly different.
Assuming everything is working, you can now setup the fastai ‘DataBlock’. This cell sets up two blocks; first, a ‘TransformBlock’, which will contain the images, and second, a ‘CategoryBlock’, which will contain the labels. The fastai function ‘get_image_files’ is used to find the paths to all of the images. The ‘get_labels’ function is used to extract the image class from its path. The fastai function ‘RandomSplitter’ is used to split the data into train and validation sets, with a static seed to always get the same split (so you can compare different runs). Lastly, the augmentations are passed in and the ‘DataBlock’ is now completed.
The next few cells perform a check. The first one prints the channel count.
The next one prints one image from the validation image set.
The last one prints an image from the training set.
The notebook now sets up a learner like normal, however the ’n_in=’ variable is set to the channel count, this tells fastai to expect more than three channels.
The extra channels that you just told fastai to expect are not pretrained, all the weights have all been set to a value of ‘0’. You can see this yourself in the fastai code here. To get around this, the notebook duplicates the pretrained RGB weights into the newly created input channels. This process is started by getting a reference to the input layer in the cell below.
Now that the notebook has a reference to the first layer, it just duplicates the RGB weights to all the additional channels and then reduces all the weights by the channel count ratio to keep the total value of the input layer the same.
That’s it! The notebook is now ready to train, just call the learning rate finder and then train as normal!