Using Autoencoders to Clean (or Denoise) Noisy Images with the help of Fashion MNIST | Unsupervised Deep Learning with TensorFlow
If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin
If you are on this page, you are also probably somewhat familiar with different neural network architectures. You have probably heard of feedforward neural networks, CNNs, RNNs and that these neural networks are very good for solving supervised learning tasks such as regression and classification.
But, we have a whole world of problems on the unsupervised learning sphere such as dimensionality reduction, feature extraction, anomaly detection, data generation, and augmentation as well as noise reduction. For these tasks, we need the help of special neural networks that are developed particularly for unsupervised learning tasks. Therefore, they must be able to solve mathematical equations without needing supervision. One of these special neural network architectures is autoencoders.
What are autoencoders?
Autoencoders are neural network architectures that consist of two sub-networks, namely, encoder and decoder networks, which are tied to each other with a latent space. Autoencoders were first developed by Geoffrey Hinton, one of the most respected scientists in the AI community, and the PDP group in the 1980s. Hinton and the PDP Group aimed to address the “backpropagation without a teacher” problem, a.k.a. unsupervised learning, by using the input as the teacher. In other words, they simply used feature data both as feature data and label data. Let’s take a closer look at how autoencoders work!
Autoencoders consists of an encoder network, which takes the feature data and encodes it to fit into the latent space. This encoded data (i.e., code) is used by the decoder to convert back to the feature data. In an encoder, what the model learns is how to encode the data efficiently so that the decoder can convert it back to the original. Therefore, the essential part of autoencoder training is to generate an optimized latent space.
Now, know that in most cases, the number of neurons in the latent space is much smaller than the input and output layers, but it does not have to be that way. There are different types of autoencoders such as undercomplete, overcomplete, sparse, denoising, contractive, and variational autoencoders. In this tutorial, we only focus on undercomplete autoencoders which are used for denoising.
Layers in an Autoencoder
The standard practice when building an autoencoder is to design an encoder and to create an inversed version of this network as the decoder of this autoencoder. So, as long as there is an inverse relationship between the encoder and the decoder network, you are free to add any layer to these sub-networks. For example, if you are dealing with image data, you would surely need convolution and pooling layers. On the other hand, if you are dealing with sequence data, you would probably need LSTM, GRU, or RNN units. The important point here is that you are free to build anything you want.
And now that you have an idea of autoencoders that you can build for image noise reduction, we can move on to the tutorial and start writing our code for our image noise reduction model. For the tutorial, we choose to do our own take on one of TensorFlow’s official tutorials, Intro to Autoencoders and we will use a very popular dataset among the members of the AI community: Fashion MNIST.
Downloading the Fashion MNIST Dataset
Fashion-MNIST is designed and maintained by Zalando, a European e-commerce company based in Berlin, Germany. Fashion MNIST consists of a training set of 60,000 images and a test set of 10,000 images. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion MNIST, which contains images of clothing items (as shown in Figure 4), is designed as an alternative dataset to the MNIST dataset, which contains handwritten digits. We choose Fashion MNIST simply because MNIST is already overused in many tutorials.
The lines below import TensorFlow and load Fashion MNIST:
Now let’s generate a grid with samples from our dataset with the following lines:
Our output shows the first 50 samples from the test dataset:
Processing the Fashion MNIST Data
For computational efficiency and model reliability, we have to apply Minmax normalization to our image data, limiting the value range between 0 and 1. Since our data is in RGB format, the minimum value is 0 and the maximum value is 255 and we can conduct the Minmax normalization operation with the following lines:
We also have to reshape our NumPy array as the current shape of the datasets is (60000, 28, 28) and (10000, 28, 28). We just need to add a fourth dimension with a single value (e.g., from (60000, 28, 28) to (60000, 28, 28, 1)). The fourth dimension acts pretty much as proof that our data is in grayscale format with a single value representing color information ranging from white to black. If we’d have colored images, then we would need three values in our fourth dimension. But all we need is a fourth dimension containing a single value since we use grayscale images. The following lines do this:
Let’s take a look at the shape of our NumPy arrays with the following lines:
Output: (60000, 28, 28, 1) and (10000, 28, 28, 1)
Adding Noise to the Images
Remember our goal is to build a model, which is capable of performing noise reduction on images. To be able to do this, we will use existing image data and add them to random noise. Then, we will feed the original images as input and noisy images as output. Our autoencoder will learn the relationship between a clean image and a noisy image and how to clean a noisy image. So let’s create a noisy version of our Fashion MNIST dataset.
For this task, we add a randomly generated value to each array item by using tf.random.normal method. Then, we multiply the random value with a noise_factor, which you can play around with. The following code adds noise to images:
We also need to make sure that our array item values are within the range of 0 to 1. For this, we may use tf.clip_by_value method. clip_by_value is a TensorFlow method which clips the values outside of the Min-Max range and replaces them with the designated min or max value. The following code clips the values out of range:
Now that we created a regularized and noisy version of our dataset, we can check out how it looks:
As you can see, it is almost impossible to understand what we see in noisy images. However, our autoencoders will marvelously learn to clean it.
Building Our Model
In TensorFlow, apart from Sequential API and Functional API, there is a third option to build models: Model subclassing. In model subclassing, we are free to implement everything from scratch. Model subclassing is fully customizable and enables us to implement our own custom model. It is a very powerful method since we can build any type of model. However, it requires a basic level of object-oriented programming knowledge. Our custom class would subclass the tf.keras.Model object. It also requires declaring several variables and functions. However, it is nothing to be afraid of.
Also note that since we are dealing with image data, it is more efficient to build a convolutional autoencoder, which would look like this:
To build a model, we simply need to complete the following tasks:
- Create a class extending the keras.Model object
- Create an __init__ function to declare two separate models built with Sequential API. Within them, we need to declare layers that would reverse each other. One Conv2D layer for the encoder model whereas one Conv2DTranspose layer for the decoder model.
- Create a call function to tell the model how to process the inputs using the initialized variables with __init__ method:
- We need to call the initialized encoder model which takes the images as input
- We also need to call the initialized decoder model which takes the output of the encoder model (encoded) as input
- Return the output of the decoder
We can achieve all of them with the code below:
And let’s create the model with an object call:
Configuring Our Model
For this task, we will use an Adam optimizer and Mean Squared Error for our model. We can easily use <strong>compile</strong> function to configure our autoencoder, as shown below:
Finally, we can run our model for 10 epochs by feeding the noisy and the clean images, which will take about 1 minute to train. We also use test datasets for validation. The following code is for training the model:
Reducing Image Noise with Our Trained Autoencoder
Now that we trained our autoencoder, we can start cleaning noisy images. Note that we have access to both encoder and decoder networks since we define them under the NoiseReducer object.
So, first, we will use an encoder to encode our noisy test dataset (x_test_noisy). Then, we will take the encoded output from the encoder to feed into the decoder to obtain the cleaned image. The following lines complete these tasks:
and let’s plot the first 10 samples for a side-by-side comparison:
The first row is for noisy images, the second row is for cleaned (reconstructed) images, and finally, the third row is for original images. See how the cleaned images are similar to the original images:
You have built an autoencoder model, which can successfully clean very noisy images, which it has never seen before (we used the test dataset). There are obviously some non-recovered distortions, such as the missing bottom of the slippers in the second image from the right. Yet, if you consider how deformed the noisy images, we can say that our model is pretty successful in recovering the distorted images.
Off the top of my head, you can -for instance- consider extending this autoencoder and embed it into a photo enhancement app, which can increase the clarity and crispiness of the photos.