Subscribe to Get All the Blog Posts and Colab Notebooks 

Image Noise Reduction in 10 Minutes with Deep Convolutional Autoencoders

Image Noise Reduction in 10 Minutes with Deep Convolutional Autoencoders

Using Autoencoders to Clean (or Denoise) Noisy Images with the help of Fashion MNIST | Unsupervised Deep Learning with TensorFlow

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. Before and After the Noise Reduction of an Image of a Playful Dog (Photo by Anna Dudkova on Unsplash)

If you are on this page, you are also probably somewhat familiar with different neural network architectures. You have probably heard of feedforward neural networks, CNNs, RNNs and that these neural networks are very good for solving supervised learning tasks such as regression and classification.

But, we have a whole world of problems on the unsupervised learning sphere such as dimensionality reduction, feature extraction, anomaly detection, data generation, and augmentation as well as noise reduction. For these tasks, we need the help of special neural networks that are developed particularly for unsupervised learning tasks. Therefore, they must be able to solve mathematical equations without needing supervision. One of these special neural network architectures is autoencoders.


What are autoencoders?

Autoencoders are neural network architectures that consist of two sub-networks, namely, encoder and decoder networks, which are tied to each other with a latent space. Autoencoders were first developed by Geoffrey Hinton, one of the most respected scientists in the AI community, and the PDP group in the 1980s. Hinton and the PDP Group aimed to address the “backpropagation without a teacher” problem, a.k.a. unsupervised learning, by using the input as the teacher. In other words, they simply used feature data both as feature data and label data. Let’s take a closer look at how autoencoders work!

  Figure 2. An Autoencoder Network with Encoder and Decoder Networks

Autoencoder Architecture

Autoencoders consists of an encoder network, which takes the feature data and encodes it to fit into the latent space. This encoded data (i.e., code) is used by the decoder to convert back to the feature data. In an encoder, what the model learns is how to encode the data efficiently so that the decoder can convert it back to the original. Therefore, the essential part of autoencoder training is to generate an optimized latent space.

Now, know that in most cases, the number of neurons in the latent space is much smaller than the input and output layers, but it does not have to be that way. There are different types of autoencoders such as undercomplete, overcomplete, sparse, denoising, contractive, and variational autoencoders. In this tutorial, we only focus on undercomplete autoencoders which are used for denoising.

Layers in an Autoencoder

The standard practice when building an autoencoder is to design an encoder and to create an inversed version of this network as the decoder of this autoencoder. So, as long as there is an inverse relationship between the encoder and the decoder network, you are free to add any layer to these sub-networks. For example, if you are dealing with image data, you would surely need convolution and pooling layers. On the other hand, if you are dealing with sequence data, you would probably need LSTM, GRU, or RNN units. The important point here is that you are free to build anything you want.

 Figure 3. Latent Spaces in UnderComplete Autoencoders are Usually Narrower than Other Layers

And now that you have an idea of autoencoders that you can build for image noise reduction, we can move on to the tutorial and start writing our code for our image noise reduction model. For the tutorial, we choose to do our own take on one of TensorFlow’s official tutorials, Intro to Autoencoders and we will use a very popular dataset among the members of the AI community: Fashion MNIST.

Downloading the Fashion MNIST Dataset

Fashion-MNIST is designed and maintained by Zalando, a European e-commerce company based in Berlin, Germany. Fashion MNIST consists of a training set of 60,000 images and a test set of 10,000 images. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion MNIST, which contains images of clothing items (as shown in Figure 4), is designed as an alternative dataset to the MNIST dataset, which contains handwritten digits. We choose Fashion MNIST simply because MNIST is already overused in many tutorials.

The lines below import TensorFlow and load Fashion MNIST:

Now let’s generate a grid with samples from our dataset with the following lines:

Our output shows the first 50 samples from the test dataset:

 Figure 4. A 5×10 Grid Showing the First 50 Samples in Fashion MNIST Test Dataset

Processing the Fashion MNIST Data

For computational efficiency and model reliability, we have to apply Minmax normalization to our image data, limiting the value range between 0 and 1. Since our data is in RGB format, the minimum value is 0 and the maximum value is 255 and we can conduct the Minmax normalization operation with the following lines:

We also have to reshape our NumPy array as the current shape of the datasets is (60000, 28, 28) and (10000, 28, 28). We just need to add a fourth dimension with a single value (e.g., from (60000, 28, 28) to (60000, 28, 28, 1)). The fourth dimension acts pretty much as proof that our data is in grayscale format with a single value representing color information ranging from white to black. If we’d have colored images, then we would need three values in our fourth dimension. But all we need is a fourth dimension containing a single value since we use grayscale images. The following lines do this:

Let’s take a look at the shape of our NumPy arrays with the following lines:

Output: (60000, 28, 28, 1) and (10000, 28, 28, 1)

Adding Noise to the Images

Remember our goal is to build a model, which is capable of performing noise reduction on images. To be able to do this, we will use existing image data and add them to random noise. Then, we will feed the original images as input and noisy images as output. Our autoencoder will learn the relationship between a clean image and a noisy image and how to clean a noisy image. So let’s create a noisy version of our Fashion MNIST dataset.

For this task, we add a randomly generated value to each array item by using tf.random.normal method. Then, we multiply the random value with a noise_factor, which you can play around with. The following code adds noise to images:

We also need to make sure that our array item values are within the range of 0 to 1. For this, we may use tf.clip_by_value method. clip_by_value is a TensorFlow method which clips the values outside of the Min-Max range and replaces them with the designated min or max value. The following code clips the values out of range:

Now that we created a regularized and noisy version of our dataset, we can check out how it looks:

 Figure 5. A 2×5 Grid Showing Clean and Noisy Image Samples

As you can see, it is almost impossible to understand what we see in noisy images. However, our autoencoders will marvelously learn to clean it.

Building Our Model

In TensorFlow, apart from Sequential API and Functional API, there is a third option to build models: Model subclassing. In model subclassing, we are free to implement everything from scratch. Model subclassing is fully customizable and enables us to implement our own custom model. It is a very powerful method since we can build any type of model. However, it requires a basic level of object-oriented programming knowledge. Our custom class would subclass the tf.keras.Model object. It also requires declaring several variables and functions. However, it is nothing to be afraid of.

Also note that since we are dealing with image data, it is more efficient to build a convolutional autoencoder, which would look like this:

 Figure 6. A Convolutional Autoencoder Example

To build a model, we simply need to complete the following tasks:

  • Create a class extending the keras.Model object
  • Create an __init__ function to declare two separate models built with Sequential API. Within them, we need to declare layers that would reverse each other. One Conv2D layer for the encoder model whereas one Conv2DTranspose layer for the decoder model.
  • Create a call function to tell the model how to process the inputs using the initialized variables with __init__ method:
  • We need to call the initialized encoder model which takes the images as input
  • We also need to call the initialized decoder model which takes the output of the encoder model (encoded) as input
  • Return the output of the decoder

We can achieve all of them with the code below:

And let’s create the model with an object call:

Configuring Our Model

For this task, we will use an Adam optimizer and Mean Squared Error for our model. We can easily use <strong>compile</strong> function to configure our autoencoder, as shown below:

Finally, we can run our model for 10 epochs by feeding the noisy and the clean images, which will take about 1 minute to train. We also use test datasets for validation. The following code is for training the model:

 Figure 7. Deep Convolutional Autoencoder Training Performance

Reducing Image Noise with Our Trained Autoencoder

Now that we trained our autoencoder, we can start cleaning noisy images. Note that we have access to both encoder and decoder networks since we define them under the NoiseReducer object.

So, first, we will use an encoder to encode our noisy test dataset (x_test_noisy). Then, we will take the encoded output from the encoder to feed into the decoder to obtain the cleaned image. The following lines complete these tasks:

and let’s plot the first 10 samples for a side-by-side comparison:

The first row is for noisy images, the second row is for cleaned (reconstructed) images, and finally, the third row is for original images. See how the cleaned images are similar to the original images:

 Figure 5. A 3×10 Grid Showing Clean and Noisy Image Samples along with Their Reconstructed Counterparts


You have built an autoencoder model, which can successfully clean very noisy images, which it has never seen before (we used the test dataset). There are obviously some non-recovered distortions, such as the missing bottom of the slippers in the second image from the right. Yet, if you consider how deformed the noisy images, we can say that our model is pretty successful in recovering the distorted images.

Off the top of my head, you can -for instance- consider extending this autoencoder and embed it into a photo enhancement app, which can increase the clarity and crispiness of the photos.

Image Generation in 10 Minutes with Generative Adversarial Networks

Image Generation in 10 Minutes with Generative Adversarial Networks

Using Unsupervised Deep Learning to Generate Handwritten Digits with Deep Convolutional GANs using TensorFlow and the MNIST Dataset

Machines are generating perfect images these days and it’s becoming more and more difficult to distinguish the machine-generated images from the originals.

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

Figure 1. Examples of Images Generated by Nvidia’s StyleGAN [2]

Figure 2. Machine Generated Digits using MNIST [3]

After receiving more than 300k views for my article, Image Classification in 10 Minutes with MNIST Dataset, I decided to prepare another tutorial on deep learning. But this time, instead of classifying images, we will generate images using the same MNIST dataset, which stands for Modified National Institute of Standards and Technology database. It is a large database of handwritten digits that is commonly used for training various image processing systems[1].

Generative Adversarial Networks

To generate -well basically- anything with machine learning, we have to use a generative algorithm and at least for now, one of the best performing generative algorithms for image generation is Generative Adversarial Networks (or GANs).

The invention of Generative Adversarial Network

 Figure 3. A Photo of Ian Goodfellow on Wikipedia [4]

The invention of GANs has occurred pretty unexpectedly. The famous AI researcher, then, a Ph.D. fellow at the University of Montreal, Ian Goodfellow, landed on the idea when he was discussing with his friends -at a friend’s going away party- about the flaws of the other generative algorithms. After the party, he came home with high hopes and implemented the concept he had in mind. Surprisingly, everything went as he hoped in the first trial [5] and he successfully created the Generative Adversarial Networks (shortly, GANs). According to Yann Lecun, the director of AI research at Facebook and a professor at New York University, GANs are “the most interesting idea in the last 10 years in machine learning” [6].

The rough structure of the GANs may be demonstrated as follows:

 Figure 4. Generative Adversarial Networks (GANs) utilizing CNNs | (Graph by author)

In an ordinary GAN structure, there are two agents competing with each other: a Generator and a Discriminator. They may be designed using different networks (e.g. Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or just Regular Neural Networks (ANNs or RegularNets)). Since we will generate images, CNNs are better suited for the task. Therefore, we will build our agents with convolutional neural networks.

How does our GAN model operate?

  Figure 5. Generator and Discriminator Relationship in a GAN Network | (Graph by author)

In a nutshell, we will ask the generator to generate handwritten digits without giving it any additional data. Simultaneously, we will fetch the existing handwritten digits to the discriminator and ask it to decide whether the images generated by the Generator are genuine or not. At first, the Generator will generate lousy images that will immediately be labeled as fake by the Discriminator. After getting enough feedback from the Discriminator, the Generator will learn to trick the Discriminator as a result of the decreased variation from the genuine images. Consequently, we will obtain a very good generative model which can give us very realistic outputs.

Building the GAN Model

GANs often use computationally complex calculations and therefore, GPU-enabled machines will make your life a lot easier. Therefore, I will use Google Colab to decrease the training time with GPU acceleration.

GPU-Enabled Training with Google Colab

For machine learning tasks, for a long time, I used to use -iPython- Jupyter Notebook via Anaconda distribution for model building, training, and testing almost exclusively. Lately, though, I have switched to Google Colab for several good reasons.

Google Colab offers several additional features on top of the Jupyter Notebook such as (i) collaboration with other developers, (ii) cloud-based hosting, and (iii) GPU & TPU accelerated training. You can do all these with the free version of Google Colab. The relationship between Python, Jupyter Notebook, and Google Colab can be visualized as follows:

 Figure 6. Relationship between iPython, Jupyter Notebook, Google Colab | (Graph by author)

Anaconda provides a free and open-source distribution of the Python and R programming languages for scientific computing with tools like Jupyter Notebook (iPython) or Jupyter Lab. On top of these tools, Google Colab lets its users use the iPython notebook and lab tools with the computing power of their servers.

Now that we have a general understanding of generative adversarial networks as our neural network architecture and Google Collaboratory as our programming environment, we can start building our model. In this tutorial, we will do our own take from an official TensorFlow tutorial [7].

Initial Imports

Colab already has most machine learning libraries pre-installed, and therefore, you can just import them as shared below:

TensorFlow, Keras Layers, and Matplotlib Imports

For the sake of shorter code, I prefer to import layers individually, as shown above.

Load and Process the MNIST Dataset

For this tutorial, we can use the MNIST dataset. The MNIST dataset contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students [8].

Luckily we may directly retrieve the MNIST dataset from the TensorFlow library. We retrieve the dataset from Tensorflow because this way, we can have the already processed version of it. We still need to do a few preparation and processing works to fit our data into the GAN model. Therefore, in the second line, we separate these two groups as train and test and also separated the labels and the images.

x_train and x_test parts contain greyscale RGB codes (from 0 to 255) while y_train and y_test parts contain labels from 0 to 9 which represents which number they actually are. Since we are doing an unsupervised learning task, we will not need label values and therefore, we use underscores (i.e., _) to ignore them. We also need to convert our dataset to 4-dimensions with the reshape function. Finally, we convert our NumPy array to a TensorFlow Dataset object for more efficient training. The lines below do all these tasks:

Our data is already processed and it is time to build our GAN model.

Build the Model

As mentioned above, every GAN must have at least one generator and one discriminator. Since we are dealing with image data, we need to benefit from Convolution and Transposed Convolution (Inverse Convolution) layers in these networks. Let’s define our generator and discriminator networks below.

Generator Network

Our generator network is responsible for generating 28×28 pixels grayscale fake images from random noise. Therefore, it needs to accept 1-dimensional arrays and output 28×28 pixels images. For this task, we need Transposed Convolution layers after reshaping our 1-dimensional array to a 2-dimensional array. Transposed Convolution layers can increase the size of a smaller array. We also take advantage of BatchNormalization and LeakyReLU layers. The below lines create a function which would generate a generator network with Keras Sequential API:

We can call our generator function with the following code:

 Figure 7. The Summary of Our Generator Network | (Graph by Author)

Now that we have our generator network, we can easily generate a sample image with the following code:

which would look like this:

 Figure 8. A Sample Image Generated by Non-Trained Generator Network | (Image by author)

It is just plain noise. But, the fact that it can create an image from a random noise array proves its potential.

Discriminator Network

For our discriminator network, we need to follow the inverse version of our generator network. It takes the 28×28 pixels image data and outputs a single value, representing the possibility of authenticity. So, our discriminator can review whether a sample image generated by the generator is fake.

We follow the same method that we used to create a generator network, The following lines create a function that would create a discriminator model using Keras Sequential API:

We can call the function to create our discriminator network with the following line:

 Figure 9. The Summary of Our Discriminator Network | (Graph by author)

Finally, we can check what our non-trained discriminator says about the sample generated by the non-trained generator:

Output: tf.Tensor([[-0.00108097]], shape=(1, 1), dtype=float32)

A negative value shows that our non-trained discriminator concludes that the image sample in Figure 8 is fake. At the moment, what’s important is that it can examine images and provide results, and the results will be much more reliable after training.

Configure the Model

Since we are training two sub-networks inside a GAN network, we need to define two loss functions and two optimizers.

Loss Functions: We start by creating a Binary Crossentropy object from tf.keras.losses module. We also set the from_logits parameter to True. After creating the object, we fill them with custom discriminator and generator loss functions. Our discriminator loss is calculated as a combination of (i) the discriminator’s predictions on real images to an array of ones and (ii) its predictions on generated images to an array of zeros. Our generator loss is calculated by measuring how well it was able to trick the discriminator. Therefore, we need to compare the discriminator’s decisions on the generated images to an array of 1s.

Optimizers: We also set two optimizers separately for generator and discriminator networks. We can use the Adam optimizer object from tf.keras.optimizers module.

The following lines configure our loss functions and optimizers

Set the Checkpoints

We would like to have access to previous training steps and TensorFlow has an option for this: checkpoints. By setting a checkpoint directory, we can save our progress at every epoch. This will be especially useful when we restore our model from the last epoch. The following lines configure the training checkpoints by using the os library to set a path to save all the training steps

Train the Model

Now our data ready, our model is created and configured. It is time to design our training loop. Note that at the moment, GANs require custom training loops and steps. I will try to make them as understandable as possible for you. Make sure that you read the code comments in the Github Gists.

Let’s create some of the variables with the following lines:

Our seed is the noise that we use to generate images on top of. The code below generates a random array with normal distribution with the shape (16, 100).

Define the Training Step

This is the most unusual part of our tutorial: We are setting a custom training step. After defining the custom train_step() function by annotating the tf.function module, our model will be trained based on the custom train_step() function we defined.

The code below with excessive comments are for the training step. Please read the comments carefully:

Now that we created our custom training step with tf.function annotation, we can define our train function for the training loop.

Define the Training Loop

We define a function, named train, for our training loop. Not only we run a for loop to iterate our custom training step over the MNIST, but also do the following with a single function:

During the Training:

  • Start recording time spent at the beginning of each epoch;
  • Produce GIF images and display them,
  • Save the model every five epochs as a checkpoint,
  • Print out the completed epoch time; and
  • Generate a final image in the end after the training is completed.

The following lines with detailed comments, do all these tasks:

Image Generation Function

In the train function, there is a custom image generation function that we haven’t defined yet. Our image generation function does the following tasks:

  • Generate images by using the model;
  • Display the generated images in a 4×4 grid layout using matplotlib;
  • Save the final figure in the end

The following lines are in charge of these tasks:

Start the Training

After training three complex functions, starting the training is fairly easy. Just call the train function with the below arguments:

If you use GPU enabled Google Colab notebook, the training will take around 10 minutes. If you are using CPU, it may take much more. Let’s see our final product after 60 epochs.

 Figure 10. The Digits Generated by Our GAN after 60 Epochs. Note that we are seeing 16 samples because we configured our output this way. | (Image by author)

Generate Digits

Before generating new images, let’s make sure we restore the values from the latest checkpoint with the following line:

We can also view the evolution of our generative GAN model by viewing the generated 4×4 grid with 16 sample digits for any epoch with the following code:

This code gives us the latest generated Grid. Pass a different number between 0 to 60 in display_image(function)

or better yet, let’s create a GIF image visualizing the evolution of the samples generated by our GAN with the following code:

Our output is as follows:

Figure 11. The GIF Image Showing the Evolution of our GAN Generated Sample Digits over Time | (Image by author)

As you can see in Figure 11, the outputs generated by our GAN becomes much more realistic over time.


You have built and trained a generative adversarial network (GAN) model, which can successfully create handwritten digits. There are obviously some samples that are not very clear, but only for 60 epochs trained on only 60,000 samples, I would say that the results are very promising.

Once you can build and train this network, you can generate much more complex images,

  • by working with a larger dataset with colored images in high definition;
  • by creating a more sophisticated discriminator and generator network;
  • by increasing the number of epochs;
  • by working on a GPU-enabled powerful hardware

In the end, you can create art pieces such as poems, paintings, text or realistic photos or videos.

Image Classification in 10 Minutes with MNIST Dataset

Image Classification in 10 Minutes with MNIST Dataset

Using Convolutional Neural Networks to Classify Handwritten Digits with TensorFlow and Keras | Supervised Deep Learning

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın – Linkedin

 MNIST Dataset and Number Classification by Katakoda

Before diving into this article, I just want to let you know that if you are into deep learning, I believe you should also check my other articles such as:

1 — Image Noise Reduction in 10 Minutes with Deep Convolutional Autoencoders where we learned to build autoencoders for image denoising;

2 — Predict Tomorrow’s Bitcoin (BTC) Price with Recurrent Neural Networks where we use an RNN to predict BTC prices and since it uses an API, the results always remain up-to-date.

When you start learning deep learning with different neural network architectures, you realize that one of the most powerful supervised deep learning techniques is the Convolutional Neural Networks (abbreviated as “CNN”). The final structure of a CNN is actually very similar to Regular Neural Networks (RegularNets) where there are neurons with weights and biases. In addition, just like in RegularNets, we use a loss function (e.g. crossentropy or softmax) and an optimizer (e.g. adam optimizer) in CNNs [CS231]. Additionally though, in CNNs, there are also Convolutional Layers, Pooling Layers, and Flatten Layers. CNNs are mainly used for image classification although you may find other application areas such as natural language processing.

Why Convolutional Neural Networks

The main structural feature of RegularNets is that all the neurons are connected to each other. For example, when we have images with 28 by 28 pixels in greyscale, we will end up having 784 (28 x 28 x 1) neurons in a layer that seems manageable. However, most images have way more pixels and they are not grey-scaled. Therefore, assuming that we have a set of color images in 4K Ultra HD, we will have 26,542,080 (4096 x 2160 x 3) different neurons connected to each other in the first layer which is not really manageable. Therefore, we can say that RegularNets are not scalable for image classification. However, especially when it comes to images, there seems to be little correlation or relation between two individual pixels unless they are close to each other. This leads to the idea of Convolutional Layers and Pooling Layers.

Layers in a CNN

We are capable of using many different layers in a convolutional neural network. However, convolution, pooling, and fully connected layers are the most important ones. Therefore, I will quickly introduce these layers before implementing them.

Convolutional Layers

The convolutional layer is the very first layer where we extract features from the images in our datasets. Due to the fact that pixels are only related to the adjacent and close pixels, convolution allows us to preserve the relationship between different parts of an image. Convolution is basically filtering the image with a smaller pixel filter to decrease the size of the image without losing the relationship between pixels. When we apply convolution to a 5×5 image by using a 3×3 filter with 1×1 stride (1-pixel shift at each step). We will end up having a 3×3 output (64% decrease in complexity).

 Figure 1: Convolution of 5 x 5 pixel image with 3 x 3 pixel filter (stride = 1 x 1 pixel)

Pooling Layer

When constructing CNNs, it is common to insert pooling layers after each convolution layer to reduce the spatial size of the representation to reduce the parameter counts which reduces the computational complexity. In addition, pooling layers also helps with the overfitting problem. Basically we select a pooling size to reduce the amount of the parameters by selecting the maximum, average, or sum values inside these pixels. Max Pooling, one of the most common pooling techniques, may be demonstrated as follows:

 Max Pooling by 2 x 2

A Set of Fully Connected Layers

A fully connected network is our RegularNet where each parameter is linked to one another to determine the true relation and effect of each parameter on the labels. Since our time-space complexity is vastly reduced thanks to convolution and pooling layers, we can construct a fully connected network in the end to classify our images. A set of fully-connected layers looks like this:

 A fully connected layer with two hidden layers

Now that you have some idea about the individual layers that we will use, I think it is time to share an overview look of a complete convolutional neural network.

 A Convolutional Neural Network Example by Mathworks

And now that you have an idea about how to build a convolutional neural network that you can build for image classification, we can get the most cliche dataset for classification: the MNIST dataset, which stands for Modified National Institute of Standards and Technology database. It is a large database of handwritten digits that is commonly used for training various image processing systems.

Downloading the MNIST Dataset

The MNIST dataset is one of the most common datasets used for image classification and accessible from many different sources. In fact, even Tensorflow and Keras allow us to import and download the MNIST dataset directly from their API. Therefore, I will start with the following two lines to import TensorFlow and MNIST dataset under the Keras API.

The MNIST database contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students [Wikipedia]. Therefore, in the second line, I have separated these two groups as train and test and also separated the labels and the images. x_train and x_test parts contain greyscale RGB codes (from 0 to 255) while y_train and y_test parts contain labels from 0 to 9 which represents which number they actually are. To visualize these numbers, we can get help from matplotlib.

When we run the code above, we will get the greyscale visualization of the RGB codes as shown below.

A visualization of the sample image at index 7777

We also need to know the shape of the dataset to channel it to the convolutional neural network. Therefore, I will use the “shape” attribute of a NumPy array with the following code:

You will get (60000, 28, 28). As you might have guessed 60000 represents the number of images in the train dataset and (28, 28) represents the size of the image: 28 x 28 pixel.

Reshaping and Normalizing the Images

To be able to use the dataset in Keras API, we need 4-dims NumPy arrays. However, as we see above, our array is 3-dims. In addition, we must normalize our data as it is always required in neural network models. We can achieve this by dividing the RGB codes to 255 (which is the maximum RGB code minus the minimum RGB code). This can be done with the following code:

Building the Convolutional Neural Network

We will build our model by using high-level Keras API which uses either TensorFlow or Theano on the backend. I would like to mention that there are several high-level TensorFlow APIs such as Layers, Keras, and Estimators which helps us create neural networks with high-level knowledge. However, this may lead to confusion since they all vary in their implementation structure. Therefore, if you see completely different codes for the same neural network although they all use TensorFlow, this is why. I will use the most straightforward API which is Keras. Therefore, I will import the Sequential Model from Keras and add Conv2D, MaxPooling, Flatten, Dropout, and Dense layers. I have already talked about Conv2D, Maxpooling, and Dense layers. In addition, Dropout layers fight with the overfitting by disregarding some of the neurons while training while Flatten layers flatten 2D arrays to 1D arrays before building the fully connected layers.

We may experiment with any number for the first Dense layer; however, the final Dense layer must have 10 neurons since we have 10 number classes (0, 1, 2, …, 9). You may always experiment with kernel size, pool size, activation functions, dropout rate, and a number of neurons in the first Dense layer to get a better result.

Compiling and Fitting the Model

With the above code, we created a non-optimized empty CNN. Now it is time to set an optimizer with a given loss function that uses a metric. Then, we can fit the model by using our train data. We will use the following code for these tasks:

You can experiment with the optimizer, loss function, metrics, and epochs. However, I can say that adam optimizer is usually out-performs the other optimizers. I am not sure if you can actually change the loss function for multi-class classification. Feel free to experiment and comment below. The epoch number might seem a bit small. However, you will reach to 98–99% test accuracy. Since the MNIST dataset does not require heavy computing power, you may easily experiment with the epoch number as well.

Evaluating the Model

Finally, you may evaluate the trained model with x_test and y_test using one line of code:

The results are pretty good for 10 epochs and for such a simple model.


The evaluation shows 98.5% accuracy on the test set!

We achieved 98.5% accuracy with such a basic model. To be frank, in many image classification cases (e.g. for autonomous cars), we cannot even tolerate 0.1% error since, as an analogy, it will cause 1 accident in 1000 cases. However, for our first model, I would say the result is still pretty good. We can also make individual predictions with the following code:

Our model will classify the image as a ‘9’ and here is the visual of the image:









Our model correctly classifies this image as a 9 (Nine)

Although it is not really a good handwriting of the number 9, our model was able to classify it as 9.


You have successfully built a convolutional neural network to classify handwritten digits with Tensorflow’s Keras API. You have achieved accuracy of over 98% and now you can even save this model & create a digit-classifier app! If you are curious about saving your model, I would like to direct you to the Keras Documentation. After all, to be able to efficiently use an API, one must learn how to read and use the documentation.

Fast Neural Style Transfer in 5 Minutes with TensorFlow Hub & Magenta

Fast Neural Style Transfer in 5 Minutes with TensorFlow Hub & Magenta

Transferring van Gogh’s Unique Style to Photos with Magenta’s Arbitrary Image Stylization Network and Deep Learning

Before we start the tutorial: If you are reading this article, we probably share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. A Neural Style Transfer Example made with Arbitrary Image Stylization Network

I am sure you have come across to deep learning projects on transferring styles of famous painters to new photos. Well, I have been thinking about working on a similar project, but I realized that you can make neural style transfer within minutes, like the one in Figure 1. I will show you how in a second. But, let’s cover some basics first:

Neural Style Transfer (NST)

Neural style transfer is a method to blend two images and create a new image from a content image by copying the style of another image, called style image. This newly created image is often referred to as the stylized image.

History of NST

Image stylization is a two-decade-old problem in the field of non-photorealistic rendering. Non-photorealistic rendering is the opposite of photorealism, which is the study of reproducing an image as realistically as possible. The output of a neural style transfer model is an image that looks similar to the content image but in painting form in the style of the style image.

 Figure 2. Original Work of Leon Gatys on CV-Foundation

Neural style transfer (NST) was first published in the paper “A Neural Algorithm of Artistic Style” by Gatys et al., originally released in 2015. The novelty of the NST method was the use of deep learning to separate the representation of the content of an image from its style of depiction. To achieve this, Gatys et al. used VGG-19 architecture, which was pre-trained on the ImageNet dataset. Even though we can build a custom model following the same methodology, for this tutorial, we will benefit from the models provided in TensorFlow Hub.

Image Analogy

Before the introduction of NST, the most prominent solution to image stylization was the image analogy method. Image Analogy is a method of creating a non-photorealistic rendering filter automatically from training data. In this process, the transformation between photos (A) and non-photorealistic copies (A’) are learned. After this learning process, the model can produce a non-photorealistic copy (B’) from another photo (B). However, NST methods usually outperform image analogy due to the difficulty of finding training data for the image analogy models. Therefore, we can talk about the superiority of NST over image analogy in real-world applications, and that’s why we will focus on the application of an NST model.

 Figure 3. Photo by Jonathan Cosens on Unsplash

Is it Art?

Well, once we build the model, you will see that creating non-photorealistic images with Neural Style Transfer is a very easy task. You can create a lot of samples by blending beautiful photos with the paintings of talented artists. There has been a discussion about whether these outputs are regarded as art because of the little work the creator needs to add to the end product. Feel free to build the model, generate your samples, and share your thoughts in the comments section.

Now that you know the basics of Neural Style Transfer, we can move on to TensorFlow Hub, the repository that we use for our NST work.

TensorFlow Hub

TensorFlow Hub is a collection of trained machine learning models that you can use with ease. TensorFlow’s official description for the Hub is as follows:

TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere. Reuse trained models like BERT and Faster R-CNN with just a few lines of code.

Apart from pre-trained models such as BERT or Faster R-CNN, there are a good amount of pre-trained models. The one we will use is Magenta’s Arbitrary Image Stylization network. Let’s take a look at what Magenta is.

Magenta and Arbitrary Image Stylization

What is Magenta?

 Figure 4. Magenta Logo on Magenta

Magenta is an open-source research project, backed by Google, which aims to provide machine learning solutions to musicians and artists. Magenta has support in both Python and Javascript. Using Magenta, you can create songs, paintings, sounds, and more. For this tutorial, we will use a network trained and maintained by the Magenta team for Arbitrary Image Stylization.

Arbitrary Image Stylization

After observing that the original work for NST proposes a slow optimization for style transfer, the Magenta team developed a fast artistic style transfer method, which can work in real-time. Even though the customizability of the model is limited, it is satisfactory enough to perform a non-photorealistic rendering work with NST. Arbitrary Image Stylization under TensorFlow Hub is a module that can perform fast artistic style transfer that may work on arbitrary painting styles.

By now, you already know what Neural Style Transfer is. You also know that we will benefit from the Arbitrary Image Stylization module developed by the Magenta team, which is maintained in TensorFlow Hub.

Now it is time to code!

Get the Image Paths

 Figure 5. Photo by Paul Hanaoka on Unsplash

We will start by selecting two image files. I will directly load these image files from URLs. You are free to choose any photo you want. Just change the filename and URL in the code below. The content image I selected for this tutorial is the photo of a cat staring at the camera, as you can see in Figure 5.

 Figure 6. Bedroom in Arles by Vincent van Gogh

I would like to transfer the style of van Gogh. So, I chose one of his famous paintings: Bedroom in Arles, which he painted in 1889 while staying in Arles, Bouches-du-Rhône, France. Again, you are free to choose any painting of any artist you want. You can even use your own drawings.

The below code sets the path to get the image files shown in Figure 5 and Figure 6.


Custom Function for Image Scaling

  One thing I noticed that, even though we are very limited with model customization, by rescaling the images, we can change the style transferred to the photo. In fact, I found out that the smaller the images, the better the model transfers the style. Just play with the max_dim parameter if you would like to experiment. Just note that a larger max_dim means, it will take slightly longer to generate the stylized image.  


We will call the img_scaler function below, inside the load_img function.

Custom Function for Preprocessing the Image

Now that we set our image paths to load and img_scaler function to scale the loaded image, we can actually load our image files with the custom function below.

Every line in the Gist below is explained with comments. Please read carefully.


    Now our custom image loading function, load_img, is also created. All we have to do is to call it.  

Load the Content and Style Images

  For content image and style image, we need to call the load_img function once and the result will be a 4-dimensional Tensor, which is what will be required by our model below. The below lines is for this operation.  


Now that we successfully loaded our images, we can plot them with matplotlib, as shown below:


and here is the output:


Figure 7. Content Image on the Left (Photo by Paul Hanaoka on Unsplash) | Style Image on the Right (Bedroom in Arles by Vincent van Gogh)

You are not gonna believe this, but the difficult part is over. Now we can create our network and pass these image Tensors as arguments for NST operation.

Load the Arbitrary Image Stylization Network

We need to import the tensorflow_hub library so that we can use the modules containing the pre-trained models. After importing tensorflow_hub, we can use the load function to load the Arbitrary Image Stylization module as shown below. Finally, as shown in the documentation, we can pass the content and style images as arguments in tf.constant object format. The module returns our stylized image in an array format.

All we have to do is to use this array and plot it with matplotlib. The below lines create a plot free from all the axis and large enough for you to review the image.

… And here is our stylized image:


Figure 8. Paul Hanaoka’s Photo after Neural Style Transfer

Figure 9 summarizes what we have done in this tutorial:

  Figure 9. A Neural Style Transfer Example made with Arbitrary Image Stylization Network


As you can see, with a minimal amount of code (we did not even train a model), we did a pretty good Neural Style Transfer on a random image we took from Unsplash using a painting from Vincent van Gogh. Try different photos and paintings to discover the capabilities of the Arbitrary Image Stylization network. Also, play around with max_dim size, you will see that the style transfer changes to a great extent.

4 Pre-Trained CNN Models to Use for Computer Vision with Transfer Learning

4 Pre-Trained CNN Models to Use for Computer Vision with Transfer Learning

Using State-of-the-Art Pre-trained Neural Network Models to Tackle Computer Vision Problems with Transfer Learning

Before we start, if you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. How Transfer Learning Works (Image by Author)

If you have been trying to build machine learning models with high accuracy; but never tried Transfer Learning, this article will change your life. At least, it did mine!

<script src=””></script>

Most of us have already tried several machine learning tutorials to grasp the basics of neural networks. These tutorials were very helpful to understand the basics of artificial neural networks such as Recurrent Neural Networks, Convolutional Neural Networks, GANs, and Autoencoders. But, their main functionality was to prepare you for real-world implementations.

Now, if you are planning to build an AI system that utilizes deep learning, you either (i) have to have a very large budget for training and excellent AI researchers at your disposal or (ii) have to benefit from transfer learning.

What is Transfer Learning?

Transfer learning is a subfield of machine learning and artificial intelligence which aims to apply the knowledge gained from one task (source task) to a different but similar task (target task).

For example, the knowledge gained while learning to classify Wikipedia texts can be used to tackle legal text classification problems. Another example would be using the knowledge gained while learning to classify cars to recognize the birds in the sky. As you can see there is a relation between these examples. We are not using a text classification model on bird detection.

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

In summary, transfer learning is a field that saves you from having to reinvent the wheel and helps you build AI applications in a very short amount of time.

Figure 2. Don’t Reinvent the Wheel, Transfer the Existing Knowledge (Photo by Jon Cartagena on Unsplash)

History of Transfer Learning

To show the power of transfer learning, we can quote from Andrew Ng:

Transfer learning will be the next driver of machine learning’s commercial success after supervised learning.

The history of Transfer Learning dates back to 1993. With her paper, Discriminability-Based Transfer between Neural Networks, Lorien Pratt opened the pandora’s box and introduced the world with the potential of transfer learning. In July 1997, the journal Machine Learning published a special issue for transfer learning papers. As the field advanced, adjacent topics such as multi-task learning were also included under the field of transfer learning. Learning to Learn is one of the pioneer books in this field. Today, transfer learning is a powerful source for tech entrepreneurs to build new AI solutions and researchers to push the frontiers of machine learning.

 Figure 3. Andrew Ng’s Expectation for the Commercial Success of Machine Learning Subfields (Image by Author)

How Does Transfer Learning Work?

There are three requirements to achieve transfer learning:

  • Development of an Open Source Pre-trained Model by a Third Party
  • Repurposing the Model
  • Fine Tuning for the Problem

Development of an Open Source Pre-trained Model

A pre-trained model is a model created and trained by someone else to solve a problem that is similar to ours. In practice, someone is almost always a tech giant or a group of star researchers. They usually choose a very large dataset as their base datasets such as ImageNet or the Wikipedia Corpus. Then, they create a large neural network (e.g., VGG19 has 143,667,240 parameters) to solve a particular problem (e.g., this problem is image classification for VGG19). Of course, this pre-trained model must be made public so that we can take these models and repurpose them.

Repurposing the Model

After getting our hands on these pre-trained models, we repurpose the learned knowledge, which includes the layers, features, weights, and biases. There are several ways to load a pre-trained model into our environment. In the end, it is just a file/folder which contains the relevant information. However, deep learning libraries already host many of these pre-trained models, which makes them more accessible and convenient:

You can use one of the sources above to load a trained model. It will usually come with all the layers and weights and you can edit the network as you wish.

Fine-Tuning for the Problem

Well, while the current model may work for our problem. It is often better to fine-tune the pre-trained model for two reasons:

  • So that we can achieve even higher accuracy;
  • Our fine-tuned model can generate the output in the correct format.

Generally speaking, in a neural network, while the bottom and mid-level layers usually represent general features, the top layers represent the problem-specific features. Since our new problem is different than the original problem, we tend to drop the top layers. By adding layers specific to our problems, we can achieve higher accuracy.

After dropping the top layers, we need to place our own layers so that we can get the output we want. For example, a model trained with ImageNet can classify up to 1000 objects. If we are trying to classify handwritten digits (e.g., MNIST classification), it may be better to end up with a final layer with only 10 neurons.

After we add our custom layers to the pre-trained model, we can configure it with special loss functions and optimizers and fine-tune it with extra training.

For a quick Transfer Learning tutorial, you may visit the post below:

4 Pre-Trained Models for Computer Vision

Here are the four pre-trained networks you can use for computer vision tasks such as ranging from image generation, neural style transfer, image classification, image captioning, anomaly detection, and so on:

  • VGG19
  • Inceptionv3 (GoogLeNet)
  • ResNet50
  • EfficientNet

Let’s dive into them one-by-one.


VGG is a convolutional neural network which has a depth of 19 layers. It was build and trained by Karen Simonyan and Andrew Zisserman at the University of Oxford in 2014 and you can access all the information from their paper, Very Deep Convolutional Networks for Large-Scale Image Recognition, which was published in 2015. The VGG-19 network is also trained using more than 1 million images from the ImageNet database. Naturally, you can import the model with the ImageNet trained weights. This pre-trained network can classify up to 1000 objects. The network was trained on 224×224 pixels colored images. Here is brief info about its size and performance:

  • Size: 549 MB
  • Top-1: Accuracy: 71.3%
  • Top-5: Accuracy: 90.0%
  • Number of Parameters: 143,667,240
  • Depth: 26
  Figure 4. An Illustration of the VGG-19 Network (Figure by Clifford K. Yang and Yufeng Zheng on ResearchGate)

Inceptionv3 (GoogLeNet)

Inceptionv3 is a convolutional neural network which has a depth of 50 layers. It was build and trained by Google and you can access all the information on the paper, titled “Going deeper with convolutions”. The pre-trained version of Inceptionv3 with the ImageNet weights can classify up to 1000 objects. The image input size of this network was 299×299 pixels, which is larger than the VGG19 network. While VGG19 was the runner up in 2014’s ImageNet competition, Inception was the winner. The brief summary of Inceptionv3 features is as follows:

  • Size: 92 MB
  • Top-1: Accuracy: 77.9%
  • Top-5: Accuracy: 93.7%
  • Number of Parameters: 23,851,784
  • Depth: 159



Figure 5. An Illustration of the Inceptionv3 Network (Figure by Masoud Mahdianpari, Bahram Salehi, and Mohammad Rezaee on ResearchGate)

ResNet50 (Residual Network)

ResNet50 is a convolutional neural network which has a depth of 50 layers. It was build and trained by Microsoft in 2015 and you can access the model performance results on their paper, titled Deep Residual Learning for Image Recognition. This model is also trained on more than 1 million images from the ImageNet database. Just like VGG-19, it can classify up to 1000 objects and the network was trained on 224×224 pixels colored images. Here is brief info about its size and performance:

  • Size: 98 MB
  • Top-1: Accuracy: 74.9%
  • Top-5: Accuracy: 92.1%
  • Number of Parameters: 25,636,712

If you compare ResNet50 to VGG19, you will see that ResNet50 actually outperforms VGG19 even though it has lower complexity. ResNet50 was improved several times and you also have access to newer versions such as ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2.




 Figure 6. An Illustration of the ResNet50 Network (Figure by Masoud Mahdianpari, Bahram Salehi, and Mohammad Rezaee on ResearchGate)


EfficientNet is a state-of-the-art convolutional neural network that was trained and released to the public by Google with the paper “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” in 2019. There are 8 alternative implementations of EfficientNet (B0 to B7) and even the simplest one, EfficientNetB0, is outstanding. With 5.3 million parameters, it achieves a 77.1% Top-1 accuracy performance.

 Figure 7. EfficientNet Model Size vs. ImageNet Accuracy (Plot by Mingxing Tan and Quoc V. Le on Arxiv)



The brief summary of EfficientNetB0 features is as follows:

  • Size: 29 MB
  • Top-1: Accuracy: 77.1%
  • Top-5: Accuracy: 93.3%
  • Number of Parameters: ~5,300,000
  • Depth: 159

Other Pre-Trained Models for Computer Vision Problems

We listed the four state-of-the-art award-winning convolutional neural network models. However, there are dozens of other models available for transfer learning. Here is a benchmark analysis of these models, which are all available in Keras Applications.

Check out the Table: Benchmark Analysis of Pre-Trained CNN Models (Table by Author)
Table 1. Benchmark Analysis of Pre-Trained CNN Models (Table by Author)


In a world where we have easy access to state-of-the-art neural network models, trying to build your own model with limited resources is like trying to reinvent the wheel. It is pointless.

Instead try to work with these train models, add a couple of new layers on top considering your particular computer vision task, and train. The results will be much more successful than a model you build from scratch.

Subscribe to the Newsletter for Google Colab Notebooks

If you would like to have access to full code on Google Colab and have access to the latest content, subscribe to the mailing list!