Subscribe to Get All the Blog Posts and Colab Notebooks 

Mastering TensorFlow Tensors in 5 Easy Steps

Mastering TensorFlow Tensors in 5 Easy Steps

Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects | Deep Learning with TensorFlow 2.x

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin




Photo by Esther Jiao on Unsplash

In this post, we will dive into the details of TensorFlow Tensors. We will cover all the topics related to Tensors in Tensorflow in these five simple steps:

  • Step I: Definition of Tensors → What is a Tensor?
  • Step II: Creation of Tensors → Functions to Create Tensor Objects
  • Step III: Qualifications of Tensors → Characteristics and Features of Tensor Objects
  • Step IV: Operations with Tensors → Indexing, Basic Tensor Operations, Shape Manipulation, and Broadcasting
  • Step V: Special Types of Tensors → Special Tensor Types Other than Regular Tensors

Let’s start!

Definition of Tensors: What is a Tensor?



Figure 1. A Visualization of Rank-3 Tensors (Figure by Author)

Tensors are TensorFlow’s multi-dimensional arrays with uniform type. They are very similar to NumPy arrays, and they are immutable, which means that they cannot be altered once created. You can only create a new copy with the edits.

Let’s see how Tensors work with code example. But first, to work with TensorFlow objects, we need to import the TensorFlow library. We often use NumPy with TensorFlow, so let’s also import NumPy with the following lines:

Creation of Tensors: Creating Tensor Objects

There are several ways to create a tf.Tensor object. Let’s start with a few examples. You can create Tensor objects with several TensorFlow functions, as shown in the below examples:

tf.constant, tf.ones, tf.zeros, and tf.range are some of the functions you can use to create Tensor objects
tf.Tensor([[1 2 3 4 5]], shape=(1, 5), dtype=int32)
tf.Tensor([[1. 1. 1. 1. 1.]], shape=(1, 5), dtype=float32)
tf.Tensor([[0. 0. 0. 0. 0.]], shape=(1, 5), dtype=float32)
tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

As you can see, we created Tensor objects with the shape (1, 5) with three different functions and a fourth Tensor object with the shape (5, )using tf.range() function. Note that tf.ones and tf.zeros accepts the shape as the required argument since their element values are pre-determined.

Qualifications of Tensors: Characteristics and Features of Tensor Objects

TensorFlow Tensors are created as tf.Tensor objects, and they have several characteristic features. First of all, they have a rank based on the number of dimensions they have. Secondly, they have a shape, a list that consists of the lengths of all their dimensions. All tensors have a size, which is the total number of elements within a Tensor. Finally, their elements are all recorded in a uniform Dtype (data type). Let’s take a closer look at each of these features.

Rank System and Dimension

Tensors are categorized based on the number of dimensions they have:

  • Rank-0 (Scalar) Tensor: A tensor containing a single value and no axes (0-dimension);
  • Rank-1 Tensor: A tensor containing a list of values in a single axis (1-dimension);
  • Rank-2 Tensor: A tensor containing 2-axes (2-dimensions); and
  • Rank-N Tensor: A tensor containing N-axis (N-dimensions).
 Figure 2. Rank-1 Tensor | Rank-2 Tensor| Rank-3 Tensor (Figure by Author)

For example, we can create a Rank-3 tensor by passing a three-level nested list object to the tf.constant function. For this example, we can split the numbers into a 3-level nested list with three-element at each level:

The code to create a Rank-3 Tensor object
tf.Tensor( [[[ 0 1 2]
[ 3 4 5]]



[[ 6 7 8]
[ 9 10 11]]],
shape=(2, 2, 3), dtype=int32)

We can view the number of dimensions that our `rank_3_tensor` object currently has with the `.ndim` attribute.

The number of dimensions in our Tensor object is 3


The shape feature is another attribute that every Tensor has. It shows the size of each dimension in the form of a list. We can view the shape of the rank_3_tensor object we created with the .shape attribute, as shown below:

The shape of our Tensor object is (2, 2, 3)

As you can see, our tensor has 2 elements at the first level, 2 elements in the second level, and 3 elements in the third level.


Size is another feature that Tensors have, and it means the total number of elements a Tensor has. We cannot measure the size with an attribute of the Tensor object. Instead, we need to use tf.size() function. Finally, we will convert the output to NumPy with the instance function .numpy() to get a more readable result:

The size of our Tensor object is 12


Tensors often contain numerical data types such as floats and ints, but may contain many other data types such as complex numbers and strings.

Each Tensor object, however, must store all its elements in a single uniform data type. Therefore, we can also view the type of data selected for a particular Tensor object with the .dtype attribute, as shown below:

The data type selected for this Tensor object is <dtype: 'int32'>

Operations with Tensors


An index is a numerical representation of an item’s position in a sequence. This sequence can refer to many things: a list, a string of characters, or any arbitrary sequence of values.

TensorFlow also follows standard Python indexing rules, which is similar to list indexing or NumPy array indexing.

A few rules about indexing:

  1. Indices start at zero (0).
  2. Negative index (“-n”) value means backward counting from the end.
  3. Colons (“:”) are used for slicing: start:stop:step.
  4. Commas (“,”) are used to reach deeper levels.

Let’s create a rank_1_tensor with the following lines:

tf.Tensor([ 0 1 2 3 4 5 6 7 8 9 10 11],
shape=(12,), dtype=int32)

and test out our rules no.1, no.2, and no.3:

First element is: 0
Last element is: 11
Elements in between the 1st and the last are: [ 1 2 3 4 5 6 7 8 9 10]

Now, let’s create our rank_2_tensor object with the following code:

tf.Tensor( [[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]], shape=(2, 6), dtype=int32)

and test the 4th rule with several examples:

The first element of the first level is: [0 1 2 3 4 5]
The second element of the first level is: [ 6 7 8 9 10 11]
The first element of the second level is: 0
The third element of the second level is: 2

Now, we covered the basics of indexing, so let’s take a look at the basic operations we can conduct on Tensors.

Basic Operations with Tensors

You can easily do basic math operations on tensors such as:

  1. Addition
  2. Element-wise Multiplication
  3. Matrix Multiplication
  4. Finding the Maximum or Minimum
  5. Finding the Index of the Max Element
  6. Computing Softmax Value

Let’s see these operations in action. We will create two Tensor objects and apply these operations.

We can start with addition.

tf.Tensor( [[ 3. 7.]
[11. 15.]], shape=(2, 2), dtype=float32)

Let’s continue with the element-wise multiplication.

tf.Tensor( [[ 2. 12.]
[30. 56.]], shape=(2, 2), dtype=float32)

We can also do matrix multiplication:

tf.Tensor( [[22. 34.]
[46. 74.]], shape=(2, 2), dtype=float32)

NOTE: Matmul operations lays in the heart of deep learning algorithms. Therefore, although you will not use matmul directly, it is crucial to be aware of these operations.

Examples of other operations we listed above:

The Max value of the tensor object b is: 7.0
The index position of the Max of the tensor object b is: [1 1]
The softmax computation result of the tensor object b is: [[0.11920291 0.880797 ] [0.11920291 0.880797 ]]

Manipulating Shapes

Just as in NumPy arrays and pandas DataFrames, you can reshape Tensor objects as well.

The tf.reshape operations are very fast since the underlying data does not need to be duplicated. For the reshape operation, we can use thetf.reshape() function. Let’s use the tf.reshape function in code:

The shape of our initial Tensor object is: (1, 6)
The shape of our initial Tensor object is: (6, 1)
The shape of our initial Tensor object is: (3, 2)
The shape of our flattened Tensor object is: tf.Tensor([1 2 3 4 5 6], shape=(6,), dtype=int32)

As you can see, we can easily reshape our Tensor objects. But beware that when doing reshape operations, a developer must be reasonable. Otherwise, the Tensor might get mixed up or can even raise an error. So, look out for that 😀.


When we try to do combined operations using multiple Tensor objects, the smaller Tensors can stretch out automatically to fit larger tensors, just as NumPy arrays can. For example, when you attempt to multiply a scalar Tensor with a Rank-2 Tensor, the scalar is stretched to multiply every Rank-2 Tensor element. See the example below:

tf.Tensor( [[ 5 10]
[15 20]], shape=(2, 2), dtype=int32)

Thanks to broadcasting, you don’t have to worry about matching sizes when doing math operations on Tensors.

Special Types of Tensors

We tend to generate Tensors in a rectangular shape and store numerical values as elements. However, TensorFlow also supports irregular, or specialized, Tensor types, which are:

  1. Ragged Tensors
  2. String Tensors
  3. Sparse Tensors

Figure 3. Ragged Tensor | String Tensor| Sparse Tensor (Figure by Author)

Let’s take a closer look at what each of them is.

Ragged Tensors

Ragged tensors are tensors with different numbers of elements along the size axis, as shown in Figure X.

You can build a Ragged Tensor, as shown below:

<tf.RaggedTensor [[1, 2, 3],
[4, 5],

String Tensors

String Tensors are tensors, which stores string objects. We can build a String Tensor just as you create a regular Tensor object. But, we pass string objects as elements instead of numerical objects, as shown below:

tf.Tensor([b'With this'
b'code, I am'
b'creating a String Tensor'],
shape=(3,), dtype=string)

Sparse tensors

Finally, Sparse Tensors are rectangular Tensors for sparse data. When you have holes (i.e., Null values) in your data, Sparse Tensors are to-go objects. Creating a sparse Tensor is a bit time consuming and should be more mainstream. But, here is an example:


tf.Tensor( [[ 25 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 50 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 100]], shape=(5, 5), dtype=int32)


We have successfully covered the basics of TensorFlow’s Tensor objects.

Give yourself a pat on the back!

This should give you a lot of confidence since you are now much more informed about the building blocks of the TensorFlow framework.

Check Part 1 of this tutorial series:

Continue with Part 3 of the series:

Image Noise Reduction in 10 Minutes with Deep Convolutional Autoencoders

Image Noise Reduction in 10 Minutes with Deep Convolutional Autoencoders

Using Autoencoders to Clean (or Denoise) Noisy Images with the help of Fashion MNIST | Unsupervised Deep Learning with TensorFlow

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. Before and After the Noise Reduction of an Image of a Playful Dog (Photo by Anna Dudkova on Unsplash)

If you are on this page, you are also probably somewhat familiar with different neural network architectures. You have probably heard of feedforward neural networks, CNNs, RNNs and that these neural networks are very good for solving supervised learning tasks such as regression and classification.

But, we have a whole world of problems on the unsupervised learning sphere such as dimensionality reduction, feature extraction, anomaly detection, data generation, and augmentation as well as noise reduction. For these tasks, we need the help of special neural networks that are developed particularly for unsupervised learning tasks. Therefore, they must be able to solve mathematical equations without needing supervision. One of these special neural network architectures is autoencoders.


What are autoencoders?

Autoencoders are neural network architectures that consist of two sub-networks, namely, encoder and decoder networks, which are tied to each other with a latent space. Autoencoders were first developed by Geoffrey Hinton, one of the most respected scientists in the AI community, and the PDP group in the 1980s. Hinton and the PDP Group aimed to address the “backpropagation without a teacher” problem, a.k.a. unsupervised learning, by using the input as the teacher. In other words, they simply used feature data both as feature data and label data. Let’s take a closer look at how autoencoders work!

  Figure 2. An Autoencoder Network with Encoder and Decoder Networks

Autoencoder Architecture

Autoencoders consists of an encoder network, which takes the feature data and encodes it to fit into the latent space. This encoded data (i.e., code) is used by the decoder to convert back to the feature data. In an encoder, what the model learns is how to encode the data efficiently so that the decoder can convert it back to the original. Therefore, the essential part of autoencoder training is to generate an optimized latent space.

Now, know that in most cases, the number of neurons in the latent space is much smaller than the input and output layers, but it does not have to be that way. There are different types of autoencoders such as undercomplete, overcomplete, sparse, denoising, contractive, and variational autoencoders. In this tutorial, we only focus on undercomplete autoencoders which are used for denoising.

Layers in an Autoencoder

The standard practice when building an autoencoder is to design an encoder and to create an inversed version of this network as the decoder of this autoencoder. So, as long as there is an inverse relationship between the encoder and the decoder network, you are free to add any layer to these sub-networks. For example, if you are dealing with image data, you would surely need convolution and pooling layers. On the other hand, if you are dealing with sequence data, you would probably need LSTM, GRU, or RNN units. The important point here is that you are free to build anything you want.

 Figure 3. Latent Spaces in UnderComplete Autoencoders are Usually Narrower than Other Layers

And now that you have an idea of autoencoders that you can build for image noise reduction, we can move on to the tutorial and start writing our code for our image noise reduction model. For the tutorial, we choose to do our own take on one of TensorFlow’s official tutorials, Intro to Autoencoders and we will use a very popular dataset among the members of the AI community: Fashion MNIST.

Downloading the Fashion MNIST Dataset

Fashion-MNIST is designed and maintained by Zalando, a European e-commerce company based in Berlin, Germany. Fashion MNIST consists of a training set of 60,000 images and a test set of 10,000 images. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion MNIST, which contains images of clothing items (as shown in Figure 4), is designed as an alternative dataset to the MNIST dataset, which contains handwritten digits. We choose Fashion MNIST simply because MNIST is already overused in many tutorials.

The lines below import TensorFlow and load Fashion MNIST:

Now let’s generate a grid with samples from our dataset with the following lines:

Our output shows the first 50 samples from the test dataset:

 Figure 4. A 5×10 Grid Showing the First 50 Samples in Fashion MNIST Test Dataset

Processing the Fashion MNIST Data

For computational efficiency and model reliability, we have to apply Minmax normalization to our image data, limiting the value range between 0 and 1. Since our data is in RGB format, the minimum value is 0 and the maximum value is 255 and we can conduct the Minmax normalization operation with the following lines:

We also have to reshape our NumPy array as the current shape of the datasets is (60000, 28, 28) and (10000, 28, 28). We just need to add a fourth dimension with a single value (e.g., from (60000, 28, 28) to (60000, 28, 28, 1)). The fourth dimension acts pretty much as proof that our data is in grayscale format with a single value representing color information ranging from white to black. If we’d have colored images, then we would need three values in our fourth dimension. But all we need is a fourth dimension containing a single value since we use grayscale images. The following lines do this:

Let’s take a look at the shape of our NumPy arrays with the following lines:

Output: (60000, 28, 28, 1) and (10000, 28, 28, 1)

Adding Noise to the Images

Remember our goal is to build a model, which is capable of performing noise reduction on images. To be able to do this, we will use existing image data and add them to random noise. Then, we will feed the original images as input and noisy images as output. Our autoencoder will learn the relationship between a clean image and a noisy image and how to clean a noisy image. So let’s create a noisy version of our Fashion MNIST dataset.

For this task, we add a randomly generated value to each array item by using tf.random.normal method. Then, we multiply the random value with a noise_factor, which you can play around with. The following code adds noise to images:

We also need to make sure that our array item values are within the range of 0 to 1. For this, we may use tf.clip_by_value method. clip_by_value is a TensorFlow method which clips the values outside of the Min-Max range and replaces them with the designated min or max value. The following code clips the values out of range:

Now that we created a regularized and noisy version of our dataset, we can check out how it looks:

 Figure 5. A 2×5 Grid Showing Clean and Noisy Image Samples

As you can see, it is almost impossible to understand what we see in noisy images. However, our autoencoders will marvelously learn to clean it.

Building Our Model

In TensorFlow, apart from Sequential API and Functional API, there is a third option to build models: Model subclassing. In model subclassing, we are free to implement everything from scratch. Model subclassing is fully customizable and enables us to implement our own custom model. It is a very powerful method since we can build any type of model. However, it requires a basic level of object-oriented programming knowledge. Our custom class would subclass the tf.keras.Model object. It also requires declaring several variables and functions. However, it is nothing to be afraid of.

Also note that since we are dealing with image data, it is more efficient to build a convolutional autoencoder, which would look like this:

 Figure 6. A Convolutional Autoencoder Example

To build a model, we simply need to complete the following tasks:

  • Create a class extending the keras.Model object
  • Create an __init__ function to declare two separate models built with Sequential API. Within them, we need to declare layers that would reverse each other. One Conv2D layer for the encoder model whereas one Conv2DTranspose layer for the decoder model.
  • Create a call function to tell the model how to process the inputs using the initialized variables with __init__ method:
  • We need to call the initialized encoder model which takes the images as input
  • We also need to call the initialized decoder model which takes the output of the encoder model (encoded) as input
  • Return the output of the decoder

We can achieve all of them with the code below:

And let’s create the model with an object call:

Configuring Our Model

For this task, we will use an Adam optimizer and Mean Squared Error for our model. We can easily use <strong>compile</strong> function to configure our autoencoder, as shown below:

Finally, we can run our model for 10 epochs by feeding the noisy and the clean images, which will take about 1 minute to train. We also use test datasets for validation. The following code is for training the model:

 Figure 7. Deep Convolutional Autoencoder Training Performance

Reducing Image Noise with Our Trained Autoencoder

Now that we trained our autoencoder, we can start cleaning noisy images. Note that we have access to both encoder and decoder networks since we define them under the NoiseReducer object.

So, first, we will use an encoder to encode our noisy test dataset (x_test_noisy). Then, we will take the encoded output from the encoder to feed into the decoder to obtain the cleaned image. The following lines complete these tasks:

and let’s plot the first 10 samples for a side-by-side comparison:

The first row is for noisy images, the second row is for cleaned (reconstructed) images, and finally, the third row is for original images. See how the cleaned images are similar to the original images:

 Figure 5. A 3×10 Grid Showing Clean and Noisy Image Samples along with Their Reconstructed Counterparts


You have built an autoencoder model, which can successfully clean very noisy images, which it has never seen before (we used the test dataset). There are obviously some non-recovered distortions, such as the missing bottom of the slippers in the second image from the right. Yet, if you consider how deformed the noisy images, we can say that our model is pretty successful in recovering the distorted images.

Off the top of my head, you can -for instance- consider extending this autoencoder and embed it into a photo enhancement app, which can increase the clarity and crispiness of the photos.

Image Generation in 10 Minutes with Generative Adversarial Networks

Image Generation in 10 Minutes with Generative Adversarial Networks

Using Unsupervised Deep Learning to Generate Handwritten Digits with Deep Convolutional GANs using TensorFlow and the MNIST Dataset

Machines are generating perfect images these days and it’s becoming more and more difficult to distinguish the machine-generated images from the originals.

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

Figure 1. Examples of Images Generated by Nvidia’s StyleGAN [2]

Figure 2. Machine Generated Digits using MNIST [3]

After receiving more than 300k views for my article, Image Classification in 10 Minutes with MNIST Dataset, I decided to prepare another tutorial on deep learning. But this time, instead of classifying images, we will generate images using the same MNIST dataset, which stands for Modified National Institute of Standards and Technology database. It is a large database of handwritten digits that is commonly used for training various image processing systems[1].

Generative Adversarial Networks

To generate -well basically- anything with machine learning, we have to use a generative algorithm and at least for now, one of the best performing generative algorithms for image generation is Generative Adversarial Networks (or GANs).

The invention of Generative Adversarial Network

 Figure 3. A Photo of Ian Goodfellow on Wikipedia [4]

The invention of GANs has occurred pretty unexpectedly. The famous AI researcher, then, a Ph.D. fellow at the University of Montreal, Ian Goodfellow, landed on the idea when he was discussing with his friends -at a friend’s going away party- about the flaws of the other generative algorithms. After the party, he came home with high hopes and implemented the concept he had in mind. Surprisingly, everything went as he hoped in the first trial [5] and he successfully created the Generative Adversarial Networks (shortly, GANs). According to Yann Lecun, the director of AI research at Facebook and a professor at New York University, GANs are “the most interesting idea in the last 10 years in machine learning” [6].

The rough structure of the GANs may be demonstrated as follows:

 Figure 4. Generative Adversarial Networks (GANs) utilizing CNNs | (Graph by author)

In an ordinary GAN structure, there are two agents competing with each other: a Generator and a Discriminator. They may be designed using different networks (e.g. Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or just Regular Neural Networks (ANNs or RegularNets)). Since we will generate images, CNNs are better suited for the task. Therefore, we will build our agents with convolutional neural networks.

How does our GAN model operate?

  Figure 5. Generator and Discriminator Relationship in a GAN Network | (Graph by author)

In a nutshell, we will ask the generator to generate handwritten digits without giving it any additional data. Simultaneously, we will fetch the existing handwritten digits to the discriminator and ask it to decide whether the images generated by the Generator are genuine or not. At first, the Generator will generate lousy images that will immediately be labeled as fake by the Discriminator. After getting enough feedback from the Discriminator, the Generator will learn to trick the Discriminator as a result of the decreased variation from the genuine images. Consequently, we will obtain a very good generative model which can give us very realistic outputs.

Building the GAN Model

GANs often use computationally complex calculations and therefore, GPU-enabled machines will make your life a lot easier. Therefore, I will use Google Colab to decrease the training time with GPU acceleration.

GPU-Enabled Training with Google Colab

For machine learning tasks, for a long time, I used to use -iPython- Jupyter Notebook via Anaconda distribution for model building, training, and testing almost exclusively. Lately, though, I have switched to Google Colab for several good reasons.

Google Colab offers several additional features on top of the Jupyter Notebook such as (i) collaboration with other developers, (ii) cloud-based hosting, and (iii) GPU & TPU accelerated training. You can do all these with the free version of Google Colab. The relationship between Python, Jupyter Notebook, and Google Colab can be visualized as follows:

 Figure 6. Relationship between iPython, Jupyter Notebook, Google Colab | (Graph by author)

Anaconda provides a free and open-source distribution of the Python and R programming languages for scientific computing with tools like Jupyter Notebook (iPython) or Jupyter Lab. On top of these tools, Google Colab lets its users use the iPython notebook and lab tools with the computing power of their servers.

Now that we have a general understanding of generative adversarial networks as our neural network architecture and Google Collaboratory as our programming environment, we can start building our model. In this tutorial, we will do our own take from an official TensorFlow tutorial [7].

Initial Imports

Colab already has most machine learning libraries pre-installed, and therefore, you can just import them as shared below:

TensorFlow, Keras Layers, and Matplotlib Imports

For the sake of shorter code, I prefer to import layers individually, as shown above.

Load and Process the MNIST Dataset

For this tutorial, we can use the MNIST dataset. The MNIST dataset contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students [8].

Luckily we may directly retrieve the MNIST dataset from the TensorFlow library. We retrieve the dataset from Tensorflow because this way, we can have the already processed version of it. We still need to do a few preparation and processing works to fit our data into the GAN model. Therefore, in the second line, we separate these two groups as train and test and also separated the labels and the images.

x_train and x_test parts contain greyscale RGB codes (from 0 to 255) while y_train and y_test parts contain labels from 0 to 9 which represents which number they actually are. Since we are doing an unsupervised learning task, we will not need label values and therefore, we use underscores (i.e., _) to ignore them. We also need to convert our dataset to 4-dimensions with the reshape function. Finally, we convert our NumPy array to a TensorFlow Dataset object for more efficient training. The lines below do all these tasks:

Our data is already processed and it is time to build our GAN model.

Build the Model

As mentioned above, every GAN must have at least one generator and one discriminator. Since we are dealing with image data, we need to benefit from Convolution and Transposed Convolution (Inverse Convolution) layers in these networks. Let’s define our generator and discriminator networks below.

Generator Network

Our generator network is responsible for generating 28×28 pixels grayscale fake images from random noise. Therefore, it needs to accept 1-dimensional arrays and output 28×28 pixels images. For this task, we need Transposed Convolution layers after reshaping our 1-dimensional array to a 2-dimensional array. Transposed Convolution layers can increase the size of a smaller array. We also take advantage of BatchNormalization and LeakyReLU layers. The below lines create a function which would generate a generator network with Keras Sequential API:

We can call our generator function with the following code:

 Figure 7. The Summary of Our Generator Network | (Graph by Author)

Now that we have our generator network, we can easily generate a sample image with the following code:

which would look like this:

 Figure 8. A Sample Image Generated by Non-Trained Generator Network | (Image by author)

It is just plain noise. But, the fact that it can create an image from a random noise array proves its potential.

Discriminator Network

For our discriminator network, we need to follow the inverse version of our generator network. It takes the 28×28 pixels image data and outputs a single value, representing the possibility of authenticity. So, our discriminator can review whether a sample image generated by the generator is fake.

We follow the same method that we used to create a generator network, The following lines create a function that would create a discriminator model using Keras Sequential API:

We can call the function to create our discriminator network with the following line:

 Figure 9. The Summary of Our Discriminator Network | (Graph by author)

Finally, we can check what our non-trained discriminator says about the sample generated by the non-trained generator:

Output: tf.Tensor([[-0.00108097]], shape=(1, 1), dtype=float32)

A negative value shows that our non-trained discriminator concludes that the image sample in Figure 8 is fake. At the moment, what’s important is that it can examine images and provide results, and the results will be much more reliable after training.

Configure the Model

Since we are training two sub-networks inside a GAN network, we need to define two loss functions and two optimizers.

Loss Functions: We start by creating a Binary Crossentropy object from tf.keras.losses module. We also set the from_logits parameter to True. After creating the object, we fill them with custom discriminator and generator loss functions. Our discriminator loss is calculated as a combination of (i) the discriminator’s predictions on real images to an array of ones and (ii) its predictions on generated images to an array of zeros. Our generator loss is calculated by measuring how well it was able to trick the discriminator. Therefore, we need to compare the discriminator’s decisions on the generated images to an array of 1s.

Optimizers: We also set two optimizers separately for generator and discriminator networks. We can use the Adam optimizer object from tf.keras.optimizers module.

The following lines configure our loss functions and optimizers

Set the Checkpoints

We would like to have access to previous training steps and TensorFlow has an option for this: checkpoints. By setting a checkpoint directory, we can save our progress at every epoch. This will be especially useful when we restore our model from the last epoch. The following lines configure the training checkpoints by using the os library to set a path to save all the training steps

Train the Model

Now our data ready, our model is created and configured. It is time to design our training loop. Note that at the moment, GANs require custom training loops and steps. I will try to make them as understandable as possible for you. Make sure that you read the code comments in the Github Gists.

Let’s create some of the variables with the following lines:

Our seed is the noise that we use to generate images on top of. The code below generates a random array with normal distribution with the shape (16, 100).

Define the Training Step

This is the most unusual part of our tutorial: We are setting a custom training step. After defining the custom train_step() function by annotating the tf.function module, our model will be trained based on the custom train_step() function we defined.

The code below with excessive comments are for the training step. Please read the comments carefully:

Now that we created our custom training step with tf.function annotation, we can define our train function for the training loop.

Define the Training Loop

We define a function, named train, for our training loop. Not only we run a for loop to iterate our custom training step over the MNIST, but also do the following with a single function:

During the Training:

  • Start recording time spent at the beginning of each epoch;
  • Produce GIF images and display them,
  • Save the model every five epochs as a checkpoint,
  • Print out the completed epoch time; and
  • Generate a final image in the end after the training is completed.

The following lines with detailed comments, do all these tasks:

Image Generation Function

In the train function, there is a custom image generation function that we haven’t defined yet. Our image generation function does the following tasks:

  • Generate images by using the model;
  • Display the generated images in a 4×4 grid layout using matplotlib;
  • Save the final figure in the end

The following lines are in charge of these tasks:

Start the Training

After training three complex functions, starting the training is fairly easy. Just call the train function with the below arguments:

If you use GPU enabled Google Colab notebook, the training will take around 10 minutes. If you are using CPU, it may take much more. Let’s see our final product after 60 epochs.

 Figure 10. The Digits Generated by Our GAN after 60 Epochs. Note that we are seeing 16 samples because we configured our output this way. | (Image by author)

Generate Digits

Before generating new images, let’s make sure we restore the values from the latest checkpoint with the following line:

We can also view the evolution of our generative GAN model by viewing the generated 4×4 grid with 16 sample digits for any epoch with the following code:

This code gives us the latest generated Grid. Pass a different number between 0 to 60 in display_image(function)

or better yet, let’s create a GIF image visualizing the evolution of the samples generated by our GAN with the following code:

Our output is as follows:

Figure 11. The GIF Image Showing the Evolution of our GAN Generated Sample Digits over Time | (Image by author)

As you can see in Figure 11, the outputs generated by our GAN becomes much more realistic over time.


You have built and trained a generative adversarial network (GAN) model, which can successfully create handwritten digits. There are obviously some samples that are not very clear, but only for 60 epochs trained on only 60,000 samples, I would say that the results are very promising.

Once you can build and train this network, you can generate much more complex images,

  • by working with a larger dataset with colored images in high definition;
  • by creating a more sophisticated discriminator and generator network;
  • by increasing the number of epochs;
  • by working on a GPU-enabled powerful hardware

In the end, you can create art pieces such as poems, paintings, text or realistic photos or videos.

Image Classification in 10 Minutes with MNIST Dataset

Image Classification in 10 Minutes with MNIST Dataset

Using Convolutional Neural Networks to Classify Handwritten Digits with TensorFlow and Keras | Supervised Deep Learning

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın – Linkedin

 MNIST Dataset and Number Classification by Katakoda

Before diving into this article, I just want to let you know that if you are into deep learning, I believe you should also check my other articles such as:

1 — Image Noise Reduction in 10 Minutes with Deep Convolutional Autoencoders where we learned to build autoencoders for image denoising;

2 — Predict Tomorrow’s Bitcoin (BTC) Price with Recurrent Neural Networks where we use an RNN to predict BTC prices and since it uses an API, the results always remain up-to-date.

When you start learning deep learning with different neural network architectures, you realize that one of the most powerful supervised deep learning techniques is the Convolutional Neural Networks (abbreviated as “CNN”). The final structure of a CNN is actually very similar to Regular Neural Networks (RegularNets) where there are neurons with weights and biases. In addition, just like in RegularNets, we use a loss function (e.g. crossentropy or softmax) and an optimizer (e.g. adam optimizer) in CNNs [CS231]. Additionally though, in CNNs, there are also Convolutional Layers, Pooling Layers, and Flatten Layers. CNNs are mainly used for image classification although you may find other application areas such as natural language processing.

Why Convolutional Neural Networks

The main structural feature of RegularNets is that all the neurons are connected to each other. For example, when we have images with 28 by 28 pixels in greyscale, we will end up having 784 (28 x 28 x 1) neurons in a layer that seems manageable. However, most images have way more pixels and they are not grey-scaled. Therefore, assuming that we have a set of color images in 4K Ultra HD, we will have 26,542,080 (4096 x 2160 x 3) different neurons connected to each other in the first layer which is not really manageable. Therefore, we can say that RegularNets are not scalable for image classification. However, especially when it comes to images, there seems to be little correlation or relation between two individual pixels unless they are close to each other. This leads to the idea of Convolutional Layers and Pooling Layers.

Layers in a CNN

We are capable of using many different layers in a convolutional neural network. However, convolution, pooling, and fully connected layers are the most important ones. Therefore, I will quickly introduce these layers before implementing them.

Convolutional Layers

The convolutional layer is the very first layer where we extract features from the images in our datasets. Due to the fact that pixels are only related to the adjacent and close pixels, convolution allows us to preserve the relationship between different parts of an image. Convolution is basically filtering the image with a smaller pixel filter to decrease the size of the image without losing the relationship between pixels. When we apply convolution to a 5×5 image by using a 3×3 filter with 1×1 stride (1-pixel shift at each step). We will end up having a 3×3 output (64% decrease in complexity).

 Figure 1: Convolution of 5 x 5 pixel image with 3 x 3 pixel filter (stride = 1 x 1 pixel)

Pooling Layer

When constructing CNNs, it is common to insert pooling layers after each convolution layer to reduce the spatial size of the representation to reduce the parameter counts which reduces the computational complexity. In addition, pooling layers also helps with the overfitting problem. Basically we select a pooling size to reduce the amount of the parameters by selecting the maximum, average, or sum values inside these pixels. Max Pooling, one of the most common pooling techniques, may be demonstrated as follows:

 Max Pooling by 2 x 2

A Set of Fully Connected Layers

A fully connected network is our RegularNet where each parameter is linked to one another to determine the true relation and effect of each parameter on the labels. Since our time-space complexity is vastly reduced thanks to convolution and pooling layers, we can construct a fully connected network in the end to classify our images. A set of fully-connected layers looks like this:

 A fully connected layer with two hidden layers

Now that you have some idea about the individual layers that we will use, I think it is time to share an overview look of a complete convolutional neural network.

 A Convolutional Neural Network Example by Mathworks

And now that you have an idea about how to build a convolutional neural network that you can build for image classification, we can get the most cliche dataset for classification: the MNIST dataset, which stands for Modified National Institute of Standards and Technology database. It is a large database of handwritten digits that is commonly used for training various image processing systems.

Downloading the MNIST Dataset

The MNIST dataset is one of the most common datasets used for image classification and accessible from many different sources. In fact, even Tensorflow and Keras allow us to import and download the MNIST dataset directly from their API. Therefore, I will start with the following two lines to import TensorFlow and MNIST dataset under the Keras API.

The MNIST database contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students [Wikipedia]. Therefore, in the second line, I have separated these two groups as train and test and also separated the labels and the images. x_train and x_test parts contain greyscale RGB codes (from 0 to 255) while y_train and y_test parts contain labels from 0 to 9 which represents which number they actually are. To visualize these numbers, we can get help from matplotlib.

When we run the code above, we will get the greyscale visualization of the RGB codes as shown below.

A visualization of the sample image at index 7777

We also need to know the shape of the dataset to channel it to the convolutional neural network. Therefore, I will use the “shape” attribute of a NumPy array with the following code:

You will get (60000, 28, 28). As you might have guessed 60000 represents the number of images in the train dataset and (28, 28) represents the size of the image: 28 x 28 pixel.

Reshaping and Normalizing the Images

To be able to use the dataset in Keras API, we need 4-dims NumPy arrays. However, as we see above, our array is 3-dims. In addition, we must normalize our data as it is always required in neural network models. We can achieve this by dividing the RGB codes to 255 (which is the maximum RGB code minus the minimum RGB code). This can be done with the following code:

Building the Convolutional Neural Network

We will build our model by using high-level Keras API which uses either TensorFlow or Theano on the backend. I would like to mention that there are several high-level TensorFlow APIs such as Layers, Keras, and Estimators which helps us create neural networks with high-level knowledge. However, this may lead to confusion since they all vary in their implementation structure. Therefore, if you see completely different codes for the same neural network although they all use TensorFlow, this is why. I will use the most straightforward API which is Keras. Therefore, I will import the Sequential Model from Keras and add Conv2D, MaxPooling, Flatten, Dropout, and Dense layers. I have already talked about Conv2D, Maxpooling, and Dense layers. In addition, Dropout layers fight with the overfitting by disregarding some of the neurons while training while Flatten layers flatten 2D arrays to 1D arrays before building the fully connected layers.

We may experiment with any number for the first Dense layer; however, the final Dense layer must have 10 neurons since we have 10 number classes (0, 1, 2, …, 9). You may always experiment with kernel size, pool size, activation functions, dropout rate, and a number of neurons in the first Dense layer to get a better result.

Compiling and Fitting the Model

With the above code, we created a non-optimized empty CNN. Now it is time to set an optimizer with a given loss function that uses a metric. Then, we can fit the model by using our train data. We will use the following code for these tasks:

You can experiment with the optimizer, loss function, metrics, and epochs. However, I can say that adam optimizer is usually out-performs the other optimizers. I am not sure if you can actually change the loss function for multi-class classification. Feel free to experiment and comment below. The epoch number might seem a bit small. However, you will reach to 98–99% test accuracy. Since the MNIST dataset does not require heavy computing power, you may easily experiment with the epoch number as well.

Evaluating the Model

Finally, you may evaluate the trained model with x_test and y_test using one line of code:

The results are pretty good for 10 epochs and for such a simple model.


The evaluation shows 98.5% accuracy on the test set!

We achieved 98.5% accuracy with such a basic model. To be frank, in many image classification cases (e.g. for autonomous cars), we cannot even tolerate 0.1% error since, as an analogy, it will cause 1 accident in 1000 cases. However, for our first model, I would say the result is still pretty good. We can also make individual predictions with the following code:

Our model will classify the image as a ‘9’ and here is the visual of the image:









Our model correctly classifies this image as a 9 (Nine)

Although it is not really a good handwriting of the number 9, our model was able to classify it as 9.


You have successfully built a convolutional neural network to classify handwritten digits with Tensorflow’s Keras API. You have achieved accuracy of over 98% and now you can even save this model & create a digit-classifier app! If you are curious about saving your model, I would like to direct you to the Keras Documentation. After all, to be able to efficiently use an API, one must learn how to read and use the documentation.

Fast Neural Style Transfer in 5 Minutes with TensorFlow Hub & Magenta

Fast Neural Style Transfer in 5 Minutes with TensorFlow Hub & Magenta

Transferring van Gogh’s Unique Style to Photos with Magenta’s Arbitrary Image Stylization Network and Deep Learning

Before we start the tutorial: If you are reading this article, we probably share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. A Neural Style Transfer Example made with Arbitrary Image Stylization Network

I am sure you have come across to deep learning projects on transferring styles of famous painters to new photos. Well, I have been thinking about working on a similar project, but I realized that you can make neural style transfer within minutes, like the one in Figure 1. I will show you how in a second. But, let’s cover some basics first:

Neural Style Transfer (NST)

Neural style transfer is a method to blend two images and create a new image from a content image by copying the style of another image, called style image. This newly created image is often referred to as the stylized image.

History of NST

Image stylization is a two-decade-old problem in the field of non-photorealistic rendering. Non-photorealistic rendering is the opposite of photorealism, which is the study of reproducing an image as realistically as possible. The output of a neural style transfer model is an image that looks similar to the content image but in painting form in the style of the style image.

 Figure 2. Original Work of Leon Gatys on CV-Foundation

Neural style transfer (NST) was first published in the paper “A Neural Algorithm of Artistic Style” by Gatys et al., originally released in 2015. The novelty of the NST method was the use of deep learning to separate the representation of the content of an image from its style of depiction. To achieve this, Gatys et al. used VGG-19 architecture, which was pre-trained on the ImageNet dataset. Even though we can build a custom model following the same methodology, for this tutorial, we will benefit from the models provided in TensorFlow Hub.

Image Analogy

Before the introduction of NST, the most prominent solution to image stylization was the image analogy method. Image Analogy is a method of creating a non-photorealistic rendering filter automatically from training data. In this process, the transformation between photos (A) and non-photorealistic copies (A’) are learned. After this learning process, the model can produce a non-photorealistic copy (B’) from another photo (B). However, NST methods usually outperform image analogy due to the difficulty of finding training data for the image analogy models. Therefore, we can talk about the superiority of NST over image analogy in real-world applications, and that’s why we will focus on the application of an NST model.

 Figure 3. Photo by Jonathan Cosens on Unsplash

Is it Art?

Well, once we build the model, you will see that creating non-photorealistic images with Neural Style Transfer is a very easy task. You can create a lot of samples by blending beautiful photos with the paintings of talented artists. There has been a discussion about whether these outputs are regarded as art because of the little work the creator needs to add to the end product. Feel free to build the model, generate your samples, and share your thoughts in the comments section.

Now that you know the basics of Neural Style Transfer, we can move on to TensorFlow Hub, the repository that we use for our NST work.

TensorFlow Hub

TensorFlow Hub is a collection of trained machine learning models that you can use with ease. TensorFlow’s official description for the Hub is as follows:

TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere. Reuse trained models like BERT and Faster R-CNN with just a few lines of code.

Apart from pre-trained models such as BERT or Faster R-CNN, there are a good amount of pre-trained models. The one we will use is Magenta’s Arbitrary Image Stylization network. Let’s take a look at what Magenta is.

Magenta and Arbitrary Image Stylization

What is Magenta?

 Figure 4. Magenta Logo on Magenta

Magenta is an open-source research project, backed by Google, which aims to provide machine learning solutions to musicians and artists. Magenta has support in both Python and Javascript. Using Magenta, you can create songs, paintings, sounds, and more. For this tutorial, we will use a network trained and maintained by the Magenta team for Arbitrary Image Stylization.

Arbitrary Image Stylization

After observing that the original work for NST proposes a slow optimization for style transfer, the Magenta team developed a fast artistic style transfer method, which can work in real-time. Even though the customizability of the model is limited, it is satisfactory enough to perform a non-photorealistic rendering work with NST. Arbitrary Image Stylization under TensorFlow Hub is a module that can perform fast artistic style transfer that may work on arbitrary painting styles.

By now, you already know what Neural Style Transfer is. You also know that we will benefit from the Arbitrary Image Stylization module developed by the Magenta team, which is maintained in TensorFlow Hub.

Now it is time to code!

Get the Image Paths

 Figure 5. Photo by Paul Hanaoka on Unsplash

We will start by selecting two image files. I will directly load these image files from URLs. You are free to choose any photo you want. Just change the filename and URL in the code below. The content image I selected for this tutorial is the photo of a cat staring at the camera, as you can see in Figure 5.

 Figure 6. Bedroom in Arles by Vincent van Gogh

I would like to transfer the style of van Gogh. So, I chose one of his famous paintings: Bedroom in Arles, which he painted in 1889 while staying in Arles, Bouches-du-Rhône, France. Again, you are free to choose any painting of any artist you want. You can even use your own drawings.

The below code sets the path to get the image files shown in Figure 5 and Figure 6.


Custom Function for Image Scaling

  One thing I noticed that, even though we are very limited with model customization, by rescaling the images, we can change the style transferred to the photo. In fact, I found out that the smaller the images, the better the model transfers the style. Just play with the max_dim parameter if you would like to experiment. Just note that a larger max_dim means, it will take slightly longer to generate the stylized image.  


We will call the img_scaler function below, inside the load_img function.

Custom Function for Preprocessing the Image

Now that we set our image paths to load and img_scaler function to scale the loaded image, we can actually load our image files with the custom function below.

Every line in the Gist below is explained with comments. Please read carefully.


    Now our custom image loading function, load_img, is also created. All we have to do is to call it.  

Load the Content and Style Images

  For content image and style image, we need to call the load_img function once and the result will be a 4-dimensional Tensor, which is what will be required by our model below. The below lines is for this operation.  


Now that we successfully loaded our images, we can plot them with matplotlib, as shown below:


and here is the output:


Figure 7. Content Image on the Left (Photo by Paul Hanaoka on Unsplash) | Style Image on the Right (Bedroom in Arles by Vincent van Gogh)

You are not gonna believe this, but the difficult part is over. Now we can create our network and pass these image Tensors as arguments for NST operation.

Load the Arbitrary Image Stylization Network

We need to import the tensorflow_hub library so that we can use the modules containing the pre-trained models. After importing tensorflow_hub, we can use the load function to load the Arbitrary Image Stylization module as shown below. Finally, as shown in the documentation, we can pass the content and style images as arguments in tf.constant object format. The module returns our stylized image in an array format.

All we have to do is to use this array and plot it with matplotlib. The below lines create a plot free from all the axis and large enough for you to review the image.

… And here is our stylized image:


Figure 8. Paul Hanaoka’s Photo after Neural Style Transfer

Figure 9 summarizes what we have done in this tutorial:

  Figure 9. A Neural Style Transfer Example made with Arbitrary Image Stylization Network


As you can see, with a minimal amount of code (we did not even train a model), we did a pretty good Neural Style Transfer on a random image we took from Unsplash using a painting from Vincent van Gogh. Try different photos and paintings to discover the capabilities of the Arbitrary Image Stylization network. Also, play around with max_dim size, you will see that the style transfer changes to a great extent.

3 Ways to Build Neural Networks in TensorFlow with the Keras API

3 Ways to Build Neural Networks in TensorFlow with the Keras API

Building Deep Learning models with Keras in TensorFlow 2.x is possible with the Sequential API, the Functional API, and Model Subclassing

  Figure 1. The Sequential API, The Functional API, Model Subclassing Methods Side-by-Side

If you are going around, checking out different tutorials, doing Google searches, spending a lot of time on Stack Overflow about TensorFlow, you might have realized that there are a ton of different ways to build neural network models. This has been an issue for TensorFlow for a long time. It is almost like TensorFlow is trying to find its path towards a bright deep learning environment. Well if you think about it, this is exactly what is happening and this is pretty normal for a library in its version 2.x. Since TensorFlow is so far the most mature deep learning library on the market, this is basically the best you can get.

Keras-TensorFlow Relationship

A Little Background

TensorFlow’s evolution into a deep learning platform did not happen overnight. Initially, TensorFlow marketed itself as a symbolic math library for dataflow programming across a range of tasks. Therefore, the value proposition that TensorFlow initially offered was not a pure machine learning library. The goal was to create an efficient math library so that custom machine learning algorithms that are built on top of this efficient structure would train in a short amount of time with high accuracy.

However, building models from scratch with low-level APIs repetitively was not very ideal. So, François Chollet, a Google engineer, developed Keras, as a separate high-level deep learning library. Although Keras has been capable of running on top of different libraries such as TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML, TensorFlow was and still is the most common library that people use Keras with.

Current Situation

After seeing the messiness around the model-building process, the TensorFlow team announced that Keras is going to be the central high-level API used to build and train models in TensorFlow 2.0. The alternative high-level API, the Estimator API, has started to lose its already-diminishing popularity after this announcement.

The Estimator API and The Keras API

 Figure 2. The Positions of the Keras API and the Estimator API within TensorFlow Diagram

Now, let’s go back to the problem: There are a lot of different methods that people build their models using TensorFlow. The main reason for this problem is TensorFlow’s failure to adopt a single Model API.

In version 1.x, for the production-level projects, the go-to model-building API was the Estimator API. But, with the recent changes, the Keras API has almost caught up with the Estimator API. Initially, the Estimator API was more scaleable, allowed multi-distribution, and had a convenient cross-platform functionality. Yet, most of these advantages of the Estimator API are now eliminated, and therefore, soon the Keras API will probably become the single standard API to build TensorFlow models.

So, in this post, we will only focus on the Keras API methods to build models in TensorFlow and there are three of them:

  • Using the Sequential API
  • Using the Functional API
  • Model Subclassing

I will make their comparison directly with their corresponding model building codes so that you can actually test them yourself. Let’s dive into coding.

Initial Code to Make the Comparison

To test these three Keras methods, we need to select a deep learning problem. Image Classification with MNIST is a very straightforward task. What we are trying to achieve is training a model to recognize handwritten digits, using the famous MNIST dataset.

Figure 3. Our Dummy Task for Benchmark Analysis: MNIST Image Classification

MNIST dataset, which stands for Modified National Institute of Standards and Technology database, is a large database of handwritten digits that is commonly used for training various image processing systems. The MNIST database contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students. You can find my separate tutorial about Image Classification if you would like to follow the full tutorial.

With the code below, we will import all layers and models so that it would not bother us in the upcoming parts. We also download the MNIST dataset and preprocess it so that it can be used in all models we will build with these three different methods. Just run the code below:

Gist 1. Necessary Imports, MNIST Loading, Preprocessing

Now, this part is out of the way, let’s focus on the three methods to build TensorFlow models.

3 Ways to Build a Keras Model

There are three methods to build a Keras model in TensorFlow:

  • The Sequential API: The Sequential API is the best method when you are trying to build a simple model with a single input, output, and layer branch. It is an excellent option for newcomers who would like to learn fast.
  • The Functional API: The Functional API is the most popular method to build Keras models. It can do everything that the Sequential API can do. Also, it allows multiple inputs, multiple outputs, branching, and layer sharing. It is a clean and easy-to-use method, and it still allows a good level of customization flexibility.
  • Model Subclassing: Model subclassing is for advanced level developers who need full control over their model, layer, and training process. You need to create a custom class defining the model, and you probably won’t need it for daily tasks. But, if you are a researcher with experimental needs, then model subclassing might be the best option for you since it would give you all the flexibility you need.
 Figure 4. Converting 2-Dim Image Array into 1 Dim Array with Flatten Layer

Let’s see how these methods are implemented. We will build a basic feedforward neural network with a single Flatten layer to convert 2-dimensional image arrays to 1-dimensional arrays and two Dense layers.

The Sequential API

In the Sequential API, we need to create a Sequential object from tf.keras.Models module. We can simply pass all the layers as a single argument in the list format as shown below. As you can see that it is very simple.

Gist 2. A Feedforward Neural Network Built with Keras Sequential API

The Functional API

With Functional API, we need to define our input separately. Then, we need to create an output object by also creating all the layers which are tied to one another and to the output. Finally, we create a Model object which would accept inputs and outputs as arguments. The code is still very clean, but we have much more flexibility in the Functional API.

Gist 3. A Feedforward Neural Network Built with the Keras Functional API

Model Subclassing

Let’s move on to model subclassing. In model subclassing, we start with creating a class extending tf.keras.Model class. There are two crucial functions in Model subclassing:

  • __init__ function acts as a constructor. Thanks to __init__, we can initialize the attributes (e.g., layers) of our model. super is used calls the parent constructor (the constructor in tf.keras.Model) and self is used to refer to instance attributes (e.g., layers).
  • call function is where the operations are defined after the layers are defined in the __init__ function.

To build the same model with Model Subclassing, we need to write much more code, as shown below:

Gist 4. A Feedforward Neural Network Built with Keras Model Subclassing

The End Code

Now that you can create the same model with three different methods, you can choose any single one of them, build the model, and run the code below.

Gist 5. Model Configuration, Training, and Evaluation

The lines above take care of model configuration, training, and evaluation. When we compare the performances of these three methods, we see that they are pretty close but slightly different.

Table 1. Performances of Different Keras Methods: The Sequential API, The Functional API, and Model Subclassing
 Figure 5. Photo by Jonathan Chng on Unsplash

Our more complex Model Subclassing method outperforms the Sequential API and the Functional API. This shows that there are slight differences in the design of these methods in the low-end as well. However, these differences are negligible.

Final Evaluations

By now, you have an idea about the similarities and differences between these three Keras methods. But, let’s summarize them all in a table:

Table 2. A Comparison of Different Methods to Build Keras Model in TensorFlow: The Sequential API, The Functional API, and Model Subclassing

In summary, if you are just starting out, stick with the Sequential API. As you dive into more complex models, try out the Functional API. If you are doing a Ph.D. or just enjoy conducting independent research, try out Model Subclassing. If you are a professional, stick with the Functional API. It will probably satisfy your needs.

Let me know your favorite method to build Keras models in the comments.