Subscribe to Get All the Blog Posts and Colab Notebooks 

Eager Execution vs. Graph Execution in TensorFlow: Which is Better?

Eager Execution vs. Graph Execution in TensorFlow: Which is Better?

Comparing Eager Execution and Graph Execution using Code Examples, Understanding When to Use Each and why TensorFlow switched to Eager Execution | Deep Learning with TensorFlow 2.x

 Figure 1. Eager Execution vs. Graph Execution (Figure by Author)

This is Part 4 of the Deep Learning with TensorFlow 2.x Series, and we will compare two execution options available in TensorFlow:

Eager Execution vs. Graph Execution

You may not have noticed that you can actually choose between one of these two. Well, the reason is that TensorFlow sets the eager execution as the default option and does not bother you unless you are looking for trouble😀. But, this was not the case in TensorFlow 1.x versions. Let’s see what eager execution is and why TensorFlow made a major shift with TensorFlow 2.0 from graph execution.

 
Figure 2. An Analogy to Graph Execution vs. Eager Execution (Photo by James Pond on Unsplash | Photo by TVBEATS on Unsplash)

Eager Execution

Eager execution is a powerful execution environment that evaluates operations immediately. It does not build graphs, and the operations return actual values instead of computational graphs to run later. With Eager execution, TensorFlow calculates the values of tensors as they occur in your code.

Eager execution simplifies the model building experience in TensorFlow, and you can see the result of a TensorFlow operation instantly. Since the eager execution is intuitive and easy to test, it is an excellent option for beginners. Not only is debugging easier with eager execution, but it also reduces the need for repetitive boilerplate codes. Eager execution is also a flexible option for research and experimentation. It provides:

  • An intuitive interface with natural Python code and data structures;
  • Easier debugging with calling operations directly to inspect and test models;
  • Natural control flow with Python, instead of graph control flow; and
  • Support for GPU & TPU acceleration.

In eager execution, TensorFlow operations are executed by the native Python environment with one operation after another. This is what makes eager execution (i) easy-to-debug, (ii) intuitive, (iii) easy-to-prototype, and (iv) beginner-friendly. For these reasons, the TensorFlow team adopted eager execution as the default option with TensorFlow 2.0. But, more on that in the next sections…

Let’s take a look at the Graph Execution.

Graph Execution

We covered how useful and beneficial eager execution is in the previous section, but there is a catch:

Eager execution is slower than graph execution!

 Figure 3. The Graph Visualization of the Model Example Below in Tensorboard (Figure by Author)

Since eager execution runs all operations one-by-one in Python, it cannot take advantage of potential acceleration opportunities. Graph execution extracts tensor computations from Python and builds an efficient graph before evaluation. Graphs, or tf.Graph objects, are special data structures with tf.Operation and tf.Tensor objects. Whiletf.Operation objects represent computational units,tf.Tensor objects represent data units. Graphs can be saved, run, and restored without original Python code, which provides extra flexibility for cross-platform applications. With a graph, you can take advantage of your model in mobile, embedded, and backend environment where Python is unavailable. In a later stage of this series, we will see that trained models are saved as graphs no matter which execution option you choose.

Graphs are easy-to-optimize. They allow compiler level transformations such as statistical inference of tensor values with constant folding, distribute sub-parts of operations between threads and devices (an advanced level distribution), and simplify arithmetic operations. Grappler performs these whole optimization operations. In graph execution, evaluation of all the operations happens only after we’ve called our program entirely. So, in summary, graph execution is:

  • Very Fast;
  • Very Flexible;
  • Runs in parallel, even in sub-operation level; and
  • Very efficient, on multiple devices
  • with GPU & TPU acceleration capability.

Therefore, despite being difficult-to-learn, difficult-to-test, and non-intuitive, graph execution is ideal for large model training. For small model training, beginners, and average developers, eager execution is better suited.

Well, considering that eager execution is easy-to-build&test, and graph execution is efficient and fast, you would want to build with eager execution and run with graph execution, right? Well, we will get to that…

Looking for the best of two worlds? A fast but easy-to-build option? Keep reading 🙂

Before we dive into the code examples, let’s discuss why TensorFlow switched from graph execution to eager execution in TensorFlow 2.0.

Why TensorFlow adopted Eager Execution?

Before version 2.0, TensorFlow prioritized graph execution because it was fast, efficient, and flexible. The difficulty of implementation was just a trade-off for the seasoned programmers. On the other hand, PyTorch adopted a different approach and prioritized dynamic computation graphs, which is a similar concept to eager execution. Although dynamic computation graphs are not as efficient as TensorFlow Graph execution, they provided an easy and intuitive interface for the new wave of researchers and AI programmers. This difference in the default execution strategy made PyTorch more attractive for the newcomers. Soon enough, PyTorch, although a latecomer, started to catch up with TensorFlow.

Figure 4. TensorFlow vs. PyTorch Google Search Results by Google Trends (Figure by Author)

After seeing PyTorch’s increasing popularity, the TensorFlow team soon realized that they have to prioritize eager execution. Therefore, they adopted eager execution as the default execution method, and graph execution is optional. This is just like, PyTorch sets dynamic computation graphs as the default execution method, and you can opt to use static computation graphs for efficiency.

Since, now, both TensorFlow and PyTorch adopted the beginner-friendly execution methods, PyTorch lost its competitive advantage over the beginners. Currently, due to its maturity, TensorFlow has the upper hand. However, there is no doubt that PyTorch is also a good alternative to build and train deep learning models. The choice is yours…

Code with Eager, Executive with Graph

In this section, we will compare the eager execution with the graph execution using basic code examples. For the sake of simplicity, we will deliberately avoid building complex models. But, in the upcoming parts of this series, we can also compare these execution methods using more complex models.

We have mentioned that TensorFlow prioritizes eager execution. But that’s not all. Now, you can actually build models just like eager execution and then run it with graph execution. TensorFlow 1.x requires users to create graphs manually. These graphs would then manually be compiled by passing a set of output tensors and input tensors to a session.run() call. But, with TensorFlow 2.0, graph building and session calls are reduced to an implementation detail. This simplification is achieved by replacing session.run() with tf.function() decorators. In TensorFlow 2.0, you can decorate a Python function using tf.function() to run it as a single graph object. With this new method, you can easily build models and gain all the graph execution benefits.

Code Examples

This post will test eager and graph execution with a few basic examples and a full dummy model. Please note that since this is an introductory post, we will not dive deep into a full benchmark analysis for now.

Basic Examples

We will start with two initial imports:

 
timeit is a Python module which provides a simple way to time small bits of Python and it will be useful to compare the performances of eager execution and graph execution.

To run a code with eager execution, we don’t have to do anything special; we create a function, pass a tf.Tensor object, and run the code. In the code below, we create a function called eager_function to calculate the square of Tensor values. Then, we create a tf.Tensor object and finally call the function we created. Our code is executed with eager execution:

Output: tf.Tensor([ 1.  4.  9. 16. 25.], shape=(5,), dtype=float32)

Let’s first see how we can run the same function with graph execution.

Output: Tensor("pow:0", shape=(5,), dtype=float32)

By wrapping our eager_function with tf.function() function, we are capable of running our code with graph execution. We can compare the execution times of these two methods with timeit as shown below:

Output:
Eager time: 0.0008830739998302306
Graph time: 0.0012101310003345134

As you can see, graph execution took more time. But why? Well, for simple operations, graph execution does not perform well because it has to spend the initial computing power to build a graph. We see the power of graph execution in complex calculations. If I run the code 100 times (by changing the number parameter), the results change dramatically (mainly due to the print statement in this example):

Output:
Eager time: 0.06957343100020807 
Graph time: 0.02631650599960267

Full Model Test

Now that you covered the basic code examples, let’s build a dummy neural network to compare the performances of eager and graph executions. We will:

1 — Make TensorFlow imports to use the required modules;

2 — Build a basic feedforward neural network;

3 — Create a random Input object;

4 — Run the model with eager execution;

5 — Wrap the model with tf.function() to run it with graph execution.

If you are new to TensorFlow, don’t worry about how we are building the model. We will cover this in detail in the upcoming parts of this Series.

The following lines do all of these operations:

Output:
Eager time: 27.14511264399971
Graph time: 17.878579870000067

As you can see, our graph execution outperformed eager execution with a margin of around 40%. In more complex model training operations, this margin is much larger.

Final Notes

In this post, we compared eager execution with graph execution. While eager execution is easy-to-use and intuitive, graph execution is faster, more flexible, and robust. Therefore, it is no brainer to use the default option, eager execution, for beginners. However, if you want to take advantage of the flexibility and speed and are a seasoned programmer, then graph execution is for you. On the other hand, thanks to the latest improvements in TensorFlow, using graph execution is much simpler. Therefore, you can even push your limits to try out graph execution. But, make sure you know that debugging is also more difficult in graph execution.

The code examples above showed us that it is easy to apply graph execution for simple examples. For more complex models, there is some added workload that comes with graph execution.

Note that when you wrap your model with tf.function(), you cannot use several model functions like model.compile() and model.fit() because they already try to build a graph automatically. But we will cover those examples in a different and more advanced level post of this series.

Congratulations

We have successfully compared Eager Execution with Graph Execution.

Give yourself a pat on the back!

This should give you a lot of confidence since you are now much more informed about Eager Execution, Graph Execution, and the pros-and-cons of using these execution methods.

Beginner’s Guide to TensorFlow 2.x for Deep Learning Applications

Beginner’s Guide to TensorFlow 2.x for Deep Learning Applications

Understanding the TensorFlow Platform and What It Has to Offer to a Machine Learning Expert

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

If you have recently started learning machine learning, you might have already realized the power of artificial neural networks and deep learning compared to traditional machine learning. Compared to other models, artificial neural networks require an extra set of technical skills and conceptual knowledge.

Figure 1. Comparison of Deep Learning and the Traditional Machine Learning Approaches (Figure by Author)

The most important of these technical skills is the ability to use a deep learning framework. A good deep learning framework speeds up the development process and provides efficient data processing, visualization, and deployment tools. When it comes to choosing a deep learning framework, as of 2020, you only have two viable options :

Figure 2. PyTorch by Facebook | Figure 3. TensorFlow by Google

Well, we can compare TensorFlow an PyTorch for days, but this post is not about framework benchmarking. This post is about what you can achieve with TensorFlow.

What is TensorFlow?

TensorFlow is an end-to-end framework and platform designed to build and train machine learning models, especially deep learning models. It was developed by Google and released as an open-source platform in 2015.

The two programming languages with stable and official TensorFlow APIs are Python and C. Besides, C++, Java, JavaScript, Go, and Swift are other programming languages where developers may find limited-to-extensive TensorFlow compatibility. Most developers end up using Python since Python has compelling data libraries such as NumPy, pandas, and Matplotlib.

Why Should We Use TensorFlow?

There are several advantages of using a powerful deep learning framework, and the non-exhaustive list below points out to some of them:

  • Reduced time to build and train models;
  • Useful data processing tools;
  • Compatibility with other popular data libraries such as NumPy, matplotlib, and pandas;
  • A rich catalog of pre-trained models with TF Hub;
  • Tools to deploy trained models across different devices such as iOS, Android, Windows, macOS, and Web;
  • Great community support;
  • A desirable skill by tech companies.

A Brief History of TensorFlow

Currently, we are using the second major version of TensorFlow: TensorFlow 2.x. It took almost nine years to achieve this level of maturity. However, I can say that we are still in the beginning phase of the ultimate deep learning platform because the current trends indicate that deep learning processes will be much more streamlined in the future. Some claims that API based practices will be the standard way of using deep learning and artificial neural networks. But, let’s not get ahead of ourselves and take a look at the history of the TensorFlow platform:

The TensorFlow team deliberately uses the term platform since its deep learning library is just a part of the whole technology.

2011–2016: The Infancy and Initial Developments

I — In 2011, Google Brain developed a proprietary machine learning library for internal Google use, called DistBelief. DistBelief was primarily used for Google’s core businesses, such as Google Search and Google Ads.

I — In 2015, to speed up the advancements in artificial intelligence, Google decided to release TensorFlow as an open-source library. Tensorflow Beta was released.

I — In 2016, Google announced Tensor Processing Units (TPUs). Tensors are the building bricks of TensorFlow applications, and as the name suggests, TPUs are specially designed ASICs for deep learning operations.

Figure 4. Google’s Tensor Processing Units on Wikipedia
ASIC stands for application-specific integrated circuit. ASICs are customized for a particular use such as deep learning or cryptocurrency mining, rather than general-purpose use.

2017–2019: First Major Version and the Advancements in Cross-Platform Technologies

The Developments of 2017:

I — In February, TensorFlow 1.0 was released, setting a milestone. Before February 2017, TensorFlow was still in 0.x.x versions, the initial development process. In general, version 1.0.0 defines the public API with a stable production capability.Therefore, February 2017 was indeed a big milestone for TensorFlow.

I — Seeing the rapid advancements in mobile technologies, the TensorFlow team announced TensorFlow Lite, a library for machine learning development in mobile devices, in May 2017.

I — Finally, in December 2017, Google introduced KubeFlow. Kubeflow is an open-source platform that allows operation and deployment of TensorFlow models on Kubernetes. In other words, “the Machine Learning Toolkit for Kubernetes

The Developments of 2018:

I — In March, Google announced TensorFlow.js 1.0, which enable developers to implement and serve machine learning models using JavaScript.

I — In July 2018, Google announced the Edge TPU. Edge TPU is Google’s purpose-built ASIC designed to run TensorFlow Lite machine learning (ML) models on smartphones.

The Developments of 2019:

I — In January 2019, the TensorFlow team announced the official release date for TensorFlow 2.0.0: September 2019.

I In May 2019, TensorFlow Graphics was announced to tackle issues related to graphic rendering and 3D modeling.

2019–2020: From September 2019 Onwards: TensorFlow 2.0+

I — In September 2019, the TensorFlow team released TensorFlow 2.0, the current major version, which streamlined many of the inconveniencies of building neural networks.

I — With version 2.0, TensorFlow finally embraced Keras as the official main High-level API to build, train, evaluate neural networks.

I — TensorFlow 2.0 streamlined the data loading and processing tools and provided newly added features.

I — Eager Execution was made the default option, replacing Graph execution. This strategy was adopted because PyTorch has attracted many researchers with eager execution.

With Eager execution, TensorFlow calculates the values of tensors as they occur in your code.

As you can see, TensorFlow is much more than a deep learning library for Python. It is an end-to-end platform that you can process your data, build & train machine learning models, serve the trained models across different devices with different programming languages. Below you can see the current diagram of the TensorFlow platform:

Figure 5. The Current Diagram of the TensorFlow Platform (Figure by Author)

How Popular is TensorFlow?

As of 2020, the real competition is taking place between TensorFlow and PyTorch. Due to its maturity, extensive support in multiple programming languages, popularity in the job market, extensive community support, and supporting technologies, TensorFlow currently has the upper hand.

Figure 6. Deep Learning Framework Power Score 2018 (based on Jeff Hale’s Work) (Figure by Author)

In 2018, Jeff Hale developed a power ranking for the deep learning frameworks in the market. He weighs the mentions found in the online job listings, the relevant articles and the blog posts, and on GitHub. Since 2018, PyTorch has achieved an upward momentum, and I believe it must have a higher score by now. But, I believe TensorFlow still has superiority over PyTorch due to its maturity.

I am Convinced! What’s Next?

You have come to this point, and I hope you already developed an understanding of what TensorFlow is and how you can benefit from it. If you are convinced to learn TensorFlow, in the next posts, I will explain the topics below with actual code examples:

  • The very basics of TensorFlow: Tensors, Variables, and Eager Execution; and
  • Five major capabilities of TensorFlow 2.x that cover the entire deep learning pipeline operations.

The second post is already published:

And the third one:

You can follow my account and subscribe to my newsletter:

Final Notes

Over the years, TensorFlow turned into a big platform covering every need of machine learning experts from head to toe. There is still a long way to go, but we are far ahead compared to where we were ten years ago. Join the rise of this new technology and learn to implement your own deep learning models with TensorFlow’s help. Don’t miss out…

Finally, if you are interested in applied deep learning tutorials, check out some of my articles:

3 Pre-Trained Model Series to Use for NLP with Transfer Learning

3 Pre-Trained Model Series to Use for NLP with Transfer Learning

Using State-of-the-Art Pre-trained Neural Network Models (OpenAI’s GPTs, BERTs, ELMos) to Tackle Natural Language Processing Problems with Transfer Learning

 

Figure 1. Photo by Safar Safarov on Unsplash

Before we start, if you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

If you have been trying to build machine learning models with high accuracy; but never tried Transfer Learning, this article will change your life. At least, it did mine!

 

Figure 2. A Depiction of Transfer Learning Logic (Figure by Author)

Note that this post is also a follow-up post of a post on Transfer Learning for Computer vision tasks. It has started to gain popularity, and now I wanted to share the NLP version of that with you. But, just in case, check it out:

Most of us have already tried several machine learning tutorials to grasp the basics of neural networks. These tutorials helped us understand the basics of artificial neural networks such as Recurrent Neural Networks, Convolutional Neural Networks, GANs, and Autoencoders. But, their main functionality was to prepare you for real-world implementations.

Now, if you are planning to build an AI system that utilizes deep learning, you have to either

  • have deep pockets for training and excellent AI researchers at your disposal*; or
  • benefit from transfer learning.
* According to BD Tech Talks, the training cost of OpenAI's GPT3 exceeded US $4.6 million dollars.

What is Transfer Learning?

Transfer learning is a subfield of machine learning and artificial intelligence, which aims to apply the knowledge gained from one task (source task) to a different but similar task (target task). In other words:

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

For example, the knowledge gained while learning to classify Wikipedia texts can help tackle legal text classification problems. Another example would be using the knowledge gained while learning to classify cars to recognize the birds in the sky. As you can see, there is a relation between these examples. We are not using a text classification model on bird detection.

In summary, transfer learning saves us from reinventing the wheel, meaning we don’t waste time doing the things that have already been done by a major company. Thanks to transfer learning, we can build AI applications in a very short amount of time.

History of Transfer Learning

The history of Transfer Learning dates back to 1993. With her paper, Discriminability-Based Transfer between Neural Networks, Lorien Pratt opened the pandora’s box and introduced the world to the potential of transfer learning. In July 1997, the journal Machine Learning published a special issue for transfer learning papers. As the field advanced, adjacent topics such as multi-task learning were also included under the field of transfer learning. Learning to Learn is one of the pioneer books in this field. Today, transfer learning is a powerful source for tech entrepreneurs to build new AI solutions and researchers to push machine learning frontiers.

To show the power of transfer learning, we can quote from Andrew Ng:

Transfer learning will be the next driver of machine learning’s commercial success after supervised learning.

 

Figure 3. A Depiction of Commercial Potential of Learning Approaches (Figure by Author)

There are three requirements to achieve transfer learning:

  • Development of an Open Source Pre-trained Model by a Third Party
  • Repurposing the Model
  • Fine Tuning for the Problem

Development of an Open Source Pre-trained Model

A pre-trained model is a model created and trained by someone else to solve a similar problem. In practice, someone is almost always a tech giant or a group of star researchers. They usually choose a very large dataset as their base datasets, such as ImageNet or the Wikipedia Corpus. Then, they create a large neural network (e.g., VGG19 has 143,667,240 parameters) to solve a particular problem (e.g., this problem is image classification for VGG19). Of course, this pre-trained model must be made public so that we can take it and repurpose it.

Repurposing the Model

After getting our hands on these pre-trained models, we repurpose the learned knowledge, which includes the layers, features, weights, and biases. There are several ways to load a pre-trained model into our environment. In the end, it is just a file/folder which contains the relevant information. Deep learning libraries already host many of these pre-trained models, which makes them more accessible and convenient:

You can use one of the sources above to load a trained model. It will usually come with all the layers and weights, and you can edit the network as you wish. Additionally, some research labs maintain their own repos, as you will see for ELMo later in this post.

Fine-Tuning for the Problem

Well, while the current model may work for our problem. It is often better to fine-tune the pre-trained model for two reasons:

  • So that we can achieve even higher accuracy;
  • Our fine-tuned model can generate the output in the correct format.

Generally speaking, in a neural network, while the bottom and mid-level layers usually represent general features, the top layers represent the problem-specific features. Since our new problem is different than the original problem, we tend to drop the top layers. By adding layers specific to our problems, we can achieve higher accuracy.

After dropping the top layers, we need to place our own layers so that we can get the output we want. For example, a model trained with English Wikipedia such as BERT can be customized by adding additional layers and further trained with the IMDB Reviews dataset to predict movie reviews sentiments.

After adding our custom layers to the pre-trained model, we can configure it with special loss functions and optimizers and fine-tune it with extra training.

For a quick Transfer Learning tutorial, you may visit the post below:

3 Popular Pre-Trained Model Series for Natural Language Processing

Here are the three pre-trained network series you can use for natural language processing tasks ranging from text classification, sentiment analysis, text generation, word embedding, machine translation, and so on:

 

Figure 4. Overall Network Comparison for BERT, OpenAI GPT, ELMo (Figure from the BERT paper)

While BERT and OpenAI GPT are based on transformers network, ELMo takes advantage of bidirectional LSTM network.

Ok, let’s dive into them one-by-one.

Open AI GPT Series (GPT-1, GPT-2, and GPT-3)

There are three generations of GPT models created by OpenAI. GPT, which stands for Generative Pre-trained Transformers, is an autoregressive language model that uses deep learning to produce human-like text. Currently, the most advanced GPT available is GPT-3; and the most complex version of GPT-3 has over 175 billion parameters. Before the release of GPT-3 in May 2020, the most complex pre-trained NLP model was Microsoft’s Turing NLG.

GPT-3 can create very realistic text, which is sometimes difficult to distinguish from the human-generated text. That’s why the engineers warned of the GPT-3’s potential dangers and called for risk mitigation research. Here is a video about 14 cool apps built on GPT-3:

As opposed to most other pre-trained NLP models, OpenAI chose not to share the GPT-3’s source code. Instead, they allowed invitation-based API access, and you can apply for a license by visiting their website. Check it out:

On September 22, 2020, Microsoft announced it had licensed “exclusive” use of GPT-3. Therefore, while others have to rely on the API to receive output, Microsoft has control of the source code. Here is brief info about its size and performance:

  • Year Published: 2020 (GPT-3)
  • Size: Unknown
  • Q&A: F1-Scores of 81.5 in zero-shot, 84.0 in one-shot, 85.0 in few-shot learning
  • TriviaAQ: Accuracy of 64.3%
  • LAMBADA: Accuracy of 76.2%
  • Number of Parameters: 175,000,000,000

BERTs (BERT, RoBERTa (by Facebook), DistilBERT, and XLNet)

BERT stands for Bidirectional Encoder Representations from Transformers, and it is a state-of-the-art machine learning model used for NLP tasks. Jacob Devlin and his colleagues developed BERT at Google in 2018. Devlin and his colleagues trained the BERT on English Wikipedia (2.5B words) and BooksCorpus (0.8B words) and achieved the best accuracies for some of the NLP tasks in 2018. There are two pre-trained general BERT variations: The base model is a 12-layer, 768-hidden, 12-heads, 110M parameter neural network architecture, whereas the large model is a 24-layer, 1024-hidden, 16-heads, 340M parameter neural network architecture. Figure 2 shows the visualization of the BERT network created by Devlin et al.

 

Figure 5. Overall pre-training and fine-tuning procedures for BERT (Figure from the BERT paper)

Even though BERT seems more inferior to GPT-3, the availability of source code to the public makes the model much more popular among developers. You can easily load a BERT variation for your NLP task using the Hugging Face’s Transformers library. Besides, there are several BERT variations, such as original BERT, RoBERTa (by Facebook), DistilBERT, and XLNet. Here is a helpful TDS post on their comparison:

Here is brief info about BERT’s size and performance:

  • Year Published: 2018
  • Size: 440 MB (BERT Baseline)
  • GLUE Benchmark: Average accuracy of 82.1%
  • SQuAD v2.0: Accuracy of 86.3%
  • Number of Parameters: 110,000,000–340,000,000

ELMo Variations

ELMo, short for Embeddings from Language Models, is a word embedding system for representing words and phrases as vectors. ELMo models the syntax and semantic of words as well as their linguistic context, and it was developed by the Allen Institute for Brain Science. There several variations of ELMo, and the most complex ELMo model (ELMo 5.5B) was trained on a dataset of 5.5B tokens consisting of Wikipedia (1.9B) and all of the monolingual news crawl data from WMT 2008–2012 (3.6B). While both BERT and GPT models are based on transformation networks, ELMo models are based on bi-directional LSTM networks.

Here is brief info about ELMo’s size and performance:

  • Year Published: 2018
  • Size: 357 MB (ELMo 5.5B)
  • SQuAD: Accuracy of 85.8%
  • NER: Accuracy of 92.2%
  • Number of Parameters: 93,600,000

Just like BERT models, we also have access to ELMo source code. You can download the different variations of ELMos from Allen NLP’s Website:

Other Pre-Trained Models for Computer Vision Problems

Although there are several other pre-trained NLP models available in the market (e.g., GloVe), GPT, BERT, and ELMo are currently the best pre-trained models out there. Since this post aims to introduce these models, we will not have a code-along tutorial. But, I will share several tutorials where we exploit these very advanced pre-trained NLP models.

Conclusion

In a world where we have easy access to state-of-the-art neural network models, trying to build your own model with limited resources is like trying to reinvent the wheel. It is pointless.

Instead, try to work with these train models, add a couple of new layers on top considering your particular natural language processing task, and train. The results will be much more successful than a model you build from scratch.

4 Pre-Trained CNN Models to Use for Computer Vision with Transfer Learning

4 Pre-Trained CNN Models to Use for Computer Vision with Transfer Learning

Using State-of-the-Art Pre-trained Neural Network Models to Tackle Computer Vision Problems with Transfer Learning

Before we start, if you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. How Transfer Learning Works (Image by Author)

If you have been trying to build machine learning models with high accuracy; but never tried Transfer Learning, this article will change your life. At least, it did mine!

<script src=”https://gist.github.com/ogyalcin/3cea5cc55842f46a5c5fb226fa4feeab.js”></script>

Most of us have already tried several machine learning tutorials to grasp the basics of neural networks. These tutorials were very helpful to understand the basics of artificial neural networks such as Recurrent Neural Networks, Convolutional Neural Networks, GANs, and Autoencoders. But, their main functionality was to prepare you for real-world implementations.

Now, if you are planning to build an AI system that utilizes deep learning, you either (i) have to have a very large budget for training and excellent AI researchers at your disposal or (ii) have to benefit from transfer learning.

What is Transfer Learning?

Transfer learning is a subfield of machine learning and artificial intelligence which aims to apply the knowledge gained from one task (source task) to a different but similar task (target task).

For example, the knowledge gained while learning to classify Wikipedia texts can be used to tackle legal text classification problems. Another example would be using the knowledge gained while learning to classify cars to recognize the birds in the sky. As you can see there is a relation between these examples. We are not using a text classification model on bird detection.

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

In summary, transfer learning is a field that saves you from having to reinvent the wheel and helps you build AI applications in a very short amount of time.

 
Figure 2. Don’t Reinvent the Wheel, Transfer the Existing Knowledge (Photo by Jon Cartagena on Unsplash)

History of Transfer Learning

To show the power of transfer learning, we can quote from Andrew Ng:

Transfer learning will be the next driver of machine learning’s commercial success after supervised learning.

The history of Transfer Learning dates back to 1993. With her paper, Discriminability-Based Transfer between Neural Networks, Lorien Pratt opened the pandora’s box and introduced the world with the potential of transfer learning. In July 1997, the journal Machine Learning published a special issue for transfer learning papers. As the field advanced, adjacent topics such as multi-task learning were also included under the field of transfer learning. Learning to Learn is one of the pioneer books in this field. Today, transfer learning is a powerful source for tech entrepreneurs to build new AI solutions and researchers to push the frontiers of machine learning.

 Figure 3. Andrew Ng’s Expectation for the Commercial Success of Machine Learning Subfields (Image by Author)

How Does Transfer Learning Work?

There are three requirements to achieve transfer learning:

  • Development of an Open Source Pre-trained Model by a Third Party
  • Repurposing the Model
  • Fine Tuning for the Problem

Development of an Open Source Pre-trained Model

A pre-trained model is a model created and trained by someone else to solve a problem that is similar to ours. In practice, someone is almost always a tech giant or a group of star researchers. They usually choose a very large dataset as their base datasets such as ImageNet or the Wikipedia Corpus. Then, they create a large neural network (e.g., VGG19 has 143,667,240 parameters) to solve a particular problem (e.g., this problem is image classification for VGG19). Of course, this pre-trained model must be made public so that we can take these models and repurpose them.

Repurposing the Model

After getting our hands on these pre-trained models, we repurpose the learned knowledge, which includes the layers, features, weights, and biases. There are several ways to load a pre-trained model into our environment. In the end, it is just a file/folder which contains the relevant information. However, deep learning libraries already host many of these pre-trained models, which makes them more accessible and convenient:

You can use one of the sources above to load a trained model. It will usually come with all the layers and weights and you can edit the network as you wish.

Fine-Tuning for the Problem

Well, while the current model may work for our problem. It is often better to fine-tune the pre-trained model for two reasons:

  • So that we can achieve even higher accuracy;
  • Our fine-tuned model can generate the output in the correct format.

Generally speaking, in a neural network, while the bottom and mid-level layers usually represent general features, the top layers represent the problem-specific features. Since our new problem is different than the original problem, we tend to drop the top layers. By adding layers specific to our problems, we can achieve higher accuracy.

After dropping the top layers, we need to place our own layers so that we can get the output we want. For example, a model trained with ImageNet can classify up to 1000 objects. If we are trying to classify handwritten digits (e.g., MNIST classification), it may be better to end up with a final layer with only 10 neurons.

After we add our custom layers to the pre-trained model, we can configure it with special loss functions and optimizers and fine-tune it with extra training.

For a quick Transfer Learning tutorial, you may visit the post below:

4 Pre-Trained Models for Computer Vision

Here are the four pre-trained networks you can use for computer vision tasks such as ranging from image generation, neural style transfer, image classification, image captioning, anomaly detection, and so on:

  • VGG19
  • Inceptionv3 (GoogLeNet)
  • ResNet50
  • EfficientNet

Let’s dive into them one-by-one.

VGG-19

VGG is a convolutional neural network which has a depth of 19 layers. It was build and trained by Karen Simonyan and Andrew Zisserman at the University of Oxford in 2014 and you can access all the information from their paper, Very Deep Convolutional Networks for Large-Scale Image Recognition, which was published in 2015. The VGG-19 network is also trained using more than 1 million images from the ImageNet database. Naturally, you can import the model with the ImageNet trained weights. This pre-trained network can classify up to 1000 objects. The network was trained on 224×224 pixels colored images. Here is brief info about its size and performance:

  • Size: 549 MB
  • Top-1: Accuracy: 71.3%
  • Top-5: Accuracy: 90.0%
  • Number of Parameters: 143,667,240
  • Depth: 26
  Figure 4. An Illustration of the VGG-19 Network (Figure by Clifford K. Yang and Yufeng Zheng on ResearchGate)

Inceptionv3 (GoogLeNet)

Inceptionv3 is a convolutional neural network which has a depth of 50 layers. It was build and trained by Google and you can access all the information on the paper, titled “Going deeper with convolutions”. The pre-trained version of Inceptionv3 with the ImageNet weights can classify up to 1000 objects. The image input size of this network was 299×299 pixels, which is larger than the VGG19 network. While VGG19 was the runner up in 2014’s ImageNet competition, Inception was the winner. The brief summary of Inceptionv3 features is as follows:

  • Size: 92 MB
  • Top-1: Accuracy: 77.9%
  • Top-5: Accuracy: 93.7%
  • Number of Parameters: 23,851,784
  • Depth: 159
 

 

 

Figure 5. An Illustration of the Inceptionv3 Network (Figure by Masoud Mahdianpari, Bahram Salehi, and Mohammad Rezaee on ResearchGate)

ResNet50 (Residual Network)

ResNet50 is a convolutional neural network which has a depth of 50 layers. It was build and trained by Microsoft in 2015 and you can access the model performance results on their paper, titled Deep Residual Learning for Image Recognition. This model is also trained on more than 1 million images from the ImageNet database. Just like VGG-19, it can classify up to 1000 objects and the network was trained on 224×224 pixels colored images. Here is brief info about its size and performance:

  • Size: 98 MB
  • Top-1: Accuracy: 74.9%
  • Top-5: Accuracy: 92.1%
  • Number of Parameters: 25,636,712

If you compare ResNet50 to VGG19, you will see that ResNet50 actually outperforms VGG19 even though it has lower complexity. ResNet50 was improved several times and you also have access to newer versions such as ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2.

 

 

 

 Figure 6. An Illustration of the ResNet50 Network (Figure by Masoud Mahdianpari, Bahram Salehi, and Mohammad Rezaee on ResearchGate)

EfficientNet

EfficientNet is a state-of-the-art convolutional neural network that was trained and released to the public by Google with the paper “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” in 2019. There are 8 alternative implementations of EfficientNet (B0 to B7) and even the simplest one, EfficientNetB0, is outstanding. With 5.3 million parameters, it achieves a 77.1% Top-1 accuracy performance.

 Figure 7. EfficientNet Model Size vs. ImageNet Accuracy (Plot by Mingxing Tan and Quoc V. Le on Arxiv)

 

 

The brief summary of EfficientNetB0 features is as follows:

  • Size: 29 MB
  • Top-1: Accuracy: 77.1%
  • Top-5: Accuracy: 93.3%
  • Number of Parameters: ~5,300,000
  • Depth: 159

Other Pre-Trained Models for Computer Vision Problems

We listed the four state-of-the-art award-winning convolutional neural network models. However, there are dozens of other models available for transfer learning. Here is a benchmark analysis of these models, which are all available in Keras Applications.

Check out the Table: Benchmark Analysis of Pre-Trained CNN Models (Table by Author)
Table 1. Benchmark Analysis of Pre-Trained CNN Models (Table by Author)

Conclusion

In a world where we have easy access to state-of-the-art neural network models, trying to build your own model with limited resources is like trying to reinvent the wheel. It is pointless.

Instead try to work with these train models, add a couple of new layers on top considering your particular computer vision task, and train. The results will be much more successful than a model you build from scratch.

Subscribe to the Newsletter for Google Colab Notebooks

If you would like to have access to full code on Google Colab and have access to the latest content, subscribe to the mailing list!