Subscribe to Get All the Blog Posts and Colab Notebooks 

Sentiment Analysis in 10 Minutes with BERT and TensorFlow

Sentiment Analysis in 10 Minutes with BERT and TensorFlow

Learn the basics of the pre-trained NLP model, BERT, and build a sentiment classifier using the IMDB movie reviews dataset, TensorFlow, and Hugging Face transformers

I prepared this tutorial because it is somehow very difficult to find a blog post with actual working BERT code from the beginning till the end. They are always full of bugs. So, I have dug into several articles, put together their codes, edited them, and finally have a working BERT model. So, just by running the code in this tutorial, you can actually create a BERT model and fine-tune it for sentiment analysis.


Figure 1. Photo by Lukas on Unsplash

Natural language processing (NLP) is one of the most cumbersome areas of artificial intelligence when it comes to data preprocessing. Apart from the preprocessing and tokenizing text datasets, it takes a lot of time to train successful NLP models. But today is your lucky day! We will build a sentiment classifier with a pre-trained NLP model: BERT.

What is BERT?

BERT stands for Bidirectional Encoder Representations from Transformers and it is a state-of-the-art machine learning model used for NLP tasks. Jacob Devlin and his colleagues developed BERT at Google in 2018. Devlin and his colleagues trained the BERT on English Wikipedia (2,500M words) and BooksCorpus (800M words) and achieved the best accuracies for some of the NLP tasks in 2018. There are two pre-trained general BERT variations: The base model is a 12-layer, 768-hidden, 12-heads, 110M parameter neural network architecture, whereas the large model is a 24-layer, 1024-hidden, 16-heads, 340M parameter neural network architecture. Figure 2 shows the visualization of the BERT network created by Devlin et al.



Figure 2. Overall pre-training and fine-tuning procedures for BERT (Figure from the BERT paper)

So, I don’t want to dive deep into BERT since we need a whole different post for that. In fact, I already scheduled a post aimed at comparing rival pre-trained NLP models. But, you will have to wait for a bit.

Additionally, I believe I should mention that although Open AI’s GPT3 outperforms BERT, the limited access to GPT3 forces us to use BERT. But rest assured, BERT is also an excellent NLP model. Here is a basic visual network comparison among rival NLP models: BERT, GPT, and ELMo:


Figure 3. Differences in pre-training model architectures of BERT, GPT, and ELMo (Figure from the BERT paper)

Installing Hugging Face Transformers Library

One of the questions that I had the most difficulty resolving was to figure out where to find the BERT model that I can use with TensorFlow. Finally, I discovered Hugging Face’s Transformers library.

Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.

We can easily load a pre-trained BERT from the Transformers library. But, make sure you install it since it is not pre-installed in the Google Colab notebook.

Sentiment Analysis with BERT

Now that we covered the basics of BERT and Hugging Face, we can dive into our tutorial. We will do the following operations to train a sentiment analysis model:

  • Install Transformers library;
  • Load the BERT Classifier and Tokenizer alıng with Input modules;
  • Download the IMDB Reviews Data and create a processed dataset (this will take several operations;
  • Configure the Loaded BERT model and Train for Fine-tuning
  • Make Predictions with the Fine-tuned Model

Let’s get started!

Note that I strongly recommend you to use a Google Colab notebook. If you want to learn more about how you will create a Google Colab notebook, check out this article:

Installing Transformers

Installing the Transformers library is fairly easy. Just run the following pip line on a Google Colab cell:

After the installation is completed, we will load the pre-trained BERT Tokenizer and Sequence Classifier as well as InputExample and InputFeatures. Then, we will build our model with the Sequence Classifier and our tokenizer with BERT’s Tokenizer.

Let’s see the summary of our BERT model:

Here are the results. We have the main BERT model, a dropout layer to prevent overfitting, and finally a dense layer for classification task:



Figure 4. Summary of BERT Model for Sentiment Classification

Now that we have our model, let’s create our input sequences from the IMDB reviews dataset:

IMDB Dataset

IMDB Reviews Dataset is a large movie review dataset collected and prepared by Andrew L. Maas from the popular movie rating service, IMDB. The IMDB Reviews dataset is used for binary sentiment classification, whether a review is positive or negative. It contains 25,000 movie reviews for training and 25,000 for testing. All these 50,000 reviews are labeled data that may be used for supervised deep learning. Besides, there is an additional 50,000 unlabeled reviews that we will not use in this case study. In this case study, we will only use the training dataset.

Initial Imports

We will first have two imports: TensorFlow and Pandas.

Get the Data from the Stanford Repo

Then, we can download the dataset from Stanford’s relevant directory with tf.keras.utils.get_file function, as shown below:

Remove Unlabeled Reviews

To remove the unlabeled reviews, we need the following operations. The comments below explain each operation:

Train and Test Split

Now that we have our data cleaned and prepared, we can create text_dataset_from_directory with the following lines. I want to process the entire data in a single batch. That’s why I selected a very large batch size:

Convert to Pandas to View and Process

Now we have our basic train and test datasets, I want to prepare them for our BERT model. To make it more comprehensible, I will create a pandas dataframe from our TensorFlow dataset object. The following code converts our train Dataset object to train pandas dataframe:

Here is the first 5 row of our dataset:



Figure 5. First 5 Row of Our Dataset

I will do the same operations for the test dataset with the following lines:

Creating Input Sequences

We have two pandas Dataframe objects waiting for us to convert them into suitable objects for the BERT model. We will take advantage of the InputExample function that helps us to create sequences from our dataset. The InputExample function can be called as follows:

Now we will create two main functions:

1 — convert_data_to_examples: This will accept our train and test datasets and convert each row into an InputExample object.

2 — convert_examples_to_tf_dataset: This function will tokenize the InputExample objects, then create the required input format with the tokenized objects, finally, create an input dataset that we can feed to the model.

We can call the functions we created above with the following lines:

Our dataset containing processed input sequences are ready to be fed to the model.

Configuring the BERT model and Fine-tuning

We will use Adam as our optimizer, CategoricalCrossentropy as our loss function, and SparseCategoricalAccuracy as our accuracy metric. Fine-tuning the model for 2 epochs will give us around 95% accuracy, which is great.

Training the model might take a while, so ensure you enabled the GPU acceleration from the Notebook Settings. After our training is completed, we can move onto making sentiment predictions.

Making Predictions

I created a list of two reviews I created. The first one is a positive review, while the second one is clearly negative.

We need to tokenize our reviews with our pre-trained BERT tokenizer. We will then feed these tokenized sequences to our model and run a final softmax layer to get the predictions. We can then use the argmax function to determine whether our sentiment prediction for the review is positive or negative. Finally, we will print out the results with a simple for loop. The following lines do all of these said operations:


Figure 6. Our Dummy Reviews with Their Predictions

Also, with the code above, you can predict as many reviews as possible.


You have successfully built a transformers network with a pre-trained BERT model and achieved ~95% accuracy on the sentiment analysis of the IMDB reviews dataset! If you are curious about saving your model, I would like to direct you to the Keras Documentation. After all, to efficiently use an API, one must learn how to read and use the documentation.

Predict Tomorrow’s Bitcoin (BTC) Price with Recurrent Neural Networks

Predict Tomorrow’s Bitcoin (BTC) Price with Recurrent Neural Networks

Using Recurrent Neural Networks to Predict Next Days Cryptocurrency Prices with TensorFlow and Keras | Supervised Deep Learning

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin


Photo by Andre Francois on Unsplash

Wouldn’t it be awesome if you were, somehow, able to predict tomorrow’s Bitcoin (BTC) price? As you all know, the cryptocurrency market has experienced tremendous volatility over the last year. The value of Bitcoin reached its peak on December 16, 2017, by climbing to nearly $20,000, and then it has seen a steep decline at the beginning of 2018. Not long ago, though, a year ago, to be precise, its value was almost half of what it is today. Therefore, if we look at the yearly BTC price chart, we may easily see that the price is still high. The fact that only two years ago, BTC’s value was only the one-tenth of its current value is even more shocking. You may personally explore the historical BTC prices using this plot below:

Historical Bitcoin (BTC) Prices by CoinDesk

There are several conspiracies regarding the precise reasons behind this volatility, and these theories are also used to support the prediction reasoning of crypto prices, particularly of BTC. These subjective arguments are valuable to predict the future of cryptocurrencies. On the other hand, our methodology evaluates historical data to predict the cryptocurrency prices from an algorithmic trading perspective. We plan to use numerical historical data to train a recurrent neural network (RNN) to predict BTC prices.

Obtaining the Historical Bitcoin Prices

There are quite a few resources we may use to obtain historical Bitcoin price data. While some of these resources allow the users to download CSV files manually, others provide an API that one can hook up to his code. Since when we train a model using time series data, we would like it to make up-to-date predictions, I prefer to use an API to obtain the latest figures whenever we run our program. After a quick search, I have decided to use’s API, which provides up-to-date coin prices that we can use in any platform.

Recurrent Neural Networks

Since we are using a time series dataset, it is not viable to use a feedforward neural network as tomorrow’s BTC price is most correlated with today’s, not a month ago’s.

A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a sequence. — Wikipedia

An RNN shows temporal dynamic behavior for a time sequence, and it can use its internal state to process sequences. In practice, this can be achieved with LSTMs and GRUs layers.

Here you can see the difference between a regular feedforward-only neural network and a recurrent neural network (RNN):

 RNN vs. Regular Nets by Niklas Donges on TowardsDataScience

Our Roadmap

To be able to create a program that trains on the historical BTC prices and predict tomorrow’s BTC price, we need to complete several tasks as follows:

1 — Obtaining, Cleaning, and Normalizing the Historical BTC Prices

2 — Building an RNN with LSTM

3 — Training the RNN and Saving The Trained Model

4 — Predicting Tomorrow’s BTC Price and “Deserialize” It

BONUS: Deserializing the X_Test Predictions and Creating a Chart

Obtaining, Cleaning, and Normalizing the Historical BTC Prices

Obtaining the BTC Data

As I mentioned above, we will use’s API for the BTC dataset and convert it into a Pandas dataframe with the following code:

Obtaining the BTC Prices with CoinRanking API

This function is adjusted for 5-years BTC/USD prices by default. However, you may always change these values by passing in different parameter values.

Cleaning the Data with Custom Functions

After obtaining the data and converting it to a pandas dataframe, we may define custom functions to clean our data, normalize it for a neural network as it is a must for accurate results, and apply a custom train-test split. We created a custom train-test split function (not the scikit-learn’s) because we need to keep the time-series in order for training our RNN properly. We may achieve this with the following code, and you may find further function explanations in the code snippet below:

Defining custom functions for matrix creation, normalizing, and train-test split

After defining these functions, we may call them with the following code:

Calling the defined functions for data cleaning, preparation, and splitting

Building an RNN with LSTM

After preparing our data, it is time for building the model that we will later train by using the cleaned&normalized data. We will start by importing our Keras components and setting some parameters with the following code:

Setting the RNN Parameters in Advance

Then, we will create our Sequential model with two LSTM and two Dense layers with the following code:

Creating a Sequential Model and Filling it with LSTM and Dense Layers

Training the RNN and Saving The Trained Model

Now it is time to train our model with the cleaned data. You can also measure the time spent during the training. Follow these codes:

Training the RNN Model using the Prepared Data

Don’t forget to save it:

Saving the Trained Model

I am keen to save the model and load it later because it is quite satisfying to know that you can actually save a trained model and re-load it to use it next time. This is basically the first step for web or mobile integrated machine learning applications.

Predicting Tomorrow’s BTC Price and “Deserialize” It

After we train the model, we need to obtain the current data for predictions, and since we normalize our data, predictions will also be normalized. Therefore, we need to de-normalize back to their original values. Firstly, we will obtain the data in a similar, partially different, manner with the following code:

Loading the last 30 days’ BTC Prices

We will only have the normalized data for prediction: No train-test split. We will also reshape the data manually to be able to use it in our saved model.

After cleaning and preparing our data, we will load the trained RNN model for prediction and predict tomorrow’s price.

Loading the Trained Model and Making the Prediction

However, our results will vary between -1 and 1, which will not make a lot of sense. Therefore, we need to de-normalize them back to their original values. We can achieve this with a custom function:

We need a deserializer for the Original BTC Prediction Value in USD

After defining the custom function, we will call these function and extract tomorrow’s BTC prices with the following code:

Calling the deserializer and extracting the Price in USD

With the code above, you can actually get the model’s prediction for tomorrow’s BTC prices.

Deserializing the X_Test Predictions and Creating a Chart

You may also be interested in the overall result of the RNN model and prefer to see it as a chart. We can also achieve these by using our X_test data from the training part of the tutorial.

We will start by loading our model (consider this as an alternative to the single prediction case) and making the prediction on X_test data so that we can make predictions for a proper number of days for plotting with the following code:

Loading the Trained Model and Making Prediction Using the X_test Values

Next, we will import Plotly and set the properties for a good plotting experience. We will achieve this with the following code:

Importing Plotly and Setting the Parameters

After setting all the properties, we can finally plot our predictions and observation values with the following code:

Creating a Dataframe and Using it in Plotly’s iPlot

When you run this code, you will come up with the up-to-date version of the following plot: Chart for BTC Price Predictions

How Reliable Are These Results?

As you can see, it does not look bad at all. However, you need to know that even though the patterns match pretty closely, the results are still dangerously apart from each other if you inspect the results on a day-to-day basis. Therefore, the code must be further developed to get better results.


You have successfully created and trained an RNN model that can predict BTC prices, and you even saved the trained model for later use. You may use this trained model on a web or mobile application by switching to Object-Oriented Programming. Pat yourself on the back for successfully developing a model relevant to artificial intelligence, blockchain, and finance. I think it sounds pretty cool to touch these areas all at once with this simple project.

Mastering Word Embeddings in 10 Minutes with TensorFlow

Mastering Word Embeddings in 10 Minutes with TensorFlow

Covering the Basics of Word Embedding, One Hot Encoding, Text Vectorization, Embedding Layers, and an Example Neural Network Architecture for NLP

Figure 1. Photo by Nick Hillier on Unsplash

Listen to the Audio Version

Word embedding is one of the most important concepts in Natural Language Processing (NLP). It is an NLP technique where words or phrases (i.e., strings) from a vocabulary are mapped to vectors of real numbers. The need to map strings into vectors of real numbers originated from computers’ inability to do operations with strings.

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

There are several NLP techniques to convert strings into representative numbers, such as:

  • One Hot Encoding
  • Encoding with a Unique Number
  • Word Embedding

Before diving into word embedding, let’s compare these three options to see why Word embedding is the best.

One Hot Encoding

One-hot encoding can be achieved by creating a vector of zeros with the length of the entire vocabulary. Then, we only place “one” in the index where the word is. For each word, we create the same vector. Below you can see an example of one hot encoding where we encode the sentence, “His dog is two years old”.

 Figure 2. One Hot Encoding for “His dog is two years old” (Figure by Author)

As the vocabulary size increases, so does the size of the vector. One hot encoding is usually regarded as an inefficient method to vectorize strings. As you can see in the example above, most of the values are zero. This approach is considered computationally complex in an unnecessary way.

Encoding with a Unique ID Number

Instead of creating a vector of zeros for each word in a sentence, you may choose to assign each word a unique ID number. Therefore, instead of using strings, you may assign 1 to “dog”, 2 to “his”, 3 to “is”, 4 to “old”, 5 to “two”, 6 to “years”. This way, we can represent our string, “His dog is two years old”, as [2, 1, 3, 5, 6, 4].

 Figure 3. Encoding with a Unique ID Number for “His dog is two years old” (Figure by Author)

As you can see, it is much easier to create. However, it comes with a covenant: These numbers do not have any relational representation since the encoding is arbitrary. There is nothing that stops us from changing the encoding differently. Therefore, these values do not capture a relationship between words. The lack of relational representation makes it difficult for models to interpret these values since the value of the feature (the number assigned) does not mean anything. Therefore, encoding with a unique ID number is also not a very good idea to capture patterns.

Word Embedding

Since one-hot encoding is inefficient and encoding with unique IDs does not offer relational representation, we have to rely on another method. This method is word embedding. As we already covered above, word embedding is the task of converting words into vectors.

 Figure 4. A Basic 3-Dimensional Word Embedding for “His dog is” (Figure by Author)

As you can see above, you can actually see a relational representation of the words. We didn’t have this feature in encoding with a unique number. Furthermore, we can also represent each word with a less complex vector compared to one-hot encoding. The size of our final array is much smaller than a one-hot encoded vocabulary.

Deep Learning for the Vectorization

So, in an ideal world, our goal is to find the perfect vector weights for each word so that their relations can be properly represented. But how are we going to know whether a word is closely related to another? Of course, we have an intuition since we have been learning languages all of our lives. But, our model must receive the training it requires to analyze these words. Therefore, we need a machine learning problem. We can adopt supervised, unsupervised, or semi-supervised learning approaches for word embedding. In this post, we will tackle a supervised learning problem to vectorize our strings.

A Neural Network Architecture for Word Embedding

We need to create a neural network to find ideal vector values for each word. At this stage, what we need is the following:

1 — A Text Vectorization layer for converting Text to 1D Tensor of Integers

2 — An Embedding layer to convert 1D Tensors of Integers into dense vectors of fixed size.

3 — A fully connected neural network for backpropagation and cost function and other deep learning tasks

Text Vectorization Layer

Text vectorization is an experimental layer that offers a lot of value for the text preprocessing automation. The layer does the following procedures:

  • Standardize each sentence with lowercasing and punctuation stripping
  • Split each sentence into words
  • Recombine words into tokens
  • Index these tokens
  • Transform each sentence using the index into a 1D tensor of integer token indices or float values.

Here is a straightforward example of how the TextVectorization layer works.

We first create a vocabulary in line with our examples above.

 Figure 5. The Vocabulary of TextVectorization Layer Example

Then, we create a TextVectorization layer. Then, we create a dummy model with Keras Sequential API. Then, we feed several sentences to create a list of 1D tensors of integers:

 Figure 6. The I/O of the Operations with TextVectorization Layer

The length of the 1D output tensors is three because we set it to three. Feel free to try other values.

Embedding Layer

The embedding layer has a simple capability:

It turns positive integers (indexes) into dense vectors of fixed size.

Let’s see it with a basic example:

I passed the output from the TextVectorization example as input and set the output dimension to two. Therefore, each of our input integers is now represented with a 2-dims vector. While our input shape is (3, 3), our output shape is (3, 3, 2).

 Figure 7. The I/O of the Operations with Embedding Layer

By changing the output dim, you can easily generate a more complex vector space, which would increase the computational complexity, but can potentially capture more pattern.

A Set of Fully Connected Layers

 Figure 8. A Basic Example of A Set of Fully Connected Layers (Figure by Author)

After TextVectorization and Embedding layers, we end up with the desired vector space. But, these values are still very random. Therefore, we need an optimizer that calculates the cost function and adjusts the values with backpropagation. This optimizer should be run on top of a set of fully connected layers. In some situations, you can also add other types of layers such as LSTM, GRU, or Convolution layers. To keep it simple, I will refer to them as a set of fully connected layers.

Note: If your embedding output has -by any chance- variable length, make sure to add a Global Pooling Layer before the set of fully connected layers.

Using a set of fully connected layers, we can feed our strings against labels to adjust the vector values. For example, assume that we have a dataset of sentences with their sentiment labels (e.g., positive or negative). We can then vectorize and embed these sentences using a vocabulary we created from our unique words. We can then create a set of fully connected layers to predict whether their sentiment is positive or negative. We can use the labels to backpropagate our model to optimize these vectors. Our result would look something similar to this:

 Figure 9. Word Embedding created using Word2Vec (Figure by Author on Embedding Projector)

Final Notes

Roughly speaking, a neural network architecture for Word embedding would look like this:

 Figure 10. A Basic Artificial Neural Network Architecture for Word embedding (Figure by Author)

As I mentioned above, a complex word embedding model would require other layers such as Global Pooling, LSTM, GRU, Convolution, and others. However, the structure remains the same.

Now that you have an idea about Word embedding, in Part II of this series, we can actually create our own word embedding using the IMDB Reviews dataset and visualize it as in Figure 9. Check out Part 2:

Mastering Word Embeddings in 10 Minutes with IMDB Reviews

Mastering Word Embeddings in 10 Minutes with IMDB Reviews

Learn the Basics of Text Vectorization, Create a Word Embedding Model trained with a Neural Network on IMDB Reviews Dataset, and Visualize it with TensorBoard Embedding Projector

Figure 1. Photo by Raphael Schaller on Unsplash

This is a follow-up tutorial prepared after Part I of the tutorial, Mastering Word Embeddings in 10 Minutes with TensorFlow, where we introduce several word vectorization concepts such as One Hot Encoding and Encoding with a Unique ID Value. I would highly recommend you to check this tutorial if you are new to natural language processing.

In Part II of the tutorial, we will vectorize our words and trained their values using the IMDB Reviews dataset. This tutorial is our own take on TensorFlow’s tutorial on word embedding. We will train a word embedding using a simple Keras model and the IMDB Reviews dataset. Then, we will visualize them using Embedding Projector.

Let’s start:

Create a New Google Colab Notebook

First of all, you need the environment to start coding. For the sake of simplicity, I recommend you work with Google Colab. It comes with all the libraries pre-installed, and you won’t have to worry about them. All you need is a Google account, and I am sure you have one. So, create a new Colab notebook (see Figure 2) and start coding.

  Figure 2: Create a New Google Colab Notebook

Initial Imports

We will start by importing TensorFlow and os libraries. We will use the os library for some directory level operations we will do below and the TensorFlow library for dataset loading, deep learning models, and text preprocessing.

Download the IMDB Reviews Dataset

IMDB Reviews Dataset is a large movie review dataset collected and prepared by Andrew L. Maas from the popular movie rating service, IMDB. The IMDB Reviews dataset is used for binary sentiment classification, whether a review is positive or negative. It contains 25,000 movie reviews for training and 25,000 for testing. All these 50,000 reviews are labeled data that may be used for supervised deep learning. Besides, there is an additional 50,000 unlabeled reviews that we will not use in this case study. In this case study, we will only use the training dataset.

We can download the dataset from Stanford’s relevant directory with tf.keras.utils.get_file function, as shown below:

Dataset Creation

We need a little bit of housekeeping to create a proper dataset. Let’s start with viewing our main directory with the

As you can see below, we have our train and test folders. For this study, we will only use the /train folder

Figure 3. The Content of Main Directory, “aclImdb”

With the following lines, let’s view what’s under the /train subdirectory:

and here it is:

Figure 4.a. The Content of Sub-Directory, “aclImdb/train”

We have reviews with negative sentiments and positive sentiments. Next step, we will remove theunsup folder, which contains unlabeled reviews. Since we are working on a supervised learning problem in this tutorial, we do not need it.

As you can see in Figure X, we removed the unsup folder thanks to theshutil library:

 Figure 4. b. The Content of Sub-Directory, “aclImdb/train” after we remove the “unsup” folder

Create the Dataset

Now that we cleaned our directory, we can create our Dataset object. For this, we can use thetf.keras.preprocessing.text_dataset_from_directory function. As the name suggests, the text_dataset_from_directory function allows us to create text datasets directly from a directory. We selected an 80/20 train and validation split, but feel free to play around by adjusting the validation_split argument.

As you can see in Figure 5, we have 20,000 reviews for training and 5,000 for validation.

 Figure 5. The volume of Our Train and Validation Dataset

Let’s check how our dataset looks by using the .take() function and run a for-loop. Note that our dataset is a TensorFlow Dataset object. It requires a little more effort to print out its elements. The following line does that:

And here is the results in Figure 6:

 Figure 6. The First Five Reviews from the Training Dataset with Their Sentiment Info in the Beginning

Configure the Dataset

Now, since we are in the realms of deep learning, optimization is essential for a bearable training experience. TensorFlow has an experimental tool that we can use to optimize the workload and shorten the time needed for preprocessing, training, and other parallel operations. We can optimize our pipeline with the following lines:

Text Preprocessing

Now that we created our dataset, it is time to process its elements so that our model can understand them.

Custom Standardization

We will create a custom string standardization function to make the best of standardization. Standardization can be described as a set of preprocessing operations for NLP studies, including lowercasing, tag removal, and punctuation stripping. In the below code, we are achieving exactly these:

Now our data will be more standardized with our custom function.


Since we created our custom standardization function, we can pass it in the TextVectorization layer we import from TensorFlow. TextVectorization is a layer that we use to map our strings to integers. We will pass in our custom standardization function, we will use up to 10,000 unique words (vocabulary), and we will keep a maximum of 100 words for each review. Check the below lines:

We will remove the labels from the train dataset and call the .adapt() function to build the vocabulary to use later on. Note that we haven’t vectorized our dataset yet. Just created the vocabulary with the lines below:

Model Building and Training

We already processed our reviews, and it is time to create our model.

Model Creation

We will make the inial imports, which include Sequential API for model building and Embedding, GlobalAveragePooling, and Dense layers we will use in the model.

We set the embedding dimension to 16, so each word will have 16 representative values. We limit the vocabulary size to 10,000 in parallel with the code above.

We add the following layers to our Keras model:

1 — A TextVectorization layer for converting strings to integers;

2 — AEmbedding layer to convert integer values with 16-dimensional vectors;

3 — A Global Average Pooling 1D layer to resolve the issue of having reviews with different lengths;

4 — A Dense layer with 16 neurons with a relu activation layer

5 — A final Dense layer with 1 neuron to classify if the review has a positive or negative sentiment.

The following lines do all these:

Set Up Callbacks for TensorBoard

Since we want to see how our model evolves and performs over time, we will configure our callback settings with the following lines:

We will use these callbacks to visualize our model performance at each epoch using TensorBoard

Configure the Model

Then, we will configure our model with Adam as optimizer and Binary Crossentropy as loss function because it is a binary classification task and select accuracy as our performance metric.

Start the Training

Now that our model is configured, we can use .fit() function to start the training. We will run for 15 epochs and record the callbacks for TensorBoard.

Here is the screenshot of the training process, as shown in Figure :

Figure 7. Model Performance at Each Epoch

Visualize the Results

Now that we concluded our model training let’s do some visualization to understand better what we built and trained.

The Summary of the Model

We can easily see the summary of our model with the .summary() function, as shown below:

Figure 8 shows how our model looks and lists the number of parameters and output shape for each layer:

 Figure 8. Our Model Summary

Training Performance on TensorBoard

Let’s see how our mode evolved as it trained on the IMDB reviews dataset. We can use TensorFlow’s visualization kit, TensorBoard. TensorBoard can be used for several machine learning visualization tasks such as:

  • Tracking and visualization loss and accuracy measures
  • Visualizing the model graph
  • Viewing the evolution of weights, biases, and other tensor values
  • Displaying images, text, and audio data
  • and more…
 Figure 9. A Screenshot of Our Tensorboard Instance

In this tutorial, we will use %load_ext to load TensorBoard and view the logs. The lines above will run a small server within our cell to visualize our metric values over time.

As you can see on the left, our accuracy increases over time while our loss values decrease. Figure 9 shows that our model does what it is supposed to do because decreasing loss value means that our model is doing something to lower its mistakes: learning.

Visualization with Embedding Projector

Our model looks nice and it learned a lot in just 15 epochs. But, the main goal of this tutorial to create a word embedding. We will not predict review sentiments in this tutorial. Instead, we will visualize our word embedding cloud using Embedding Projector.

Embedding Projector is a tool built on top of TensorBoard. It is a useful tool to analyze data and visualize the position of embedding values relative to one another. Using Embedding Projector, we can graphically represent high dimensional embedding by simplifying them using algorithms like PCA.

Get the Vector Values and Vocabulary Data

We will start by getting our 16-dimensional embedding values for each word. Also, we will get a list of all these words we embedded. We can achieve these two tasks with the following code:

Let’s see how our word and its vector values look with a random example. We selected the word with index no. 500 and visualize it with the following code:

The vector values and the corresponding word for index no. 500 is shown in Figure 10:

 Figure 10. A Random Example of Word-Vector Pair

Feel free to change the index value to view other words with their vector values.

Save the Data to New Files

Now we have the entire list of words (vocabulary) with their corresponding 16-dimensional vector values. We will save word names to the metadata.tsv file and vector values to the vectors.tsv file. The following lines create new files, write our data to these new files, save the data, close the files, and download them to your local machine:

Load to Embedding Projector

Now we visit the Embedding Projector website:

Then, we click the “Load” button on the left to load our vectors.tsv and metadata.tsv files. Then, we can click anywhere outside of the popup window.

and, voilà!

 Figure 11. Our Word Embedding Trained on IMDB Reviews Dataset

Note that Embedding Projectors runs a PCA algorithm to reduce the 16-dimensional vector space into 3-dimensional since this is the only way to visualize it.


You have successfully built a neural network to train a word embedding model, and it takes a lot of effort to achieve this. Pat yourself on the back and keep improving yourself in the field of natural language processing, as there are many unsolved problems.

Mastering TensorFlow Tensors in 5 Easy Steps

Mastering TensorFlow Tensors in 5 Easy Steps

Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects | Deep Learning with TensorFlow 2.x

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin




Photo by Esther Jiao on Unsplash

In this post, we will dive into the details of TensorFlow Tensors. We will cover all the topics related to Tensors in Tensorflow in these five simple steps:

  • Step I: Definition of Tensors → What is a Tensor?
  • Step II: Creation of Tensors → Functions to Create Tensor Objects
  • Step III: Qualifications of Tensors → Characteristics and Features of Tensor Objects
  • Step IV: Operations with Tensors → Indexing, Basic Tensor Operations, Shape Manipulation, and Broadcasting
  • Step V: Special Types of Tensors → Special Tensor Types Other than Regular Tensors

Let’s start!

Definition of Tensors: What is a Tensor?



Figure 1. A Visualization of Rank-3 Tensors (Figure by Author)

Tensors are TensorFlow’s multi-dimensional arrays with uniform type. They are very similar to NumPy arrays, and they are immutable, which means that they cannot be altered once created. You can only create a new copy with the edits.

Let’s see how Tensors work with code example. But first, to work with TensorFlow objects, we need to import the TensorFlow library. We often use NumPy with TensorFlow, so let’s also import NumPy with the following lines:

Creation of Tensors: Creating Tensor Objects

There are several ways to create a tf.Tensor object. Let’s start with a few examples. You can create Tensor objects with several TensorFlow functions, as shown in the below examples:

tf.constant, tf.ones, tf.zeros, and tf.range are some of the functions you can use to create Tensor objects
tf.Tensor([[1 2 3 4 5]], shape=(1, 5), dtype=int32)
tf.Tensor([[1. 1. 1. 1. 1.]], shape=(1, 5), dtype=float32)
tf.Tensor([[0. 0. 0. 0. 0.]], shape=(1, 5), dtype=float32)
tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

As you can see, we created Tensor objects with the shape (1, 5) with three different functions and a fourth Tensor object with the shape (5, )using tf.range() function. Note that tf.ones and tf.zeros accepts the shape as the required argument since their element values are pre-determined.

Qualifications of Tensors: Characteristics and Features of Tensor Objects

TensorFlow Tensors are created as tf.Tensor objects, and they have several characteristic features. First of all, they have a rank based on the number of dimensions they have. Secondly, they have a shape, a list that consists of the lengths of all their dimensions. All tensors have a size, which is the total number of elements within a Tensor. Finally, their elements are all recorded in a uniform Dtype (data type). Let’s take a closer look at each of these features.

Rank System and Dimension

Tensors are categorized based on the number of dimensions they have:

  • Rank-0 (Scalar) Tensor: A tensor containing a single value and no axes (0-dimension);
  • Rank-1 Tensor: A tensor containing a list of values in a single axis (1-dimension);
  • Rank-2 Tensor: A tensor containing 2-axes (2-dimensions); and
  • Rank-N Tensor: A tensor containing N-axis (N-dimensions).
 Figure 2. Rank-1 Tensor | Rank-2 Tensor| Rank-3 Tensor (Figure by Author)

For example, we can create a Rank-3 tensor by passing a three-level nested list object to the tf.constant function. For this example, we can split the numbers into a 3-level nested list with three-element at each level:

The code to create a Rank-3 Tensor object
tf.Tensor( [[[ 0 1 2]
[ 3 4 5]]



[[ 6 7 8]
[ 9 10 11]]],
shape=(2, 2, 3), dtype=int32)

We can view the number of dimensions that our `rank_3_tensor` object currently has with the `.ndim` attribute.

The number of dimensions in our Tensor object is 3


The shape feature is another attribute that every Tensor has. It shows the size of each dimension in the form of a list. We can view the shape of the rank_3_tensor object we created with the .shape attribute, as shown below:

The shape of our Tensor object is (2, 2, 3)

As you can see, our tensor has 2 elements at the first level, 2 elements in the second level, and 3 elements in the third level.


Size is another feature that Tensors have, and it means the total number of elements a Tensor has. We cannot measure the size with an attribute of the Tensor object. Instead, we need to use tf.size() function. Finally, we will convert the output to NumPy with the instance function .numpy() to get a more readable result:

The size of our Tensor object is 12


Tensors often contain numerical data types such as floats and ints, but may contain many other data types such as complex numbers and strings.

Each Tensor object, however, must store all its elements in a single uniform data type. Therefore, we can also view the type of data selected for a particular Tensor object with the .dtype attribute, as shown below:

The data type selected for this Tensor object is <dtype: 'int32'>

Operations with Tensors


An index is a numerical representation of an item’s position in a sequence. This sequence can refer to many things: a list, a string of characters, or any arbitrary sequence of values.

TensorFlow also follows standard Python indexing rules, which is similar to list indexing or NumPy array indexing.

A few rules about indexing:

  1. Indices start at zero (0).
  2. Negative index (“-n”) value means backward counting from the end.
  3. Colons (“:”) are used for slicing: start:stop:step.
  4. Commas (“,”) are used to reach deeper levels.

Let’s create a rank_1_tensor with the following lines:

tf.Tensor([ 0 1 2 3 4 5 6 7 8 9 10 11],
shape=(12,), dtype=int32)

and test out our rules no.1, no.2, and no.3:

First element is: 0
Last element is: 11
Elements in between the 1st and the last are: [ 1 2 3 4 5 6 7 8 9 10]

Now, let’s create our rank_2_tensor object with the following code:

tf.Tensor( [[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]], shape=(2, 6), dtype=int32)

and test the 4th rule with several examples:

The first element of the first level is: [0 1 2 3 4 5]
The second element of the first level is: [ 6 7 8 9 10 11]
The first element of the second level is: 0
The third element of the second level is: 2

Now, we covered the basics of indexing, so let’s take a look at the basic operations we can conduct on Tensors.

Basic Operations with Tensors

You can easily do basic math operations on tensors such as:

  1. Addition
  2. Element-wise Multiplication
  3. Matrix Multiplication
  4. Finding the Maximum or Minimum
  5. Finding the Index of the Max Element
  6. Computing Softmax Value

Let’s see these operations in action. We will create two Tensor objects and apply these operations.

We can start with addition.

tf.Tensor( [[ 3. 7.]
[11. 15.]], shape=(2, 2), dtype=float32)

Let’s continue with the element-wise multiplication.

tf.Tensor( [[ 2. 12.]
[30. 56.]], shape=(2, 2), dtype=float32)

We can also do matrix multiplication:

tf.Tensor( [[22. 34.]
[46. 74.]], shape=(2, 2), dtype=float32)

NOTE: Matmul operations lays in the heart of deep learning algorithms. Therefore, although you will not use matmul directly, it is crucial to be aware of these operations.

Examples of other operations we listed above:

The Max value of the tensor object b is: 7.0
The index position of the Max of the tensor object b is: [1 1]
The softmax computation result of the tensor object b is: [[0.11920291 0.880797 ] [0.11920291 0.880797 ]]

Manipulating Shapes

Just as in NumPy arrays and pandas DataFrames, you can reshape Tensor objects as well.

The tf.reshape operations are very fast since the underlying data does not need to be duplicated. For the reshape operation, we can use thetf.reshape() function. Let’s use the tf.reshape function in code:

The shape of our initial Tensor object is: (1, 6)
The shape of our initial Tensor object is: (6, 1)
The shape of our initial Tensor object is: (3, 2)
The shape of our flattened Tensor object is: tf.Tensor([1 2 3 4 5 6], shape=(6,), dtype=int32)

As you can see, we can easily reshape our Tensor objects. But beware that when doing reshape operations, a developer must be reasonable. Otherwise, the Tensor might get mixed up or can even raise an error. So, look out for that 😀.


When we try to do combined operations using multiple Tensor objects, the smaller Tensors can stretch out automatically to fit larger tensors, just as NumPy arrays can. For example, when you attempt to multiply a scalar Tensor with a Rank-2 Tensor, the scalar is stretched to multiply every Rank-2 Tensor element. See the example below:

tf.Tensor( [[ 5 10]
[15 20]], shape=(2, 2), dtype=int32)

Thanks to broadcasting, you don’t have to worry about matching sizes when doing math operations on Tensors.

Special Types of Tensors

We tend to generate Tensors in a rectangular shape and store numerical values as elements. However, TensorFlow also supports irregular, or specialized, Tensor types, which are:

  1. Ragged Tensors
  2. String Tensors
  3. Sparse Tensors

Figure 3. Ragged Tensor | String Tensor| Sparse Tensor (Figure by Author)

Let’s take a closer look at what each of them is.

Ragged Tensors

Ragged tensors are tensors with different numbers of elements along the size axis, as shown in Figure X.

You can build a Ragged Tensor, as shown below:

<tf.RaggedTensor [[1, 2, 3],
[4, 5],

String Tensors

String Tensors are tensors, which stores string objects. We can build a String Tensor just as you create a regular Tensor object. But, we pass string objects as elements instead of numerical objects, as shown below:

tf.Tensor([b'With this'
b'code, I am'
b'creating a String Tensor'],
shape=(3,), dtype=string)

Sparse tensors

Finally, Sparse Tensors are rectangular Tensors for sparse data. When you have holes (i.e., Null values) in your data, Sparse Tensors are to-go objects. Creating a sparse Tensor is a bit time consuming and should be more mainstream. But, here is an example:


tf.Tensor( [[ 25 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 50 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 100]], shape=(5, 5), dtype=int32)


We have successfully covered the basics of TensorFlow’s Tensor objects.

Give yourself a pat on the back!

This should give you a lot of confidence since you are now much more informed about the building blocks of the TensorFlow framework.

Check Part 1 of this tutorial series:

Continue with Part 3 of the series:

Mastering TensorFlow “Variables” in 5 Easy Steps

Mastering TensorFlow “Variables” in 5 Easy Steps

Learn how to use TensorFlow Variables, their differences from plain Tensor objects, and when they are preferred over these Tensor objects | Deep Learning with TensorFlow 2.x

WARNING: Do not confuse this article with “Mastering TensorFlow Tensors in 5 Easy Steps”!

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin


Figure 1. Photo by Crissy Jarvis on Unsplash

In this tutorial, we will focus on TensorFlow Variables. After the tutorial, you will be able to create, update, and manage TensorFlow Variables effectively. As usual, our tutorial will deliver code examples with detailed explanations as well as conceptual explanations. We will master TensorFlow Variables in 5 easy steps:

  • Step 1: Definition of Variables →A Brief Introduction, Comparison with Tensors
  • Step 2: Creation of Variables → Instantiating tf.Variable Objects
  • Step 3: Qualifications of Variables → Characteristics and Features
  • Step 4: Operations with Variables → Basic Tensor Operations, Indexing, Shape Manipulation, and Broadcasting
  • Step 5: Hardware Selection for Variables → GPUs, CPUs, TPUs

Fasten your belts, and let’s start!

Definition of Variables

In this step, we will briefly cover what Variables are and understand the difference between plain Tensor objects and Variable objects.

A Brief Introduction

A TensorFlow Variable is the preferred object type representing a shared and persistent state that you can manipulate with any operation, including TensorFlow models. Manipulation refers to any value or parameter update. This characteristic is the most distinguishing feature of Variables compared to tf.Tensor objects. TensorFlow Variables are recorded as tf.Variable objects. Let’s make a brief comparison between tf.Tensor and tf.Variable objects to understand their similarities and differences.

 Figure 2. Variable Values can be Updated (Figure by Author)

Comparison with Tensors

So, the most important difference between Variables and Tensors is mutability. The values in a Variable object can be updated (e.g., with the assign() function) as opposed to Tensors.

“The values of tensor objects cannot be updated, and you can only create a new Tensor object with the new values.

Variable objects are mainly used to store model parameters, and since these values are constantly updated during training, using Variables, instead of Tensors, is a necessity rather than a choice.

The shape of a Variable object can be updated with the reshape() instance function just like the shape of a Tensor object. Since Variable objects are built on top of Tensor objects, they have common attributes such as .shape and .dtype. But, Variables also have unique attributes such as .trainable,.device, and .name attributes that the Tensors do not have.

 Figure 3. A Tensorflow Variable is actually a wrapper around a TensorFlow Tensor with additional features (Figure by Author)

Let’s see how we can create tf.Variable objects!

Creation of Variables

We can instantiate (i.e., create) tf.Variableobjects with the tf.Variable() function. The tf.Variable() function accepts different data types as parameter such as integers, floats, strings, lists, and tf.Constant objects.

Before showing different Variable object examples with these different data types, I want you to start a new Google Colab notebook and import TensorFlow library with the following code:

Now, we can start creating tf.Variable objects.

1 — We can pass a tf.constant() object as the initial_value:

2 — We can pass a single integer as the initial_value:

3 — We can pass a list of integers or floats as the initial_value:

4— We can pass a single string as the initial_value:

5— We can pass a list of strings as the initial_value:

As you can see, there are several data types that th etf.Variable() function accepts as the initial_value argument. Now let’s take a look at the characteristics and features of variables.

Qualifications of Variables

Every Variable must have some properties such as value, name, uniform data type, shape, rank, size, and more. In this section, we will see what these properties are and how we can view these properties in a Colab notebook.


Every Variable must specify an initial_value. Otherwise, TensorFlow raises an error and says that Value Error: initial_value must be specified. Therefore, make sure that you pass on an initial_valueargument when creating Variable objects. To be able to view a Variable’s values, we can use the .value() function as well as the .numpy() function. See the example below:

The values stored in the variables:
tf.Tensor( [[1. 2.]  
            [1. 2.]], shape=(2, 2), dtype=float32)
The values stored in the variables:
[[1. 2.]
[1. 2.]]


Name is a Variable attribute which helps developers to track the updates on a particular variable. You can pass a name argument while creating the Variable object. If you don’t specify a name, TensorFlow assigns a default name, as shown below:

The name of the variable:  Variable:0


Each Variable must have a uniform data type that it stores. Since there is a single type of data stored for every Variable, you can also view this type with the .dtype attribute. See the example below:

The selected datatype for the variable:  <dtype: 'float32'>

Shape, Rank, and Size

The shape property shows the size of each dimension in the form of a list. We can view the shape of the Variable object with the .shape attribute. Then, we can view the number of dimensions that a Variable object has with the tf.size() function. Finally, Size corresponds to the total number of elements a Variable has. We need to use the tf.size() function to count the number of elements in a Variable. See the code below for all three properties:

The shape of the variable:  (2, 2)
The number of dimensions in the variable: 2
The number of dimensions in the variable: 4

Operations with Variables

There are several basic operations you can easily conduct with math operators and TensorFlow functions. On top of what we covered in Part 2 of this tutorial series, you may also use the following math operators for Variable operations.

Basic Tensor Operations

 Figure 4. You May Benefit from Basic Math Operators (Figure by Author)
  • Addition and Subtraction: We can conduct addition and subtraction with + and  signs.
Addition by 2:
tf.Tensor( [[3. 4.]  [3. 4.]], shape=(2, 2), dtype=float32)
Substraction by 2:
tf.Tensor( [[-1.  0.]  [-1.  0.]], shape=(2, 2), dtype=float32)
  • Multiplication and Division: We can conduct multiplication and division with * and / signs.
Multiplication by 2:
tf.Tensor( [[2. 4.]  [2. 4.]], shape=(2, 2), dtype=float32)
Division by 2:
tf.Tensor( [[0.5 1. ]  [0.5 1. ]], shape=(2, 2), dtype=float32)
  • Matmul and Modulo Operations: Finally, you can also do matmul and modulo operations with @ and % signs:
Matmul operation with itself:
tf.Tensor( [[3. 6.]  [3. 6.]], shape=(2, 2), dtype=float32)
Modulo operation by 2:
tf.Tensor( [[1. 0.]  [1. 0.]], shape=(2, 2), dtype=float32)

These are elementary examples, but they can be extended into complex calculations, which creates the algorithms that we use for deep learning applications.

Note: These operators also work on regular Tensor objects.

Assignment, Indexing, Broadcasting, and Shape Manipulation


With the tf.assign() function, you may assign new values to a Variable object without creating a new object. Being able to assign new values is one of the advantages of Variables, where value reassignment is required. Here is an example of reassignment of values:

...array([[  2., 100.],
          [  1.,  10.]],...


Just as in Tensors, you may easily access particular elements using index values, as shown below:

The 1st element of the first level is: [1. 2.]
The 2nd element of the first level is: [1. 2.]
The 1st element of the second level is: 1.0
The 3rd element of the second level is: 2.0


Just as with Tensor objects, when we try to do combined operations using multiple Variable objects, the smaller Variables can stretch out automatically to fit larger Variables, just as NumPy arrays can. For example, when you attempt to multiply a scalar Variable with a 2-dimensional Variable, the scalar is stretched to multiply every 2-dimensional Variable element. See the example below:

tf.Tensor([[ 5 10]
           [15 20]], shape=(2, 2), dtype=int32)

Shape Manipulation

Just as in Tensor objects, you can reshape Variable objects as well. For the reshape operation, we can use the tf.reshape() function. Let’s use the tf.reshape() function in code:

tf.Tensor( [[1.]
            [2.]], shape=(4, 1), dtype=float32)

Hardware Selection for Variables

As you will see in the upcoming Parts, we will accelerate our model training with GPUs and TPUs. To be able to see what type of device (i.e., processor) our variable is processed with, we can use .device attribute:

The device which process the variable:   /job:localhost/replica:0/task:0/device:GPU:0

We can also set which device should process a particular calculation with the tf.device() function by passing the device name as an argument. See the example below:

The device which processes the variable a: /job:localhost/replica:0/task:0/device:CPU:0
The device which processes the variable b: /job:localhost/replica:0/task:0/device:CPU:0
The device which processes the calculation: /job:localhost/replica:0/task:0/device:GPU:0

Even though you will not have to set this manually while training a model, there might be circumstances where you have to choose a device for a particular calculation or data processing work. So, beware of this option.


We have successfully covered the basics of TensorFlow’s Variable objects.

Give yourself a pat on the back!

This should give you a lot of confidence since you are now much more informed about the main mutable Variable object type used for all kinds of operations in TensorFlow.

If this is your first post, consider starting from Part 1 of this tutorial series:

or check out Part 2: