Subscribe to Get All the Blog Posts and Colab Notebooks 

Mastering TensorFlow Tensors in 5 Easy Steps

Mastering TensorFlow Tensors in 5 Easy Steps

Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects | Deep Learning with TensorFlow 2.x

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin




Photo by Esther Jiao on Unsplash

In this post, we will dive into the details of TensorFlow Tensors. We will cover all the topics related to Tensors in Tensorflow in these five simple steps:

  • Step I: Definition of Tensors → What is a Tensor?
  • Step II: Creation of Tensors → Functions to Create Tensor Objects
  • Step III: Qualifications of Tensors → Characteristics and Features of Tensor Objects
  • Step IV: Operations with Tensors → Indexing, Basic Tensor Operations, Shape Manipulation, and Broadcasting
  • Step V: Special Types of Tensors → Special Tensor Types Other than Regular Tensors

Let’s start!

Definition of Tensors: What is a Tensor?



Figure 1. A Visualization of Rank-3 Tensors (Figure by Author)

Tensors are TensorFlow’s multi-dimensional arrays with uniform type. They are very similar to NumPy arrays, and they are immutable, which means that they cannot be altered once created. You can only create a new copy with the edits.

Let’s see how Tensors work with code example. But first, to work with TensorFlow objects, we need to import the TensorFlow library. We often use NumPy with TensorFlow, so let’s also import NumPy with the following lines:

Creation of Tensors: Creating Tensor Objects

There are several ways to create a tf.Tensor object. Let’s start with a few examples. You can create Tensor objects with several TensorFlow functions, as shown in the below examples:

tf.constant, tf.ones, tf.zeros, and tf.range are some of the functions you can use to create Tensor objects
tf.Tensor([[1 2 3 4 5]], shape=(1, 5), dtype=int32)
tf.Tensor([[1. 1. 1. 1. 1.]], shape=(1, 5), dtype=float32)
tf.Tensor([[0. 0. 0. 0. 0.]], shape=(1, 5), dtype=float32)
tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

As you can see, we created Tensor objects with the shape (1, 5) with three different functions and a fourth Tensor object with the shape (5, )using tf.range() function. Note that tf.ones and tf.zeros accepts the shape as the required argument since their element values are pre-determined.

Qualifications of Tensors: Characteristics and Features of Tensor Objects

TensorFlow Tensors are created as tf.Tensor objects, and they have several characteristic features. First of all, they have a rank based on the number of dimensions they have. Secondly, they have a shape, a list that consists of the lengths of all their dimensions. All tensors have a size, which is the total number of elements within a Tensor. Finally, their elements are all recorded in a uniform Dtype (data type). Let’s take a closer look at each of these features.

Rank System and Dimension

Tensors are categorized based on the number of dimensions they have:

  • Rank-0 (Scalar) Tensor: A tensor containing a single value and no axes (0-dimension);
  • Rank-1 Tensor: A tensor containing a list of values in a single axis (1-dimension);
  • Rank-2 Tensor: A tensor containing 2-axes (2-dimensions); and
  • Rank-N Tensor: A tensor containing N-axis (N-dimensions).
 Figure 2. Rank-1 Tensor | Rank-2 Tensor| Rank-3 Tensor (Figure by Author)

For example, we can create a Rank-3 tensor by passing a three-level nested list object to the tf.constant function. For this example, we can split the numbers into a 3-level nested list with three-element at each level:

The code to create a Rank-3 Tensor object
tf.Tensor( [[[ 0 1 2]
[ 3 4 5]]



[[ 6 7 8]
[ 9 10 11]]],
shape=(2, 2, 3), dtype=int32)

We can view the number of dimensions that our `rank_3_tensor` object currently has with the `.ndim` attribute.

The number of dimensions in our Tensor object is 3


The shape feature is another attribute that every Tensor has. It shows the size of each dimension in the form of a list. We can view the shape of the rank_3_tensor object we created with the .shape attribute, as shown below:

The shape of our Tensor object is (2, 2, 3)

As you can see, our tensor has 2 elements at the first level, 2 elements in the second level, and 3 elements in the third level.


Size is another feature that Tensors have, and it means the total number of elements a Tensor has. We cannot measure the size with an attribute of the Tensor object. Instead, we need to use tf.size() function. Finally, we will convert the output to NumPy with the instance function .numpy() to get a more readable result:

The size of our Tensor object is 12


Tensors often contain numerical data types such as floats and ints, but may contain many other data types such as complex numbers and strings.

Each Tensor object, however, must store all its elements in a single uniform data type. Therefore, we can also view the type of data selected for a particular Tensor object with the .dtype attribute, as shown below:

The data type selected for this Tensor object is <dtype: 'int32'>

Operations with Tensors


An index is a numerical representation of an item’s position in a sequence. This sequence can refer to many things: a list, a string of characters, or any arbitrary sequence of values.

TensorFlow also follows standard Python indexing rules, which is similar to list indexing or NumPy array indexing.

A few rules about indexing:

  1. Indices start at zero (0).
  2. Negative index (“-n”) value means backward counting from the end.
  3. Colons (“:”) are used for slicing: start:stop:step.
  4. Commas (“,”) are used to reach deeper levels.

Let’s create a rank_1_tensor with the following lines:

tf.Tensor([ 0 1 2 3 4 5 6 7 8 9 10 11],
shape=(12,), dtype=int32)

and test out our rules no.1, no.2, and no.3:

First element is: 0
Last element is: 11
Elements in between the 1st and the last are: [ 1 2 3 4 5 6 7 8 9 10]

Now, let’s create our rank_2_tensor object with the following code:

tf.Tensor( [[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]], shape=(2, 6), dtype=int32)

and test the 4th rule with several examples:

The first element of the first level is: [0 1 2 3 4 5]
The second element of the first level is: [ 6 7 8 9 10 11]
The first element of the second level is: 0
The third element of the second level is: 2

Now, we covered the basics of indexing, so let’s take a look at the basic operations we can conduct on Tensors.

Basic Operations with Tensors

You can easily do basic math operations on tensors such as:

  1. Addition
  2. Element-wise Multiplication
  3. Matrix Multiplication
  4. Finding the Maximum or Minimum
  5. Finding the Index of the Max Element
  6. Computing Softmax Value

Let’s see these operations in action. We will create two Tensor objects and apply these operations.

We can start with addition.

tf.Tensor( [[ 3. 7.]
[11. 15.]], shape=(2, 2), dtype=float32)

Let’s continue with the element-wise multiplication.

tf.Tensor( [[ 2. 12.]
[30. 56.]], shape=(2, 2), dtype=float32)

We can also do matrix multiplication:

tf.Tensor( [[22. 34.]
[46. 74.]], shape=(2, 2), dtype=float32)

NOTE: Matmul operations lays in the heart of deep learning algorithms. Therefore, although you will not use matmul directly, it is crucial to be aware of these operations.

Examples of other operations we listed above:

The Max value of the tensor object b is: 7.0
The index position of the Max of the tensor object b is: [1 1]
The softmax computation result of the tensor object b is: [[0.11920291 0.880797 ] [0.11920291 0.880797 ]]

Manipulating Shapes

Just as in NumPy arrays and pandas DataFrames, you can reshape Tensor objects as well.

The tf.reshape operations are very fast since the underlying data does not need to be duplicated. For the reshape operation, we can use thetf.reshape() function. Let’s use the tf.reshape function in code:

The shape of our initial Tensor object is: (1, 6)
The shape of our initial Tensor object is: (6, 1)
The shape of our initial Tensor object is: (3, 2)
The shape of our flattened Tensor object is: tf.Tensor([1 2 3 4 5 6], shape=(6,), dtype=int32)

As you can see, we can easily reshape our Tensor objects. But beware that when doing reshape operations, a developer must be reasonable. Otherwise, the Tensor might get mixed up or can even raise an error. So, look out for that 😀.


When we try to do combined operations using multiple Tensor objects, the smaller Tensors can stretch out automatically to fit larger tensors, just as NumPy arrays can. For example, when you attempt to multiply a scalar Tensor with a Rank-2 Tensor, the scalar is stretched to multiply every Rank-2 Tensor element. See the example below:

tf.Tensor( [[ 5 10]
[15 20]], shape=(2, 2), dtype=int32)

Thanks to broadcasting, you don’t have to worry about matching sizes when doing math operations on Tensors.

Special Types of Tensors

We tend to generate Tensors in a rectangular shape and store numerical values as elements. However, TensorFlow also supports irregular, or specialized, Tensor types, which are:

  1. Ragged Tensors
  2. String Tensors
  3. Sparse Tensors

Figure 3. Ragged Tensor | String Tensor| Sparse Tensor (Figure by Author)

Let’s take a closer look at what each of them is.

Ragged Tensors

Ragged tensors are tensors with different numbers of elements along the size axis, as shown in Figure X.

You can build a Ragged Tensor, as shown below:

<tf.RaggedTensor [[1, 2, 3],
[4, 5],

String Tensors

String Tensors are tensors, which stores string objects. We can build a String Tensor just as you create a regular Tensor object. But, we pass string objects as elements instead of numerical objects, as shown below:

tf.Tensor([b'With this'
b'code, I am'
b'creating a String Tensor'],
shape=(3,), dtype=string)

Sparse tensors

Finally, Sparse Tensors are rectangular Tensors for sparse data. When you have holes (i.e., Null values) in your data, Sparse Tensors are to-go objects. Creating a sparse Tensor is a bit time consuming and should be more mainstream. But, here is an example:


tf.Tensor( [[ 25 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 50 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 100]], shape=(5, 5), dtype=int32)


We have successfully covered the basics of TensorFlow’s Tensor objects.

Give yourself a pat on the back!

This should give you a lot of confidence since you are now much more informed about the building blocks of the TensorFlow framework.

Check Part 1 of this tutorial series:

Continue with Part 3 of the series:

Mastering TensorFlow “Variables” in 5 Easy Steps

Mastering TensorFlow “Variables” in 5 Easy Steps

Learn how to use TensorFlow Variables, their differences from plain Tensor objects, and when they are preferred over these Tensor objects | Deep Learning with TensorFlow 2.x

WARNING: Do not confuse this article with “Mastering TensorFlow Tensors in 5 Easy Steps”!

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin


Figure 1. Photo by Crissy Jarvis on Unsplash

In this tutorial, we will focus on TensorFlow Variables. After the tutorial, you will be able to create, update, and manage TensorFlow Variables effectively. As usual, our tutorial will deliver code examples with detailed explanations as well as conceptual explanations. We will master TensorFlow Variables in 5 easy steps:

  • Step 1: Definition of Variables →A Brief Introduction, Comparison with Tensors
  • Step 2: Creation of Variables → Instantiating tf.Variable Objects
  • Step 3: Qualifications of Variables → Characteristics and Features
  • Step 4: Operations with Variables → Basic Tensor Operations, Indexing, Shape Manipulation, and Broadcasting
  • Step 5: Hardware Selection for Variables → GPUs, CPUs, TPUs

Fasten your belts, and let’s start!

Definition of Variables

In this step, we will briefly cover what Variables are and understand the difference between plain Tensor objects and Variable objects.

A Brief Introduction

A TensorFlow Variable is the preferred object type representing a shared and persistent state that you can manipulate with any operation, including TensorFlow models. Manipulation refers to any value or parameter update. This characteristic is the most distinguishing feature of Variables compared to tf.Tensor objects. TensorFlow Variables are recorded as tf.Variable objects. Let’s make a brief comparison between tf.Tensor and tf.Variable objects to understand their similarities and differences.

 Figure 2. Variable Values can be Updated (Figure by Author)

Comparison with Tensors

So, the most important difference between Variables and Tensors is mutability. The values in a Variable object can be updated (e.g., with the assign() function) as opposed to Tensors.

“The values of tensor objects cannot be updated, and you can only create a new Tensor object with the new values.

Variable objects are mainly used to store model parameters, and since these values are constantly updated during training, using Variables, instead of Tensors, is a necessity rather than a choice.

The shape of a Variable object can be updated with the reshape() instance function just like the shape of a Tensor object. Since Variable objects are built on top of Tensor objects, they have common attributes such as .shape and .dtype. But, Variables also have unique attributes such as .trainable,.device, and .name attributes that the Tensors do not have.

 Figure 3. A Tensorflow Variable is actually a wrapper around a TensorFlow Tensor with additional features (Figure by Author)

Let’s see how we can create tf.Variable objects!

Creation of Variables

We can instantiate (i.e., create) tf.Variableobjects with the tf.Variable() function. The tf.Variable() function accepts different data types as parameter such as integers, floats, strings, lists, and tf.Constant objects.

Before showing different Variable object examples with these different data types, I want you to start a new Google Colab notebook and import TensorFlow library with the following code:

Now, we can start creating tf.Variable objects.

1 — We can pass a tf.constant() object as the initial_value:

2 — We can pass a single integer as the initial_value:

3 — We can pass a list of integers or floats as the initial_value:

4— We can pass a single string as the initial_value:

5— We can pass a list of strings as the initial_value:

As you can see, there are several data types that th etf.Variable() function accepts as the initial_value argument. Now let’s take a look at the characteristics and features of variables.

Qualifications of Variables

Every Variable must have some properties such as value, name, uniform data type, shape, rank, size, and more. In this section, we will see what these properties are and how we can view these properties in a Colab notebook.


Every Variable must specify an initial_value. Otherwise, TensorFlow raises an error and says that Value Error: initial_value must be specified. Therefore, make sure that you pass on an initial_valueargument when creating Variable objects. To be able to view a Variable’s values, we can use the .value() function as well as the .numpy() function. See the example below:

The values stored in the variables:
tf.Tensor( [[1. 2.]  
            [1. 2.]], shape=(2, 2), dtype=float32)
The values stored in the variables:
[[1. 2.]
[1. 2.]]


Name is a Variable attribute which helps developers to track the updates on a particular variable. You can pass a name argument while creating the Variable object. If you don’t specify a name, TensorFlow assigns a default name, as shown below:

The name of the variable:  Variable:0


Each Variable must have a uniform data type that it stores. Since there is a single type of data stored for every Variable, you can also view this type with the .dtype attribute. See the example below:

The selected datatype for the variable:  <dtype: 'float32'>

Shape, Rank, and Size

The shape property shows the size of each dimension in the form of a list. We can view the shape of the Variable object with the .shape attribute. Then, we can view the number of dimensions that a Variable object has with the tf.size() function. Finally, Size corresponds to the total number of elements a Variable has. We need to use the tf.size() function to count the number of elements in a Variable. See the code below for all three properties:

The shape of the variable:  (2, 2)
The number of dimensions in the variable: 2
The number of dimensions in the variable: 4

Operations with Variables

There are several basic operations you can easily conduct with math operators and TensorFlow functions. On top of what we covered in Part 2 of this tutorial series, you may also use the following math operators for Variable operations.

Basic Tensor Operations

 Figure 4. You May Benefit from Basic Math Operators (Figure by Author)
  • Addition and Subtraction: We can conduct addition and subtraction with + and  signs.
Addition by 2:
tf.Tensor( [[3. 4.]  [3. 4.]], shape=(2, 2), dtype=float32)
Substraction by 2:
tf.Tensor( [[-1.  0.]  [-1.  0.]], shape=(2, 2), dtype=float32)
  • Multiplication and Division: We can conduct multiplication and division with * and / signs.
Multiplication by 2:
tf.Tensor( [[2. 4.]  [2. 4.]], shape=(2, 2), dtype=float32)
Division by 2:
tf.Tensor( [[0.5 1. ]  [0.5 1. ]], shape=(2, 2), dtype=float32)
  • Matmul and Modulo Operations: Finally, you can also do matmul and modulo operations with @ and % signs:
Matmul operation with itself:
tf.Tensor( [[3. 6.]  [3. 6.]], shape=(2, 2), dtype=float32)
Modulo operation by 2:
tf.Tensor( [[1. 0.]  [1. 0.]], shape=(2, 2), dtype=float32)

These are elementary examples, but they can be extended into complex calculations, which creates the algorithms that we use for deep learning applications.

Note: These operators also work on regular Tensor objects.

Assignment, Indexing, Broadcasting, and Shape Manipulation


With the tf.assign() function, you may assign new values to a Variable object without creating a new object. Being able to assign new values is one of the advantages of Variables, where value reassignment is required. Here is an example of reassignment of values:

...array([[  2., 100.],
          [  1.,  10.]],...


Just as in Tensors, you may easily access particular elements using index values, as shown below:

The 1st element of the first level is: [1. 2.]
The 2nd element of the first level is: [1. 2.]
The 1st element of the second level is: 1.0
The 3rd element of the second level is: 2.0


Just as with Tensor objects, when we try to do combined operations using multiple Variable objects, the smaller Variables can stretch out automatically to fit larger Variables, just as NumPy arrays can. For example, when you attempt to multiply a scalar Variable with a 2-dimensional Variable, the scalar is stretched to multiply every 2-dimensional Variable element. See the example below:

tf.Tensor([[ 5 10]
           [15 20]], shape=(2, 2), dtype=int32)

Shape Manipulation

Just as in Tensor objects, you can reshape Variable objects as well. For the reshape operation, we can use the tf.reshape() function. Let’s use the tf.reshape() function in code:

tf.Tensor( [[1.]
            [2.]], shape=(4, 1), dtype=float32)

Hardware Selection for Variables

As you will see in the upcoming Parts, we will accelerate our model training with GPUs and TPUs. To be able to see what type of device (i.e., processor) our variable is processed with, we can use .device attribute:

The device which process the variable:   /job:localhost/replica:0/task:0/device:GPU:0

We can also set which device should process a particular calculation with the tf.device() function by passing the device name as an argument. See the example below:

The device which processes the variable a: /job:localhost/replica:0/task:0/device:CPU:0
The device which processes the variable b: /job:localhost/replica:0/task:0/device:CPU:0
The device which processes the calculation: /job:localhost/replica:0/task:0/device:GPU:0

Even though you will not have to set this manually while training a model, there might be circumstances where you have to choose a device for a particular calculation or data processing work. So, beware of this option.


We have successfully covered the basics of TensorFlow’s Variable objects.

Give yourself a pat on the back!

This should give you a lot of confidence since you are now much more informed about the main mutable Variable object type used for all kinds of operations in TensorFlow.

If this is your first post, consider starting from Part 1 of this tutorial series:

or check out Part 2:

Kaggle’s Titanic Competition in 10 Minutes | Part-III

Kaggle’s Titanic Competition in 10 Minutes | Part-III

Using Natural Language Processing (NLP), Deep Learning, and GridSearchCV in Kaggle’s Titanic Competition | Machine Learning Tutorials

Figure 1. Titanic Under Construction on Unsplash

If you follow my tutorial series on Kaggle’s Titanic Competition (Part-I and Part-II) or have already participated in the Competition, you are familiar with the whole story. If you are not familiar with it, since this is a follow-up tutorial, I strongly recommend you to check out the Competition Page or Part-I and Part-II of this tutorial series. In Part-III (Final) of the series, (i) we will use natural language processing (NLP) techniques to obtain the titles of the passengers, (ii) create an Artificial Neural Network (ANN or RegularNet) to train the model, and (iii) use Grid Search Cross-Validation to tune the ANN so that we get the best results.

Let’s start!


Part-I of the Tutorial

Throughout this tutorial series, we try to keep things simple and develop the story slowly and clearly. In Part-I of the tutorial, we learned to write a python program with less than 20 lines to enter the Kaggle’s Competition. Things were kept as simple as possible. We cleaned the non-numerical parts, took care of the null values, trained our model using the train.csv file, predicted the passenger’s survival in the test.csv file, and saved it as a CSV file for submission.

Part-II of the Tutorial

Since we did not explore the dataset properly in Part-I, we focus on data exploration in Part-II using Matplotlib and Seaborn. We impute the null values instead of dropping them by using aggregated functions, better cleaned the data, and finally generated the dummy variables from the categorical variables. Then, we use a RandomForestClassifier model instead of LogisticRegression, which also improves precision. We achieve an approximately 20% increase in precision compared to the model in Part-I.

Part-III of the Tutorial

Figure 2. A Diagram of an Artificial Neural Network with one Hidden Layer (Figure by Author)

We will now use the Name column to derive the passengers’ titles, which played a significant role in their survival chances. We will also create an Artificial Neural Network (ANN or RegularNets) with Keras to obtain better results. Finally, to tune the ANN model, we will use GridSearchCV to detect the best parameters. Finally, we will generate a new CSV file for submission.

Preparing the Dataset

Like what we have done in Part-I and Part-II, I will start cleaning data and imputing the null values. This time, we will adopt a different approach and combine the two datasets for cleaning and imputing. We already covered why we impute the null values the way we do in Part-II; therefore, we will give you the code straight away. If you feel that some operations do not make sense, you may refer to Part-II or comment below. However, since we saw in Part-II that people younger than 18 had a greater chance of survival, we should add a new feature to measure this effect.

Data Cleaning and Null Value Imputation

Deriving Passenger Titles with NLP

We will drop the unnecessary columns and generate the dummy variables from the categorical variables above. But first, we need to extract the titles from the ‘Name’ column. To understand what we are doing, we will start by running the following code to get the first 10 rows Name column values.

Name Column Values of the First 10 Rows

And here is what we get:

Figure 3. Name Column Values of the First 10 Row (Figure by Author)

The structure of the name column value is as follows:


Therefore, we need to split these String based on the dot and comma and extract the title. We can accomplish this with the following code:

Splitting the Name Values to Extract Titles

Once we run this code, we will have a Title column with titles in it. To be able to see what kind of titles do we have, we will run this:

Grouping Titles and Get the Counts

Figure 4. Unique Title Counts (Figure by Author)

It seems that we have four major groups: ‘Mr’, ‘Mrs’, ‘Miss’, ‘Master’, and others. However, before grouping all the other titles as Others, we need to take care of the French titles. We need to convert them to their corresponding English titles with the following code:

French to English Title Converter

Now, we only have officers and royal titles. It makes sense to combine them as Others. We can achieve this with the following code:

Combining All the Non-Major Titles as Others (Contains Officer and Royal Titles)

Figure 5. Final Unique Title Counts (Figure by Author)

Final Touch on Data Preparation

Now that our Titles are more manageable, we can create dummies and drop the unnecessary columns with the following code:

Final Touch on Data Preparation

Creating an Artificial Neural Network for Training

Figure 6. A Diagram of an Artificial Neural Network with Two Hidden Layers (Figure by Author)

Standardizing Our Data with Standard Scaler

To get a good result, we must scale our data by using Scikit Learn’s Standard Scaler. Standard Scaler standardizes features by removing the mean and scaling to unit variance (i.e., standardization), which is different than MinMaxScaler. The mathematical difference between Standardization and Normalizer is as follows:

Figure 7. Standardization vs. Normalization (Figure by Author)

We will choose StandardScaler() for scaling our dataset and run the following code:

Scaling Train and Test Datasets

Building the ANN Model

After standardizing our data, we can start building our artificial neural network. We will create one Input Layer (Dense), one Output Layer (Dense), and one Hidden Layer (Dense). After each layer until the Output Layer, we will apply 0.2 Dropout for regularization to fight over-fitting. Finally, we will build the model with Keras Classifier to apply GridSearchCV on this neural network. As we have 14 explanatory variables, our input_dimension must be equal to 14. Since we will make binary classification, our final output layer must output a single value for Survived or Not-Survived classification. The other units in between are “try-and-see” values, and we selected 128 neurons.

Building an ANN with Keras Classifier

Grid Search Cross-Validation

After building the ANN, we will use scikit-learn GridSearchCV to find the best parameters and tune our ANN to get the best results. We will try different optimizers, epochs, and batch_sizes with the following code.

Grid Search with Keras Classifier

After running this code and printing out the best parameters, we get the following output:

Figure 8. Best Parameters and the Accuracy

Please note that we did not activate the Cross-Validation in the GridSearchCV. If you would like to add cross-validation functionality to GridSearchCV, select a cv value inside the GridSearch (e.g., cv=5).

Fitting the Model with Best Parameters

Now that we found the best parameters, we can re-create our classifier with the best parameter values and fit our training dataset with the following code:

Fitting with the Best Parameters

Since we obtain the prediction, we may conduct the final operations to make it ready for submission. One thing to note is that our ANN gives us the probabilities of survival, which is a continuous numerical variable. However, we need a binary categorical variable. Therefore, we are also making the necessary operation with the lambda function below to convert the continuous values to binary values (0 or 1) and writing the results to a CSV file.

Creating the Submission File


Figure 9. Deep Learning vs. Older Algorithms (Figure by Author)

You have created an artificial neural network to classify the Survivals of titanic passengers. Neural Networks are proved to outperform all the other machine learning algorithms as long as there is a large volume of data. Since our dataset only consists of 1309 lines, some machine learning algorithms such as Gradient Boosting Tree or Random Forest with good tuning may outperform neural networks. However, for datasets with large volumes, this will not be the case, as you may see on the chart below:

I would say that Titanic Dataset may be on the left side of the intersection of where older algorithms outperform deep learning algorithms. However, we will still achieve an accuracy rate higher than 80%, around the natural accuracy level.

Kaggle’s Titanic Competition in 10 Minutes | Part-II

Kaggle’s Titanic Competition in 10 Minutes | Part-II

Improving Our Code to Obtain Better Results for Kaggle’s Titanic Competition with Data Analysis & Visualization and Gradient Boosting Algorithm

In Part-I of this tutorial, we developed a small python program with less than 20 lines that allowed us to enter the first Kaggle competition.

However, this model did not perform very well since we did not make good data exploration and preparation to understand the data and structure the model better. In Part-II of the tutorial, we will explore the dataset using Seaborn and Matplotlib. Besides, new concepts will be introduced and applied for a better performing model. Finally, we will increase our ranking in the second submission.

 Figure 1. Sea Trials of RMS Titanic on Wikipedia

Using Jupyter or Google Colab Notebook

For your programming environment, you may choose one of these two options: Jupyter Notebook and Google Colab Notebook:

Jupyter Notebook

As mentioned in Part-I, you need to install Python on your system to run any Python code. Also, you need to install libraries such as Numpy, Pandas, Matplotlib, Seaborn. Also, you need an IDE (text editor) to write your code. You may use your choice of IDE, of course. However, I strongly recommend installing Jupyter Notebook with Anaconda Distribution. Jupyter Notebook utilizes iPython, which provides an interactive shell, which provides a lot of convenience for testing your code. So, you should definitely check it if you are not already using it.

Google Colab Notebook

Google Colab is built on top of the Jupyter Notebook and gives you cloud computing capabilities. Instead of completing all the steps above, you can create a Google Colab notebook, which comes with the libraries pre-installed. So, it is much more streamlined. I recommend Google Colab over Jupyter, but in the end, it is up to you.

Exploring Our Data

To be able to create a good model, firstly, we need to explore our data. Seaborn, a statistical data visualization library, comes in pretty handy. First, let’s remember how our dataset looks like:

 Table 1. Top 5 Rows of our Training Data (Table by Author)

and this is the explanation of the variables you see above:

 Table 2. Explanation of the Variables (Table by Author)

So, now it is time to explore some of these variables’ effects on survival probability!

Our first suspicion is that there is a correlation between a person’s gender (male-female) and his/her survival probability. To be able to understand this relationship, we create a bar plot of the males & females categories against survived & not-survived labels:

Figure 2. Survival Counts of Males and Females (Figure by Author)

As you can see in the plot, females had a greater chance of survival compared to males. Therefore, gender must be an explanatory variable in our model.

Secondly, we suspect that there is a correlation between the passenger class and survival rate as well. When we plot Pclass against Survival, we obtain the plot below:

Figure 3. Survival Counts of Different Passenger Classes (Figure by Author)

Just as we suspected, passenger class has a significant influence on one’s survival chance. It seems that if someone is traveling in third class, it has a great chance of non-survival. Therefore, Pclass is definitely explanatory on survival probability.

Thirdly, we also suspect that the number of siblings aboard (SibSp) and the number of parents aboard (Parch) are also significant in explaining the survival chance. Therefore, we need to plot SibSp and Parch variables against Survival, and we obtain this:

Figure 4. Survival Counts Based on Siblings and Parents on Board (Figure by Author)

So, we reach this conclusion: As the number of siblings on board or number of parents on board increases, the chances of survival increase. In other words, people traveling with their families had a higher chance of survival.

Another potential explanatory variable (feature) of our model is the Embarked variable. When we plot Embarked against the Survival, we obtain this outcome:

 Figure 5. Survival Counts Based on the Port of Embarkation (Figure by Author)

It is clearly visible that people who embarked on Southampton Port were less fortunate compared to the others. Therefore, we will also include this variable in our model.

So far, we checked 5 categorical variables (Sex, Plclass, SibSp, Parch, Embarked), and it seems that they all played a role in a person’s survival chance.

Now it is time to work on our numerical variables Fare and Age. First of all, we would like to see the effect of Age on Survival chance. Therefore, we plot the Age variable (seaborn.distplot):

Figure 6. Survivals Plotted Against Age (Figure by Author)

We can see that the survival rate is higher for children below 18, while for people above 18 and below 35, this rate is low. Age plays a role in Survival.

Finally, we need to see whether the Fare helps explain the Survival probability. Therefore, we plot the Fare variable (seaborn.distplot):

 Figure 7. Survivals Plotted Against Fare (Figure by Author)

In general, we can see that as the Fare paid by the passenger increases, the chance of survival increases, as we expected.

We will ignore three columns: Name, Cabin, Ticket since we need to use more advanced techniques to include these variables in our model. To give an idea of how to extract features from these variables: You can tokenize the passenger’s Names and derive their titles. Apart from titles like Mr. and Mrs., you will find other titles such as Master or Lady, etc. Surely, this played a role in who to save during that night. Therefore, you can take advantage of the given Name column as well as Cabin and Ticket columns.

Checking the Data for Null Values

Null values are our enemies! In the Titanic dataset, we have some missing values. First of all, we will combine the two datasets after dropping the training dataset’s Survived column.

We need to get information about the null values! There are two ways to accomplish this: .info() function and heatmaps (way cooler!). To be able to detect the nulls, we can use seaborn’s heatmap with the following code:

Here is the outcome. Yellow lines are the missing values.

 Figure 8. Heatmap of the Null Values (Figure by Author)

There are a lot of missing Age and Cabin values. Two values are missing in the Embarked column while one is missing in the Fare column. Let’s take care of these first. Alternatively, we can use the .info() function to receive the same information in text form:


 Figure 9. Null Value Information on Combined Titanic Data (Figure by Author)

Reading the Datasets

We will not get into the details of the dataset since it was covered in Part-I. Using the code below, we can import Pandas & Numpy libraries and read the train & test CSV files.

As we know from the above, we have null values in both train and test sets. We need to impute these null values and prepare the datasets for the model fitting and prediction separately.

Imputing Null Values

There are two main approaches to solve the missing values problem in datasets: drop or fill. Drop is the easy and naive way out; although, sometimes it might actually perform better. In our case, we will fill them unless we have decided to drop a whole column altogether.

The initial look of our dataset is as follows:

 Table 3. Initial Look of the Train Dataset (Table by Author)

We will make several imputation and transformations to get a fully numerical and clean dataset to be able to fit the machine learning model with the following code (it also contain imputation):

Python Code to Clean Train Dataset

After running this code on the train dataset, we get this:

 Table 4. Clean Version of the Train Dataset (Table by Author)

There are no null values, no strings, or categories that would get in our way. Now, we can split the data into two, Features (X or explanatory variables) and Label (Y or response variable), and then we can use the sklearn’s train_test_split() function to make the train test splits inside the train dataset.

Note: We have another dataset called test. This isn’t very clear due to the naming made by Kaggle. We are training and testing our model using the train dataset by splitting it into X_train, X_test, y_train, y_test DataFrames, and then applying the trained model on our test dataset to generate a predictions file.

Creating a Gradient Boosting Model and Train

 Figure 10. A Visualization of Gradient Boosting Algorithm (Figure by Author)

In Part-I, we used a basic Decision Tree model as our machine learning algorithm. Another well-known machine learning algorithm is Gradient Boosting Classifier, and since it usually outperforms Decision Tree, we will use Gradient Boosting Classifier in this tutorial. The code shared below allows us to import the Gradient Boosting Classifier algorithm, create a model based on it, fit and train the model using X_train and y_train DataFrames, and finally make predictions on X_test.

Now, we have the predictions, and we also know the answers since X_test is split from the train dataframe. To be able to measure our success, we can use the confusion matrix and classification report. You can achieve this by running the code below:

And this is the output:


Figure 11. Confusion Matrix and Classification Report on Our Results (Figure by Author)

We obtain about 82% accuracy, which may be considered pretty good, although there is still room for improvement.

Create the Prediction File for the Kaggle Competition

Now, we have a trained and working model that we can use to predict the passenger’s survival probabilities in the test.csv file.

First, we will clean and prepare the data with the following code (quite similar to how we clean the training dataset). Just note that we save PassengerId columns as a separate dataframe before removing it under the name ‘ids’.

Finally, we can predict the Survival values of the test dataframe and write to a CSV file as required with the following code.

There you have a new and better model for Kaggle competition. We made several improvements in our code, which increased the accuracy by around 15–20%, which is a good improvement. As I mentioned above, there is still some room for improvement, and the accuracy can increase to around 85–86%. However, the scoreboard scores are not very reliable, in my opinion, since many people used dishonest techniques to increase their ranking.

Part III of the This Mini-Series

In Part III, we will use more advanced techniques such as Natural Language Processing (NLP), Deep Learning, and GridSearchCV to increase our accuracy in Kaggle’s Titanic Competition.

Since you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

Kaggle’s Titanic Competition in 10 Minutes | Part-I

Kaggle’s Titanic Competition in 10 Minutes | Part-I

Complete Your First Kaggle Competition in Less Than 20 Lines of Code with Decision Tree Classifier | Machine Learning Tutorials

Since you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

  Photo by Markus Spiske on Unsplash

If you are interested in machine learning, you have probably heard of Kaggle. Kaggle is a platform where you can learn a lot about machine learning with Python and R, do data science projects, and (this is the most fun part) join machine learning competitions. Competitions are changed and updated over time. Currently, “Titanic: Machine Learning from Disaster” is “the beginner’s competition” on the platform. In this post, we will create a ready-to-upload submission file with less than 20 lines of Python code. To be able to this, we will use Pandas and Scikit-Learn libraries.

Titanic RMS and the Infamous Accident

RMS Titanic was the largest ship afloat when it entered service, and it sank after colliding with an iceberg during its first voyage to the United States on 15 April 1912. There were 2,224 passengers and crew aboard during the voyage, and unfortunately, 1,502 of them died. It was one of the deadliest commercial peacetime maritime disasters in the 20th century.

 Figure 1. A Greyscale Photo of Titanic RMS on Wikipedia

One of the main reasons for such a high number of casualties was the lack of sufficient lifeboats for the passengers and the crew. Although luck played a part in surviving the accident, some people such as women, children, and the upper-class passengers were more likely to survive than the rest. We will calculate this likelihood and effect of having particular features on the likelihood of surviving. And we will accomplish this in less than 20 lines of code and have a file ready for submission. … Let’s Get Started!

Download the Data

The Titanic dataset is an open dataset where you can reach from many different repositories and GitHub accounts. However, downloading from Kaggle will definitely be the best choice as the other sources may have slightly different versions and may not offer separate train and test files. So, please visit this link to download the datasets (Train.csv and Test.csv) to get started.

Normally our Train.csv file looks like this in Excel:

Table 1. Train Dataset in CSV Format

After converting it to the table in Excel (Data->Text to Columns), we get this view:

 Table 2. Train Dataset after Text to Column Operation

Way nicer, right! Now, we can clearly see that we have 12 variables. While the “Survived” variable represents whether a particular passenger survived the accident, the rest is the essential information about this passenger. Here is a brief explanation of the variables:

 Table 3. The Information on the Train Dataset Features

Load and Process The Training Data

 Figure 2. Photo by UX Indonesia on Unsplash

I assume that you have your Python environment installed. However, if you don’t have Python on your computer, you may refer to this link for Windows and this link for macOS. After making sure that you have Python installed on your system, open your favorite IDE, and start coding!

Note that using a Google Colab Notebook is another option, which does not require local Python3 installation. To have access to the Google Colab Notebook with the full code, consider signing up to the Newsletter using the slider below.

First, we will load the training data for cleaning and getting it ready for training our model. We will (i) load the data, (ii) delete the rows with empty values, (iii) select the “Survival” column as my response variable, (iv) drop the for-now irrelevant explanatory variables, (v) convert categorical variables to dummy variables, and we will accomplish all this with 7 lines of code:

Create the Model and Train

To uncover the relationship between the Survival variable and other variables (or features if you will), you need to select a statistical machine learning model and train your model with the processed data.

 Figure 4. A Simplified Decision Tree Schema for Titanic Case (Figure by Author)

Scikit-learn provides several algorithms for this. We will select the DecisionTreeClassifier, which is a basic but powerful algorithm for machine learning. And get this: We will only need 3 lines of code to reveal the hidden relationship between Survival (denoted as y) and the selected explanatory variables (denoted as X)

Make Predictions and Save Your Results

We may prepare our testing data for the prediction phase after revealing the hidden relationship between Survival and the selected explanatory variables. Test.csv file is slightly different than the Train.csv file: It does not contain the “Survival” column. This makes sense because if we would know all the answers, we could have just faked our algorithm and submit the correct answers after writing by hand (wait! some people somehow have already done that?). Anyway, our testing data needs almost the same kind of cleaning, massaging, prepping, and preprocessing for the prediction phase. We will accomplish this with 5 lines of code:

Now our test data is clean and prepared for prediction. Finally, make the predictions for the given test file and save it to memory:

So easy, right! Before saving these predictions, we need to obtain proper structure so that Kaggle can automatically score our predictions. Remember, we saved the PassengerId column to the memory as a separate dataset (DataFrame, if you will)? Now we will assign (or attach) the predictions dataset to PassengerIds (note that they are both single-column datasets). Finally, we will get the data from memory and save it in CSV (comma separated values) format required by Kaggle.

Figure 5. Photo by Pietro Mattia on Unsplash

Now you can visit Kaggle’s Titanic competition page, and after login, you can upload your submission file.

Will You Make It to the Top?

Definitely not! We tried to implement a simple machine learning algorithm enabling you to enter a Kaggle competition. As you improve this basic code, you will be able to rank better in the following submissions.

Fast Neural Style Transfer in 5 Minutes with TensorFlow Hub & Magenta

Fast Neural Style Transfer in 5 Minutes with TensorFlow Hub & Magenta

Transferring van Gogh’s Unique Style to Photos with Magenta’s Arbitrary Image Stylization Network and Deep Learning

Before we start the tutorial: If you are reading this article, we probably share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin

 Figure 1. A Neural Style Transfer Example made with Arbitrary Image Stylization Network

I am sure you have come across to deep learning projects on transferring styles of famous painters to new photos. Well, I have been thinking about working on a similar project, but I realized that you can make neural style transfer within minutes, like the one in Figure 1. I will show you how in a second. But, let’s cover some basics first:

Neural Style Transfer (NST)

Neural style transfer is a method to blend two images and create a new image from a content image by copying the style of another image, called style image. This newly created image is often referred to as the stylized image.

History of NST

Image stylization is a two-decade-old problem in the field of non-photorealistic rendering. Non-photorealistic rendering is the opposite of photorealism, which is the study of reproducing an image as realistically as possible. The output of a neural style transfer model is an image that looks similar to the content image but in painting form in the style of the style image.

 Figure 2. Original Work of Leon Gatys on CV-Foundation

Neural style transfer (NST) was first published in the paper “A Neural Algorithm of Artistic Style” by Gatys et al., originally released in 2015. The novelty of the NST method was the use of deep learning to separate the representation of the content of an image from its style of depiction. To achieve this, Gatys et al. used VGG-19 architecture, which was pre-trained on the ImageNet dataset. Even though we can build a custom model following the same methodology, for this tutorial, we will benefit from the models provided in TensorFlow Hub.

Image Analogy

Before the introduction of NST, the most prominent solution to image stylization was the image analogy method. Image Analogy is a method of creating a non-photorealistic rendering filter automatically from training data. In this process, the transformation between photos (A) and non-photorealistic copies (A’) are learned. After this learning process, the model can produce a non-photorealistic copy (B’) from another photo (B). However, NST methods usually outperform image analogy due to the difficulty of finding training data for the image analogy models. Therefore, we can talk about the superiority of NST over image analogy in real-world applications, and that’s why we will focus on the application of an NST model.

 Figure 3. Photo by Jonathan Cosens on Unsplash

Is it Art?

Well, once we build the model, you will see that creating non-photorealistic images with Neural Style Transfer is a very easy task. You can create a lot of samples by blending beautiful photos with the paintings of talented artists. There has been a discussion about whether these outputs are regarded as art because of the little work the creator needs to add to the end product. Feel free to build the model, generate your samples, and share your thoughts in the comments section.

Now that you know the basics of Neural Style Transfer, we can move on to TensorFlow Hub, the repository that we use for our NST work.

TensorFlow Hub

TensorFlow Hub is a collection of trained machine learning models that you can use with ease. TensorFlow’s official description for the Hub is as follows:

TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere. Reuse trained models like BERT and Faster R-CNN with just a few lines of code.

Apart from pre-trained models such as BERT or Faster R-CNN, there are a good amount of pre-trained models. The one we will use is Magenta’s Arbitrary Image Stylization network. Let’s take a look at what Magenta is.

Magenta and Arbitrary Image Stylization

What is Magenta?

 Figure 4. Magenta Logo on Magenta

Magenta is an open-source research project, backed by Google, which aims to provide machine learning solutions to musicians and artists. Magenta has support in both Python and Javascript. Using Magenta, you can create songs, paintings, sounds, and more. For this tutorial, we will use a network trained and maintained by the Magenta team for Arbitrary Image Stylization.

Arbitrary Image Stylization

After observing that the original work for NST proposes a slow optimization for style transfer, the Magenta team developed a fast artistic style transfer method, which can work in real-time. Even though the customizability of the model is limited, it is satisfactory enough to perform a non-photorealistic rendering work with NST. Arbitrary Image Stylization under TensorFlow Hub is a module that can perform fast artistic style transfer that may work on arbitrary painting styles.

By now, you already know what Neural Style Transfer is. You also know that we will benefit from the Arbitrary Image Stylization module developed by the Magenta team, which is maintained in TensorFlow Hub.

Now it is time to code!

Get the Image Paths

 Figure 5. Photo by Paul Hanaoka on Unsplash

We will start by selecting two image files. I will directly load these image files from URLs. You are free to choose any photo you want. Just change the filename and URL in the code below. The content image I selected for this tutorial is the photo of a cat staring at the camera, as you can see in Figure 5.

 Figure 6. Bedroom in Arles by Vincent van Gogh

I would like to transfer the style of van Gogh. So, I chose one of his famous paintings: Bedroom in Arles, which he painted in 1889 while staying in Arles, Bouches-du-Rhône, France. Again, you are free to choose any painting of any artist you want. You can even use your own drawings.

The below code sets the path to get the image files shown in Figure 5 and Figure 6.


Custom Function for Image Scaling

  One thing I noticed that, even though we are very limited with model customization, by rescaling the images, we can change the style transferred to the photo. In fact, I found out that the smaller the images, the better the model transfers the style. Just play with the max_dim parameter if you would like to experiment. Just note that a larger max_dim means, it will take slightly longer to generate the stylized image.  


We will call the img_scaler function below, inside the load_img function.

Custom Function for Preprocessing the Image

Now that we set our image paths to load and img_scaler function to scale the loaded image, we can actually load our image files with the custom function below.

Every line in the Gist below is explained with comments. Please read carefully.


    Now our custom image loading function, load_img, is also created. All we have to do is to call it.  

Load the Content and Style Images

  For content image and style image, we need to call the load_img function once and the result will be a 4-dimensional Tensor, which is what will be required by our model below. The below lines is for this operation.  


Now that we successfully loaded our images, we can plot them with matplotlib, as shown below:


and here is the output:


Figure 7. Content Image on the Left (Photo by Paul Hanaoka on Unsplash) | Style Image on the Right (Bedroom in Arles by Vincent van Gogh)

You are not gonna believe this, but the difficult part is over. Now we can create our network and pass these image Tensors as arguments for NST operation.

Load the Arbitrary Image Stylization Network

We need to import the tensorflow_hub library so that we can use the modules containing the pre-trained models. After importing tensorflow_hub, we can use the load function to load the Arbitrary Image Stylization module as shown below. Finally, as shown in the documentation, we can pass the content and style images as arguments in tf.constant object format. The module returns our stylized image in an array format.

All we have to do is to use this array and plot it with matplotlib. The below lines create a plot free from all the axis and large enough for you to review the image.

… And here is our stylized image:


Figure 8. Paul Hanaoka’s Photo after Neural Style Transfer

Figure 9 summarizes what we have done in this tutorial:

  Figure 9. A Neural Style Transfer Example made with Arbitrary Image Stylization Network


As you can see, with a minimal amount of code (we did not even train a model), we did a pretty good Neural Style Transfer on a random image we took from Unsplash using a painting from Vincent van Gogh. Try different photos and paintings to discover the capabilities of the Arbitrary Image Stylization network. Also, play around with max_dim size, you will see that the style transfer changes to a great extent.