Supervised machine learning problems can be broadly divided into 2: Regression problems and classification problems.
In the previous tutorials, we have examined how to build a linear regression model with Tensorflow and Keras. In this tutorial, we shall be turning our attention to classification problems. Classification problems take a large chunk of machine learning problems. Thus, It is critically important to understand how to build classifiers using machine learning algorithms or deep learning techniques.Â
Later in this tutorial, we will build a linear classifier using Tensorflow Keras. We’d begin by brushing up on all the theoretical concepts of linear classifiers before going ahead to build one.
By the end of the tutorial, you’d discover:
- What is a Linear Classifier?
- Types of Classification Problems
- The Workings of a Binary Classifier
- How the Performance of a Binary Classifier is Measured
- Exploratory Data Analysis (EDA)
- Checking for imbalanced dataset
- Checking for Correlation
- Data Preprocessing
- Building a Single Layer Perceptron for binary classification
- Building a Multilayer Perceptronfor binary classification
What is a Linear Classifier?
To answer this, we need to first understand what a classifier is. A classifier is a model that predicts the class of an object, given its properties. For instance, if my model determines whether an object is a cat or a dog, that is a classifier. In classification problems, the labels, which are called classes, are discrete rather than continuous numbers in regression problems.
Basically, a classifier splits the observations into its class. But while this splitting can be done using a straight hyperplane, some datasets may contain class boundaries that cannot be split by a straight hyperplane. A model in which the classification cannot be done with a hyperplane is called a nonlinear classifier. A linear classifier on the other hand is a model that can capture the class boundaries using straight lines or hyperplanes.
Types of Classification Problems
Classification problems can also be divided into three based on the label classes – binary classification, multiclass classification, and multilabel classification
- Binary classification problem: This is a classification problem where the label contains only two classes. For instance, a model that predicts whether an individual has COVID-19 or not. Or a model that determines whether mail is spam or not.
- Multiclass classification problem: In this type of classification problem, the label contains more than 2 classes. For example, the popular iris dataset contains 3 classes (iris setosa, iris virginica, and iris Versicolor). Such a classification problem is called a multiclass classification problem.
- Multilabel classification problems: For this kind of classification problem, each label class would have more than a class. In photo recognition problems, there may be more than one object in the picture, maybe a dog, and a house. Therefore, the model would predict more than one class for this photo. This is a typical multilabel classification problem.
In this tutorial, we shall build a binary classifier. Let’s understand how it works
The Workings of a Binary Classifier
In supervised learning, the dataset comprises independent variables (called features) and dependent features (called labels). For linear regression problems which we treated in the last tutorial, the variables are continuous numbers (any real number). In such cases, the model attempts to predict the exact number and checks its success by determining how close it is to the correct number. Metrics such as root mean square error, r-squared error, or mean absolute error are common metrics to check how well the model has performed.
For binary classification problems, the labels are two discrete numbers, 1(yes) or 0 (no). The classifier predicts the probability of the occurrence of each class. It then returns the class with the highest probability. Logistic regression is typically used to compute the probability of each class in a binary classification problem. But how does the logistic regression work?
A Logistic Function
Logistic regression makes use of the logistic function (called a sigmoid function) to return a class output. The function is an S-curve that receives any continuous number and maps it within the range of 0 to 1 (although not exactly equal to 0 or 1).
The function is given as
Where x is the continuous number that would be transformed into a number between 0 and 1. The graph is typically in this form
Where x is the continuous number that would be transformed into a number between 0 and 1. The graph is typically in this form
Source: TowardsDataScience
The Logistic Regression Equation
In logistic regression, the independent variables (x) alongside some assigned weights (b) are used to predict the binary outputs (y). The linear regression equation is given below.
Where b0 is the bias, b1 is the weight of each independent variable (x). The weight shows how the independent variables (x) are correlated with the dependent variable (y). Hence, a positive correlation causes an increase in the probability of the positive while a negative correlation does otherwise.
Once the output of the equation is found, the logistic function converts them into probabilities.
How the Performance of a Binary Classifier is Measured
You’d need to check how well your binary classifier is performing having built it. There are a couple of metrics to check your classifier performance. Let’s talk about the most popular ones.
- Accuracy
Accuracy is perhaps the simplest and commonest metrics. It is simply the ratio of the total number of correct predictions to the total number of predictions made. For example, in a dataset of 1000 samples with labels indicating whether a mail is a spam or ham. If the model makes 850 correct predictions, the accuracy is simply 850 / 1000 which is 85%.
Making use of accuracy has some shortcomings, however. In an unbalanced dataset using accuracy can be misleading. An unbalanced dataset is one where the occurrence of one class far outweighs the occurrence of the other. In the earlier instance I gave, imagine the labels have 850 labels that belong to ham and 150 that belong to spam. This is an example of an imbalanced dataset since 850 outweighs 150 by a large margin.
If we build a dummy model that predicts that all observations are a ham without checking the independent features. The model would still have an accuracy of 85%. This is intrinsically not a good
metric to use for such data. Furthermore, accuracy does not take into account the probability of each class. This may also be a bottleneck when you wish to finetune the outputs of the model for better results.
- Confusion Matrix
The confusion matrix is another popular metric for binary classification problems. The matrix is divided into four quadrants: true positive (TP), false positive (FP), true negative (TN), false negative (FN). Let’s see what the confusion matrix looks like then explain what these terms mean.
Source: Packt
As seen above, the row indicates the actual value and the column indicates the predicted values. The 4 quadrants are the interception of the actual and predicted values.
TN: This is when the model predicts that observation is NOT a class and is actually correct. Say, the model predicts that observation is not spam, and is correctly so.
FN: This is when the model predicts that observation is NOT a class but is incorrect. Say, the model predicts that observation is not spam, but it is actually spam.
FP: This is when the model predicts that an observation is a class but is incorrect. Say, the model predicts that observation is spam, but it is actually not spam.
TP: This is when the model predicts that an observation is a class and it actually is correct. Say, the model predicts that observation is spam and is correctly so.
The confusion matrix adds a lot of flexibility to the performance measure and birth concepts such as precision and recall.
- Precision: The precision indicates how accurate the positive class is. Mathematically, the precision is given by
If the model correctly predicts all positive classes, the precision will be equal to 1. This is not a very good metric, especially for an unbalanced dataset with a high positive class, since the precision neglects the negative classes. It can however be useful when we place grave importance on the positive class. An example would be the ham(P)/spam(N) classifier. In this problem, the major concern is to correctly predict that spam is spam (TP) while not predicting a ham as spam. Predicting a ham when it is actually spam (FN) may not have serious consequences but predicting a ham as spam (FP) can be very expensive. You have a high precision when the TP is high and the FP is low.
Precision is typically combined with another metric called the recall.
- Recall: Recall is basically concerned about the rate of positive classes that was predicted corrected. It is also called the sensitivity of true positive rate. Mathematically, the recall is given by
The recall is particularly important in situations where the true positive is critically important such as a cancer classifier. In such a situation, predicting that a patient has cancer when he does not have cancer may not be severe. However, predicting that a patient does not have cancer when they actually do (FN), can be very grave. In situations such as this, it is critically important to check the recall of your model. This is because to get a high recall, the FN must be, low.
- F1 score
F1 score takes both the recall and precision into cognizance. Mathematically, the f1 score is given by
If you are looking for a balance between recall and precision, you should go for the F1 score.
Building a Binary Classifier with Keras on Tensorflow
IN this section, we will go-ahead to build a binary classifier using Keras. The dataset used is the dataset from the National Institute of Diabetes and Digestive and Kidney Diseases. The dataset shows whether or not a patient has diabetes based on some diagnostic measurements such as age, glucose level, blood pressure, insulin level, and so on. We will begin by importing the data using the pandas read_csv() method. We will then check out how the data frame looks like by printing the first five rows.
#import necessary libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout #read the dataset file df = pd.read_csv('diabetes.csv') #print the first five rows of the dataframe print(df.head())
Output:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \ 0 6 148 72 35 0 33.6 1 1 85 66 29 0 26.6 2 8 183 64 0 0 23.3 3 1 89 66 23 94 28.1 4 0 137 40 35 168 43.1 DiabetesPedigreeFunction Age Outcome 0 0.627 50 1 1 0.351 31 0 2 0.672 32 1 3 0.167 21 0 4 2.288 33 1
Exploratory Data Analysis (EDA)
It is important to know the numbers of rows in your dataset. That way, you’d be able to determine how large or small your data is.
df.shape
Output:
(768, 9)
So as shown above, the dataset has 768 rows with 9 columns (including the target column). This is thus a fairly small dataset. Let’s get some more information about each column.
#check the datatype of each column df.info()
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 768 entries, 0 to 767 Data columns (total 9 columns): Pregnancies 768 non-null int64 Glucose 768 non-null int64 BloodPressure 768 non-null int64 SkinThickness 768 non-null int64 Insulin 768 non-null int64 BMI 768 non-null float64 DiabetesPedigreeFunction 768 non-null float64 Age 768 non-null int64 Outcome 768 non-null int64 dtypes: float64(2), int64(7) memory usage: 54.1 KB
All columns are of int data type except for BMI and Diabetes Pedigree Function that are of data type float64. The info() method also revealed that all columns contain non-null values. We can however confirm this by using the isnull() method.
#check for null values df.isnull().sum()
Output:
Pregnancies 0 Glucose 0 BloodPressure 0 SkinThickness 0 Insulin 0 BMI 0 DiabetesPedigreeFunction 0 Age 0 Outcome 0 dtype: int64
Next, we would use the describe method to get the statistical details for each column.
#print the statistical summary for each column df.describe()
Output:
Pregnancies Glucose BloodPressure SkinThickness Insulin \ count 768.000000 768.000000 768.000000 768.000000 768.000000 mean 3.845052 120.894531 69.105469 20.536458 79.799479 std 3.369578 31.972618 19.355807 15.952218 115.244002 min 0.000000 0.000000 0.000000 0.000000 0.000000 25% 1.000000 99.000000 62.000000 0.000000 0.000000 50% 3.000000 117.000000 72.000000 23.000000 30.500000 75% 6.000000 140.250000 80.000000 32.000000 127.250000 max 17.000000 199.000000 122.000000 99.000000 846.000000 BMI DiabetesPedigreeFunction Age Outcome count 768.000000 768.000000 768.000000 768.000000 mean 31.992578 0.471876 33.240885 0.348958 std 7.884160 0.331329 11.760232 0.476951 min 0.000000 0.078000 21.000000 0.000000 25% 27.300000 0.243750 24.000000 0.000000 50% 32.000000 0.372500 29.000000 0.000000 75% 36.600000 0.626250 41.000000 1.000000 max 67.100000 2.420000 81.000000 1.000000
Checking if the data is imbalanced
When dealing with binary classification problems, this is an important step. If an imbalanced dataset is fed into a machine learning model, the model tends to perform lowers. Let’s see whether our data is imbalanced. We use seaborn’s countplot() method. This method counts the occurrence of each class in a column and plots a simple bar graph.
#plot a bar plot that shows the number of each class labels sns.countplot(df['Outcome'])
Output:
You get insight by plotting a bivariate graph for the columns using the Pairplot method of seaborn.
#make a plot of the columns against one another sns.pairplot(df, hue='Outcome')
Output:
From the plot, you’d notice that classes are separable by using a hyperplane (or line) to split the data
Checking for Correlation
As a way of exploring your data, you should check for data correlation. This gives you insight as to what attribute after the other and the way they are connected. Harmed with this knowledge, you could drop or merge columns to reduce the dimensionality of the data. A strong correlation can also give you an idea of what to replace missing values with. Let’s check if some columns are strongly correlated. We use the corr() method and then draw a heat map.
#check the correlation of each column plt.figure(figsize=(10, 6)) sns.heatmap(df.corr(), cmap='Blues', annot=True)
Output:
Split the Data into Train and Test Data
As we gear up to build the deep learning model, we need to split the data into train and test data. Before that, the data needs to be split into its independent variable(X) and dependent variable (y). The independent variables are simply the data without the label column. Hence, we drop that column. The dependent variable on the other hand is the label column alone.
Afterward, the X and y data and split into train and test datasets. Given that the test data is 20% of the entire data. If you do not understand why the data is split into train and test. See the test data has a portion of the data that is hidden from the model during training. It is then used to measure how well the model will predict unseen data.
#split the data into features and labels X = df.drop(['Outcome'], axis=1) y = df['Outcome'] #further split the labels and features into train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
Checking for Outliers
As explained in the previous tutorial, outliers can affect how the models perform. Thus, it is pivotal to check for them and deal with them. Here, we will graph the boxplot for each column to detect the presence of outliers.
#create a copy of the dataset df1 = df.copy() # # Create a figure with 10 subplots with a width spacing of 1.5 fig, ax = plt.subplots(2,3) fig.subplots_adjust(wspace=1.5) # Create a boxplot for the continuous features box_plot1 = sns.boxplot(y=np.log(df1[df1.columns[0]]), ax=ax[0][0]) box_plot2 = sns.boxplot(y=np.log(df1[df1.columns[1]]), ax=ax[0][1]) box_plot3 = sns.boxplot(y=np.log(df1[df1.columns[2]]), ax=ax[0][2]) box_plot6 = sns.boxplot(y=np.log(df1[df1.columns[5]]), ax=ax[1][0]) box_plot7 = sns.boxplot(y=np.log(df1[df1.columns[6]]), ax=ax[1][1]) box_plot8 = sns.boxplot(y=np.log(df1[df1.columns[7]]), ax=ax[1][2]) ;
Output:
You’d notice that Glucose, BloodPressure, and BMI have sprinkles of outliers. The effect of these data points can be softened by standardizing the data.
Building the Neural Network (a Single Layer Perceptron)
We will begin by building a simple neural network – one fully connected hidden layer with the same number of nodes, as the independent variables (8). This is a neutral way to begin building neural networks. The ReLu activation function is used for the input layer. The outer layer is a single node that spits out the probability of the output class. Hence, a sigmoid activation function is used. This probability can be easily converted into class values.
Furthermore, the common binary_crossentropy is used as the loss function. This the preferable loss function for classification problems. Adam optimizer is used for the gradient descent optimization. Finally, accuracy, precision and recall are set as the model’s metrics.
def create_model(): '''The function creates a Perceptron using Keras''' model = Sequential() model.add(Dense(8, input_dim=len(X.columns), activation='relu')) model.add(Dense(1, activation='sigmoid')) return model estimator = create_model() estimator.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()], loss='binary_crossentropy')
The model is then trained on the train dataset, specifying the test datasets as the validation data.
#train the model history = estimator.fit(X_train, y_train, epochs=300, validation_data=(X_test, y_test))
Output:
Train on 614 samples, validate on 154 samples Epoch 1/300 614/614 [==============================] - 3s 5ms/sample - loss: 0.6690 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6787 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 2/300 614/614 [==============================] - 0s 168us/sample - loss: 0.6657 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6754 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 3/300 614/614 [==============================] - 0s 173us/sample - loss: 0.6622 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6729 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 4/300 614/614 [==============================] - 0s 163us/sample - loss: 0.6600 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6713 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 5/300 614/614 [==============================] - 0s 166us/sample - loss: 0.6572 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6689 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 6/300 614/614 [==============================] - 0s 164us/sample - loss: 0.6547 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6663 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 7/300 614/614 [==============================] - 0s 229us/sample - loss: 0.6524 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6645 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 8/300 614/614 [==============================] - 0s 214us/sample - loss: 0.6504 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6626 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 9/300 614/614 [==============================] - 0s 161us/sample - loss: 0.6483 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6607 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 10/300 614/614 [==============================] - 0s 171us/sample - loss: 0.6463 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6594 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 11/300 614/614 [==============================] - 0s 168us/sample - loss: 0.6445 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6575 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 12/300 614/614 [==============================] - 0s 163us/sample - loss: 0.6422 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6547 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00 Epoch 13/300 614/614 [==============================] - 0s 179us/sample - loss: 0.6407 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6524 - val_acc: 0.6494 - val_precision_14: 1.0000 - val_recall_14: 0.0182 Epoch 14/300 614/614 [==============================] - 0s 197us/sample - loss: 0.6382 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6510 - val_acc: 0.6494 - val_precision_14: 1.0000 - val_recall_14: 0.0182 Epoch 15/300 614/614 [==============================] - 0s 163us/sample - loss: 0.6362 - acc: 0.6564 - precision_14: 1.0000 - recall_14: 0.0094 - val_loss: 0.6487 - val_acc: 0.6494 - val_precision_14: 1.0000 - val_recall_14: 0.0182 Epoch 16/300 614/614 [==============================] - 0s 200us/sample - loss: 0.6340 - acc: 0.6564 - precision_14: 1.0000 - recall_14: 0.0094 - val_loss: 0.6464 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0182 Epoch 17/300 614/614 [==============================] - 0s 184us/sample - loss: 0.6319 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6443 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0182 Epoch 18/300 614/614 [==============================] - 0s 168us/sample - loss: 0.6301 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6426 - val_acc: 0.6494 - val_precision_14: 0.6667 - val_recall_14: 0.0364 Epoch 19/300 614/614 [==============================] - 0s 171us/sample - loss: 0.6278 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6410 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0364 Epoch 20/300 614/614 [==============================] - 0s 176us/sample - loss: 0.6257 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6390 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0364 … Epoch 281/300 614/614 [==============================] - 0s 211us/sample - loss: 0.4527 - acc: 0.7785 - precision_14: 0.7251 - recall_14: 0.5822 - val_loss: 0.4638 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 282/300 614/614 [==============================] - 0s 166us/sample - loss: 0.4520 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4647 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182 Epoch 283/300 614/614 [==============================] - 0s 169us/sample - loss: 0.4525 - acc: 0.7850 - precision_14: 0.7580 - recall_14: 0.5587 - val_loss: 0.4653 - val_acc: 0.7922 - val_precision_14: 0.7805 - val_recall_14: 0.5818 Epoch 284/300 614/614 [==============================] - 0s 197us/sample - loss: 0.4515 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4637 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 285/300 614/614 [==============================] - 0s 176us/sample - loss: 0.4526 - acc: 0.7818 - precision_14: 0.7158 - recall_14: 0.6150 - val_loss: 0.4633 - val_acc: 0.7987 - val_precision_14: 0.7609 - val_recall_14: 0.6364 Epoch 286/300 614/614 [==============================] - 0s 210us/sample - loss: 0.4524 - acc: 0.7785 - precision_14: 0.7278 - recall_14: 0.5775 - val_loss: 0.4641 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182 Epoch 287/300 614/614 [==============================] - 0s 176us/sample - loss: 0.4515 - acc: 0.7834 - precision_14: 0.7381 - recall_14: 0.5822 - val_loss: 0.4634 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 288/300 614/614 [==============================] - 0s 176us/sample - loss: 0.4516 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4635 - val_acc: 0.7987 - val_precision_14: 0.7727 - val_recall_14: 0.6182 Epoch 289/300 614/614 [==============================] - 0s 178us/sample - loss: 0.4519 - acc: 0.7818 - precision_14: 0.7453 - recall_14: 0.5634 - val_loss: 0.4651 - val_acc: 0.7987 - val_precision_14: 0.7857 - val_recall_14: 0.6000 Epoch 290/300 614/614 [==============================] - 0s 180us/sample - loss: 0.4509 - acc: 0.7834 - precision_14: 0.7381 - recall_14: 0.5822 - val_loss: 0.4630 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 291/300 614/614 [==============================] - 0s 197us/sample - loss: 0.4514 - acc: 0.7866 - precision_14: 0.7330 - recall_14: 0.6056 - val_loss: 0.4631 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 292/300 614/614 [==============================] - 0s 124us/sample - loss: 0.4522 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4642 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182 Epoch 293/300 614/614 [==============================] - 0s 134us/sample - loss: 0.4510 - acc: 0.7834 - precision_14: 0.7326 - recall_14: 0.5915 - val_loss: 0.4629 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 294/300 614/614 [==============================] - 0s 141us/sample - loss: 0.4512 - acc: 0.7866 - precision_14: 0.7330 - recall_14: 0.6056 - val_loss: 0.4630 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 295/300 614/614 [==============================] - 0s 176us/sample - loss: 0.4512 - acc: 0.7818 - precision_14: 0.7232 - recall_14: 0.6009 - val_loss: 0.4630 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 296/300 614/614 [==============================] - 0s 166us/sample - loss: 0.4510 - acc: 0.7834 - precision_14: 0.7326 - recall_14: 0.5915 - val_loss: 0.4637 - val_acc: 0.7987 - val_precision_14: 0.7727 - val_recall_14: 0.6182 Epoch 297/300 614/614 [==============================] - 0s 189us/sample - loss: 0.4509 - acc: 0.7834 - precision_14: 0.7299 - recall_14: 0.5962 - val_loss: 0.4633 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 298/300 614/614 [==============================] - 0s 139us/sample - loss: 0.4511 - acc: 0.7801 - precision_14: 0.7294 - recall_14: 0.5822 - val_loss: 0.4633 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364 Epoch 299/300 614/614 [==============================] - 0s 172us/sample - loss: 0.4533 - acc: 0.7801 - precision_14: 0.7143 - recall_14: 0.6103 - val_loss: 0.4629 - val_acc: 0.7987 - val_precision_14: 0.7609 - val_recall_14: 0.6364 Epoch 300/300 614/614 [==============================] - 0s 181us/sample - loss: 0.4509 - acc: 0.7850 - precision_14: 0.7425 - recall_14: 0.5822 - val_loss: 0.4638 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182
As seen above, at the end of the 300th epoch, the model had a loss of 45.09%, an accuracy of 78.50%, precision of 74.25%, and a recall of 58.22%. This is a pretty decent result with a very simple neural network.
Let’s visualize how the training process went.
#plot the loss and validation loss of the dataset history_df = pd.DataFrame(history.history) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.legend()
Output:
Adding Some Hidden Layers (a Multilayer Perceptron)
Let’s tweak the neural network architecture by adding more layers with some dropout and see how the performance will be affected.
This time, the first hidden layer has 16 nodes with a ReLu activation function. In the next layer, the nodes were reduced to 12, with a 20% dropout to avoid overfitting. The next layer had 3 nodes with a ReLu activation while the final layer was a single-node layer with the sigmoid activation.
The optimizer, metrics and loss function remains the same as in the last architecture.
def create_model(): '''The function creates a Perceptron using Keras''' model = Sequential() model.add(Dense(16, input_dim=len(X.columns), activation='relu')) model.add(Dense(12, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(3, activation='relu')) model.add(Dense(1, activation='sigmoid')) return model import tensorflow as tf estimator = create_model() estimator.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()], loss='binary_crossentropy')
Now, we train the model on 300 epochs as well.
#train the model history = estimator.fit(X_train, y_train, epochs=300, validation_data=(X_test, y_test))
Output:
Train on 614 samples, validate on 154 samples Epoch 1/300 614/614 [==============================] - 2s 4ms/sample - loss: 0.7018 - acc: 0.4235 - precision_17: 0.3667 - recall_17: 0.9108 - val_loss: 0.6878 - val_acc: 0.7338 - val_precision_17: 0.6591 - val_recall_17: 0.5273 Epoch 2/300 614/614 [==============================] - 0s 156us/sample - loss: 0.6924 - acc: 0.5749 - precision_17: 0.4172 - recall_17: 0.5681 - val_loss: 0.6855 - val_acc: 0.6883 - val_precision_17: 0.6842 - val_recall_17: 0.2364 Epoch 3/300 614/614 [==============================] - 0s 175us/sample - loss: 0.6883 - acc: 0.6173 - precision_17: 0.4505 - recall_17: 0.4695 - val_loss: 0.6831 - val_acc: 0.6948 - val_precision_17: 0.7500 - val_recall_17: 0.2182 Epoch 4/300 614/614 [==============================] - 0s 173us/sample - loss: 0.6842 - acc: 0.6564 - precision_17: 0.5079 - recall_17: 0.3005 - val_loss: 0.6809 - val_acc: 0.6623 - val_precision_17: 0.8000 - val_recall_17: 0.0727 Epoch 5/300 614/614 [==============================] - 0s 188us/sample - loss: 0.6799 - acc: 0.6775 - precision_17: 0.5882 - recall_17: 0.2347 - val_loss: 0.6784 - val_acc: 0.6558 - val_precision_17: 0.6667 - val_recall_17: 0.0727 Epoch 6/300 614/614 [==============================] - 0s 191us/sample - loss: 0.6771 - acc: 0.6808 - precision_17: 0.6049 - recall_17: 0.2300 - val_loss: 0.6758 - val_acc: 0.6623 - val_precision_17: 0.7143 - val_recall_17: 0.0909 Epoch 7/300 614/614 [==============================] - 0s 221us/sample - loss: 0.6752 - acc: 0.6775 - precision_17: 0.6087 - recall_17: 0.1972 - val_loss: 0.6737 - val_acc: 0.6623 - val_precision_17: 0.8000 - val_recall_17: 0.0727 Epoch 8/300 614/614 [==============================] - 0s 210us/sample - loss: 0.6731 - acc: 0.6759 - precision_17: 0.6522 - recall_17: 0.1408 - val_loss: 0.6715 - val_acc: 0.6558 - val_precision_17: 1.0000 - val_recall_17: 0.0364 Epoch 9/300 614/614 [==============================] - 0s 170us/sample - loss: 0.6699 - acc: 0.6645 - precision_17: 0.6000 - recall_17: 0.0986 - val_loss: 0.6690 - val_acc: 0.6623 - val_precision_17: 0.8000 - val_recall_17: 0.0727 Epoch 10/300 614/614 [==============================] - 0s 189us/sample - loss: 0.6675 - acc: 0.6824 - precision_17: 0.6667 - recall_17: 0.1690 - val_loss: 0.6663 - val_acc: 0.6753 - val_precision_17: 0.7778 - val_recall_17: 0.1273 Epoch 11/300 614/614 [==============================] - 0s 184us/sample - loss: 0.6643 - acc: 0.6889 - precision_17: 0.7115 - recall_17: 0.1737 - val_loss: 0.6643 - val_acc: 0.6623 - val_precision_17: 1.0000 - val_recall_17: 0.0545 Epoch 12/300 614/614 [==============================] - 0s 170us/sample - loss: 0.6628 - acc: 0.6661 - precision_17: 0.5870 - recall_17: 0.1268 - val_loss: 0.6617 - val_acc: 0.6753 - val_precision_17: 0.8571 - val_recall_17: 0.1091 Epoch 13/300 614/614 [==============================] - 0s 166us/sample - loss: 0.6595 - acc: 0.7003 - precision_17: 0.7164 - recall_17: 0.2254 - val_loss: 0.6589 - val_acc: 0.6818 - val_precision_17: 0.8750 - val_recall_17: 0.1273 Epoch 14/300 614/614 [==============================] - 0s 163us/sample - loss: 0.6553 - acc: 0.6938 - precision_17: 0.7451 - recall_17: 0.1784 - val_loss: 0.6562 - val_acc: 0.6818 - val_precision_17: 0.8750 - val_recall_17: 0.1273 Epoch 15/300 614/614 [==============================] - 0s 164us/sample - loss: 0.6560 - acc: 0.6873 - precision_17: 0.6842 - recall_17: 0.1831 - val_loss: 0.6532 - val_acc: 0.6948 - val_precision_17: 0.9000 - val_recall_17: 0.1636 Epoch 16/300 614/614 [==============================] - 0s 168us/sample - loss: 0.6521 - acc: 0.7003 - precision_17: 0.7042 - recall_17: 0.2347 - val_loss: 0.6508 - val_acc: 0.6948 - val_precision_17: 0.9000 - val_recall_17: 0.1636 Epoch 17/300 614/614 [==============================] - 0s 178us/sample - loss: 0.6506 - acc: 0.6840 - precision_17: 0.6727 - recall_17: 0.1737 - val_loss: 0.6478 - val_acc: 0.7013 - val_precision_17: 0.9091 - val_recall_17: 0.1818 Epoch 18/300 614/614 [==============================] - 0s 163us/sample - loss: 0.6425 - acc: 0.7036 - precision_17: 0.6914 - recall_17: 0.2629 - val_loss: 0.6452 - val_acc: 0.7468 - val_precision_17: 0.7667 - val_recall_17: 0.4182 Epoch 19/300 614/614 [==============================] - 0s 179us/sample - loss: 0.6465 - acc: 0.7215 - precision_17: 0.6458 - recall_17: 0.4366 - val_loss: 0.6408 - val_acc: 0.7208 - val_precision_17: 0.8333 - val_recall_17: 0.2727 Epoch 20/300 614/614 [==============================] - 0s 158us/sample - loss: 0.6365 - acc: 0.7264 - precision_17: 0.7528 - recall_17: 0.3146 - val_loss: 0.6375 - val_acc: 0.7468 - val_precision_17: 0.7500 - val_recall_17: 0.4364 … Epoch 281/300 614/614 [==============================] - 0s 197us/sample - loss: 0.4374 - acc: 0.7883 - precision_17: 0.7862 - recall_17: 0.5352 - val_loss: 0.4249 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000 Epoch 282/300 614/614 [==============================] - 0s 203us/sample - loss: 0.4210 - acc: 0.8127 - precision_17: 0.7882 - recall_17: 0.6291 - val_loss: 0.4293 - val_acc: 0.7792 - val_precision_17: 0.7442 - val_recall_17: 0.5818 Epoch 283/300 614/614 [==============================] - 0s 200us/sample - loss: 0.4402 - acc: 0.7850 - precision_17: 0.7718 - recall_17: 0.5399 - val_loss: 0.4301 - val_acc: 0.7987 - val_precision_17: 0.7857 - val_recall_17: 0.6000 Epoch 284/300 614/614 [==============================] - 0s 207us/sample - loss: 0.4354 - acc: 0.8208 - precision_17: 0.8084 - recall_17: 0.6338 - val_loss: 0.4280 - val_acc: 0.7857 - val_precision_17: 0.7391 - val_recall_17: 0.6182 Epoch 285/300 614/614 [==============================] - 0s 200us/sample - loss: 0.4444 - acc: 0.7915 - precision_17: 0.7673 - recall_17: 0.5728 - val_loss: 0.4310 - val_acc: 0.7922 - val_precision_17: 0.7805 - val_recall_17: 0.5818 Epoch 286/300 614/614 [==============================] - 0s 220us/sample - loss: 0.4380 - acc: 0.8046 - precision_17: 0.7888 - recall_17: 0.5962 - val_loss: 0.4315 - val_acc: 0.7727 - val_precision_17: 0.7381 - val_recall_17: 0.5636 Epoch 287/300 614/614 [==============================] - 0s 208us/sample - loss: 0.4297 - acc: 0.8013 - precision_17: 0.7725 - recall_17: 0.6056 - val_loss: 0.4278 - val_acc: 0.8052 - val_precision_17: 0.7660 - val_recall_17: 0.6545 Epoch 288/300 614/614 [==============================] - 0s 192us/sample - loss: 0.4266 - acc: 0.8062 - precision_17: 0.8013 - recall_17: 0.5869 - val_loss: 0.4295 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000 Epoch 289/300 614/614 [==============================] - 0s 163us/sample - loss: 0.4294 - acc: 0.8078 - precision_17: 0.7987 - recall_17: 0.5962 - val_loss: 0.4298 - val_acc: 0.7922 - val_precision_17: 0.7556 - val_recall_17: 0.6182 Epoch 290/300 614/614 [==============================] - 0s 192us/sample - loss: 0.4299 - acc: 0.8094 - precision_17: 0.7667 - recall_17: 0.6479 - val_loss: 0.4346 - val_acc: 0.7857 - val_precision_17: 0.7750 - val_recall_17: 0.5636 Epoch 291/300 614/614 [==============================] - 0s 193us/sample - loss: 0.4333 - acc: 0.8094 - precision_17: 0.7824 - recall_17: 0.6244 - val_loss: 0.4301 - val_acc: 0.7987 - val_precision_17: 0.7727 - val_recall_17: 0.6182 Epoch 292/300 614/614 [==============================] - 0s 263us/sample - loss: 0.4380 - acc: 0.8029 - precision_17: 0.7674 - recall_17: 0.6197 - val_loss: 0.4278 - val_acc: 0.7922 - val_precision_17: 0.7556 - val_recall_17: 0.6182 Epoch 293/300 614/614 [==============================] - 0s 206us/sample - loss: 0.4212 - acc: 0.8127 - precision_17: 0.7816 - recall_17: 0.6385 - val_loss: 0.4263 - val_acc: 0.7987 - val_precision_17: 0.7609 - val_recall_17: 0.6364 Epoch 294/300 614/614 [==============================] - 0s 185us/sample - loss: 0.4322 - acc: 0.8078 - precision_17: 0.7844 - recall_17: 0.6150 - val_loss: 0.4293 - val_acc: 0.7922 - val_precision_17: 0.7674 - val_recall_17: 0.6000 Epoch 295/300 614/614 [==============================] - 0s 243us/sample - loss: 0.4377 - acc: 0.8046 - precision_17: 0.7784 - recall_17: 0.6103 - val_loss: 0.4298 - val_acc: 0.7922 - val_precision_17: 0.7556 - val_recall_17: 0.6182 Epoch 296/300 614/614 [==============================] - 0s 217us/sample - loss: 0.4348 - acc: 0.7866 - precision_17: 0.7563 - recall_17: 0.5681 - val_loss: 0.4308 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000 Epoch 297/300 614/614 [==============================] - 0s 180us/sample - loss: 0.4295 - acc: 0.7997 - precision_17: 0.7557 - recall_17: 0.6244 - val_loss: 0.4287 - val_acc: 0.7857 - val_precision_17: 0.7619 - val_recall_17: 0.5818 Epoch 298/300 614/614 [==============================] - 0s 200us/sample - loss: 0.4431 - acc: 0.7915 - precision_17: 0.7607 - recall_17: 0.5822 - val_loss: 0.4296 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000 Epoch 299/300 614/614 [==============================] - 0s 167us/sample - loss: 0.4310 - acc: 0.8046 - precision_17: 0.7853 - recall_17: 0.6009 - val_loss: 0.4296 - val_acc: 0.7792 - val_precision_17: 0.7442 - val_recall_17: 0.5818 Epoch 300/300 614/614 [==============================] - 0s 178us/sample - loss: 0.4190 - acc: 0.8078 - precision_17: 0.7684 - recall_17: 0.6385 - val_loss: 0.4269 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000
This time, the model has a loss of 41.9%, an accuracy of 80.78%, precision of 76.84%, and a recall of 63.85. These results are slightly better than the previous model but are not convincing.
It goes to show that for this dataset, increasing the number of layers does not necessarily improve the performance of the model.
Finally, we can visualize the training process with the code below.
#plot the loss and validation loss of the dataset history_df = pd.DataFrame(history.history) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.legend()
Output:
Summary
In conclusion, you have discovered how to build a binary classifier using Keras. We started out by carrying out some EDA after which the data was preprocessed and then fed into the neural network. For this dataset, we found out that increasing the number of hidden layers in the neural network architecture does not have a significant effect on the performance of the model.
You can conclude that this is because the dataset was a relatively small dataset. It explains why a single perceptron was able to capture the patterns in the data with an accuracy of over 80%.
One Response