Kernel Methods with Tensorflow

In most real-life classification problems, datasets are linearly non-separable. That is to say, the classes can not be separated by a straight line. But a linear classifier built with the LinearClassifier class of Tensorflow’s estimator API attempts to learn the data with the assumption that it can be classified with a straight line. Other popular machine learning algorithms such as Support Machine Machines (SVM) also hold this assumption. While these models can produce impressive results, the reality remains that they will struggle to learn when hit by linearly non-separable data.

The question then is, how do we classify models with features that are not linearly separable? You guessed right. Kernels!

Using Kernels allows you to make data that are not linearly separable, linearly separable. In the course of this tutorial, you’ll learn how Kernels work and exactly why it is a pretty good method to use. Furthermore, we’d build a Tensorflow classifier as a base model and then a second classifier, using Kernel methods.

Specifically, these are what you’d learn by the end of this tutorial.

The Problem of Classification in lower Dimensional Space
What are Kernels and why Kernels
Type of Kernel Methods
Training a Kernel Classifier with Tensorflow.estimator
Building a Baseline Linear Classifier
Split Data into train and test data
Creating the Feature Columns
Defining the train input function and training the model
Defining the Test input function and Evaluating the model
Building the kernel classifier
Improving the Performance of the Kernel Classifier

Let’s begin.

The Problem of Classification in lower Dimensional Space

When you build a classifier, it’s job is to predict the class of an object correctly. A logistic regression model is decent for the classification problem in which the data points of the data are not intertwined. However, if the data points are interwoven, the logistic regression model will struggle to capture the classes to the fullest.

A way to solve this problem of non-linearly separable data is by increasing the dimensions of the data. In other words, a classifier can easily classify a data by increasing the dimensions from say 2 to 3. To understand this better, let’s take an example.

This is an example of a dataset that can be classified with a straight line.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#create some data and plot the graph
x = [1, 2, 3, 6, 7, 8]
y = [2, 4, 6, 8, 10, 12]
labels = [2, 2, 2, 1, 1, 1]
plt.scatter(x, y, c=labels)
#plot a line that splits the data into 2 classes
plt.plot([x for x in [11, 10, 9, 8, 7, 6, 5, 4 ,3 ,2, 1]])

Output:

What about non-linearly separable data?

#create some dataset
x = [0, 1, 2, 3, 4, 5, 6, 6, 7, 8, 9]
y = [6, 6, 5, 3, 3, 4, 4, 6, 8, 8, 9]
labels = [0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]

#plot the graph
plt.scatter(x, y, c=labels)

Output:

You will observe that a straight line cannot finely solve the classification problem. We can however overturn this situation by increasing the dimension of the data.

For the data above, let’s attempt to map the 2D data into 3D data using the

def tranformation_function(x, y):
    """This function converts the 2D data into 3D"""
    data = np.c_[(x, y)] #zips the x and y value
    #check if the data has more than 2 observations
    if len(data) > 2:
        x1 = data[:, 0] ** 2
        x2 = np.sqrt(2) * data[:, 0] * data[:, 1]
        x3 = data[:, 1] ** 2
    else:
        x1 = data[0] ** 2
        x2 = np.sqrt(2) * data[0] * data[1]
        x3 =  data[1] ** 2
    translated_data = np.array([x1, x2, x3])
    
    return translated_data

To check if this function really works, let’s print the dimension of the data before and after calling the function.

print(f'The shape of the data before transformation is {np.c_[(x, y)].shape}')
#call the transformation function on the data
data_3d = tranformation_function(x, y)
#check the dimension of the data
print(f'The shape of the data after transformation is {data_3d.shape}')

Output:

The shape of the data before transformation is (11, 2)
The shape of the data after transformation is (3, 11)

As seen, the data was transformed to 3 dimensional space. We can go ahead to graph the 3D data.

# graph the 3D data in 3D space
%matplotlib notebook
#create a subplot
fig, ax = plt.subplots()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x_1[0], x_1[1], x_1[2], c=labels)
 
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
 
plt.show()

Output:

From the 3D figure above, you’d begin to see that the data can be classified linearly by just increasing the dimension by 1.

Problem solved right? Not yet. Notice the amount of work needed to increase the dimension by just 1. Imagine we want to increase the dimension by 1000. As the data gets larger and larger, it becomes computationally intensive to increase its dimensions. Even with a fast processor, it would take a long time for your model to be trained. In some cases, it can run out of memory. This is where the Kernel method becomes useful.

What are Kernels and why Kernels

Kernels provide a way of converting non-linearly separable data to linearly separable data. Kernels do not do the transformation to a higher dimension per se. It however searches for the best function that makes the model linear separable and returns the weights, irrespective of the dimensions.

The catch is, just like in SVM, the classifier requires the inner product of the vector, which is a scalar. This implies that whether the function takes the data to the 3rd dimension, 1000th dimension, or even the 1000000th dimension, the kernel returns its inner product of the vector spaces. That’s all we need from the higher vector space. And for any number of dimensions whatsoever, the inner product returns a scalar. Kernels, therefore, helps you calculate the inner product of the higher vector space without you knowing the vector space. Since this is the case, kernels are not just more accurate, they also are efficient in their operation.

Let’s see an example. First, we convert 2D data into 3D and compute the inner product.

#create data
a = [3, 5]
b = [7, 5]

#transform the data to 3D
data_transfomed = tranformation_function(a, b)
#carry out the inner product
print(np.dot(data_transfomed[:, 0], data_transfomed[:, 1]))

Output:

2116.0

Now, we perform the second-degree polynomial kernel on the data and carry out the inner product. Note that we didn’t have to increase the dimension of the data.

#compute the polynomial kernel of the data and perform the dot operation
(np.dot(a, b)) ** 2

Output:

As seen, the inner product is the same. Kernels can thus be seen as a way of arriving at the end without caring about the means.

In a nutshell, Kernels allows you to find the optimal dimension to use and return the inner product of the dimensions as though the dimension transformation took place. This process is computationally less demanding. This is why it’s advisable to employ Kernels for non-linearly separable data.

Type of Kernel Methods

There is a myriad of kernel Methods. Some of the common ones include

Linear Kernel: This is simply the inner product of both vectors.

Where x and y are the two vectors

Polynomial kernel

Where d is given as the dimension of the polynomial

Other types of kernels include

Exponential kernel
Gaussian kernel
Laplacian kernel
Anova radial basis kernel
Hyperbolic or sigmoid kernel

Training a Kernel Classifier with Tensorflow.estimator

In Tensorflow, there is a built-in function in tf.estimator that can be used to compute more feature space. The function, called Random Fourier, is largely an approximation of the Gaussian Kernel. This class will be used in this tutorial to build the Kernel classifier.

We will start to create a baseline model using the Tensorflow LinearClassifier class. The model will simply classify whether or not a person has a credit card. Afterward, we will build and train a second model using the Gaussian Kernel in Tensorflow. Let’s begin with the baseline model.

Building a Baseline Linear Classifier

The dataset obtained from the UCI Machine Learning Repository can be downloaded here. Here’s a brief description of the data features.

ID	Customer ID
Age	Customer’s age in completed years
Experience	#years of professional experience
Income	The annual income of the customer ($000)
ZIPCode	Home Address ZIP code.
Family	Family size of the customer
CCAvg	Avg. spending on credit cards per month ($000)
Education	Education Level. 1: Undergrad; 2: Graduate; 3: Advanced/Professional
Mortgage	Value of house mortgage if any. ($000)
Personal Loan	Did this customer accept the personal loan offered in the last campaign?
Securities Account	Does the customer have a securities account with the bank?
CD Account	Does the customer have a certificate of deposit (CD) account with the bank?
Online	Does the customer use internet banking facilities?
CreditCard	Does the customer use a credit card issued by UniversalBank?

We begin by importing the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

Let’s now import the dataset using the read_csv() method of pandas.

# load the dataset
df = pd.read_csv("Bank_Personal_Loan_Modelling.csv")
#print the first 5 rows of the data

print(df.head())

Output:

ID  Age  Experience  Income  ZIP Code  Family  CCAvg  Education  Mortgage  \
0   1   25           1      49     91107       4    1.6          1         0   
1   2   45          19      34     90089       3    1.5          1         0   
2   3   39          15      11     94720       1    1.0          1         0   
3   4   35           9     100     94112       1    2.7          2         0   
4   5   35           8      45     91330       4    1.0          2         0   

   Personal Loan  Securities Account  CD Account  Online  CreditCard  
0              0                   1           0       0           0  
1              0                   1           0       0           0  
2              0                   0           0       0           0  
3              0                   0           0       0           0  
4              0                   0           0       0           1

We can check the number of observations and features of the dataset using the shape attribute of a dataframe.

#check the number of samples and features
df.shape

Output:

(5000, 14)

Next, we check if missing values exist in the data.

#check missing values in the data
df.isnull().sum()

Output:

ID                    0
Age                   0
Experience            0
Income                0
ZIP Code              0
Family                0
CCAvg                 0
Education             0
Mortgage              0
Personal Loan         0
Securities Account    0
CD Account            0
Online                0
CreditCard            0
dtype: int64

It appears that the data is clean, with no missing values at all.

Going forward, we must separate the features from the target. On this particular date, the CreditCard is the target variable while the other columns are the independent variables. However, we take out the ID column from the features. This is because every entry in the ID column is a unique number. Therefore, this column does not help the model learn any pattern. It is a dummy feature and thus, must be removed.

#split the data into targets(y) and features (X)
target = df.CreditCard 
features = df.drop(['ID', 'CreditCard'], axis=1)

Splitting Data into train and test data

Next, we will need to split the data into train and test data. The model is trained on the train data after which its performance is evaluated on the test data.

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder 
from sklearn.model_selection import train_test_split

#split the data into train and test data
X_train, X_test, y_train, y_test = train_test_split(features,
                                                   target, test_size=0.2, random_state=42)

Finally, we will need to standardize the data. This is an important step before feeding your data into the model. Since each column is at a different scale, the model will find it difficult to get trained on the divergent values. Standardization helps to shrink the dataset to values between 0 and 1. This is done using the StandardScaler class. Also, the labels were encoded using the LabelEncoder class.

instantiate the Standard Scaler and Label Encoder class
scaler = StandardScaler()  
encoder = LabelEncoder()
#encode the dependent variable (label)

target = encoder.fit_transform(target)
#standardize the independent features
X_train = scaler.fit_transform(X_train).astype(np.float32)
X_test = scaler.transform(X_test).astype(np.float32)
print(X_train.shape)
print(X_test.shape)

Note that to avoid data leakage of any sort, the scaling parameters are found with the train data only. The parameters are then used to transform the test data.

(4000, 12)
(1000, 12)

Creating the Feature Columns

Real world data can come in different forms. They could be strings, images, videos, numeric values, categorical values, etc. Tensorflow however works with Tensors alone. This implies that the data to be fed into the TensorFlow models must be converted to tensors

There are various approaches to convert a column to a feature column depending on the type of data the column holds. Since all the columns in our data are numeric values, we can use the real_valued_column method to convert all the columns to feature columns.

#create a feature column
feature_columns = tf.contrib.layers.real_valued_column('x', dimension=12)
feature_columns

Output:

_RealValuedColumn(column_name='x', dimension=12, default_value=None, dtype=tf.float32, normalizer=None)

Instantiating the Model

After defining the feature columns, the model can then be instantiated. Recall that we are using the LinearClassifier for this example. We pass the feature column, number of classes in the target data and the model directory as arguments when calling the class.

#instantiate the linear classifier
estimator = tf.estimator.LinearClassifier(feature_columns=[feature_columns],
                                          n_classes=2,
                                          model_dir = "base_model1"
                                         )

Output:

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'base_model1', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000021D56338630>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Defining the train input function and training the model

To pass the data into the TensorFlow model, you need to pass the features and target using a defined function. This function is called an input function.

The function takes parameters such as the features, target, batch size, number of epochs, and whether or not the data should be shuffled. The input function can be defined with Tensorflow’s nunpy_input_fn or pandas_input_fn. Here, we’d use the numpy_input_fn since the train and test data are already in NumPy arrays. We can therefore define an input function for the train data with a batch size of 32 so my machine is not overworked. In addition, the shuffle argument is set to true so that the model does not learn patterns in the train data verbatim.

#define the training input function 
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x = {'x': X_train},
    y = y_train,
    batch_size=32,
    num_epochs=None,
    shuffle=True,
    )

Notice that the epoch was set to None. This allows the model to see the data for as many iterations as defined by the number of steps.

It’s finally time to train the model. The model is trained using the train method of the estimator. We define a start and end time to check how long the model spends during training.

import time
#set the start timer
start_time = time.time()
#train the model on the training data
estimator.train(input_fn=train_input_fn, steps=2000)
#set the stop timer
end_time = time.time()
timetaken = end_time - start_time
print()
print(f"The model gets trained in {timetaken} seconds")

Output:

WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Collocations handled automatically by placer.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\inputs\queues\feeding_queue_runner.py:62: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\inputs\queues\feeding_functions.py:500: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\feature_column.py:1901: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py:809: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into base_model1\model.ckpt.
INFO:tensorflow:loss = 22.18071, step = 1
INFO:tensorflow:global_step/sec: 280.272
INFO:tensorflow:loss = 19.401627, step = 101 (0.359 sec)
INFO:tensorflow:global_step/sec: 510.493
INFO:tensorflow:loss = 14.620773, step = 201 (0.196 sec)
INFO:tensorflow:global_step/sec: 521.133
INFO:tensorflow:loss = 22.846588, step = 301 (0.192 sec)
INFO:tensorflow:global_step/sec: 510.495
INFO:tensorflow:loss = 18.471878, step = 401 (0.197 sec)
INFO:tensorflow:global_step/sec: 546.763
INFO:tensorflow:loss = 19.982285, step = 501 (0.181 sec)
INFO:tensorflow:global_step/sec: 523.86
INFO:tensorflow:loss = 19.685211, step = 601 (0.191 sec) INFO:tensorflow:global_step/sec: 483.367 INFO:tensorflow:loss = 15.895845, step = 701 (0.208 sec) INFO:tensorflow:global_step/sec: 465.385 INFO:tensorflow:loss = 19.809559, step = 801 (0.215 sec) INFO:tensorflow:global_step/sec: 483.369 INFO:tensorflow:loss = 18.505947, step = 901 (0.207 sec) INFO:tensorflow:global_step/sec: 502.799 INFO:tensorflow:loss = 14.306513, step = 1001 (0.199 sec) INFO:tensorflow:global_step/sec: 546.759 INFO:tensorflow:loss = 21.479156, step = 1101 (0.183 sec) INFO:tensorflow:global_step/sec: 361.218 INFO:tensorflow:loss = 14.587541, step = 1201 (0.284 sec) INFO:tensorflow:global_step/sec: 318.655 INFO:tensorflow:loss = 16.127178, step = 1301 (0.307 sec) INFO:tensorflow:global_step/sec: 489.912 INFO:tensorflow:loss = 18.31077, step = 1401 (0.206 sec) INFO:tensorflow:global_step/sec: 334.272 INFO:tensorflow:loss = 18.162086, step = 1501 (0.307 sec) INFO:tensorflow:global_step/sec: 300.79 INFO:tensorflow:loss = 15.54518, step = 1601 (0.324 sec) INFO:tensorflow:global_step/sec: 264.306 INFO:tensorflow:loss = 20.591423, step = 1701 (0.377 sec) INFO:tensorflow:global_step/sec: 328.056 INFO:tensorflow:loss = 15.356109, step = 1801 (0.309 sec) INFO:tensorflow:global_step/sec: 332.416 INFO:tensorflow:loss = 18.233725, step = 1901 (0.297 sec) INFO:tensorflow:Saving checkpoints for 2000 into base_model1\model.ckpt. INFO:tensorflow:Loss for final step: 18.70794. 

The model gets trained in 9.365769863128662 seconds

Defining the Test input function and Evaluating the model

Before proceeding to evaluate the model, we must define another input function to take in the test data. The input function is defined with a batch size of 128 and 1 epoch since it just needs to check whether its prediction is right or wrong. The shuffle argument was also set to False because it was not necessary to shuffle the data this time.

#define the test input data
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x = {'x': X_test},
    y = y_test,
    batch_size=128,
    num_epochs=1,
    shuffle=False
    )

Now we can see how well the model learns. We check its performance by evaluating the model on the test data.

#evaluate the model’s performance on the test data
estimator.evaluate(input_fn=test_input_fn, steps=1)

Output:

INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\ops\metrics_impl.py:2002: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-12-12T17:01:44Z
INFO:tensorflow:Graph was finalized.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from base_model1\model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2020-12-12-17:01:46
INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.7890625, accuracy_baseline = 0.7265625, auc = 0.6102919, auc_precision_recall = 0.4829452, average_loss = 0.52276087, global_step = 2000, label/mean = 0.2734375, loss = 66.91339, precision = 1.0, prediction/mean = 0.2957184, recall = 0.22857143
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2000: base_model1\model.ckpt-2000
Out[20]:
{'accuracy': 0.7890625,
 'accuracy_baseline': 0.7265625,
 'auc': 0.6102919,
 'auc_precision_recall': 0.4829452,
 'average_loss': 0.52276087,
 'label/mean': 0.2734375,
 'loss': 66.91339,
 'precision': 1.0,
 'prediction/mean': 0.2957184,
 'recall': 0.22857143,
 'global_step': 2000}

As seen the model has an accuracy of 78.9% which is fairly okay. The loss of 66.9 however seems to be quite high. Other parameters such as the recall seem to be low.

We would now turn our attention to the Kernel classifier. Let’s see if we can up the numbers with a kernel model.

Building the Kernel Classifier

The preprocessing steps in building the Kernel model is virtually the same as building the linear classifier. In fact, the train and test input function remains unchanged. The major difference is in building the kernel model itself.

To build the kernel model, we first must define a Kernel mapper. We use the RandomFourierFeatureMapper class from the tf.contrib.kernel_methods module to define the mapper. The mapper takes in the input dimension function of the data as well as the expected output dimension. So if you wish to increase the data dimension to 200, the output_dim is set to 200.

Once that’s done, the Kernel model can be built using the KernelLinearClassifier.

#define a kernel
kernel_mapper = tf.contrib.kernel_methods.RandomFourierFeatureMapper(input_dim=12, output_dim=5000, stddev=4.5, name='k_mapper1')
#map the kernels to the feature columns
kernel_mappers = {feature_columns: [kernel_mapper]}

#define an optimizer
optimizer = tf.train.FtrlOptimizer(learning_rate=50, l2_regularization_strength=0.001)

#instantiate the kernel classifier
kernel_estimator = tf.contrib.kernel_methods.KernelLinearClassifier(
    n_classes=3,
    optimizer=optimizer,
    kernel_mappers=kernel_mappers, 
    model_dir="Kernel_model1")

Output:

WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\kernel_methods\python\kernel_estimators.py:305: multi_class_head (from tensorflow.contrib.learn.python.learn.estimators.head) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.contrib.estimator.*_head.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py:1179: BaseEstimator.__init__ (from tensorflow.contrib.learn.python.learn.estimators.estimator) is deprecated and will be removed in a future version.
Instructions for updating:
Please replace uses of any Estimator from tf.contrib.learn with an Estimator from tf.estimator.*
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py:427: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000002738D99C0B8>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_train_distribute': None, '_eval_distribute': None, '_device_fn': None, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_protocol': None, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'Kernel_model1'}

The model can then be trained and evaluated. The training is done with the fit method of the instantiated kernel model. It similarly takes the input function and the number of steps as argument. To maintain evenness, the same input function that was passed for the base model is passed for this one. Also, the model was set to be trained for 2000 iterations.

import time
start_time = time.time()
#traine the kernel classifier 
kernel_estimator.fit(input_fn=train_input_fn, steps=2000)
end_time = time.time()

timetaken = end_time - start_time
print(f"The model gets trained in {timetaken} seconds")

Output:

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from Kernel_model1\model.ckpt-2000
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 2000 into Kernel_model1\model.ckpt.
INFO:tensorflow:loss = 10.088472, step = 2001
INFO:tensorflow:global_step/sec: 205.505
INFO:tensorflow:loss = 5.561243, step = 2101 (0.490 sec)
INFO:tensorflow:global_step/sec: 250.77
INFO:tensorflow:loss = 1.7062492, step = 2201 (0.399 sec)
INFO:tensorflow:global_step/sec: 267.533
INFO:tensorflow:loss = 14.24986, step = 2301 (0.374 sec)
INFO:tensorflow:global_step/sec: 263.168
INFO:tensorflow:loss = 5.7791023, step = 2401 (0.380 sec)
INFO:tensorflow:global_step/sec: 275.64
INFO:tensorflow:loss = 1.8302681, step = 2501 (0.364 sec)
INFO:tensorflow:global_step/sec: 191.289
INFO:tensorflow:loss = 6.062792, step = 2601 (0.522 sec)
INFO:tensorflow:global_step/sec: 215.641
INFO:tensorflow:loss = 1.9746958, step = 2701 (0.465 sec)
INFO:tensorflow:global_step/sec: 268.869
INFO:tensorflow:loss = 3.9464078, step = 2801 (0.371 sec)
INFO:tensorflow:global_step/sec: 249.517
INFO:tensorflow:loss = 27.863808, step = 2901 (0.401 sec)
INFO:tensorflow:global_step/sec: 282.651
INFO:tensorflow:loss = 1.5290673, step = 3001 (0.354 sec)
INFO:tensorflow:global_step/sec: 265.404
INFO:tensorflow:loss = 14.24245, step = 3101 (0.378 sec)
INFO:tensorflow:global_step/sec: 271.895
INFO:tensorflow:loss = 10.880495, step = 3201 (0.367 sec)
INFO:tensorflow:global_step/sec: 277.167
INFO:tensorflow:loss = 22.52391, step = 3301 (0.361 sec)
INFO:tensorflow:global_step/sec: 272.635
INFO:tensorflow:loss = 19.375069, step = 3401 (0.368 sec)
INFO:tensorflow:global_step/sec: 249.52
INFO:tensorflow:loss = 3.5978289, step = 3501 (0.400 sec)
INFO:tensorflow:global_step/sec: 281.852
INFO:tensorflow:loss = 4.502479, step = 3601 (0.355 sec)
INFO:tensorflow:global_step/sec: 272.635
INFO:tensorflow:loss = 13.293575, step = 3701 (0.367 sec)
INFO:tensorflow:global_step/sec: 282.648
INFO:tensorflow:loss = 16.384726, step = 3801 (0.354 sec)
INFO:tensorflow:global_step/sec: 280.465
INFO:tensorflow:loss = 17.880123, step = 3901 (0.357 sec)
INFO:tensorflow:Saving checkpoints for 4000 into Kernel_model1\model.ckpt.
INFO:tensorflow:Loss for final step: 10.802145.
The model gets trained in 10.8771390914917 seconds

This time, the model gets trained in 10.8 seconds. This small increase is understandable since the kernel model is more sophisticated. Recall the model assessed the inner products of 5000 dimensions. Finally, we check the performance of the model by evaluating it on the test data.

# Evaluate the kernel classifier
eval_metrics = kernel_estimator.evaluate(input_fn=test_input_fn, steps=1)

Output:

INFO:tensorflow:Starting evaluation at 2020-12-12T17:55:02Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from Kernel_model1\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2020-12-12-17:55:03
INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.78125, global_step = 4000, loss = 10.940855

The kernel model has managed to reduce the loss from 66.9 to 10.9 in the same number of training steps. This goes to show that it outperforms our earlier created based model. We can, in fact, attempt to improve this number by tweaking some of its training parameters.

Improving the Performance of the Kernel Classifier

The Kernel classifier is sensitive to the defined stddev and the output dimensions. In the earlier model, the stddev was set to 5 and the output dimension was set to 5000. A higher output dimension means the inner product of the two vectors gets closer. Thus, an increase in dimension would lead to an increased degree of freedom until it saturates at some point.

Here, we will attempt to tweak these parameters, setting the stddev to 4.5 and the output dimension to 400. We would also change the optimizer parameters, particularly the learning rate to 5 and an L2 regularizer strength to 0.01 to prevent overfitting.

#define a kernel
kernel_mapper = tf.contrib.kernel_methods.RandomFourierFeatureMapper(input_dim=12, output_dim=4000, stddev=5, name='k_mapper')
#map the kernels to the feature columns
kernel_mappers = {feature_columns: [kernel_mapper]}

#define an optimizer
optimizer = tf.train.FtrlOptimizer(learning_rate=50, l2_regularization_strength=0.01)

#instantiate the kernel classifier
kernel_estimator = tf.contrib.kernel_methods.KernelLinearClassifier(
    n_classes=3,
    optimizer=optimizer,
    kernel_mappers=kernel_mappers, 
    model_dir="Kernel_model")

Output:

WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\kernel_methods\python\kernel_estimators.py:305: multi_class_head (from tensorflow.contrib.learn.python.learn.estimators.head) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.contrib.estimator.*_head.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py:1179: BaseEstimator.__init__ (from tensorflow.contrib.learn.python.learn.estimators.estimator) is deprecated and will be removed in a future version.
Instructions for updating:
Please replace uses of any Estimator from tf.contrib.learn with an Estimator from tf.estimator.*
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\estimator.py:427: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000021D599E5E48>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_train_distribute': None, '_eval_distribute': None, '_device_fn': None, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_protocol': None, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'Kernel_model'}

After defining the parameters, let’s train the model.

start_time = time.time()
#traine the kernel classifier 
kernel_estimator.fit(input_fn=train_input_fn, steps=2000)
end_time = time.time()

timetaken = end_time - start_time
print(f"The model gets trained in {timetaken} seconds")

Output:

WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\head.py:677: ModelFnOps.__new__ (from tensorflow.contrib.learn.python.learn.estimators.model_fn) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.EstimatorSpec. You can use the `estimator_spec` method to create an equivalent one.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into Kernel_model\model.ckpt.
INFO:tensorflow:loss = 1.0986123, step = 1
INFO:tensorflow:global_step/sec: 139.152
INFO:tensorflow:loss = 0.9671518, step = 101 (0.727 sec)
INFO:tensorflow:global_step/sec: 237.121
INFO:tensorflow:loss = 1.6124909, step = 201 (0.415 sec)
INFO:tensorflow:global_step/sec: 255.145
INFO:tensorflow:loss = 8.096902, step = 301 (0.390 sec)
INFO:tensorflow:global_step/sec: 267.797
INFO:tensorflow:loss = 2.1676967, step = 401 (0.373 sec)
INFO:tensorflow:global_step/sec: 271.93
INFO:tensorflow:loss = 2.9555616, step = 501 (0.368 sec)
INFO:tensorflow:global_step/sec: 239.576
INFO:tensorflow:loss = 4.1158886, step = 601 (0.422 sec)
INFO:tensorflow:global_step/sec: 257.063
INFO:tensorflow:loss = 1.1513939, step = 701 (0.386 sec)
INFO:tensorflow:global_step/sec: 275.379
INFO:tensorflow:loss = 1.4807884, step = 801 (0.366 sec)
INFO:tensorflow:global_step/sec: 259.215
INFO:tensorflow:loss = 3.0109026, step = 901 (0.383 sec)
INFO:tensorflow:global_step/sec: 176.894
INFO:tensorflow:loss = 3.5773954, step = 1001 (0.573 sec)
INFO:tensorflow:global_step/sec: 159.864
INFO:tensorflow:loss = 5.449258, step = 1101 (0.620 sec)
INFO:tensorflow:global_step/sec: 176.148
INFO:tensorflow:loss = 0.9399394, step = 1201 (0.563 sec)
INFO:tensorflow:global_step/sec: 198.512
INFO:tensorflow:loss = 6.211024, step = 1301 (0.506 sec)
INFO:tensorflow:global_step/sec: 188.923
INFO:tensorflow:loss = 10.430393, step = 1401 (0.532 sec)
INFO:tensorflow:global_step/sec: 172.681
INFO:tensorflow:loss = 2.431583, step = 1501 (0.577 sec)
INFO:tensorflow:global_step/sec: 180.112
INFO:tensorflow:loss = 1.4712481, step = 1601 (0.552 sec)
INFO:tensorflow:global_step/sec: 189.118
INFO:tensorflow:loss = 3.9613595, step = 1701 (0.532 sec)
INFO:tensorflow:global_step/sec: 209.762
INFO:tensorflow:loss = 3.3975558, step = 1801 (0.479 sec)
INFO:tensorflow:global_step/sec: 255.382
INFO:tensorflow:loss = 7.3193703, step = 1901 (0.387 sec)
INFO:tensorflow:Saving checkpoints for 2000 into Kernel_model\model.ckpt.
INFO:tensorflow:Loss for final step: 3.1488423.
The model gets trained in 12.914040088653564 seconds

After finally, we evaluate its performance.

# Evaluate the kernel classifier
eval_metrics = kernel_estimator.evaluate(input_fn=test_input_fn, steps=1)

Output:

INFO:tensorflow:Starting evaluation at 2020-12-12T17:02:21Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from Kernel_model\model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2020-12-12-17:02:21
INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.7890625, global_step = 2000, loss = 4.6506495

As seen, the loss has been reduced to 4.65. This is a decent improvement from the first kernel classifier that recorded a loss of 10.94

Conclusion

In this tutorial, we have seen how to build a linear classifier with TensorFlow and a kernel classifier. We saw that the Kernel betters how the model performed by taking the dimension of the vector into a higher dimension and returns the inner product. This way has proven to work well for data that cannot be linearly separable, which is the case for most real-life data.

We then went ahead to build a kernel classifier and juxtaposed the result with the linear classifier.