Neural-Network-With-Tensorflow¶
Tensorflow is a deeplearning framework used to train deeplearning models easily which is opensource and maintained by google.
This notebook deals with training an end to end neural network. We can train a neural network just by using the numpy
library but that's a long and tedious process so for to make this easier we will use tensorflow with the latest version.
For this notebook we will build a simple hand sign classifier with the help of a 3 layered neural network.
# Check for tensorflow version
import tensorflow as tf
tf.__version__
'2.5.0'
# importing libraries we will need
import h5py
import numpy as np
import time
import matplotlib.pyplot as plt
Getting our data ready¶
For this notebook our dataset is present in a h5py
file, let's get it ready so that we can use it.
train_dataset = h5py.File('/content/drive/MyDrive/dataset/train_signs.h5', "r")
test_dataset = h5py.File('/content/drive/MyDrive/dataset/test_signs.h5', "r")
Here we will call the TensorFlow dataset created on a HDF5 file, which we can use in place of a Numpy array to store our datasets.
x_train = tf.data.Dataset.from_tensor_slices(train_dataset['train_set_x'])
y_train = tf.data.Dataset.from_tensor_slices(train_dataset['train_set_y'])
x_test = tf.data.Dataset.from_tensor_slices(test_dataset['test_set_x'])
y_test = tf.data.Dataset.from_tensor_slices(test_dataset['test_set_y'])
# Check for the type
type(x_train)
tensorflow.python.data.ops.dataset_ops.TensorSliceDataset
TensorFlow Datasets are generators, we can't access directly the contents unless we iterate over them in a for loop. Let's do this by using a iter
and next
in python.
print(next(iter(x_train)))
tf.Tensor( [[[227 220 214] [227 221 215] [227 222 215] ... [232 230 224] [231 229 222] [230 229 221]] [[227 221 214] [227 221 215] [228 221 215] ... [232 230 224] [231 229 222] [231 229 221]] [[227 221 214] [227 221 214] [227 221 215] ... [232 230 224] [231 229 223] [230 229 221]] ... [[119 81 51] [124 85 55] [127 87 58] ... [210 211 211] [211 212 210] [210 211 210]] [[119 79 51] [124 84 55] [126 85 56] ... [210 211 210] [210 211 210] [209 210 209]] [[119 81 51] [123 83 55] [122 82 54] ... [209 210 210] [209 210 209] [208 209 209]]], shape=(64, 64, 3), dtype=uint8)
# check for details in y_train
print(y_train.element_spec)
TensorSpec(shape=(), dtype=tf.int64, name=None)
Normalizing function¶
Now, let's create a function which normalizes our images like converts them into tensors and in the dimension (64 x 64 x 3, 1).
To apply this function to each element we will use map()
function.
def normalize(image):
image = tf.cast(image, tf.float32) / 256.0
image = tf.reshape(image, [-1,1])
return image
# applying the function to each image
new_train = x_train.map(normalize)
new_test = x_test.map(normalize)
new_train.element_spec
TensorSpec(shape=(12288, 1), dtype=tf.float32, name=None)
x_train.element_spec
TensorSpec(shape=(64, 64, 3), dtype=tf.uint8, name=None)
print(next(iter(new_train)))
tf.Tensor( [[0.88671875] [0.859375 ] [0.8359375 ] ... [0.8125 ] [0.81640625] [0.81640625]], shape=(12288, 1), dtype=float32)
One Hot Encodings¶
In deeplearning sometimes we come accross some of the problems where the y mappings are not just classified between two(0, 1) but different. And hence to convert this into (0, 1) we use one_hot_encoding.
This is called "one hot" encoding, because in the converted representation, exactly one element of each column is "hot" (meaning set to 1)
def one_hot_matrix(label, depth=6):
one_hot = tf.reshape(tf.one_hot(label, depth, axis=0), (depth,1))
return one_hot
new_y_test = y_test.map(one_hot_matrix)
new_y_train = y_train.map(one_hot_matrix)
print(next(iter(new_y_test)))
tf.Tensor( [[1.] [0.] [0.] [0.] [0.] [0.]], shape=(6, 1), dtype=float32)
Initializing the parameters¶
Now we'll initialize a vector of numbers between zero and one.
The function we are using is - tf.keras.initializers.GlorotNormal(seed=1)
- we are using a seed
here so that the initializer always comes up with the same random values.
This function draws samples from a truncated normal distribution centered on 0, with stddev = sqrt(2 / (fan_in + fan_out))
, where fan_in
is the number of input units and fan_out
is the number of output units, both in the weight tensor.
Initializing parameters to build a neural network with TensorFlow. The shapes are:
W1 : [25, 12288]
b1 : [25, 1]
W2 : [12, 25]
b2 : [12, 1]
W3 : [6, 12]
b3 : [6, 1]
def initialize_parameters():
initializer = tf.keras.initializers.GlorotNormal(seed=1)
W1 = tf.Variable(initializer(shape=(25, 12288)))
b1 = tf.Variable(initializer(shape=(25, 1)))
W2 = tf.Variable(initializer(shape=(12, 25)))
b2 = tf.Variable(initializer(shape=(12, 1)))
W3 = tf.Variable(initializer(shape=(6, 12)))
b3 = tf.Variable(initializer(shape=(6, 1)))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2,
"W3": W3,
"b3": b3}
return parameters
Two most important steps :¶
- Implement forward propagation
- Retrieve the gradients and train the model
We are training our model with tensorflow and while using it we have this benefit of not using the backpropagation as tensorflow takes care of it for us. This is also the reason why tensorflow is a great deeplearning framework.
We need to only work on forward propagation.
Here, we'll use a TensorFlow decorator, @tf.function
, which builds a computational graph to execute the function. @tf.function
is polymorphic, which comes in very handy, as it can support arguments with different data types or shapes, and be used with other languages, such as Python. This means that you can use data dependent control flow statements.
When you use @tf.function
to implement forward propagation, the computational graph is activated, which keeps track of the operations. This is so you can calculate your gradients with backpropagation.
Implementing the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR
@tf.function
def forward_propagation(X, parameters):
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
Z1 = tf.math.add(tf.linalg.matmul(W1, X), b1)
A1 = tf.keras.activations.relu(Z1)
Z2 = tf.math.add(tf.linalg.matmul(W2, A1), b2)
A2 = tf.keras.activations.relu(Z2)
Z3 = tf.math.add(tf.linalg.matmul(W3, A2), b3)
return Z3
Computing the cost¶
Here again, @tf.function
decorator steps in and saves us time. All we need to do is specify how to compute the cost, and we can do so in one simple step by using:
tf.reduce_mean(tf.keras.losses.binary_crossentropy(y_true = ..., y_pred = ..., from_logits=True))
@tf.function
def compute_cost(logits, labels):
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(y_true = labels, y_pred = logits, from_logits=True))
return cost
Training the model¶
Almost all of our functions are ready to use in the model except the deciding the optimizer which we will decide in the model()
function.
For this case we are using SGD
- stochastic gradient descent.
The tape.gradient
function: this allows us to retrieve the operations recorded for automatic differentiation inside the GradientTape block. Then, calling the optimizer method apply_gradients
, will apply the optimizer's update rules to each trainable parameter.
tf.Data.dataset = dataset.prefetch(8)
- What this does is prevent a memory bottleneck that can occur when reading from disk. prefetch()
sets aside some data and keeps it ready for when it's needed. It does this by creating a source dataset from your input data, applying a transformation to preprocess the data, then iterating over the dataset the specified number of elements at a time. This works because the iteration is streaming, so the data doesn't need to fit into the memory.
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
num_epochs = 1500, minibatch_size = 32, print_cost = True):
costs = [] # To keep track of the cost
# Initializing our parameters
parameters = initialize_parameters()
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
# Optimizer selection
optimizer = tf.keras.optimizers.SGD(learning_rate)
X_train = X_train.batch(minibatch_size, drop_remainder=True).prefetch(8)# <<< extra step
Y_train = Y_train.batch(minibatch_size, drop_remainder=True).prefetch(8) # loads memory faster
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0.
for (minibatch_X, minibatch_Y) in zip(X_train, Y_train):
# Select a minibatch
with tf.GradientTape() as tape:
# 1. predict
Z3 = forward_propagation(minibatch_X, parameters)
# 2. loss
minibatch_cost = compute_cost(Z3, minibatch_Y)
trainable_variables = [W1, b1, W2, b2, W3, b3]
grads = tape.gradient(minibatch_cost, trainable_variables)
optimizer.apply_gradients(zip(grads, trainable_variables))
epoch_cost += minibatch_cost / minibatch_size
# Print the cost every epoch
if print_cost == True and epoch % 10 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# Plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per fives)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
# Save the parameters in a variable
print ("Parameters have been trained!")
return parameters
# training the model on the data set
model(new_train, new_y_train, new_test, new_y_test, num_epochs=200)
Cost after epoch 0: 0.742591 Cost after epoch 10: 0.614557 Cost after epoch 20: 0.598900 Cost after epoch 30: 0.588907 Cost after epoch 40: 0.579898 Cost after epoch 50: 0.570628 Cost after epoch 60: 0.560898 Cost after epoch 70: 0.550808 Cost after epoch 80: 0.540497 Cost after epoch 90: 0.488141 Cost after epoch 100: 0.478272 Cost after epoch 110: 0.472865 Cost after epoch 120: 0.468991 Cost after epoch 130: 0.466015 Cost after epoch 140: 0.463661 Cost after epoch 150: 0.461677 Cost after epoch 160: 0.459951 Cost after epoch 170: 0.458392 Cost after epoch 180: 0.456970 Cost after epoch 190: 0.455647
Parameters have been trained!
{'W1': <tf.Variable 'Variable:0' shape=(25, 12288) dtype=float32, numpy= array([[ 0.00159522, -0.00737918, 0.00893291, ..., -0.01227797, 0.01642201, 0.00506484], [ 0.02264025, 0.0067227 , 0.00795862, ..., 0.00284724, 0.01910819, 0.00122853], [-0.00173583, -0.00872451, -0.01410439, ..., -0.00733834, 0.02050859, -0.0268302 ], ..., [-0.00126929, 0.01729332, 0.02082342, ..., 0.01709594, 0.00429358, -0.00733263], [ 0.00268257, 0.00410495, 0.00936706, ..., 0.01222281, -0.02717606, 0.01498352], [-0.00145541, 0.02459595, 0.00339064, ..., -0.02478788, 0.02716016, -0.00306428]], dtype=float32)>, 'W2': <tf.Variable 'Variable:0' shape=(12, 25) dtype=float32, numpy= array([[ 0.03270398, -0.13031 , 0.16566683, -0.20850259, -0.2404858 , -0.10598166, -0.01016674, 0.12317106, -0.00411659, -0.3709333 , 0.45312327, -0.3642326 , 0.09766971, 0.18042909, -0.05753208, -0.13796303, -0.04518652, -0.15597364, -0.00236228, -0.05681378, 0.07734591, 0.01733258, -0.04763132, 0.31054643, 0.18095495], [ 0.27500606, 0.0652916 , 0.19277133, 0.00808901, -0.3506106 , -0.04379589, 0.00529772, 0.14074522, -0.22700673, -0.08254708, -0.10437232, -0.27877635, -0.22737738, -0.15467171, -0.30434558, 0.4284142 , 0.04013086, 0.14082551, 0.4080341 , 0.19127996, -0.08289494, 0.19833343, -0.18854786, 0.11045365, -0.10293514], [ 0.07370562, 0.12879197, -0.38048762, -0.1428371 , -0.16866724, -0.12560502, 0.08047906, -0.1422233 , -0.32914424, 0.11487048, 0.21897405, 0.1428981 , 0.4108542 , -0.02966296, -0.11487778, 0.28352287, 0.25715733, -0.12365948, 0.14695017, -0.39992067, -0.11544652, -0.11918075, -0.5031594 , -0.16646984, -0.04636551], [-0.11886354, 0.19529893, -0.13205235, -0.46206364, 0.07806109, -0.36992034, -0.06379854, 0.3715831 , 0.07556012, 0.51988 , -0.01714148, 0.35476214, 0.09361272, 0.17954445, 0.00514462, 0.04280851, 0.10517999, 0.03767022, -0.23309664, -0.23678231, -0.07444265, -0.30713868, -0.11694648, 0.32925925, -0.09511968], [ 0.15940417, 0.0393942 , 0.47869277, 0.22657531, 0.03725046, -0.51921755, -0.0173153 , -0.31578007, -0.21672027, 0.04122906, 0.04947521, -0.29094276, -0.03152777, 0.47902155, 0.31676555, 0.04739 , 0.0777043 , 0.31394583, -0.02500663, 0.10048122, -0.05332499, -0.34107792, -0.13928486, 0.124021 , -0.41300818], [-0.14994687, 0.03965308, -0.47870165, -0.07975383, 0.0975506 , -0.00232861, -0.26367775, -0.23967458, 0.2494654 , 0.22969168, -0.3077367 , 0.1017215 , 0.03053022, 0.26468748, -0.51858556, -0.08669728, 0.03128891, 0.28504834, 0.20724736, -0.14461055, -0.09631127, 0.2553377 , 0.0313108 , 0.28684515, 0.02228327], [-0.20329641, -0.2922766 , -0.03024991, 0.00603078, 0.34428513, 0.14932795, -0.42723438, 0.07875892, 0.06157893, -0.19437577, 0.03054013, -0.20949648, 0.2890019 , 0.03168807, 0.18291236, -0.1762907 , -0.2162296 , 0.02522451, -0.17976454, 0.20999095, 0.13074146, 0.12900151, -0.29620144, 0.39828372, 0.35581756], [-0.08132942, 0.0508789 , 0.03970909, -0.06884057, -0.07758211, 0.2122033 , 0.16169944, -0.05766107, -0.04837854, -0.23052695, 0.2551639 , -0.2933403 , -0.16104451, -0.11232601, -0.1305835 , 0.05021809, 0.18621859, -0.07786819, 0.10281897, -0.06372993, 0.41251048, -0.01803587, 0.04746069, 0.27628535, -0.21901166], [ 0.28539088, 0.2062927 , -0.38372174, 0.26297212, 0.23504944, 0.18105377, 0.25501856, -0.19114958, 0.3558071 , 0.00106932, -0.33252394, -0.09722907, -0.00984806, 0.22310142, -0.22939922, -0.02731948, 0.18572627, -0.00867934, 0.47467563, 0.00131025, 0.3148377 , -0.22662118, 0.12927507, 0.04265387, -0.45121887], [-0.23054188, -0.22334962, -0.18913192, 0.15417176, -0.07368276, -0.0554374 , 0.12214173, 0.3880139 , -0.01242276, 0.11768965, 0.26777858, -0.06251994, -0.12100054, -0.12495217, -0.03189994, -0.50085783, -0.09560107, -0.2402923 , 0.07087833, 0.03642716, -0.00494978, -0.36984688, 0.00878784, 0.24595839, -0.13239339], [ 0.3191285 , 0.02266271, -0.06669851, -0.33996752, 0.36436075, -0.29865557, -0.0511701 , -0.3724362 , 0.27359566, 0.20692119, 0.02171062, 0.10230298, -0.3980014 , 0.02363082, 0.13089393, 0.33540612, 0.08214815, 0.20031555, -0.08127788, -0.28784147, 0.17327178, -0.13266881, 0.28894275, -0.19869862, -0.03405774], [ 0.18820514, -0.20398362, -0.03503592, -0.36792815, -0.22963937, 0.23911734, -0.04237935, -0.0165515 , -0.05906205, 0.16423815, -0.3201712 , 0.15379828, 0.14842784, -0.24647985, -0.08833575, 0.13306315, 0.41101152, 0.3626319 , 0.335514 , 0.05405051, 0.21186371, 0.0197499 , 0.45979315, 0.04402938, 0.36662805]], dtype=float32)>, 'W3': <tf.Variable 'Variable:0' shape=(6, 12) dtype=float32, numpy= array([[ 0.04761663, -0.18691424, 0.23871914, -0.29910994, -0.3481434 , -0.14891662, -0.01468904, 0.17718892, -0.00528843, -0.53165394, 0.651227 , -0.52562135], [ 0.14137168, 0.25856522, -0.08242729, -0.24897729, -0.086944 , -0.23124097, -0.00362752, -0.08165746, 0.108671 , 0.02485008, -0.10594466, 0.4071277 ], [ 0.2588232 , 0.39531502, 0.09260609, 0.19811381, -0.02675303, -0.5141329 , -0.06277467, 0.0073774 , 0.2010778 , -0.3234572 , -0.1716854 , -0.21085015], [-0.39999744, -0.3245937 , -0.22211905, -0.4464757 , 0.6061089 , 0.06233858, 0.20539553, 0.5849541 , 0.27438077, -0.1188482 , 0.2764906 , -0.2822454 ], [ 0.15867612, -0.14753208, 0.10718007, 0.20221178, -0.5397338 , -0.19613965, -0.24190295, -0.17912963, 0.1174304 , -0.20150405, -0.45716155, 0.17953748], [ 0.31372803, 0.20475909, 0.5910115 , -0.070414 , -0.17219493, 0.39685246, 0.37256533, -0.17371103, 0.20850411, -0.5733746 , -0.1864096 , -0.19104952]], dtype=float32)>, 'b1': <tf.Variable 'Variable:0' shape=(25, 1) dtype=float32, numpy= array([[ 0.03964256], [-0.15545043], [ 0.19885883], [-0.24874453], [-0.2867674 ], [-0.12604605], [-0.01213097], [ 0.14784044], [-0.0041317 ], [-0.44089788], [ 0.5405422 ], [-0.43450323], [ 0.1176388 ], [ 0.21523888], [-0.06772597], [-0.16429298], [-0.0525962 ], [-0.18479516], [-0.00280251], [-0.06777475], [ 0.09226809], [ 0.02067652], [-0.05682073], [ 0.37065938], [ 0.21586621]], dtype=float32)>, 'b2': <tf.Variable 'Variable:0' shape=(12, 1) dtype=float32, numpy= array([[ 0.05586328], [-0.22080198], [ 0.28016466], [-0.35078028], [-0.407107 ], [-0.17678551], [-0.01738933], [ 0.20840582], [-0.00978938], [-0.6255955 ], [ 0.7664691 ], [-0.6127304 ]], dtype=float32)>, 'b3': <tf.Variable 'Variable:0' shape=(6, 1) dtype=float32, numpy= array([[ 0.07464747], [-0.31648076], [ 0.357774 ], [-0.48544684], [-0.5509958 ], [-0.24964482]], dtype=float32)>}
So, we are done with our building a 3 layered neural network with tensorflow.