{:check ["true"]}

Index

Basics of Keras

Elements of Neural Networks

split=4

In this video, I describe the elements of Keras' abstraction of neural networks using simple graphical representations.

Building Neural Networks with Keras

split=4

Here is a video in which we discuss the specific API calls provided by the Keras library. We work through a complete example of using Keras API to perform line fitting.

The complete Jupyter notebook

1 Keras Api

Keras API

In [29]:
import numpy as np
import matplotlib.pyplot as pl
import tensorflow as tf

Layers

Keras API design is based on modular composition of layers.

Using layers, we can construct ever more complex neural networks.

Each layer is a function: $ f(x | \theta) $

  • $x$ is the input which is a tensor of some shape.

  • $\theta$ is a collection of model parameters.

  • Shapes of input $x$ and layer output are important.

Each layer is callable.

layer(input_tensor_batch)

The input must be a batch of input tensors.

Why batch processing?

  • Single observation training is too inefficient.
  • When we have billions of observations, batch becomes necessary.

Dense Layer

$$f(x | w, b) = w\cdot x + b$$

Input:

  • $x$ needs to have the shape $(n,)$
  • The model parameters are $(w, b)$ where

    • $w : (m, n)$ and
    • $b: (n,)$
  • They are usually randomly seeded.

  • The output shape is $(m,)$. We call $m$ the output dimensionality of the layer.
In [22]:
#
# The layers module comes all the built-in types of layers
#
import tensorflow.keras.layers as layers

#
# We can construct a dense layer and specify the
# output dimensionality.
#
dense = layers.Dense(5)

#
# Layers are callable.
#
# It's important to note that the input to a layer is
# a **batch** of input vectors.
#
dense(np.ones((4,10)))
Out[22]:
<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 0.9466287 , -1.0576787 ,  0.6624387 ,  0.55489326, -0.07569325],
       [ 0.9466287 , -1.0576787 ,  0.6624387 ,  0.55489326, -0.07569325],
       [ 0.9466287 , -1.0576787 ,  0.6624387 ,  0.55489326, -0.07569325],
       [ 0.9466287 , -1.0576787 ,  0.6624387 ,  0.55489326, -0.07569325]],
      dtype=float32)>
In [23]:
#
# Layer objects provide many methods and properties.
# Layer.weights gives us the list of tf.Variables which
# are the model parameters associated with the layer.
#
dense.weights
Out[23]:
[<tf.Variable 'dense_9/kernel:0' shape=(10, 5) dtype=float32, numpy=
 array([[ 0.6025072 , -0.43672186,  0.01523787, -0.05744964, -0.02783871],
        [ 0.3051511 ,  0.5362379 , -0.00778657,  0.1384775 ,  0.59329134],
        [-0.30592373, -0.12877333, -0.15314567,  0.36254793, -0.59227914],
        [ 0.26332307, -0.01757264,  0.48372084,  0.10010713,  0.06260192],
        [-0.590446  , -0.37000114,  0.40983242,  0.49618083,  0.53072447],
        [ 0.5794999 , -0.08634806, -0.47732228, -0.27232084, -0.05219626],
        [-0.19411871, -0.22270411,  0.4445173 , -0.45263886, -0.5300445 ],
        [-0.5467794 , -0.42953312,  0.08185697, -0.4732017 ,  0.21752769],
        [ 0.31189018, -0.3273468 ,  0.32280928,  0.42046827, -0.28268683],
        [ 0.5215251 ,  0.4250844 , -0.45728153,  0.29272258,  0.00520676]],
       dtype=float32)>,
 <tf.Variable 'dense_9/bias:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>]

Activation Layer

An activation function is just another word for element-wise rescaling functions.

$$ f: \mathbb{R}^n\to\mathbb{R}^n $$

Here are some popular activation functions:

  • Rectilinear relu: $$f(x) = \left\{\begin{array}{ll} 0 & \mathrm{if}\ x \leq 0 \\ x & \mathrm{else} \end{array}\right.$$
  • Sigmoid: $$ f(x) = \frac{1}{1+e^{-x}}$$

Other frequently used activation functions are:

softmax and tanh

In [34]:
#
# We can build a sigmoid activation layer.
#
sigmoid = layers.Activation('sigmoid')

sigmoid(tf.zeros((3,2)))
Out[34]:
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5]], dtype=float32)>
In [35]:
#
# Activation layers do not have any model parameters
#
sigmoid.weights
Out[35]:
[]

We will be explore many other types of layers throughout the course.

Model

Keras bundles layers, loss function and optimizer into a single Model object.

Models are extremely easy to build using the sequential model API.

Using the sequential API, we need to specify a sequence of layers, and add them to the sequential model.

We need an input layer.

  • The input layer is actually just a placeholder for Keras to know the shape of the tensors to expect during training and prediction.
In [67]:
import tensorflow.keras.models as models

#
# This is all we need for linear regression.
#
model = models.Sequential([
    layers.Input(shape=(1,)),
    layers.Dense(1),
])

Before training, we need to pick the loss function, and the optimizer.

We can also specify many other useful training related properties.

In [68]:
import tensorflow.keras.losses as losses
import tensorflow.keras.optimizers as optimizers
import tensorflow.keras.metrics as metrics

model.compile(
    loss=losses.MeanSquaredError(),
    optimizer=optimizers.SGD(learning_rate=1e-5),
    metrics=[metrics.MeanAbsoluteError()],
)

Inspecting the model

Keras models can be inspected by its summary method.

In [101]:
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_13 (Dense)             (None, 1)                 2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________

Training

Let's generate some data.

In [69]:
x_data = np.linspace(0, 10, 1000)
y_data = 3*x_data + np.random.randn(1000)*4
pl.plot(x_data, y_data, '.');

The model has a method fit that performs the model parameter tuning using the gradient based training loop.

In [99]:
model.fit(x_data, y_data, epochs=10, batch_size=32, verbose=2)
Epoch 1/10
32/32 - 0s - loss: 17.1070 - mean_absolute_error: 3.3410
Epoch 2/10
32/32 - 0s - loss: 17.0865 - mean_absolute_error: 3.3386
Epoch 3/10
32/32 - 0s - loss: 17.0677 - mean_absolute_error: 3.3364
Epoch 4/10
32/32 - 0s - loss: 17.0491 - mean_absolute_error: 3.3341
Epoch 5/10
32/32 - 0s - loss: 17.0305 - mean_absolute_error: 3.3318
Epoch 6/10
32/32 - 0s - loss: 17.0142 - mean_absolute_error: 3.3299
Epoch 7/10
32/32 - 0s - loss: 16.9978 - mean_absolute_error: 3.3279
Epoch 8/10
32/32 - 0s - loss: 16.9812 - mean_absolute_error: 3.3261
Epoch 9/10
32/32 - 0s - loss: 16.9658 - mean_absolute_error: 3.3240
Epoch 10/10
32/32 - 0s - loss: 16.9514 - mean_absolute_error: 3.3222
Out[99]:
<tensorflow.python.keras.callbacks.History at 0x7fcdf46cc8e0>

Prediction

Okay, now the model is reasonably trained. Let's see how well it works.

model.predict performs the prediction on given batch of input tensors.

Note, model.predict returns a NumPy tensor.

In [100]:
y_pred = model.predict(x_data)

pl.plot(x_data, y_data, '.', color='#ccc');
pl.plot(x_data, np.squeeze(y_pred), linewidth=2);

Saving and loading the model

Keras can save the model to file.

  1. Model architecture is serialized.
  1. Model parameters are serialized.
In [102]:
model.save('./model.h5')

Let's try to load the model.

In [103]:
restored_model = models.load_model('./model.h5')
restored_model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_13 (Dense)             (None, 1)                 2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________
In [106]:
pl.plot(x_data, y_data, '.', color='#ccc');
pl.plot(x_data, restored_model.predict(x_data).squeeze());
In [ ]: