{:check ["true"]}

Index

Convolutional Networks

Notes on building the convolutional network

The Keras API

1 Mnist With Conv2d

In [1]:
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.models as models
import tensorflow.keras.layers as layers
import tensorflow.keras.losses as losses
import tensorflow.keras.optimizers as optimizers
import tensorflow.keras.datasets as datasets

import numpy as np
import matplotlib.pyplot as pl
import matplotlib.patches as patches

Load the data

In [2]:
dataset = keras.datasets.mnist.load_data()
In [3]:
(x_train, y_train), (x_test, y_test) = dataset
x_train = x_train / 255.
x_test = x_test / 255.

Kernels and 2D Convolution (aka Filtering)

Let's examine just one of the samples.

In [4]:
x0 = x_train[0]
(c0, c1), (r0, r1) = (5, 10), (20, 25)
pl.imshow(x0, cmap='gray')
ax = pl.gca()
ax.add_patch(patches.Rectangle((c0, r0), c1-c0, r1-r0,
                               facecolor='none',
                               edgecolor='red',
                               linewidth=5))
Out[4]:
<matplotlib.patches.Rectangle at 0x7fb5e8723eb0>

We will define the kernel as the subregion.

In [5]:
kernel = x0[r0:r1, c0:c1]
pl.imshow(kernel, cmap='gray');

Here is the function that performs 2D convolution.

$Y = \mathrm{convolve}(X, K)$ is given as:

$$Y[i,j] = \left<X[i:i+w, j:j+h] \cdot K\right> $$
In [6]:
def conv2d(image, kernel):
    (w_i, h_i) = image.shape
    (w_k, h_k) = kernel.shape
    w, h = (w_i-w_k+1), (h_i-h_k+1)
    result = np.zeros((w, h))
    for i in range(w):
        for j in range(h):
            region = image[i:i+w_k, j:j+h_k]
            result[i,j] = np.sum(region * kernel)
    return result

This is the result of applying the kernel to the image. We can see the region of strong excitation.

In [7]:
result = conv2d(x0, kernel)
pl.imshow(result)
Out[7]:
<matplotlib.image.AxesImage at 0x7fb5e85f80a0>
In [8]:
fig = pl.figure(figsize=(15, 5))
for i in range(1, 6):
    x = x_train[i]
    pl.subplot(1, 5, i)
    pl.imshow(conv2d(x, kernel))

Using Keras conv2d

We construct a conv2d layer.

  • filters is the number of kernels.

  • kernel_size is the dimensionality of each of the kernels.

In [9]:
conv2d = layers.Conv2D(filters=1, kernel_size=kernel.shape)

Keras Conv2D expects a batch of multi-channel images (RGB).

So, the input tensor size should be:

(batch_size, width, height, channels).

In [10]:
output = conv2d(x0.reshape(1, 28, 28, 1))
output.shape
Out[10]:
TensorShape([1, 24, 24, 1])

Keras Conv2D layer comes with two model parameters:

  1. Kernel
  2. Bias
In [11]:
(kernel_parameter, bias_parameter) = conv2d.get_weights()
print('kernel_parameter:', kernel_parameter.shape)
print('bias_parameter:', bias_parameter.shape)
kernel_parameter: (5, 5, 1, 1)
bias_parameter: (1,)

Let's set these parameters manually, and see if we can reproduce the same output.

In [12]:
conv2d.set_weights([kernel.reshape(5, 5, 1, 1), np.array([0])])

output = conv2d(x0.reshape(1, 28, 28, 1))
In [13]:
pl.imshow(output.numpy().squeeze())
Out[13]:
<matplotlib.image.AxesImage at 0x7fb5dc6290a0>

Convolution Preprocessing: Padding

Observation:

  1. Conv2D reduces image dimension at each application.
  1. We want to compose Conv2D into multi-layer networks.

Solution:

Padding the original image so that the output of Conv2D has the same dimensional as the original image.

In [14]:
conv2d_padded = layers.Conv2D(filters=1, kernel_size=kernel.shape, padding='same')
print("x0.shape", x0.shape)
output = conv2d_padded(x0.reshape(1, 28, 28, 1))
print("output.shape", output.shape)
x0.shape (28, 28)
output.shape (1, 28, 28, 1)

Convolution Postprocessing: Max Pooling

Pooling just means aggregating using the max function.

MaxPooling2D scans through the 2D image, and performs pooling in the region defined by the pool_size.

The region is scanned through the image, covering the entire image using strides.

In [15]:
maxpooling = layers.MaxPooling2D(pool_size=(2,2))
In [16]:
output1 = conv2d(x0.reshape(1, 28, 28, 1)).numpy()
output2 = maxpooling(output1).numpy()
pl.subplot(1,2,1)
pl.imshow(output1.squeeze())

pl.subplot(1,2,2)
pl.imshow(output2.squeeze())
Out[16]:
<matplotlib.image.AxesImage at 0x7fb5dc602a60>

Putting it together into a network

We will use Conv2D / Max Pooling as feature construction, and perform classification using a dense layer followed by softmax.

  • We will allow large number of kernels to be learned from the training data.
In [17]:
model = models.Sequential([
    layers.Input(shape=(28,28)),
    layers.Reshape((28,28,1)),
    layers.Conv2D(32, (3,3), padding='same'),
    layers.MaxPooling2D((2,2)),                   # 14x14
    layers.Conv2D(16, (3,3), padding='same'),
    layers.MaxPooling2D((2,2)),                   # 7x7
    layers.Flatten(),
    layers.Dense(10, activation='softmax'),
])

model.compile(loss=losses.SparseCategoricalCrossentropy(),
             optimizer=optimizers.Adam(),
             metrics=['acc'])
In [18]:
model.fit(x_train, y_train, epochs=5)
Epoch 1/5
1875/1875 [==============================] - 76s 40ms/step - loss: 0.4523 - acc: 0.8660
Epoch 2/5
1875/1875 [==============================] - 75s 40ms/step - loss: 0.0729 - acc: 0.9778
Epoch 3/5
1875/1875 [==============================] - 76s 40ms/step - loss: 0.0611 - acc: 0.9810
Epoch 4/5
1875/1875 [==============================] - 75s 40ms/step - loss: 0.0468 - acc: 0.9863
Epoch 5/5
1875/1875 [==============================] - 75s 40ms/step - loss: 0.0423 - acc: 0.9867
Out[18]:
<tensorflow.python.keras.callbacks.History at 0x7fb5dc548100>
In [19]:
model.evaluate(x_test, y_test)
313/313 [==============================] - 6s 18ms/step - loss: 0.0614 - acc: 0.9821
Out[19]:
[0.061403539031744, 0.9821000099182129]
In [ ]: