{:check ["true"]}
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.models as models
import tensorflow.keras.layers as layers
import tensorflow.keras.losses as losses
import tensorflow.keras.optimizers as optimizers
import tensorflow.keras.datasets as datasets
import numpy as np
import matplotlib.pyplot as pl
import matplotlib.patches as patches
Load the data
dataset = keras.datasets.mnist.load_data()
(x_train, y_train), (x_test, y_test) = dataset
x_train = x_train / 255.
x_test = x_test / 255.
Let's examine just one of the samples.
x0 = x_train[0]
(c0, c1), (r0, r1) = (5, 10), (20, 25)
pl.imshow(x0, cmap='gray')
ax = pl.gca()
ax.add_patch(patches.Rectangle((c0, r0), c1-c0, r1-r0,
facecolor='none',
edgecolor='red',
linewidth=5))
We will define the kernel as the subregion.
kernel = x0[r0:r1, c0:c1]
pl.imshow(kernel, cmap='gray');
Here is the function that performs 2D convolution.
$Y = \mathrm{convolve}(X, K)$ is given as:
$$Y[i,j] = \left<X[i:i+w, j:j+h] \cdot K\right> $$def conv2d(image, kernel):
(w_i, h_i) = image.shape
(w_k, h_k) = kernel.shape
w, h = (w_i-w_k+1), (h_i-h_k+1)
result = np.zeros((w, h))
for i in range(w):
for j in range(h):
region = image[i:i+w_k, j:j+h_k]
result[i,j] = np.sum(region * kernel)
return result
This is the result of applying the kernel to the image. We can see the region of strong excitation.
result = conv2d(x0, kernel)
pl.imshow(result)
fig = pl.figure(figsize=(15, 5))
for i in range(1, 6):
x = x_train[i]
pl.subplot(1, 5, i)
pl.imshow(conv2d(x, kernel))
We construct a conv2d layer.
filters is the number of kernels.
kernel_size is the dimensionality of each of the kernels.
conv2d = layers.Conv2D(filters=1, kernel_size=kernel.shape)
Keras Conv2D expects a batch of multi-channel images (RGB).
So, the input tensor size should be:
(batch_size, width, height, channels)
.
output = conv2d(x0.reshape(1, 28, 28, 1))
output.shape
Keras Conv2D layer comes with two model parameters:
(kernel_parameter, bias_parameter) = conv2d.get_weights()
print('kernel_parameter:', kernel_parameter.shape)
print('bias_parameter:', bias_parameter.shape)
Let's set these parameters manually, and see if we can reproduce the same output.
conv2d.set_weights([kernel.reshape(5, 5, 1, 1), np.array([0])])
output = conv2d(x0.reshape(1, 28, 28, 1))
pl.imshow(output.numpy().squeeze())
Observation:
Solution:
Padding the original image so that the output of Conv2D has the same dimensional as the original image.
conv2d_padded = layers.Conv2D(filters=1, kernel_size=kernel.shape, padding='same')
print("x0.shape", x0.shape)
output = conv2d_padded(x0.reshape(1, 28, 28, 1))
print("output.shape", output.shape)
Pooling just means aggregating using the max
function.
MaxPooling2D scans through the 2D image, and performs pooling in the region defined by the
pool_size
.
The region is scanned through the image, covering the entire image using strides
.
maxpooling = layers.MaxPooling2D(pool_size=(2,2))
output1 = conv2d(x0.reshape(1, 28, 28, 1)).numpy()
output2 = maxpooling(output1).numpy()
pl.subplot(1,2,1)
pl.imshow(output1.squeeze())
pl.subplot(1,2,2)
pl.imshow(output2.squeeze())
We will use Conv2D / Max Pooling as feature construction, and perform classification using a dense layer followed by softmax.
model = models.Sequential([
layers.Input(shape=(28,28)),
layers.Reshape((28,28,1)),
layers.Conv2D(32, (3,3), padding='same'),
layers.MaxPooling2D((2,2)), # 14x14
layers.Conv2D(16, (3,3), padding='same'),
layers.MaxPooling2D((2,2)), # 7x7
layers.Flatten(),
layers.Dense(10, activation='softmax'),
])
model.compile(loss=losses.SparseCategoricalCrossentropy(),
optimizer=optimizers.Adam(),
metrics=['acc'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)