卷积神经网络的简单可视化

本次将进行卷积神经网络权重的简单可视化。

在本篇教程的前半部分，我们会首先定义一个及其简单的 CNN 模型，并手工指定一些过滤器权重参数，作为卷积核参数。

后半部分，我们会使用 FashionMNIST 数据集，并且定义一个 2 层的 CNN 模型，将模型训练至准确率在 85% 以上，再进行模型卷积核的可视化。

1. 简单卷积网络模型的可视化

1.1 指定过滤器卷积层的可视化

在下面的练习中，我们将手动定义几个类似索比尔算子的过滤器，并将它们指定给一个极其简单地卷积神经网络模型。然后可视化卷积层 4 个过滤器的输出（即 feature maps）。

加载目标图像

import cv2
import matplotlib.pyplot as plt
%matplotlib inline

img_path = 'images/udacity_sdc.png'
bgr_img = cv2.imread(img_path)

gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)
gray_img = gray_img.astype("float32")/255

plt.imshow(gray_img, cmap='gray')
plt.show()

手动定义过滤器

import numpy as np

filter_vals = np.array([[-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1]])

# 变化产生更丰富的过滤器
filter_1 = filter_vals
filter_2 = -filter_1
filter_3 = filter_1.T
filter_4 = -filter_3
filters = np.array([filter_1, filter_2, filter_3, filter_4])

fig = plt.figure(figsize=(10, 5))
for i in range(4):
    ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
    ax.imshow(filters[i], cmap='gray')
    ax.set_title('Filter %s' % str(i+1))
    width, height = filters[i].shape
    for x in range(width):
        for y in range(height):
            ax.annotate(str(filters[i][x][y]), xy=(y,x),
                       horizontalalignment='center',
                       verticalalignment='center', 
                       color='white' if filters[i][x][y] < 0 else 'black')

定义简单卷积神经网络

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self, weight):
        super(Net, self).__init__()
        k_height, k_width = weight.shape[2:]
        self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)
        self.conv.weight = torch.nn.Parameter(weight)
        self.pool = nn.MaxPool2d(4,4)
        
    def forward(self, x):
        conv_x = self.conv(x)
        activated_x = F.relu(conv_x)
        pooled_x = self.pool(activated_x)
        
        return conv_x, activated_x, pooled_x
    
# filters 的大小为 4 4 4
# weight 的大小被增加为 4 1 4 4，1 的维度是针对输入的一个通道
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)

print('Filters shape: ', filters.shape)
print('weights shape: ', weight.shape)
print(model)

Filters shape:  (4, 4, 4)
weights shape:  torch.Size([4, 1, 4, 4])
Net(
  (conv): Conv2d(1, 4, kernel_size=(4, 4), stride=(1, 1), bias=False)
  (pool): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
)

可视化卷积输出

定义一个函数 viz_layer，在这个方法可以可视化某一层卷积的输出。

def viz_layer(layer, n_filters=4):
    fig = plt.figure(figsize=(20, 20))
    
    for i in range(n_filters):
        ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])
        ax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')
        ax.set_title('Output %s' % str(i+1))

# 输出原图
plt.imshow(gray_img, cmap='gray')
# 格式化输出过滤器（卷积核）
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):
    ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
    ax.imshow(filters[i], cmap='gray')
    ax.set_title('Filter %s' % str(i+1))
    
# 为 gray img 添加 1 个 batch 维度，以及 1 个 channel 维度，并转化为 tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)
print(gray_img.shape)
print(gray_img_tensor.shape)

# 将输入图传入模型，获得输出
conv_layer, activated_layer, pooled_layer = model(gray_img_tensor)

# 可视化卷积输出
viz_layer(conv_layer)

(213, 320)
torch.Size([1, 1, 213, 320])

# 可视化卷积后激活函数后的输出
viz_layer(activated_layer)

1.2 指定过滤器池化层的可视化

下面可视化池化层后的输出。

# 可视化池化层后的输出
viz_layer(pooled_layer)

2. 多层卷积网络模型的可视化

在下面的练习中，我们将定义一个相对复杂点的神经网络，并使用 FashionMNIST 数据集训练至 85% 以上的准确率，其后再对神经网络进行可视化分析。

2.1 加载 FashionMNIST 数据集

FashionMNIST 相当于一种对 MNIST 数据集的升级。MNIST 数据集的数字识别在目前来说，模式比较简单，可能作为深度神经网络模型的目标数据集稍显简单。FashionMNIST 将图像内容变为“时尚衣物”，图像格式不变，使用起来几乎与 MNIST 无异，且比 MNIST 更能考验模型对数据模式的学习能力。

FashionMNIST 的类别列表：

0：T-shirt/top（T恤） 
1：Trouser（裤子） 
2：Pullover（套衫） 
3：Dress（裙子） 
4：Coat（外套） 
5：Sandal（凉鞋） 
6：Shirt（汗衫） 
7：Sneaker（运动鞋） 
8：Bag（包）

加载 FashionMNIST 数据集

import torch
import torchvision

from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms

data_transform = transforms.ToTensor()

train_data = FashionMNIST(root='./data', train=True,
                         download=False, transform=data_transform)
test_data = FashionMNIST(root='./data', train=False,
                         download=False, transform=data_transform)

# Print out some stats about the training and test data
print('Train data, number of images: ', len(train_data))
print('Test data, number of images: ', len(test_data))

Train data, number of images:  60000
Test data, number of images:  10000

创建数据加载器

batch_size = 20

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)

# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

可视化目标数据集的部分数据

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title(classes[labels[idx]])#### 加载 FashionMNIST 数据集

2.2 训练多层卷积模型

定义模型

下面定义一个具有两层卷积的模型，加入的 dropout 在一定程度上起到防止过拟合的作用。

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.activation_l = nn.ReLU()
        
        self.fc = nn.Linear(32 * 7 * 7, 24)
        self.out = nn.Linear(24, 10)
        self.dropout = nn.Dropout(p=0.5)
        self.activation_out = nn.Softmax(dim=1)
        
    def forward(self, x):
        x = self.activation_l(self.conv1(x))
        x = self.pool1(x)
        x = self.activation_l(self.conv2(x))
        x = self.pool2(x)
        
        x = x.view(x.size(0), -1)
        x = self.activation_l(self.fc(x))
        x = self.dropout(x)
        x = self.activation_out(self.out(x))
        
        return x

训练模型

import torch.optim as optim

criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(net.parameters())

def train(n_epochs):
    for epoch in range(n_epochs):
        running_loss = 0.0
        for batch_i, data in enumerate(train_loader):
            inputs, labels = data
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            if batch_i % 1000 == 999:
                print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, running_loss/1000))
                running_loss = 0.0
                
    print('Finished Training')
    
n_epochs = 10

train(n_epochs)

model_dir = 'saved_models/'
model_name = 'model_best.pt'

torch.save(net.state_dict(), model_dir+model_name)

加载训练的模型

net = Net()

net.load_state_dict(torch.load('saved_models/model_best.pt'))

print(net)

Net(
  (conv1): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (activation_l): ReLU()
  (fc): Linear(in_features=1568, out_features=24, bias=True)
  (out): Linear(in_features=24, out_features=10, bias=True)
  (dropout): Dropout(p=0.5)
  (activation_out): Softmax()
)

在测试数据集上测试模型

test_loss = torch.zeros(1)
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

print(class_correct)
print(test_loss)

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
tensor([ 0.])

net.eval()

criterion = torch.nn.CrossEntropyLoss()

for batch_i, data in enumerate(test_loader):
    inputs, labels = data
    output = net(inputs)
    loss = criterion(outputs, labels)
    
    # update average test loss 
    test_loss = test_loss + ( (torch.ones(1) / (batch_i+1)) * (loss.data - test_loss) )
    
    _, predicted = torch.max(output.data, 1)
    
    correct = np.squeeze(predicted.eq(labels.data.view_as(predicted)))
    
    for i in range(batch_size):
        label = labels.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1
        
print('Test Loss: {:.6f}
'.format(test_loss.numpy()[0]))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            classes[i], 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

        
print('
Test Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))

Test Loss: 2.362950

Test Accuracy of T-shirt/top: 85% (850/1000)
Test Accuracy of Trouser: 96% (963/1000)
Test Accuracy of Pullover: 84% (842/1000)
Test Accuracy of Dress: 91% (911/1000)
Test Accuracy of  Coat: 85% (856/1000)
Test Accuracy of Sandal: 98% (989/1000)
Test Accuracy of Shirt: 49% (495/1000)
Test Accuracy of Sneaker: 94% (948/1000)
Test Accuracy of   Bag: 97% (978/1000)
Test Accuracy of Ankle boot: 93% (930/1000)

Test Accuracy (Overall): 87% (8762/10000)

2.3 特征可视化

模型得到训练并且在测试数据上可以达到 87% 的准确率，下面让我们进行可视化。

可视化策略是从模型中将各卷积层的参数提取出来，作为独立的过滤器，使用 OpenCV 的 filter2D 函数，施加在一张从测试集抽样出的图像中。观察过滤器对图像起到的作用，并尝试去解释当前过滤器对原图起到了怎样的滤波作用。

从数据集中抽取单张图片

dataiter = iter(test_loader)
images, labels = dataiter.next()
images = images.numpy()

idx = 15
img = np.squeeze(images[idx])

import cv2
plt.imshow(img, cmap='gray')

<matplotlib.image.AxesImage at 0x124832a90>

进行第一层卷积核的可视化

weights = net.conv1.weight.data
w = weights.numpy()
print(w.shape)

fig = plt.figure(figsize=(30, 10))
columns = 4 * 2
row = 4
for i in range(0, columns * row):
    fig.add_subplot(row, columns, i+1)
    if ((i%2)==0):
        plt.imshow(w[int(i/2)][0], cmap='gray')
    else:
        c = cv2.filter2D(img, -1, w[int((i-1)/2)][0])
        plt.imshow(c, cmap='gray')
plt.show()

(16, 1, 3, 3)

进行第一层卷积核的可视化

weights = net.conv2.weight.data
w = weights.numpy()
print(w.shape)

fig = plt.figure(figsize=(30, 20))
columns = 4 * 2
row = 8
for i in range(0, columns * row):
    fig.add_subplot(row, columns, i+1)
    if ((i%2)==0):
        plt.imshow(w[int(i/2)][0], cmap='gray')
    else:
        c = cv2.filter2D(img, -1, w[int((i-1)/2)][0])
        plt.imshow(c, cmap='gray')
plt.show()

(32, 16, 3, 3)

可以看到一些卷积核起到了边缘检测的功能，不同的卷积核对不同方向，不同的纹理，或者说不同的图像内容敏感。

感觉这种人以主观想法可视化卷积的方法还不够丰满，可能这就算是简单的神经网络的可视化方法。除了卷积核的可视化，还可以进行全连接层的可视化。

关于全连接层的可视化，有教程表示是通过可视化类似类别间不同数据单例的“嵌入向量”距离进行可视化的，可能还需要对全连接层产生的“嵌入向量”进行 T-SNE 将为后再进行可视化。如果后续遇到了相关内容，会在本文中再补上。

后记

本文内容参考自 Udacity 计算机视觉纳米学位练习，官方源码连接：

https://github.com/udacity/CVND_Exercises/tree/master/1_5_CNN_Layers