第二次作业:卷积神经网络 part 1

第一部分 视频学习

  • 深度学习的数学基础

  1. 机器学习三要素:模型,策略,算法

    • 模型:对要学习问题映射的假设(问题建模,确定假设空间)
    • 策略:从假设空间中学习/选择最优模型的准则(确定目标函数)
    • 算法:根据目标函数求解最优模型的具体计算方法(求解模型参数)
  2. 概率/函数形式的统一

  3. 欠拟合&过拟合

    • 欠拟合:训练集的一般性质尚未被学习器学好(训练误差大)
    • 过拟合:学习器把训练集特点当作样本的一般特点(训练误差小,测试误差大)
    • 解决方案 欠拟合:提高模型复杂度;过拟合:降低模型复杂度
  • 卷积神经网络

  1. 卷积神经网络的基本应用:分类、检索、检测、分割、人脸识别、图像生成、自动驾驶等

  2. 卷积神经网络通过局部关联和参数共享解决了过拟合问题。

  3. 卷积层--Convolutional Layer

    • 卷积是对两个实变函数的一种数学操作

    • 基本概念:input--输入、kernel/filter--卷积核/滤波器、weights--权重、reception field--感受野、feature map--特征图、depth/channel--深度、output--输出、padding

    • 输出的特征图大小:(N-F)/stride+1·······未加padding (N+padding*2-F)/stride+1········加padding时

  4. 池化层--Pooling Layer

    • 保留了主要特征的同时减少参数和计算量,防止过拟合,提高模型泛化能力
    • Pooling类型:最大值池化(Max Pooling)、平均池化(Average Pooling)
  5. 卷积神经网络典型结构:AlexNet、VGG、GoogleNet、ResNet

  • 结合pytorch代码讲解ResNet

    残差学习

第二部分 代码练习

  • MNIST数据集分类

    主要代码

    1. 加载数据 MNIST

      input_size  = 28*28   # MNIST上的图像尺寸是 28x28
      output_size = 10      # 类别为 0 到 9 的数字,因此为十类
      
      train_loader = torch.utils.data.DataLoader(
          datasets.MNIST('./data', train=True, download=True,
              transform=transforms.Compose(
                  [transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))])),
          batch_size=64, shuffle=True)
      test_loader = torch.utils.data.DataLoader(
          datasets.MNIST('./data', train=False, transform=transforms.Compose([
                   transforms.ToTensor(),
                 transforms.Normalize((0.1307,), (0.3081,))])),
          batch_size=1000, shuffle=True)
      显示数据集中的部分图像
       plt.figure(figsize=(8, 5))
      for i in range(20):
          plt.subplot(4, 5, i + 1)
          image, _ = train_loader.dataset.__getitem__(i)
          plt.imshow(image.squeeze().numpy(),'gray')
          plt.axis('off');
      ![](https://img2020.cnblogs.com/blog/1814284/202008/1814284-20200801132223030-192065762.png)
      
      
      
    2. 创建网络

      ​ 定义网络时,需要继承nn.Module,并实现它的forward方法,把网络中具有可学习参数的层放在构造函数init中。只要在nn.Module的子类中定义了forward函数,backward函数就会自动被实现(利用autograd)。

    3. 在小型全连接网络上训练(Fully-connected network)

     n_hidden = 8 # number of hidden units
     
     model_fnn = FC2Layer(input_size, n_hidden, output_size)
     model_fnn.to(device)
     optimizer = optim.SGD(model_fnn.parameters(), lr=0.01, momentum=0.5)
     print('Number of parameters: 
     {}'.format(get_n_params(model_fnn)))
    
     train(model_fnn)
     test(model_fnn)
    

    1. 在卷积神经网络上训练
      需要注意的是,上在定义的CNN和全连接网络,拥有相同数量的模型参数
    n_features = 6 # number of feature maps
    
    model_cnn = CNN(input_size, n_features, output_size)
    model_cnn.to(device)
    optimizer = optim.SGD(model_cnn.parameters(), lr=0.01, momentum=0.5)
    print('Number of parameters: {}'.format(get_n_params(model_cnn)))
    
    train(model_cnn)
    test(model_cnn)
    

    1. 打乱像素顺序再次在两个网络上训练与测试
    #在全连接网络上训练与测试:
    perm = torch.randperm(784)
    n_hidden = 8 # number of hidden units
    
    model_fnn = FC2Layer(input_size, n_hidden, output_size)
    model_fnn.to(device)
    optimizer = optim.SGD(model_fnn.parameters(), lr=0.01, momentum=0.5)
    print('Number of parameters: {}'.format(get_n_params(model_fnn)))
    
    train_perm(model_fnn, perm)
    test_perm(model_fnn, perm)
    

    !

    #在卷积神经网络上训练与测试
    perm = torch.randperm(784)
    n_features = 6 # number of feature maps
    
    model_cnn = CNN(input_size, n_features, output_size)
    model_cnn.to(device)
    optimizer = optim.SGD(model_cnn.parameters(), lr=0.01, momentum=0.5)
    print('Number of parameters: {}'.format(get_n_params(model_cnn)))
    
    train_perm(model_cnn, perm)
    test_perm(model_cnn, perm)
    

  • CIFAR10 数据集分类

    主要代码

    images, labels = iter(trainloader).next()
    imshow(torchvision.utils.make_grid(images))
    # 展示第一行图像的标签
    for j in range(8):
        print(classes[labels[j]])
    

    定义网络,损失函数和优化器,训练网络

    从测试集中取出8张图片,把图片输入模型,看看CNN把这些图片识别成什么:

    outputs = net(images.to(device))
    _, predicted = torch.max(outputs, 1)
    # 展示预测的结果
    for j in range(8):
        print(classes[predicted[j]])
    

    可以看到,有几个都识别错了~ 现在看看网络在整个数据集上的表现:

    correct = 0
    total = 0
    
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    print('Accuracy of the network on the 10000 test images: %d %%' % (
        100 * correct / total))
    

  • 使用 VGG16 对 CIFAR10 分类

    主要代码

    测试验证准确率

  • 使用VGG模型迁移学习进行猫狗大战

    主要代码

    1.数据处理

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    vgg_format = transforms.Compose([
                    transforms.CenterCrop(224),
                    transforms.ToTensor(),
                    normalize,
                ])
    
    data_dir = './dogscats'
    dsets = {x: datasets.ImageFolder(os.path.join(data_dir, x), vgg_format)
             for x in ['train', 'valid']}
    
    dset_sizes = {x: len(dsets[x]) for x in ['train', 'valid']}
    dset_classes = dsets['train'].classes
    

    显示图片

    def imshow(inp, title=None):
    #   Imshow for Tensor.
        inp = inp.numpy().transpose((1, 2, 0))
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        inp = np.clip(std * inp + mean, 0,1)
        plt.imshow(inp)
        if title is not None:
            plt.title(title)
        plt.pause(0.001)  # pause a bit so that plots are updated
    
    # 显示 labels_try 的5张图片,即valid里第一个batch的5张图片
    out = torchvision.utils.make_grid(inputs_try)
    imshow(out, title=[dset_classes[x] for x in labels_try])
    

    创建VGG模型

    !wget https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json
    
    model_vgg = models.vgg16(pretrained=True)
    
    with open('./imagenet_class_index.json') as f:
        class_dict = json.load(f)
    dic_imagenet = [class_dict[str(i)][1] for i in range(len(class_dict))]
    
    inputs_try , labels_try = inputs_try.to(device), labels_try.to(device)
    model_vgg = model_vgg.to(device)
    outputs_try = model_vgg(inputs_try)
    print(outputs_try)
    print(outputs_try.shape)
    

    修改最后一层

    print(model_vgg)
    
    model_vgg_new = model_vgg;
    
    for param in model_vgg_new.parameters():
        param.requires_grad = False
    model_vgg_new.classifier._modules['6'] = nn.Linear(4096, 2)
    model_vgg_new.classifier._modules['7'] = torch.nn.LogSoftmax(dim = 1)
    
    model_vgg_new = model_vgg_new.to(device)
    print(model_vgg_new.classifier)
    

    训练并测试

    train_model(model_vgg_new,loader_train,size=dset_sizes['train'], epochs=1, 
                optimizer=optimizer_vgg)
    

可视化模型预测结果

# 单次可视化显示的图片个数
n_view = 8
correct = np.where(predictions==all_classes)[0]
from numpy.random import random, permutation
idx = permutation(correct)[:n_view]
print('random correct idx: ', idx)
loader_correct = torch.utils.data.DataLoader([dsets['valid'][x] for x in idx],
                  batch_size = n_view,shuffle=True)
for data in loader_correct:
    inputs_cor,labels_cor = data
# Make a grid from batch
out = torchvision.utils.make_grid(inputs_cor)
imshow(out, title=[l.item() for l in labels_cor])

原文地址:https://www.cnblogs.com/lixinhh/p/13414199.html