第一个神经网络简洁示例

以下内容来自《python深度学习》，仅作学习笔记记录~

第一个神经网络示例——使用 Python 的 Keras 库来学习手写数字分类

在机器学习中，分类问题中的某个类别叫作类（class）。数据点叫作样本（sample）。某个样本对应的类叫作标签（label）。

将手写数字的灰度图像（28 像素×28 像素）划分到 10 个类别中（0~9）。使用 MNIST 数据集，这个数据集包含 60000 张训练图像和10000 张测试图像。MNIST 数据集预先加载在 Keras 库中，其中包括 4 个 Numpy 数组。

from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels)=mnist.load_data()

加载 Keras 中的 MNIST 数据集

train_images 和 train_labels 组成了训练集（training set），模型将从这些数据中进行学习。然后在测试集（test set，即 test_images 和 test_labels）上对模型进行测试。图像被编码为 Numpy 数组，而标签是数字数组，取值范围为 0~9。图像和标签一一对应。

train_images.shape (60000, 28, 28)
len(train_labels) 60000 
train_labels array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

训练数据

test_images.shape (10000, 28, 28)
len(test_labels) 10000
test_labels array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

测试数据

接下来的工作流程如下：

首先，将训练数据（train_images 和 train_labels）输入神经网络；其次，网络学习将图像和标签关联在一起；最后，网络对 test_images 生成预测，而我们将验证这些预测与 test_labels 中的标签是否匹配。下面我们来构建网络。

from keras import models 
from keras import layers 
network = models.Sequential() 
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,))) 
network.add(layers.Dense(10, activation='softmax'))

网络架构

神经网络的核心组件是层（layer），它是一种数据处理模块，你可以将它看成数据过滤器。进去一些数据，出来的数据变得更加有用。大多数深度学习都是将简单的层链接起来，从而实现渐进式的数据蒸馏（data distillation）。深度学习模型就像是数据处理的筛子，包含一系列越来越精细的数据过滤器（即层）。

本例中的网络包含 2 个 Dense 层，它们是密集连接（也叫全连接）的神经层。第二层是一个10 路 softmax 层，它将返回一个由 10 个概率值（总和为 1）组成的数组。每个概率值表示当前数字图像属于 10 个数字类别中某一个的概率。

要想训练网络，我们还需要选择编译（compile）步骤的三个参数：

损失函数（loss function）：网络如何衡量在训练数据上的性能，即网络如何朝着正确的方向前进。

优化器（optimizer）：基于训练数据和损失函数来更新网络的机制。

在训练和测试过程中需要监控的指标（metric）：本例只关心精度，即正确分类的图像所占的比例。

network.compile(optimizer='rmsprop',loss='categorical_crossentropy', metrics=['accuracy'])

编译步骤

在开始训练之前，我们将对数据进行预处理，将其变换为网络要求的形状，并缩放到所有值都在 [0, 1] 区间。

比如，之前训练图像保存在一个 uint8 类型的数组中，其形状为 (60000, 28, 28)，取值区间为 [0, 255]。我们需要将其变换为一个 float32 数组，其形状为 (60000, 28 * 28)，取值范围为 0~1。

train_images = train_images.reshape((60000, 28 * 28)) train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

准备图像数据

from keras.utils import to_categorical
train_labels = to_categorical(train_labels) 
test_labels = to_categorical(test_labels)

准备标签

现在准备开始训练网络，在 Keras 中这一步是通过调用网络的 fit 方法来完成的—— 我们在训练数据上拟合（fit）模型。

network.fit(train_images, train_labels, epochs=5, batch_size=128) 
##输出
#Epoch 1/5 
#60000/60000 [=============================] - 9s - loss: 0.2524 - acc: 0.9273
#Epoch 2/5 
#51328/60000 [=======================>.....] - ETA: 1s - loss: 0.1035 - acc: 0.9692

训练过程中显示了两个数字：一个是网络在训练数据上的损失（loss），另一个是网络在训练数据上的精度（acc）。我们很快就在训练数据上达到了 0.989（98.9%）的精度。

test_loss, test_acc = network.evaluate(test_images, test_labels)  print('test_acc:', test_acc) 
#输出test_acc: 0.9785

检查模型在测试集上的性能

测试集精度为 97.8%，比训练集精度低不少。训练精度和测试精度之间的这种差距是过拟合（overfit）造成的。过拟合是指机器学习模型在新数据上的性能往往比在训练数据上要差。第一个例子到这里就结束了。你刚刚看到了如何构建和训练一个神经网络，用不到 20 行的 Python 代码对手写数字进行分类。