9-Keras人工神经网络简介

理论部分

神经元

人工神经网络(ANN)是深度学习的核心
神经元的逻辑计算
- McCulloch和Pitts提出了一个非常简单的生物神经元模型，该模型后来被称为神经元。它具有一个或多个二进制(开/关)输入和一个二进制输出，当超过一定数量的输入处于激活状态时，人工神经元将激活其输出。
感知器
- 感知器是最简单的ANN架构之一，由Frank Rosenblatt于1957年发明。它基于稍微不同的人工神经元，称为阈值逻辑单元(TLU)，有时也称为线性阈值单元(LTU)。输入和输出是数字(而不是二进制开/关值)，并且每个输入连接都与权重相关联。TLU计算其输入的加权和
[z=w_1x_1 + x_2x_2 + ... + w_nx_n = x^Tw ]
- 然后将阶跃函数应用于该和并输出结果：
[h_w(x)=step(z)，其中z=x^Tw ]
- 感知器中最常用的阶跃函数是Heaviside阶跃函数，有时使用符号函数代替
[感知器中使用的常见阶跃函数(假设阈值=0)\ heaviside(z)=left{egin{matrix} 0，如果z<0 \ 1，如果z>=0 end{matrix} ight .\ sign(z)=left{egin{matrix} -1，如果z<0 \ 0，如果z=0 \ +1，如果z>0 end{matrix} ight . ]
- 单个TLU可用于简单的线性二进制分类。它计算输入的线性组合，如果结果超过阈值，则输出这个类；否则，它将输出负类。感知器仅由单层TLU组成，每个TLU连接到所有的输入。当一层中的所有神经元都连接到上一层中的每个神经元(即其输入神经元)时，该层称为全连接层或密基层。感知器的输入被送到称为输入神经元的特殊直通神经元：它们输出被送入的任何输入。所有输入神经元形成输入层。此外，通常会添加一个额外的偏置特征(x₀=1)：通常使用一种称为偏置神经元的特殊类型的神经元来表示该特征，该神经元始终输出1。
[计算全连接层的输出：\ h_{W,b}(X)=Phi(XW+b) ]
- Hebb学习
  - “触发的细胞，连接在一起”。也就是说，两个神经元同时触发时，它们之间的连接权重会增加。该规则后来被称为Hebb规则。更具体地说，感知器以此被送入一个训练实例，并且针对每个实例进行预测。对于产生错误预测的每个输出神经元，它会增强来自输入的连接权重，这些权重将有助于正确的预测。
[感知器学习规则：\ w_{i,j}^{下一步}=w_{i,j}+eta(y_j-hat y_j)x_i\ 此等式中：\ w_{i,j}是第i个输入神经元和第j个输出神经元之间的连接权重\ x_i是当前训练实例的第i个输入值\ hat y_j是当前训练实例第j个输出神经元的输出\ y_j是当前训练实例的第j个输出神经元的目标输出\ eta是学习率 ]
- 每个输出神经元的决策边界都是线性的，因此感知器无法学习复杂的模式(就像逻辑回归分类器一样)。但是如果训练实例是线性可分的，Rosenblatt证明了该算法将收敛到一个解，这被称为感知器收敛定理；感知器学习算法非常类似于随机梯度下降；与逻辑回归分类相反，感知器不输出分类概率，相反，它们基于硬阈值进行预测，这是逻辑回归胜过感知器的原因
- 事实证明可以通过堆叠多个感知器来消除感知器的某些局限性。所得的ANN称为多层感知器(MLP)

多层感知器和反向传播

MLP由一层(直通)输入层、一层或多层(TLU)(称为隐藏层)和一个TLU的最后一层(称为输出层)组成。除输出层外，每一层都包含一个偏置神经元，并完全连接到下一层。信号仅沿一个方向(从输入到输出)流动，因此该架构是前馈神经网络(FNN)的示例
当一个ANN包含一个深层的隐藏层时，它称为深度神经网络(DNN)
反向传播算法
- 它使用有效的技术自动计算梯度下降：在仅两次通过网络的过程中(一次前向，一次反向)，反向传播算法能够针对每个模型参数计算网络误差的梯度。换句话说，它可以找出应如何调整每个连接权重和每个偏置项以减少误差。一旦获得了这些梯度，它便会执行常规的梯度下降步骤，然后重复整个过程，直到网络收敛到解
- 它一次处理一个小批量，并且多次遍历整个训练集，每次遍历都称为一个轮次
- 每个小批量都传递到网络的输入层，然后将其送到第一个隐藏层。然后该算法将计算该层中所有神经元的输出(对于小批量中的每个实例)。结果传递到下一层，计算其输出并传递到下一层，以此类推，直到获得最后一层(即输出层)的输出。这就是前向通路：就像进行预测一样，只是保留了所有中间结果，因为反向遍历需要它们
- 接下来，该算法测量网络的输出误差(该算法使用一种损失函数，该函数将网络的期望输出与实际输出进行比较，并返回一些误差测量值)
- 然后，它计算每个输出连接对错误的贡献程度。通过应用链式法则来进行分析，从而使此步骤变得快速而精确
- 然后，算法再次使用链式法则来测量这些错误贡献中有多少是来自下面层中每个连接的错误贡献，算法一直进行，到达输入层为止。如前所述，这种反向传递通过在网络中向后传播误差梯度，从而有效地测量了网络中所有连接权重上的误差梯度
- 最终，该算法执行梯度下降步骤，使用刚刚计算出的误差梯度来调整网络中的所有连接权重
- 对于每个训练实例，反向传播算法首先进行预测(正向传递)并测量误差，然后反向经过每个层以测量来自每个连接的误差贡献(反向传递)，最后调整连接权重以减少错误(梯度下降步骤)
- 随机初始化所有隐藏层的连接权重很重要，否则训练将失败
- 为了使该算法正常工作，作者对MLP的架构进行了重要更改，将阶跃函数替换为逻辑(s型)函数：σ(z)=1/(1+exp(-z))。这一点很重要，因为阶跃函数仅包含平坦段，所以没有梯度可使用(梯度下降不能在平面上移动)，而逻辑函数在各处均有定义明确的非零导数，从而使梯度下降在每一步都可以有所进展。

典型的回归MLP架构

超参数	典型值
输入神经元数量	每个输入特征一个(例如：MNIST为28*28=784)
隐藏层数量	取决于问题，但通常为1到5
每个隐藏层的神经元数量	取决于问题，但通常为10到100
输出神经元数量	每个预测维度输出1个神经元
隐藏的激活	ReLU(或SELU)
输出激活	无，或ReLUsoftplus(如果为正输出)或逻辑/tanh(如果为有界输出)
损失函数	MSE或MAE/Huber(如果存在离群值)

典型的分类MLP架构

超参数	二进制分类	多标签二进制分类	多类分类
输入层和隐藏层	与回归相同	与回归相同	与回归相同
输出神经元数量	1	每个标签1	每个类1
输出层激活	逻辑	逻辑	softmax
损失函数	交叉熵	交叉熵	交叉熵

使用Keras实现MLP

使用顺序API构建图像分类器
- 使用keras加载数据集
- 使用顺序API创建模型
- 编译模型
- 训练和评估模型
- 使用模型进行预测
使用函数式API构建复杂模型
使用子类API构建动态模型
保存和还原模型
使用回调函数
使用TensorBoard进行可视化

微调神经网络超参数

神经网络的灵活性也是它们的主要缺点之一：有许多需要调整的超参数，一种选择是简单地尝试超参数的许多组合，然后查看那种对验证集最有效，为此，我们需要将Keras模型包装在模仿常规Scikit-Learn回归器的对象中
由于存在许多超参数，因此最好使用随机搜索而不是网格搜索
使用随机搜索并不困难，它可以很好地解决许多相当简单的问题，但是，当训练速度很慢时，此方法将仅探索超参数空间的一小部分。你可以通过手动协助搜索过程来部分缓解此问题：首先使用宽范围的超参数值快速进行随机搜索，然后使用以第一次运行中找到的最佳值为中心，用较小范围的值运行另一个搜索，以此类推。幸运的是，现在有许多技术可以比随机方法更有效地探索搜索空间。
隐藏层数量
- 与浅层网络相比，深层网络可以使用更少的神经元对复杂的函数进行建模，从而使它们在相同数量的训练数据下可以获得更好的性能
- 较低的隐藏层对底层结构建模，中间的隐藏层组合这些低层结构，对中间层结构进行建模，而最高层的隐藏层和输出层将这些中间结构组合起来，对高层结构进行建模。这种分层架构不仅可以帮助DNN更快地收敛到一个好的解，而且还可以提高DNN泛化到新数据的能力
迁移学习
- 如果你已经训练了一个模型来识别图片中的人脸，并且现在训练一个新的神经网络来识别发型，则可以通过重用第一个网络的较低层来开始训练。你可以将它们初始化为第一个网络较低层的权重和偏置值，而不是随机初始化新神经网络前几层的权重和偏置值。这样网络就不比从头开始学习大多数图片中出现的所有结构，只需学习更高层次的结构。
每个隐藏层的神经元数量
- 对于隐藏层，通常将它们调整大小以形成金字塔形状，每一层的神经元越来越少，理由是许多底层特征可以合并层更少的高层特征。但是这种做法已被很大程度上放弃了。因为似乎在所有隐藏层中使用相同数量的神经元，在大多数情况下层的表现都一样好，甚至更好；另外，只需要调整一个超参数，而不是每层一个。也就是说，根据数据集，有时使第一个隐藏层大于其他隐藏层是有帮助的。
- 选择一个比你实际需要的层和神经元更多的模型，然后使用提前停止和其他正则化技术来防止模型过拟合，通常更简单更有效。通常，通过增加层数而不是每层神经元数，你将获得更多收益
其他超参数
- 学习率
- 优化器
- 批量大小
- 激活函数
- 迭代次数

代码部分

引入

import sys
assert sys.version_info >= (3, 5)

import sklearn
assert sklearn.__version__ >= '0.20'

try:
    %tensorflow_version 2.x
except Exception as e:
    pass

import tensorflow as tf
assert tf.__version__ >= '2.0'

import numpy as np
import os

np.random.seed(42)

%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

PROJECT_ROOT_DIR = '.'
CHAPTER_ID = 'ann'
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, 'images', CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension='png', resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + '.' + fig_extension)
    print('Saving figure', fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

感知机

import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:, (2, 3)]  # petal length, petal width
y = (iris.target == 0).astype(np.int)

per_clf = Perceptron(max_iter=1000, tol=1e-3, random_state=42)
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])

y_pred  # array([1])

a = -per_clf.coef_[0][0] / per_clf.coef_[0][1]
b = -per_clf.intercept_ / per_clf.coef_[0][1]

axes = [0, 5, 0, 2]

x0, x1 = np.meshgrid(np.linspace(axes[0], axes[1], 500).reshape(-1, 1), np.linspace(axes[2], axes[3], 200).reshape(-1, 1))
X_new = np.c_[x0.ravel(), x1.ravel()]
y_predict = per_clf.predict(X_new)
zz = y_predict.reshape(x0.shape)

plt.figure(figsize=(10, 4))
plt.plot(X[y==0, 0], X[y==0, 1], 'bs', label='Not Iris-Setosa')
plt.plot(X[y==1, 0], X[y==1, 1], 'yo', label='Iris-Setosa')

plt.plot([axes[0], axes[1]], [a * axes[0] + b, a * axes[1] + b], 'k-', linewidth=3)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#9898ff', '#fafab0'])

plt.contourf(x0, x1, zz, cmap=custom_cmap)
plt.xlabel('Petal length', fontsize=14)
plt.ylabel('Petal width', fontsize=14)
plt.legend(loc='lower right', fontsize=14)
plt.axis(axes)

save_fig('perceptron_iris_plot')
plt.show()

激活函数

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def relu(z):
    return np.maximum(0, z)

def derivative(f, z, eps=0.000001):
    return (f(z + eps) - f(z - eps)) / (2 * eps)

z = np.linspace(-5, 5, 200)

plt.figure(figsize=(11, 4))

plt.subplot(121)
plt.plot(z, np.sign(z), 'r--', linewidth=1, label='Step')
plt.plot(z, sigmoid(z), 'g--', linewidth=2, label='Sigmoid')
plt.plot(z, np.tanh(z), 'b-', linewidth=2, label='Tanh')
plt.plot(z, relu(z), 'm-.', linewidth=2, label='ReLU')
plt.grid(True)
plt.legend(loc='center right', fontsize=14)
plt.title('Activation functions', fontsize=14)
plt.axis([-5, 5, -1.2, 1.2])

plt.subplot(122)
plt.plot(z, derivative(np.sign, z), 'r-', linewidth=1, label='Step')
plt.plot(0, 0, 'ro', markersize=5)
plt.plot(0, 0, 'rx', markersize=10)
plt.plot(z, derivative(sigmoid, z), 'g--', linewidth=2, label='Sigmoid')
plt.plot(z, derivative(np.tanh, z), 'b-', linewidth=2, label='Tanh')
plt.plot(z, derivative(relu, z), 'm-.', linewidth=2, label='ReLU')
plt.grid(True)
plt.title('Derivatives', fontsize=14)
plt.axis([-5, 5, -0.2, 1.2])

save_fig('activation_function_plot')
plt.show()

def heaviside(z):
    return (z >= 0).astype(z.dtype)

def mlp_xor(x1, x2, activation=heaviside):
    return activation(-activation(x1 + x2 - 1.5) + activation(x1 + x2 - 0.5) - 0.5)

x1s = np.linspace(-0.2, 1.2, 100)
x2s = np.linspace(-0.2, 1.2, 100)
x1, x2 = np.meshgrid(x1s, x2s)

z1 = mlp_xor(x1, x2, activation=heaviside)
z2 = mlp_xor(x1, x2, activation=sigmoid)

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.contourf(x1, x2, z1)
plt.plot([0, 1], [0, 1], 'gs', markersize=20)
plt.plot([0, 1], [1, 0], 'y^', markersize=20)
plt.title('Activation function: heaviside', fontsize=14)
plt.grid(True)

plt.subplot(122)
plt.contourf(x1, x2, z2)
plt.plot([0, 1], [0, 1], 'gs', markersize=20)
plt.plot([0, 1], [1, 0], 'y^', markersize=20)
plt.title('Activation function: sigmoid', fontsize=14)
plt.grid(True)

使用Keras构建图片分类器

import tensorflow as tf
from tensorflow import keras

tf.__version__  # '2.6.2'

keras.__version__  # '2.6.0'

fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

X_train_full.shape  # (60000, 28, 28)

X_train_full.dtype  # dtype('uint8')

X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.

plt.imshow(X_train[0], cmap='binary')
plt.axis('off')
plt.show()

y_train  # array([4, 0, 7, ..., 3, 0, 5], dtype=uint8)

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

class_names[y_train[0]]  # 'Coat'

X_valid.shape  # (5000, 28, 28)

X_test.shape  # (10000, 28, 28)

n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))
for row in range(n_rows):
    for col in range(n_cols):
        index = n_cols * row + col
        plt.subplot(n_rows, n_cols, index + 1)
        plt.imshow(X_train[index], cmap='binary', interpolation='nearest')
        plt.axis('off')
        plt.title(class_names[y_train[index]], fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
save_fig('fashion_mnist_plot', tight_layout=False)
plt.show()

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation='relu'))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation='relu'),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model.layers
'''
[<keras.layers.core.Flatten at 0x27354379358>,
 <keras.layers.core.Dense at 0x273543794e0>,
 <keras.layers.core.Dense at 0x27354379828>,
 <keras.layers.core.Dense at 0x27354379b38>]
'''

model.summary()
'''
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________
'''

keras.utils.plot_model(model, 'my_fashion_mnist_model.png', show_shapes=True)

hidden1 = model.layers[1]
hidden1.name  # 'dense'

model.get_layer(hidden1.name) is hidden1  # True

weights, biases = hidden1.get_weights()
weights
'''
array([[ 0.02448617, -0.00877795, -0.02189048, ..., -0.02766046,
         0.03859074, -0.06889391],
       [ 0.00476504, -0.03105379, -0.0586676 , ...,  0.00602964,
        -0.02763776, -0.04165364],
       [-0.06189284, -0.06901957,  0.07102345, ..., -0.04238207,
         0.07121518, -0.07331658],
       ...,
       [-0.03048757,  0.02155137, -0.05400612, ..., -0.00113463,
         0.00228987,  0.05581069],
       [ 0.07061854, -0.06960931,  0.07038955, ..., -0.00384101,
         0.00034875,  0.02878492],
       [-0.06022581,  0.01577859, -0.02585464, ..., -0.00527829,
         0.00272203, -0.06793761]], dtype=float32)
'''

weights.shape  # (784, 300)

biases
'''
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
'''

biases.shape  # (300,)

model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))

history.params  # {'verbose': 1, 'epochs': 30, 'steps': 1719}

print(history.epoch)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

history.history.keys()  # dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

import pandas as pd

pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1)
save_fig('keras_learning_curves_plot')
plt.show()

model.evaluate(X_test, y_test)  # [0.3360535800457001, 0.883400022983551]

X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)
'''
array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.01, 0.  , 0.03, 0.  , 0.96],
       [0.  , 0.  , 0.99, 0.  , 0.01, 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]],
      dtype=float32)
'''

y_pred = np.argmax(model.predict(X_new), axis=-1)
y_pred  # array([9, 2, 1], dtype=int64)

np.array(class_names)[y_pred]  # array(['Ankle boot', 'Pullover', 'Trouser'], dtype='<U11')

y_new = y_test[:3]
y_new  # array([9, 2, 1], dtype=uint8)

plt.figure(figsize=(7.2, 2.4))
for index, image in enumerate(X_new):
    plt.subplot(1, 3, index + 1)
    plt.imshow(image, cmap='binary', interpolation='nearest')
    plt.axis('off')
    plt.title(class_names[y_test[index]], fontsize=12)
# wspace、hspace分别表示子图之间左右、上下的间距
plt.subplots_adjust(wspace=0.2, hspace=0.5)
save_fig('fashion_mnist_images_plot', tight_layout=False)
plt.show()

回归MLP

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.fit_transform(X_valid)
X_test = scaler.fit_transform(X_test)

np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=X_train.shape[1:]),
    keras.layers.Dense(1)
])
model.compile(loss='mean_squared_error', optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)
X_new = X_test[:3]
y_pred = model.predict(X_new)

plt.plot(pd.DataFrame(history.history))
plt.grid(True)
plt.gca().set_ylim(0, 1)
plt.show()

y_pred  # array([[0.5947634], [1.6204587], [3.604278 ]], dtype=float32)

mse_test  # 0.42627349495887756

函数式API

np.random.seed(42)
tf.random.set_seed(42)

input_ = keras.layers.Input(shape=X_train.shape[1:])
hidden1 = keras.layers.Dense(30, activation='relu')(input_)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.models.Model(inputs=[input_], outputs=[output])

model.summary()
'''
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            [(None, 8)]          0                                            
__________________________________________________________________________________________________
dense_10 (Dense)                (None, 30)           270         input_2[0][0]                    
__________________________________________________________________________________________________
dense_11 (Dense)                (None, 30)           930         dense_10[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 38)           0           input_2[0][0]                    
                                                                 dense_11[0][0]                   
__________________________________________________________________________________________________
dense_12 (Dense)                (None, 1)            39          concatenate_1[0][0]              
==================================================================================================
Total params: 1,239
Trainable params: 1,239
Non-trainable params: 0
__________________________________________________________________________________________________
'''

model.compile(loss='mean_squared_error', optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)
y_pred = model.predict(X_new)

np.random.seed(42)
tf.random.set_seed(42)

input_A = keras.layers.Input(shape=[5], name='wide_input')
input_B = keras.layers.Input(shape=[6], name='deep_input')
hidden1 = keras.layers.Dense(30, activation='relu')(input_B)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name='output')(concat)
model = keras.models.Model(inputs=[input_A, input_B], outputs=[output])

model.compile(loss='mse', optimizer=keras.optimizers.SGD(learning_rate=1e-3))
X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]

history = model.fit((X_train_A, X_train_B), y_train, epochs=20, validation_data=((X_valid_A, X_valid_B), y_valid))
mse_test = model.evaluate((X_test_A, X_test_B), y_test)
y_pred = model.predict((X_new_A, X_new_B))

np.random.seed(42)
tf.random.set_seed(42)

input_A = keras.layers.Input(shape=[5], name='wide_input')
input_B = keras.layers.Input(shape=[6], name='deep_input')
hidden1 = keras.layers.Dense(30, activation='relu')(input_B)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name='main_output')(concat)
# 辅助输出
aux_output = keras.layers.Dense(1, name='aux_output')(hidden2)
model = keras.models.Model(inputs=[input_A, input_B], outputs=[output, aux_output])

model.compile(loss=['mse', 'mse'], loss_weights=[0.9, 0.1], optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit([X_train_A, X_train_B], [y_train, y_train], epochs=20, validation_data=([X_valid_A, X_valid_B], [y_valid, y_valid]))

total_loss, main_loss, aux_loss = model.evaluate(
    [X_test_A, X_test_B], [y_test, y_test]
)
y_pred_main, y_pred_aux = model.predict([X_new_A, X_new_B])

子类API

class WideAndDeepModel(keras.models.Model):
    def __init__(self, units=30, activation='relu', **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = keras.layers.Dense(units, activation=activation)
        self.hidden2 = keras.layers.Dense(units, activation=activation)
        self.main_output = keras.layers.Dense(1)
        self.aux_output = keras.layers.Dense(1)
    
    def call(self, inputs):
        input_A, input_B = inputs
        hidden1 = self.hidden1(input_B)
        hidden2 = self.hidden2(hidden1)
        concat = keras.layers.concatenate([input_A, hidden2])
        main_output = self.main_output(concat)
        aux_output = self.aux_output(hidden2)
        return main_output, aux_output

model = WideAndDeepModel(30, activation='relu')

model.compile(loss='mse', loss_weights=[0.9, 0.1], optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit((X_train_A, X_train_B), (y_train, y_train), epochs=10, validation_data=((X_valid_A, X_valid_B), (y_valid, y_valid)))
total_loss, main_loss, aux_loss = model.evaluate((X_test_A, X_test_B), (y_test, y_test))
y_pred_main, y_pred_aux = model.predict((X_new_A, X_new_B))

保存和加载模型

np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=[8]),
    keras.layers.Dense(30, activation='relu'),
    keras.layers.Dense(1)
])

model.compile(loss='mse', optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)

model.save('my_keras_model.h5')

model = keras.models.load_model('my_keras_model.h5')

model.predict(X_new)  # array([[0.4736449], [1.6739202], [3.142792 ]], dtype=float32)

model.save_weights('my_keras_weights.ckpt')

model.load_weights('my_keras_weights.ckpt')

训练期间使用回调函数

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=[8]),
    keras.layers.Dense(30, activation='relu'),
    keras.layers.Dense(1)
])

model.compile(loss='mse', optimizer=keras.optimizers.SGD(learning_rate=1e-3))
# 只有在验证集上的性能达到目前最好时，它才会保存模型
checkpoint_cb = keras.callbacks.ModelCheckpoint('my_keras_model.h5', save_best_only=True)
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid), callbacks=[checkpoint_cb])
model = keras.models.load_model('my_keras_model.h5')
mse_test = model.evaluate(X_test, y_test)

# 提前停止
model.compile(loss='mse', optimizer=keras.optimizers.SGD(learning_rate=1e-3))
# 如果多个轮次训练没有进展，它将终止训练
early_stopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid), callbacks=[checkpoint_cb, early_stopping_cb])
mse_test = model.evaluate(X_test, y_test)

class PrintValTrainRationCallBack(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        print('
val/train: {:.2f}'.format(logs['val_loss'] / logs['loss']))

val_train_ratio_cb = PrintValTrainRationCallBack()
history = model.fit(X_train, y_train, epochs=1, validation_data=(X_valid, y_valid), callbacks=[val_train_ratio_cb])  # val/train: 1.10

TensorBoard

root_logdir = os.path.join(os.curdir, 'my_logs')

def get_run_logdir():
    import time
    run_id = time.strftime('run_%Y_%m_%d-%H_%M_%S')
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
run_logdir  # '.\my_logs\run_2021_11_15-16_43_40'

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=[8]),
    keras.layers.Dense(30, activation='relu'),
    keras.layers.Dense(1)
])
model.compile(loss='mse', optimizer=keras.optimizers.SGD(learning_rate=1e-3))

tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid), callbacks=[checkpoint_cb, tensorboard_cb])

%load_ext tensorboard
%tensorboard --logdir=./my_logs --port=6006

run_logdir2 = get_run_logdir()
run_logdir2  # '.\my_logs\run_2021_11_15-16_49_46'

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=[8]),
    keras.layers.Dense(30, activation='relu'),
    keras.layers.Dense(1)
])
model.compile(loss='mse', optimizer=keras.optimizers.SGD(learning_rate=0.05))

tensorboard_cb = keras.callbacks.TensorBoard(run_logdir2)
history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid), callbacks=[checkpoint_cb, tensorboard_cb])

help(keras.callbacks.TensorBoard.__init__)
'''
Help on function __init__ in module keras.callbacks:

__init__(self, log_dir='logs', histogram_freq=0, write_graph=True, write_images=False, write_steps_per_second=False, update_freq='epoch', profile_batch=2, embeddings_freq=0, embeddings_metadata=None, **kwargs)
    Initialize self.  See help(type(self)) for accurate signature.
'''

微调超参数

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation='relu'))
    model.add(keras.layers.Dense(1))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
    model.compile(loss='mse', optimizer=optimizer)
    return model

keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)

keras_reg.fit(X_train, y_train, epochs=100, validation_data=(X_valid, y_valid), callbacks=[keras.callbacks.EarlyStopping(patience=10)])

mse_test = keras_reg.score(X_test, y_test)
mse_test  # -0.37181994318962097

y_pred = keras_reg.predict(X_new)

np.random.seed(42)
tf.random.set_seed(42)

# 随机搜索
from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    'n_hidden': [0, 1, 2, 3],
    'n_neurons': np.arange(1, 100).tolist(),
    'learning_rate': reciprocal(3e-4, 3e-2).rvs(1000).tolist(),
}
rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)  # 打印的详细程度
rnd_search_cv.fit(X_train, y_train, epochs=100, validation_data=(X_valid, y_valid), callbacks=[keras.callbacks.EarlyStopping(patience=10)])

rnd_search_cv.best_params_  # {'n_neurons': 68, 'n_hidden': 3, 'learning_rate': 0.02681276837118844}

rnd_search_cv.best_score_  # -0.28945433100064594

rnd_search_cv.best_estimator_  # <keras.wrappers.scikit_learn.KerasRegressor at 0x2734f6425c0>

rnd_search_cv.score(X_test, y_test)  # -0.45435237884521484

model = rnd_search_cv.best_estimator_.model
model  # <keras.engine.sequential.Sequential at 0x27355180ac8>

model.evaluate(X_test, y_test)  # 0.45435237884521484