11使用TensorFlow自定义模型和训练

理论部分

TensorFlow快速浏览

TensorFlow2.0(beta)与2019年6月发布，使TensorFlow更加易于使用
TensorFlow是一个强大的用于数值计算的库
- 它的核心与NumPy非常相似，但具有GPU支持
- 它支持分布式计算(跨多个设备和服务器)
- 它包含一种即时(JIT)编译器，可使其针对速度和内存使用情况来优化计算。它的工作方式是从Python函数中提取计算图，然后进行优化(通过修剪未使用的节点)，最后有效地运行它(通过自动并行运行相互独立的操作)
- 计算图可以导出为可移植格式，因此你可以在一个环境中(例如在Linux上使用Python)训练TensorFlow模型，然后在另外一个环境中(例如Android设备上使用Java)运行TensorFlow模型
- 它实现了自动微分，并提供了一些优秀的优化器，因此你可以轻松地最小化各种损失函数
在最底层，每个TensorFlow操作都是使用高效地C++代码实现的。许多操作都有称为内核的多种实现：每个内核专用于特定的设备类型，例如CPU、GPU甚至TPU(张量处理单元)。如你所知，GPU可以通过将GPU分成许多较小的块并在多个GPU线程中并行它们来极大地加快计算速度。TPU甚至更快：它们是专门为深度学习操作而构建的定制ASIC芯片

像NumPy一样使用TensorFlow

TensorFlow的API一切都围绕张量，张量从一个操作流向另一个操作，因此命名为TensorFlow。张量非常类似NumPy的ndarray，它通常是一个多维数组，但它也可以保存标量
张量和操作
- @运算符是Python3.5中添加的，用于矩阵乘法，等效于调用tf.matual()函数
- tf.transpose()函数与NumPy的T属性没有完全相同的功能：在TensorFlow中，使用自己的转置数据副本创建一个新的张量，而在NumPy中，t.T只是相同数据的转置视图。类似地，tf.reduce_sum()操作之所以这样命名，是因为其GPU内核(即GPU实现)使用的reduce算法不能保证元素添加的顺序：因为32位浮点数的精度有限，因此每次你调用此操作时，结果可能稍有不同。tf.reduce_mean()也是如此，当然tf.reduce_max()是确定性的
张量和NumPy
- 张量可以与NumPy配合使用：你可以用NumPy数组创建张量，反之亦然。你甚至可以将TensorFlow操作应用于NumPy数组，将NumPy操作应用于张量
- 默认情况下NumPy使用64位精度，而TensorFlow使用32位精度。这是因为32位精度通常对于神经网络来说绰绰有余，而且运行速度更快且使用的RAM更少。因此，当你从NumPy数组创建张量时，请确保设置dtype=tf.float32
类型转换
- 类型转换严重影响性能，并且自动完成转换很容易被忽视。为了避免这种情况，TensorFlow不会自动执行任何类型转换：如果你对不兼容类型的张量执行操作，会引发异常。例如，你不能把浮点张量和整数张量相加，甚至不能相加32位浮点和64位浮点
变量
- 我们不能使用常规张量在神经网络中实现权重，因为它们需要通过反向传播进行调整，我们需要的是tf.Variable
- 实际上，你几乎不需要手动创建变量，因为Keras提供了add_weight()方法，而且模型参数通常由优化器直接更新，因此你几乎不需要手动更新变量
其他数据结构
- 稀疏张量(tf.SparseTensor)
  - 有效地表示主要包含零的张量
- 张量数组(tf.TensorArray)
  - 张量的列表，默认情况下，它们的大小是固定的，但可以选择动态设置。它们包含的所有张量必须具有相同的形状和数据类型
- 不规则张量(tf.RaggedTensor)
  - 表示张量列表的静态列表，其中每个张量具有相同的形状和数据类型
- 字符串张量
  - tf.string类型的常规张量。它们表示字节字符串，而不是Unicode字符串，因此如果使用Unicode字符串(常规的Python3字符串，例如'cafe')创建字符串张量，则它将自动被编码为UTF-8(例如，b'caf\ xc3\ xa9')。或者，你可以使用tf.int32类型的张量来表示Unicode字符串，其中每个项都表示一个Unicode代码点(例如[99、97、102、233])。tf.string是原子级的，这意味着它的长度不会出现在张量的形状中。一旦你将其转换为Unicode张量(即包含Unicode代码点的tf.int32类型的张量)后，长度就会显示在形状中
- 集合
  - 表示为常规张量(或稀疏张量)。例如tf.constant([[1,2], [3,4]])代表两个集合{1,2}和{3,4}。通常，每个集合由张量的最后一个轴上上的向量表示。
- 队列
  - 跨多个步骤存储的张量。TensorFlow提供了各种队列：简单的先进先出(FIFO)队列(FIFOQueue)，可以区分某些元素优先级的队列(PriorityQueue)，将其元素随机排序的队列(RandomShuffleQueue)，通过填充批处理具有不同形状的元素(PaddingFIFOQueue)

定制模型和训练算法

自定义损失函数
- 只需创建一个将标签和预测作为参数的函数，然后使用TensorFlow操作来计算每个实例的损失

\[Huber损失：\\ Huber Loss是一种将MSE和MAE结合起来，取两者优点的损失函数，也被称为\\ Smooth Mean Absolute Error Loss，原理：\\ 在误差接近0时使用MSE，在误差较大时使用MAE。\\ J_{Huber}= \frac{1}{N}\sum^N_{i=1}I_{|y_i-\hat y_i|<=δ} \frac{(y_i-\hat y_i)^2}{2} + I_{|y_i-\hat y_i|>δ} (δ|y_i-\hat y_i|-\frac{1}{2} δ^2) \]

保存和加载包含自定义组件的模型
- 当加载包含自定义对象的模型时，需要将名称映射到对象。不幸的是，当你保存模型时，阈值不会被保存，这意味着在加载模型时必须指定阈值。你可以通过创建keras.losses.Loss类的子类，然后实现其get_config()方法来解决此问题
- Keras API当前仅指定如何使用子类定义层、模型、回调和正则化。如果使用子类构建其他组件(例如损失、性能度量、初始化或约束)，则它们可能无法移植到其他Keras实现中
- get_config()方法返回一个字典，将每个超参数名称映射到此值。它首先调用父类的get_config()方法，然后将新的超参数添加到此字典中
- 当你保存模型时，Keras会调用损失实例的get_config()方法，并将配置以JSON格式保存到HDF5文件中。加载模型时，它在HuberLoss类上调用from_config()类方法：此方法由基类(Loss)实现，并创建该类的实例，并将**config传递给构造函数
自定义激活函数、初始化、正则化和约束
- 如果函数具有需要与模型一起保存的超参数，你需要继承适当的类
- 你必须为损失、层(包括激活函数)和模型实现call()方法，或者为正则化、初始化和约束实现__call__()方法
自定义指标
- 损失和指标在概念上不是一回事：损失(例如交叉熵)被梯度下降用来训练模型，因此它们必须是可微的(至少是在求值的地方)，并且梯度在任何地方都不应为0。另外，如果人类不容易解释他们也没有问题。相反，指标(例如准确率)用于评估模型，它们必须更容易被解释，并且可以是不可微的或在各处具有0梯度
- 流式指标(或状态指标)：逐批次更新的指标。某些指标(如精度)不能简单地按批次平均，在这种情况下，除了实现流式指标之外，别无选择
自定义层
- 你可能偶尔会想要构建一个包含独特层的架构，而TensorFlow没有为其提供默认实现。在这种情况下，你将需要创建一个自定义层。或者你可能只想构建一个重复的架构，其中包含重复多次的相同层块，因此将每个层块视为一个层会很方便
- 某些层没有权重，如果要创建带任何权重的自定义层，最简单的选择是编写一个函数并将其包装在keras.layers.Lambda层中
- 要构建自定义的有状态层(即具有权重的层)，你需要创建keras.layers.Layer类的子类
自定义模型
- 我们在构造函数中创建层，并在call()方法中使用它们。然后就可以像使用任何其他模型一样使用此模型(并对其进行编译、拟合、评估和预测)
基于模型内部的损失和指标
- 你可能要根据模型的其他部分来定义损失，例如权重或隐藏层的激活。这对于进行正则化或监视模型的某些内部方面可能很有用。要基于模型内部自定义损失，根据所需模型的任何部分进行计算，然后将结果传递给add_loss()方法。
- 与辅助输出相关的损失称为重建损失，我们鼓励模型通过隐藏层保留尽可能多的信息，即使对回归任务本身没有直接用处的信息。实际中，这种损失有时会提高泛化性(这是正则化损失)
- 你可以通过所需的任何方式计算来添加基于模型内部的自定义指标，只要结果是指标对象的输出即可
使用自动微分计算梯度
- 对于神经网络，通常具有数以万计的参数，用手工分析找到偏导数将几乎是不可能的任务。一种解决方案是通过在调整相应参数时测量函数输出的变化来计算每个偏导数的近似值。这工作看起来很好并且易于实现，但这只是一个近似值，重要的是每个参数至少要调用一次f()。每个参数至少需要调用f()一次，这种方法对于大型神经网络来说很棘手。因此我们应该使用自动微分，TensorFlow使这个变得非常简单
- 大多数情况下，一个梯度tape是用来计算单个值(通常是损失)相对于一组值(通常是模型参数)的梯度。这就是反向模式自动微分有用的地方，因为它只需执行一次正向传播和一次反向传播即可一次获得所有梯度。
自定义训练循环
- Wide&Deep论文使用了两种不同的优化器：一种用于宽路径，另一种用于深路径。由于fit()方法只使用一个优化器(编译模型时指定的优化器)，因此实现该论文需要编写你自己的自定义循环
- 除非你真的需要额外的灵活性，否则应该更倾向使用fit()方法，而不是实现你自己的训练循环，尤其是在团队合作中

TensorFlow函数和图

使用tf.function()将Python函数转换为TensorFlow函数，或者使用tf.function作为装饰器
TensorFlow可以优化计算图，修剪未使用的节点，简化表达式。准备好优化的图后，TF函数会以适当的顺序(并在可能时并行执行)有效地执行图中的操作。因此TF函数通常比原始的Python函数运行得更快，尤其是在执行复杂计算的情况下。大多数时候，你并不需要真正了解很多：当你想增强Python函数时，只需将其转换为TF函数即可
当你编写自定义损失函数、自定义指标、自定义层或任何其他自定义函数，并在Keras模型中使用它时，Keras会自动将你的函数转换为TF函数，不需要使用tf.function()
默认情况下，TF函数会为每个不同的输入形状和数据类型集生成一个新图形，并将其缓存以供后续调用。这就是TF函数处理多态(即变化的参数类型和形状)的方式。但是这仅适用于张量参数：如果将Python数值传递给TF函数，则将为每个不同的值生成一个新图
如果用不同的Python数值多次调用TF函数，则会生成许多图，这会降低程序的运行速度并消耗大量RAM(必须删除TF函数才能释放它)。Python值应保留给很少有唯一值的参数，例如像每层神经元的数量那样的超参数，这是TensorFlow可以更好地优化模型的每个变体
自动图和跟踪
- TensorFlow如何生成图
  - 第一步称为自动图：TensorFlow分析了函数的代码之后，自动图输出该函数的升级版本，其中所有控制流程语句都被相应的TensorFlow操作替换
  - 接下来，TensorFlow调用此“升级”函数，但不传递参数，而是传递符号张量—没有任何实际值的张量，仅包含名称、数据类型和形状，该函数将在图模式下运行，这意味着每个TensorFlow操作都会在图形中添加一个节点来表示自身及其输出张量(与常规模式相对，称为eager执行或eager模式)。在图形模式下，TF操作不执行任何计算
TF函数规则
- 如果调用任何外部库，包括NumPy甚至标准库，此调用将仅在跟踪过程中运行。它不会称为图表的一部分。实际上，TensorFlow图只能包含TensorFlow构造(张量、运算、变量、数据集等)
  - 如果你定义了一个返回np.random.rand()的TF函数法f(x)，则仅在跟踪该函数时才会生成随机数，因此f(tf.constant(2.))和f(tf.constant(2.))将返回相同的随机数。但f(tf.constant([2., 3.]))将返回不同的随机数。把np.random.rand()替换为tf.random.uniform([])，则每次操作都会生层一个新的随机数，该操作将成为图形的一部分
  - 如果你得非TensorFlow代码具有副作用(例如记录某些内容或更新Python计数器)，那么你不应期望每次调用TF函数时都会发生这些副作用，因为它们只会在跟踪该函数时发生
  - 你可以在tf.py_function()操作中包装任何Python代码，但这样做会降低性能，因为TensorFlow无法对此代码进行任何图优化。这也会降低可移植性，因为该图仅可在安装了Python(并且安装了正确的库)的平台上运行
- 你可以调用其他Python函数或TF函数，但它们应遵循相同的规则，因为TensorFlow会在计算图中捕获它们的操作，请注意这些其他函数不需要用@tf.function修饰
- 如果该函数创建了TensorFlow变量(或任何其他有状态的TensorFlow对象，例如数据集或队列)，则必须在第一次调用时这样做(只有这样做)，否则你会得到一个异常，通常最好在TF函数(例如在自定义层的build()方法中)外部创建变量。如果要为变量分配一个新值，确保调用它的assign()方法，而不要使用=运算符
- 你的Python函数的源代码可用于TensorFlow
- TensorFlow只能捕获在张量或数据集上迭代的for循环。因此，请确保你是用for i in tf.range(x)，而不是for i in range(x)，否则这个循环不会在图中被捕获
- 与通常一样，出于性能原因，应尽可能使用向量化实现，而不是使用循环

代码部分

引入

import sys
assert sys.version_info >= (3, 5)

import sklearn
assert sklearn.__version__ >= '0.20'

try:
    %tensorflow_version 2.x
except Exception as e:
    pass

import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= '2.4'

import numpy as np
import os

np.random.seed(42)
tf.random.set_seed(42)

%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

PROJECT_ROOT_DIR = '.'
CHAPTER_ID = 'deep'
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, 'images', CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension='png', resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + '.' + fig_extension)
    print('Saving figure', fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

张量和操作

# 张量
tf.constant([[1., 2., 3.], [4., 5., 6.]])
'''
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>
'''

tf.constant(42)  # <tf.Tensor: shape=(), dtype=int32, numpy=42>

t = tf.constant([[1., 2., 3.], [4., 5., 6.]])
t
'''
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>
'''

t.shape  # TensorShape([2, 3])

t.dtype  # tf.float32

# 索引
t[:, 1:]
'''
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>
'''

# t[...,1]等价于t[:,1]；如果是三维数值，t[...,1]等价于t[:,:,1]
# tf.newaxis和np.newaxis功能相同，都是增加维度。
t[..., 1, tf.newaxis]
'''
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>
'''

t[..., 1]  # <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 5.], dtype=float32)>

# 操作
t + 10
'''
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>
'''

tf.square(t)
'''
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>
'''

t @ tf.transpose(t)
'''
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>
'''

# 使用Keras.backend
from tensorflow import keras

K = keras.backend
K.square(K.transpose(t)) + 10
'''
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[11., 26.],
       [14., 35.],
       [19., 46.]], dtype=float32)>
'''

张量与NumPy相互转换

a = np.array([2., 4., 5.])
tf.constant(a)  # <tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

t.numpy()
'''
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)
'''

np.array(t)
'''
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)
'''

tf.square(a)  # <tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

np.square(t)
'''
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)
'''

类型冲突

try:
    tf.constant(2.0) + tf.constant(40)
except tf.errors.InvalidArgumentError as e:
    print(e)
'''
cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2]
'''

try:
    tf.constant(2.0) + tf.constant(40., dtype=tf.float64)
except tf.errors.InvalidArgumentError as e:
    print(e)
'''
cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:AddV2]
'''

t2 = tf.constant(40., dtype=tf.float64)
tf.constant(2.0) + tf.cast(t2, tf.float32)  # <tf.Tensor: shape=(), dtype=float32, numpy=42.0>

字符串张量

tf.constant(b'hello world')  # <tf.Tensor: shape=(), dtype=string, numpy=b'hello world'>

tf.constant("café")  # <tf.Tensor: shape=(), dtype=string, numpy=b'caf\xc3\xa9'>

# ASCII码
u = tf.constant([ord(c) for c in 'café'])
u  # <tf.Tensor: shape=(4,), dtype=int32, numpy=array([ 99,  97, 102, 233])>

b = tf.strings.unicode_encode(u, 'UTF-8')
tf.strings.length(b, unit='UTF8_CHAR')  # <tf.Tensor: shape=(), dtype=int32, numpy=4>

tf.strings.unicode_decode(b, 'UTF-8')  # <tf.Tensor: shape=(4,), dtype=int32, numpy=array([ 99,  97, 102, 233])>

字符串数组张量

p = tf.constant(['Café', 'Coffee', 'caffé', '咖啡'])

tf.strings.length(p, unit='UTF8_CHAR')  # <tf.Tensor: shape=(4,), dtype=int32, numpy=array([4, 6, 5, 2])>

r = tf.strings.unicode_decode(p, 'UTF8')
r  # <tf.RaggedTensor [[67, 97, 102, 233], [67, 111, 102, 102, 101, 101], [99, 97, 102, 102, 233], [21654, 21857]]>

不规则张量

print(r[1])  # tf.Tensor([ 67 111 102 102 101 101], shape=(6,), dtype=int32)

print(r[1: 3])  # <tf.RaggedTensor [[67, 111, 102, 102, 101, 101], [99, 97, 102, 102, 233]]>

r2 = tf.ragged.constant([[65, 66], [], [67]])
print(tf.concat([r, r2], axis=0))
'''
<tf.RaggedTensor [[67, 97, 102, 233], [67, 111, 102, 102, 101, 101], [99, 97, 102, 102, 233], [21654, 21857], [65, 66], [], [67]]>
'''

r3 = tf.ragged.constant([[68, 69, 70], 
                         [67, 111, 102, 102, 101, 101, 71], 
                         [99, 97, 102, 102, 232], 
                         [21654, 21857, 72, 73]])
print(tf.concat([r, r3], axis=1))
'''
<tf.RaggedTensor [[67, 97, 102, 233, 68, 69, 70], [67, 111, 102, 102, 101, 101, 67, 111, 102, 102, 101, 101, 71], [99, 97, 102, 102, 233, 99, 97, 102, 102, 232], [21654, 21857, 21654, 21857, 72, 73]]>
'''

tf.strings.unicode_encode(r3, "UTF-8")
'''
<tf.Tensor: shape=(4,), dtype=string, numpy=
array([b'DEF', b'CoffeeG', b'caff\xc3\xa8', b'\xe5\x92\x96\xe5\x95\xa1HI'],
      dtype=object)>
'''

r.to_tensor()
'''
<tf.Tensor: shape=(4, 6), dtype=int32, numpy=
array([[   67,    97,   102,   233,     0,     0],
       [   67,   111,   102,   102,   101,   101],
       [   99,    97,   102,   102,   233,     0],
       [21654, 21857,     0,     0,     0,     0]])>
'''

稀疏张量

s = tf.SparseTensor(indices=[[0, 1], [1, 0], [2, 3]], values=[1., 2., 3.], dense_shape=[3, 4])
print(s)
'''
SparseTensor(indices=tf.Tensor(
[[0 1]
 [1 0]
 [2 3]], shape=(3, 2), dtype=int64), values=tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32), dense_shape=tf.Tensor([3 4], shape=(2,), dtype=int64))
'''

tf.sparse.to_dense(s)
'''
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 1., 0., 0.],
       [2., 0., 0., 0.],
       [0., 0., 0., 3.]], dtype=float32)>
'''

s2 = s * 2.0

try:
    s3 = s + 1
except TypeError as e:
    print(e)  # unsupported operand type(s) for +: 'SparseTensor' and 'int'

s4 = tf.constant([[10., 20.], [30., 40.], [50., 60.], [70., 80.]])
tf.sparse.sparse_dense_matmul(s, s4)
'''
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ 30.,  40.],
       [ 20.,  40.],
       [210., 240.]], dtype=float32)>
'''

s5 = tf.SparseTensor(indices=[[0, 2], [0, 1]], values=[1., 2.], dense_shape=[3, 4])
print(s5)
'''
SparseTensor(indices=tf.Tensor(
[[0 2]
 [0 1]], shape=(2, 2), dtype=int64), values=tf.Tensor([1. 2.], shape=(2,), dtype=float32), dense_shape=tf.Tensor([3 4], shape=(2,), dtype=int64))
'''

try:
    tf.sparse.to_dense(s5)
except tf.errors.InvalidArgumentError as e:
    print(e)
'''
indices[1] = [0,1] is out of order. Many sparse ops require sorted indices.
    Use `tf.sparse.reorder` to create a correctly ordered copy.

 [Op:SparseToDense]
'''

s6 = tf.sparse.reorder(s5)
tf.sparse.to_dense(s6)
'''
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 2., 1., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]], dtype=float32)>
'''

集合

set1 = tf.constant([[2, 3, 5, 7], [7, 9, 0, 0]])
set2 = tf.constant([[4, 5, 6], [9, 10, 0]])
tf.sparse.to_dense(tf.sets.union(set1, set2))
'''
<tf.Tensor: shape=(2, 6), dtype=int32, numpy=
array([[ 2,  3,  4,  5,  6,  7],
       [ 0,  7,  9, 10,  0,  0]])>
'''

tf.sparse.to_dense(tf.sets.difference(set1, set2))
'''
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[2, 3, 7],
       [7, 0, 0]])>
'''

tf.sparse.to_dense(tf.sets.intersection(set1, set2))
'''
<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[5, 0],
       [0, 9]])>
'''

变量

v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])

v.assign(2 * v)
'''
<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>
'''

v[0, 1].assign(42)
'''
<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>
'''

v[:, 2].assign([0., 1.])
'''
<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>
'''

try:
    v[1] = [2., 8., 9.]
except TypeError as e:
    print(e)  # 'ResourceVariable' object does not support item assignment

v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])
'''
<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>
'''

sparse_delta = tf.IndexedSlices(values=[[1., 2., 3.], [4., 5., 6.]], indices=[1, 0])
v.scatter_update(sparse_delta)
'''
<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[4., 5., 6.],
       [1., 2., 3.]], dtype=float32)>
'''

张量数组

array = tf.TensorArray(dtype=tf.float32, size=3)
array = array.write(0, tf.constant([1., 2.]))
array = array.write(1, tf.constant([2., 10.]))
array = array.write(2, tf.constant([5., 7.]))

array.read(1)  # <tf.Tensor: shape=(2,), dtype=float32, numpy=array([ 2., 10.], dtype=float32)>

array.stack()
'''
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[1., 2.],
       [0., 0.],
       [5., 7.]], dtype=float32)>
'''

mean, variance = tf.nn.moments(array.stack(), axes=0)
mean  # <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 3.], dtype=float32)>

variance  # <tf.Tensor: shape=(2,), dtype=float32, numpy=array([4.6666665, 8.666667 ], dtype=float32)>

自定义损失函数

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target.reshape(-1, 1), random_state=42
)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

plt.figure(figsize=(8, 3.5))
z = np.linspace(-4, 4, 200)
plt.plot(z, huber_fn(0, z), 'b-', linewidth=2, label='huber($z$)')
plt.plot(z, z**2 / 2, 'b:', linewidth=1, label=r'$\frac{1}{2}z^2$')
plt.plot([-1, -1], [0, huber_fn(0., -1.)], 'r--')
plt.plot([1, 1], [0, huber_fn(0., 1.)], 'r--')
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.axis([-4, 4, 0, 4])
plt.grid(True)
plt.xlabel('$z$')
plt.legend(fontsize=14)
plt.title('Huber loss', fontsize=14)
plt.show()

input_shape = X_train.shape[1:]

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1)
])

model.compile(loss=huber_fn, optimizer='nadam', metrics=['mae'])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

保存/加载包含自定义对象的模型

model.save('my_model_with_a_custom_loss.h5')

model = keras.models.load_model('my_model_with_a_custom_loss.h5', custom_objects={'huber_fn': huber_fn})

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer='nadam', metrics=['mae'])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

model.save('my_model_with_a_custom_loss_threshild_2.h5')

model = keras.models.load_model('my_model_with_a_custom_loss_threshild_2.h5', custom_objects={'huber_fn': create_huber(2.0)})

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

class HuberLoss(keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss = self.threshold * tf.abs(error) - self.threshold ** 2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'threshold': self.threshold}

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1)
])

model.compile(loss=HuberLoss(2.), optimizer='nadam', metrics=['mae'])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

model.save('my_model_with_a_custom_loss.h5')

model = keras.models.load_model('my_model_with_a_custom_loss.h5', custom_objects={'HuberLoss': HuberLoss})

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

model.loss.threshold  # 2.0

其他自定义函数

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

def my_softplus(z):
    return tf.math.log(tf.exp(z) + 1.0)  # tf.nn.softplus(z)

def my_glorot_initializer(shape, dtype=tf.float32):
    '''
    tf.random.normal(
    shape, mean=0.0, stddev=1.0, dtype=tf.dtypes.float32, seed=None, name=None)
    '''
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

def my_positive_weights(weights):
    return tf.where(weights < 0., tf.zeros_like(weights), weights)  # tf.nn.relu(weights)

layers = keras.layers.Dense(1, activation=my_softplus, 
                            kernel_initializer=my_glorot_initializer, 
                            kernel_regularizer=my_l1_regularizer, 
                            kernel_constraint=my_positive_weights)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1, activation=my_softplus, 
                       kernel_initializer=my_glorot_initializer, 
                       kernel_regularizer=my_l1_regularizer, 
                       kernel_constraint=my_positive_weights)
])

model.compile(loss='mse', optimizer='nadam', metrics=['mae'])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

model.save('my_model_with_many_custom_parts.h5')

model = keras.models.load_model('my_model_with_many_custom_parts.h5', 
                               custom_objects={
                                   'my_l1_regularizer': my_l1_regularizer,
                                   'my_positive_weights': my_positive_weights,
                                   'my_glorot_initializer': my_glorot_initializer,
                                   'my_softplus': my_softplus
                               })

class MyL1Regularizer(keras.regularizers.Regularizer):
    def __init__(self, factor):
        self.factor = factor
    def __call__(self, weights):
        return tf.reduce_sum(tf.abs(self.factor * weights))
    def get_config(self):
        return {'factor': self.factor}

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1, activation=my_softplus, 
                       kernel_regularizer=MyL1Regularizer(0.01),  
                       kernel_constraint=my_positive_weights, 
                       kernel_initializer=my_glorot_initializer)
])

model.compile(loss='mse', optimizer='nadam', metrics=['mae'])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

model.save('my_model_with_many_custom_parts.h5')

model = keras.models.load_model('my_model_with_many_custom_parts.h5', custom_objects={
    'MyL1Regularizer': MyL1Regularizer,
    'my_positive_weights': my_positive_weights,
    'my_glorot_initializer': my_glorot_initializer,
    'my_softplus': my_softplus
})

自定义指标

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1)
])

model.compile(loss='mse', optimizer='nadam', metrics=[create_huber(2.0)])

model.fit(X_train_scaled, y_train, epochs=2)

model.compile(loss=create_huber(2.0), optimizer='nadam', metrics=[create_huber(2.0)])

# 如果使用与损失和度量相同的函数，您可能会惊讶地看到不同的结果。
# 这通常是由于浮点精度错误造成的：即使数学方程是等价的，但运算的顺序也不相同，
# 这可能导致微小的差异。
sample_weight = np.random.rand(len(y_train))
history = model.fit(X_train_scaled, y_train, epochs=2, sample_weight=sample_weight)

# 如果你做数学运算，你会发现loss=metric*样本权重的平均值（加上一些浮点精度误差）
history.history['loss'][0], history.history['huber_fn'][0] * sample_weight.mean()  # (0.11749906837940216, 0.11906625573138947)

流式指标

precision = keras.metrics.Precision()
precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])  # <tf.Tensor: shape=(), dtype=float32, numpy=0.8>

precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])  # <tf.Tensor: shape=(), dtype=float32, numpy=0.5>

precision.result()  # <tf.Tensor: shape=(), dtype=float32, numpy=0.5>

precision.variables
'''
[<tf.Variable 'true_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>,
 <tf.Variable 'false_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>]
'''

precision.reset_states()

# 创建一个流式指标
class HuberMetric(keras.metrics.Metric):
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs)
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        self.total = self.add_weight('total', initializer='zeros')
        self.count = self.add_weight('count', initializer='zeros')
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(metric))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))
    def result(self):
        return self.total / self.count
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'threshold': self.threshold}

m = HuberMetric(2.)
# total = 2 * |10 - 2| - 2²/2 = 14
# count = 1
# result = 14 / 1 = 14
m(tf.constant([[2.]]), tf.constant([[10.]]))  # <tf.Tensor: shape=(), dtype=float32, numpy=14.0>

m(tf.constant([[0.], [5.]]), tf.constant([[1.], [9.25]]))
m.result()  # <tf.Tensor: shape=(), dtype=float32, numpy=7.0>

m.variables
'''
[<tf.Variable 'total:0' shape=() dtype=float32, numpy=21.0>,
 <tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>]
'''

m.reset_states()
m.variables
'''
[<tf.Variable 'total:0' shape=() dtype=float32, numpy=0.0>,
 <tf.Variable 'count:0' shape=() dtype=float32, numpy=0.0>]
'''

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1),
])

model.compile(loss=create_huber(2.0), optimizer='nadam', metrics=[HuberMetric(2.0)])

model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2)

model.save('my_model_with_a_custom_loss.h5')

model = keras.models.load_model('my_model_with_a_custom_loss.h5', custom_objects={
    'huber_fn': create_huber(2.0),
    'HuberMetric': HuberMetric
})

model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2)

model.metrics[-1].threshold  # 2.0

class HuberMetric(keras.metrics.Mean):
    def __init__(self, threshold=1.0, name='HuberMetric', dtype=None):
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        super().__init__(name=name, dtype=dtype)
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        super(HuberMetric, self).update_state(metric, sample_weight)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'threshold': self.threshold}

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal', input_shape=input_shape),
    keras.layers.Dense(1),
])

model.compile(loss=keras.losses.Huber(2.0), optimizer='nadam', weighted_metrics=[HuberMetric(2.0)])

sample_weight = np.random.rand(len(y_train))
history = model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2, sample_weight=sample_weight)

history.history['loss'][0], history.history['HuberMetric'][0] * sample_weight.mean()  # (0.44554394483566284, 0.44554404180100277)

model.save('my_model_with_a_cumstom_metric_v2.h5')

model = keras.models.load_model('my_model_with_a_cumstom_metric_v2.h5', custom_objects={
    'HuberMetric': HuberMetric
})

model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2)

model.metrics[-1].threshold  # 2.0

自定义层

exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x))

exponential_layer([-1., 0., 1.])  # <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.36787945, 1.        , 2.7182817 ], dtype=float32)>

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=input_shape),
    keras.layers.Dense(1),
    exponential_layer
])
model.compile(loss='mse', optimizer='sgd')
model.fit(X_train_scaled, y_train, epochs=5, validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)  # 0.3586341142654419

class MyDense(keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)
    
    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name='kernel', shape=[batch_input_shape[-1], self.units],
            initializer='glorot_normal')
        self.bias = self.add_weight(
            name='bias', shape=[self.units], initializer='zeros')
        super().build(batch_input_shape)  # must be at the end
        
    def call(self, X):
        return self.activation(X @ self.kernel + self.bias)
    
    def compute_output_shape(self, batch_input_shape):
        return tf.TensorShape(batch_input_shape.as_list()[:-1] + [self.units])
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'units': self.units, 'activation': keras.activations.serialize(self.activation)}

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    MyDense(30, activation='relu', input_shape=input_shape),
    MyDense(1)
])

model.compile(loss='mse', optimizer='nadam')
model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)  # 0.5473727583885193

model.save('my_model_with_a_custom_layer.h5')

model = keras.models.load_model('my_model_with_a_custom_layer.h5', custom_objects={
    'MyDense': MyDense
})

class MyMultiLayer(keras.layers.Layer):
    def call(self, X):
        X1, X2 = X
        print('X1.shape:', X1.shape, 'X2.shape:', X2.shape)
        return X1 + X2, X1 * X2
    
    def compute_output_shape(self, batch_input_shape):
        batch_input_shape1, batch_input_shape2 = batch_input_shape
        return [batch_input_shape1, batch_input_shape2]

inputs1 = keras.layers.Input(shape=[2])
inputs2 = keras.layers.Input(shape=[2])
outputs1, output2 = MyMultiLayer()((inputs1, inputs2))
'''
X1.shape: (None, 2) X2.shape: (None, 2)
'''

def split_data(data):
    columns_count = data.shape[-1]
    half = columns_count // 2
    return data[:, :half], data[:, half:]

X_train_scaled_A, X_train_scaled_B = split_data(X_train_scaled)
X_valid_scaled_A, X_valid_scaled_B = split_data(X_valid_scaled)
X_test_scaled_A, X_test_scaled_B = split_data(X_test_scaled)

X_train_scaled_A.shape, X_train_scaled_B.shape  # ((11610, 4), (11610, 4))

outputs1, output2 = MyMultiLayer()((X_train_scaled_A, X_train_scaled_B))
'''
X1.shape: (11610, 4) X2.shape: (11610, 4)
'''

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

input_A = keras.layers.Input(shape=X_train_scaled_A.shape[-1])
input_B = keras.layers.Input(shape=X_train_scaled_B.shape[-1])
hidden_A, hidden_B = MyMultiLayer()((input_A, input_B))
hidden_A = keras.layers.Dense(30, activation='selu')(hidden_A)
hidden_B = keras.layers.Dense(30, activation='selu')(hidden_B)
concat = keras.layers.Concatenate()((hidden_A, hidden_B))
output = keras.layers.Dense(1)(concat)
model = keras.models.Model(inputs=[input_A, input_B], outputs=[output])  # X1.shape: (None, 4) X2.shape: (None, 4)

model.compile(loss='mse', optimizer='nadam')

model.fit((X_train_scaled_A, X_train_scaled_B), y_train, epochs=2, validation_data=((X_valid_scaled_A, X_valid_scaled_B), y_valid))

class AddGaussianNoise(keras.layers.Layer):
    def __init__(self, stddev, **kwargs):
        super().__init__(**kwargs)
        self.stddev = stddev
    
    def call(self, X, training=None):
        if training:
            noise = tf.random.normal(tf.shape(X), stddev=self.stddev)
            return X + noise
        else:
            return X
        
    def compute_output_shape(self, batch_input_shape):
        return batch_input_shape

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    AddGaussianNoise(stddev=1.0),
    keras.layers.Dense(30, activation='selu'),
    keras.layers.Dense(1)
])

model.compile(loss='mse', optimizer='nadam')
model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)  # 0.7559615969657898

自定义模型

X_new_scaled = X_test_scaled

class ResidualBlock(keras.layers.Layer):
    def __init__(self, n_layers, n_neurous, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(n_neurous, activation='elu', kernel_initializer='he_normal') for _ in range(n_layers)]
        
    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        return inputs + Z

class ResidualRegressor(keras.models.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = keras.layers.Dense(30, activation='elu', kernel_initializer='he_normal')
        self.block1 = ResidualBlock(2, 30)
        self.block2 = ResidualBlock(2, 30)
        self.out = keras.layers.Dense(output_dim)
        
    def call(self, inputs):
        Z = self.hidden1(inputs)
        for _ in range(1 + 3):
            Z = self.block1(Z)
        Z = self.block2(Z)
        return self.out(Z)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = ResidualRegressor(1)
model.compile(loss='mse', optimizer='nadam')
history = model.fit(X_train_scaled, y_train, epochs=5)
score = model.evaluate(X_test_scaled, y_test)
y_pred = model.predict(X_new_scaled)

model.save('my_custom_model.ckpt')

model = keras.models.load_model('my_custom_model.ckpt')

history = model.fit(X_train_scaled, y_train, epochs=5)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

block1 = ResidualBlock(2, 30)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation='elu', kernel_initializer='he_normal'),
    block1, block1, block1, block1,
    ResidualBlock(2, 30),
    keras.layers.Dense(1)
])

model.compile(loss='mse', optimizer='nadam')
history = model.fit(X_train_scaled, y_train, epochs=5)
score = model.evaluate(X_test_scaled, y_test)
y_pred = model.predict(X_new_scaled)

基于模型内部的损失和指标

class ReconstructingRegressor(keras.models.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation='selu', kernel_initializer='lecun_normal') for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
        self.reconstruct = keras.layers.Dense(8)
        self.reconstruction_mean = keras.metrics.Mean(name='reconstruction_error')
        
    def call(self, inputs, training=None):
        Z = inputs 
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        if training:
            result = self.reconstruction_mean(recon_loss)
            self.add_metric(result)
        return self.out(Z)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = ReconstructingRegressor(1)
model.compile(loss='mse', optimizer='nadam')
history = model.fit(X_train_scaled, y_train, epochs=2)
y_pred = model.predict(X_test_scaled)

利用自动微分计算梯度

def f(w1, w2):
    return 3 * w1 ** 2+ 2 * w1 * w2

w1, w2 = 5, 3
eps = 1e-6
(f(w1 + eps, w2) - f(w1, w2)) / eps  # 36.000003007075065

(f(w1, w2 + eps) - f(w1, w2)) / eps  # 10.000000003174137

w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)
    
gradient = tape.gradient(z, [w1, w2])

gradient
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]
'''

with tf.GradientTape() as tape:
    z = f(w1, w2)
    
dz_dw1 = tape.gradient(z, w1)
try:
    dz_dw2 = tape.gradient(z, w2)
except RuntimeError as e:
    print(e)  # A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)

with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)
dz_dw2 = tape.gradient(z, w2)
del tape

dz_dw1, dz_dw2
'''
(<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>)
'''

c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    z = f(c1, c2)
gradients = tape.gradient(z, [c1, c2])
gradients  # [None, None]

with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])
gradients
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]
'''

with tf.GradientTape() as tape:
    z1 = f(w1, w2 + 2.)
    z2 = f(w1, w2 + 5.)
    z3 = f(w1, w2 + 7.)
    
# 如果你尝试计算向量的梯度，那么TensorFlow将计算向量和的梯度
'''
3 * w1 ** 2+ 2 * w1 * w2

3 * w1 ** 2+ 2 * w1 * (w2+2)

3 * w1 ** 2+ 2 * w1 * (w2+5)

3 * w1 ** 2+ 2 * w1 * (w2+7)

9 * w1 ** 2 + 6 * w1 * w2 + 28 * w1

18 * 5 + 6 * 3 + 28 = 136

6 * 5 = 30
'''
tape.gradient([z1, z2, z3], [w1, w2])
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=136.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=30.0>]
'''

with tf.GradientTape(persistent=True) as tape:
    z1 = f(w1, w2 + 2.)
    z2 = f(w1, w2 + 5.)
    z3 = f(w1, w2 + 7.)

tf.reduce_sum(tf.stack([tape.gradient(z, [w1, w2]) for z in (z1, z2, z3)]), axis=0)
del tape

with tf.GradientTape(persistent=True) as hessian_tape:
    with tf.GradientTape() as jacobian_tape:
        z = f(w1, w2)
    jacobians = jacobian_tape.gradient(z, [w1, w2])
hessians = [hessian_tape.gradient(jacobian, [w1, w2]) for jacobian in jacobians]
del hessian_tape

jacobians
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]
'''

hessians
'''
[[<tf.Tensor: shape=(), dtype=float32, numpy=6.0>,
  <tf.Tensor: shape=(), dtype=float32, numpy=2.0>],
 [<tf.Tensor: shape=(), dtype=float32, numpy=2.0>, None]]
'''

def f(w1, w2):
    return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)

with tf.GradientTape() as tape:
    z = f(w1, w2)
    
tape.gradient(z, [w1, w2])  # [<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, None]

x = tf.Variable(100.)
with tf.GradientTape() as tape:
    z = my_softplus(x)

tape.gradient(z, [x])  # [<tf.Tensor: shape=(), dtype=float32, numpy=nan>]

tf.math.log(tf.exp(tf.constant(30., dtype=tf.float32)) + 1.)  # <tf.Tensor: shape=(), dtype=float32, numpy=30.0>

x = tf.Variable([100.])
with tf.GradientTape() as tape:
    z = my_softplus(x)

tape.gradient(z, [x])  # [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([nan], dtype=float32)>]

@tf.custom_gradient
def my_better_softplus(z):
    exp = tf.exp(z)
    def my_softplus_gradients(grad):
        return grad / (1 + 1/ exp)
    return tf.math.log(exp + 1), my_softplus_gradients

def my_better_softplus(z):
    return tf.where(z > 30., z, tf.math.log(tf.exp(z) + 1.))

x = tf.Variable([1000.])
with tf.GradientTape() as tape:
    z = my_better_softplus(x)

z, tape.gradient(z, [x])
'''
(<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1000.], dtype=float32)>,
 [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([nan], dtype=float32)>])
'''

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

l2_reg = keras.regularizers.l2(0.05)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation='elu', kernel_initializer='he_normal', kernel_regularizer=l2_reg),
    keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

def print_status_bar(iteration, total, loss, metrics=None):
    metrics = ' - '.join(['{}: {:.4f}'.format(m.name, m.result()) for m in [loss] + (metrics or [])])
    end = '' if iteration < total else '\n'
    print('\r{}/{} - '.format(iteration, total) + metrics, end=end)

import time

mean_loss = keras.metrics.Mean(name='loss')
mean_square = keras.metrics.Mean(name='mean_square')
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)

def progress_bar(iteration, total, size=30):
    running = iteration < total
    c = '>' if running else '='
    p = (size - 1) * iteration // total
    fmt = '{{:-{}d}} / {{}} [{{}}]'.format(len(str(total)))
    params = [iteration, total, '=' * p + c + '.' * (size -p -1)]
    return fmt.format(*params)

progress_bar(3500, 10000, size=6)

def print_status_bar(iteration, total, loss, metrics=None, size=30):
    metrics = ' - '.join(['{}: {:.4f}'.format(m.name, m.result()) for m in [loss] + (metrics or [])])
    end = '' if iteration < total else '\n'
    print('\r{} - {}'.format(progress_bar(iteration, total), metrics), end=end)

mean_loss = keras.metrics.Mean(name='loss')
mean_square = keras.metrics.Mean(name='mean_square')
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

n_epochs = 5
batch_size = 32
n_steps = len(X_train)  // batch_size
optimizer = keras.optimizers.Nadam(learning_rate=0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

for epoch in range(1, n_epochs + 1):
    print('Epoch {} / {}'.format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            y_pred = model(X_batch)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        for variable in model.variables:
            if variable.constraint is not None:
                variable.assign(variable.constraint(variable))
        mean_loss(loss)
        for metric in metrics:
            metric(y_batch, y_pred)
        print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    for metric in [mean_loss] + metrics:
        metric.reset_states()

try:
    from tqdm.notebook import trange
    from collections import OrderedDict
    with trange(1, n_epochs + 1, desc='All epochs') as epochs:
        for epoch in epochs:
            with trange(1, n_steps + 1, desc='Epoch {} / {}'.format(epoch, n_epochs)) as steps:
                for step in steps:
                    X_batch, y_batch = random_batch(X_train_scaled, y_train)
                    with tf.GradientTape() as tape:
                        y_pred = model(X_batch)
                        main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                        loss = tf.add_n([main_loss] + model.losses)
                    gradients = tape.gradient(loss, model.trainable_variables)
                    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
                    for variable in model.variables:
                        if variable.constraint is not None:
                            variable.assign(variable.constraint(variable))
                    status = OrderedDict()
                    mean_loss(loss)
                    status['loss'] = mean_loss.result().numpy()
                    for metric in metrics:
                        metric(y_batch, y_pred)
                        status[metric.name] = metric.result().numpy()
                    steps.set_postfix(status)
                for metric in [mean_loss] + metrics:
                    metric.reset_states()
except Exception as e:
    print(e)

TensorFlow函数

def cube(x):
    return x ** 3

cube(2)  # 8

cube(tf.constant(2.))  # <tf.Tensor: shape=(), dtype=float32, numpy=8.0>

tf_cube = tf.function(cube)
tf_cube  # <tensorflow.python.eager.def_function.Function at 0x1fa41b8d908>

tf_cube(2)  # <tf.Tensor: shape=(), dtype=int32, numpy=8>

tf_cube(tf.constant(2.))  # <tf.Tensor: shape=(), dtype=float32, numpy=8.0>

TF函数及具体功能

concrete_function = tf_cube.get_concrete_function(tf.constant(2.))
concrete_function.graph  # <tensorflow.python.framework.func_graph.FuncGraph at 0x1fa41bb5eb8>

concrete_function(tf.constant(2.))  # <tf.Tensor: shape=(), dtype=float32, numpy=8.0>

concrete_function is tf_cube.get_concrete_function(tf.constant(2.0))  # True

探索函数定义和图

concrete_function.graph  # <tensorflow.python.framework.func_graph.FuncGraph at 0x1fa41bb5eb8>

ops = concrete_function.graph.get_operations()
ops
'''
[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'pow/y' type=Const>,
 <tf.Operation 'pow' type=Pow>,
 <tf.Operation 'Identity' type=Identity>]
'''

pow_op = ops[2]
list(pow_op.inputs)
'''
[<tf.Tensor 'x:0' shape=() dtype=float32>,
 <tf.Tensor 'pow/y:0' shape=() dtype=float32>]
'''

pow_op.outputs  # [<tf.Tensor 'pow:0' shape=() dtype=float32>]

concrete_function.graph.get_operation_by_name('x')  # <tf.Operation 'x' type=Placeholder>

concrete_function.graph.get_tensor_by_name('Identity:0')  # <tf.Tensor 'Identity:0' shape=() dtype=float32>

concrete_function.function_def.signature
'''
name: "__inference_cube_1067234"
input_arg {
  name: "x"
  type: DT_FLOAT
}
output_arg {
  name: "identity"
  type: DT_FLOAT
}
'''

TF函数如何追踪Python函数并抽取计算图

@tf.function
def tf_cube(x):
    print('print:', x)
    return x ** 3

result = tf_cube(tf.constant(2.0))  # print: Tensor("x:0", shape=(), dtype=float32)

result  # <tf.Tensor: shape=(), dtype=float32, numpy=8.0>

result = tf_cube(2.)
result = tf_cube(3.)
result = tf_cube(tf.constant([[1, 2]]))
result = tf_cube(tf.constant([[3, 4], [5, 6]]))
result = tf_cube(tf.constant([[7, 8], [9, 10], [11, 12]]))
'''
print: 2.0
print: 3.0
print: Tensor("x:0", shape=(1, 2), dtype=int32)
print: Tensor("x:0", shape=(2, 2), dtype=int32)
WARNING:tensorflow:5 out of the last 5 calls to <function tf_cube at 0x000001A61F4BCD90> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
print: Tensor("x:0", shape=(3, 2), dtype=int32)
WARNING:tensorflow:6 out of the last 6 calls to <function tf_cube at 0x000001A61F4BCD90> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
'''

@tf.function(input_signature=[tf.TensorSpec([None, 28, 28], tf.float32)])
def shrink(images):
    print('Tracing', images)
    return images[:, ::2, ::2]

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

img_batch_1 = tf.random.uniform(shape=[100, 28, 28])
img_batch_2 = tf.random.uniform(shape=[50, 28, 28])
preprocessed_images = shrink(img_batch_1)
preprocessed_images = shrink(img_batch_2)
'''
Tracing Tensor("images:0", shape=(None, 28, 28), dtype=float32)
'''

img_batch_3 = tf.random.uniform(shape=[2, 2, 2])
try:
    preprocessed_images = shrink(img_batch_3)
except Exception as e:
    print(e)
'''
Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor(
[[[0.7413678  0.62854624]
  [0.01738465 0.3431449 ]]

 [[0.51063764 0.3777541 ]
  [0.07321596 0.02137029]]], shape=(2, 2, 2), dtype=float32))
  input_signature: (
    TensorSpec(shape=(None, 28, 28), dtype=tf.float32, name=None))
'''

用自动图追踪控制流

@tf.function
def add_10(x):
    for i in range(10):
        x += 1
    return x

add_10(tf.constant(5))  # <tf.Tensor: shape=(), dtype=int32, numpy=15>

add_10.get_concrete_function(tf.constant(5)).graph.get_operations()
'''
[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'add/y' type=Const>,
 <tf.Operation 'add' type=AddV2>,
 <tf.Operation 'add_1/y' type=Const>,
 <tf.Operation 'add_1' type=AddV2>,
 <tf.Operation 'add_2/y' type=Const>,
 <tf.Operation 'add_2' type=AddV2>,
 <tf.Operation 'add_3/y' type=Const>,
 <tf.Operation 'add_3' type=AddV2>,
 <tf.Operation 'add_4/y' type=Const>,
 <tf.Operation 'add_4' type=AddV2>,
 <tf.Operation 'add_5/y' type=Const>,
 <tf.Operation 'add_5' type=AddV2>,
 <tf.Operation 'add_6/y' type=Const>,
 <tf.Operation 'add_6' type=AddV2>,
 <tf.Operation 'add_7/y' type=Const>,
 <tf.Operation 'add_7' type=AddV2>,
 <tf.Operation 'add_8/y' type=Const>,
 <tf.Operation 'add_8' type=AddV2>,
 <tf.Operation 'add_9/y' type=Const>,
 <tf.Operation 'add_9' type=AddV2>,
 <tf.Operation 'Identity' type=Identity>]
'''

@tf.function
def add_10(x):
    condition = lambda i, x: tf.less(i, 10)
    body = lambda i, x: (tf.add(i, 1), tf.add(x, 1))
    final_i, final_x = tf.while_loop(condition, body, [tf.constant(0), x])
    return final_x

add_10(tf.constant(5))  # <tf.Tensor: shape=(), dtype=int32, numpy=15>

add_10.get_concrete_function(tf.constant(5)).graph.get_operations()
'''
[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'Const' type=Const>,
 <tf.Operation 'while/maximum_iterations' type=Const>,
 <tf.Operation 'while/loop_counter' type=Const>,
 <tf.Operation 'while' type=StatelessWhile>,
 <tf.Operation 'Identity' type=Identity>]
'''

@tf.function
def add_10(x):
    for i in tf.range(10):
        x += 1
    return x

add_10.get_concrete_function(tf.constant(0)).graph.get_operations()
'''
[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'range/start' type=Const>,
 <tf.Operation 'range/limit' type=Const>,
 <tf.Operation 'range/delta' type=Const>,
 <tf.Operation 'range' type=Range>,
 <tf.Operation 'sub' type=Sub>,
 <tf.Operation 'floordiv' type=FloorDiv>,
 <tf.Operation 'mod' type=FloorMod>,
 <tf.Operation 'zeros_like' type=Const>,
 <tf.Operation 'NotEqual' type=NotEqual>,
 <tf.Operation 'Cast' type=Cast>,
 <tf.Operation 'add' type=AddV2>,
 <tf.Operation 'zeros_like_1' type=Const>,
 <tf.Operation 'Maximum' type=Maximum>,
 <tf.Operation 'while/maximum_iterations' type=Const>,
 <tf.Operation 'while/loop_counter' type=Const>,
 <tf.Operation 'while' type=StatelessWhile>,
 <tf.Operation 'Identity' type=Identity>]
'''

在TF函数中处理变量和其他资源

counter = tf.Variable(0)

@tf.function
def increment(counter, c=1):
    return counter.assign_add(c)

increment(counter)
increment(counter)
'''
<tf.Tensor: shape=(), dtype=int32, numpy=2>
'''

function_def = increment.get_concrete_function(counter).function_def
function_def.signature.input_arg[0]
'''
name: "counter"
type: DT_RESOURCE
'''

counter = tf.Variable(0)

@tf.function
def increment(c=1):
    return counter.assign_add(c)

increment()
increment()  # <tf.Tensor: shape=(), dtype=int32, numpy=2>

function_def = increment.get_concrete_function().function_def
function_def.signature.input_arg[0]
'''
name: "assignaddvariableop_resource"
type: DT_RESOURCE
'''

class Counter:
    def __init__(self):
        self.counter = tf.Variable(0)
        
    @tf.function
    def increment(self, c=1):
        return self.counter.assign_add(c)

c = Counter()
c.increment()
c.increment()  # <tf.Tensor: shape=(), dtype=int32, numpy=2>

@tf.function
def add_10(x):
    for i in tf.range(10):
        x += 1
    return x

print(tf.autograph.to_code(add_10.python_function))
'''
def tf__add(x):
    with ag__.FunctionScope('add_10', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()

        def get_state():
            return (x,)

        def set_state(vars_):
            nonlocal x
            (x,) = vars_

        def loop_body(itr):
            nonlocal x
            i = itr
            x = ag__.ld(x)
            x += 1
        i = ag__.Undefined('i')
        ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (10,), None, fscope), None, loop_body, get_state, set_state, ('x',), {'iterate_names': 'i'})
        try:
            do_return = True
            retval_ = ag__.ld(x)
        except:
            do_return = False
            raise
        return fscope.ret(retval_, do_return)
'''

def display_tf_code(func):
    from IPython.display import display, Markdown
    if hasattr(func, 'python_function'):
        func = func.python_function
    code = tf.autograph.to_code(func)
    display(Markdown('```python\n{}\n```'.format(code)))

display_tf_code(add_10)
'''
def tf__add(x):
    with ag__.FunctionScope('add_10', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()

        def get_state():
            return (x,)

        def set_state(vars_):
            nonlocal x
            (x,) = vars_

        def loop_body(itr):
            nonlocal x
            i = itr
            x = ag__.ld(x)
            x += 1
        i = ag__.Undefined('i')
        ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (10,), None, fscope), None, loop_body, get_state, set_state, ('x',), {'iterate_names': 'i'})
        try:
            do_return = True
            retval_ = ag__.ld(x)
        except:
            do_return = False
            raise
        return fscope.ret(retval_, do_return)
'''

在Keras中使用TF函数

# Custom loss functin
def my_mse(y_true, y_pred):
    print('Tracing loss my_mse()')
    return tf.reduce_mean(tf.square(y_pred - y_true))

# Custom metric function
def my_mae(y_true, y_pred):
    print('Tracing metric my_mae()')
    return tf.reduce_mean(tf.abs(y_pred - y_true))

# 自定义层
class MyDense(keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)
        
    def build(self, input_shape):
        self.kernel = self.add_weight(name='kernel', shape=(input_shape[1], self.units), 
                                    initializer='uniform', trainable=True)
        self.biases = self.add_weight(name='bias', shape=(self.units,), initializer='zeros', trainable=True)
        super().build(input_shape)
        
    def call(self, X):
        print('Tracing MyDense.call()')
        return self.activation(X @ self.kernel + self.biases)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

# Custom model
class MyModel(keras.models.Model):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = MyDense(30, activation='relu')
        self.hidden2 = MyDense(30, activation='relu')
        self.output_ = MyDense(1)
        
    def call(self, inputs):
        print('Tracing MyModel.call()')
        hidden1 = self.hidden1(inputs)
        hidden2 = self.hidden2(hidden1)
        concat = keras.layers.concatenate([inputs, hidden2])
        output = self.output_(concat)
        return output
    
model = MyModel()

model.compile(loss=my_mse, optimizer='nadam', metrics=[my_mae])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)  # [0.4163525402545929, 0.4639028012752533]

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = MyModel(dynamic=True)

model.compile(loss=my_mse, optimizer='nadam', metrics=[my_mae])

# 不是每次迭代都会调用自定义代码
model.fit(X_test_scaled[:64], y_train[:64], epochs=1, validation_data=(X_valid_scaled[:64], y_valid[:64]), verbose=0)
model.evaluate(X_test_scaled[:64], y_test[:64], verbose=0)
'''
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
[5.507431983947754, 2.055328845977783]
'''

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = MyModel()

model.compile(loss=my_mse, optimizer='nadam', metrics=[my_mae], run_eagerly=True)

model.fit(X_test_scaled[:64], y_train[:64], epochs=1, validation_data=(X_valid_scaled[:64], y_valid[:64]), verbose=0)
model.evaluate(X_test_scaled[:64], y_test[:64], verbose=0)
'''
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
[5.507431983947754, 2.055328845977783]
'''

自定义优化器

class MyMomentumOptimizer(keras.optimizers.Optimizer):
    def __init__(self, learning_rate=0.001, momentum=0.9, name='MyMomentumOptimizer', **kwargs):
        """Call super().__init__() and use _set_hyper() to store hyperparameters"""
        super().__init__(name, **kwargs)
        self._set_hyper('learning_rate', kwargs.get('lr', learning_rate))
        self._set_hyper('decay', self._initial_decay)
        self._set_hyper('momentum', momentum)
    
    def _create_slots(self, var_list):
        """For each model variable, create the optimizer variable associated with it.
        TensorFlow calls these optimizer variables "slots".
        For momentum optimization, we need one momentum slot per model variable.
        """
        for var in var_list:
            self.add_slot(var, 'momentum')
            
    @tf.function
    def _resource_apply_dense(self, grad, var):
        """Update the slots and perform one optimization step for one model variable
        """
        var_dtype = var.dtype.base_dtype
        lr_t = self._decayed_lr(var_dtype)
        momentum_var = self.get_slot(var, 'momentum')
        momentum_hyper = self._get_hyper('momentum', var_dtype)
        momentum_var.assign(momentum_var * momentum_hyper - (1. - momentum_hyper) * grad)
        var.assign_add(momentum_var * lr_t)
        
    def _resource_apply_sparse(self, grad, var):
        raise NotImplementedError
        
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'learning_rate': self._serialize_hyperparameter('learning_rate'),
               'decay': self._serialize_hyperparameter('decay'), 
               'momentum': self._serialize_hyperparameter('momentum')}

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([keras.layers.Dense(1, input_shape=[8])])
model.compile(loss='mse', optimizer=MyMomentumOptimizer())
model.fit(X_train_scaled, y_train, epochs=5)