TensorFlow——Eager essentials【译】

Eager essentials

Eager 要领

Tensorflow的eager execution 是一个命令式编程环境（imperative programming environment），他可以运算返回具体值，而不是构建计算图形以便稍后运行。这样可以轻松的使用TensorFlow和调试模型，并且还可以减少样板。

Eager execution是一个灵活的机器学习研究和实验的平台，他提供：

An intuitive interface(直观的界面)——自然地构建python代码并使用python数据结构。快速地迭代小型模型和小型的数据集。
Easily debugging(容易调试)——直接调用ops（操作）来检查运行模型或测试更改。使用标准的python调试工具进行及时错误报告。

natural control flow（自然的控制流）——使用python控制流而不是计算图控制流，简化了动态模型的规范。

安装与基本使用

from __future__ import absolute_import, division, print_function, unicode_literals

!pip install -q tensorflow-gpu==2.0.0-beta1
import tensorflow as tf

import cProfile

而在TensorFlow2.0中，eager是默认开启的。

tf.executing_eagerly()  # 改名返回eager mode

如果eager打开，你可以运行TensorFlow操作并且立刻返回结果:

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))  # hello,[[4.]]

打开eager execution会改变TensorFlow的操作行为——现在他们直接计算并返回他们的值给python。tf.tensor的对象是指的具体的值而非计算图中的符号句柄。由于在会话（session）中没有构建计算图，因此使用print（）或调试器检查结果很容易。计算，打印和检查Tensor的值不会破坏计算梯度的flow。

eager execution与numpy很好协作。numpy操作接受tf.tensor参数。TensorFlow数学运算将python对象和numpy数组转换为tf.tensor对象。tf.tensor.numpy方法将对象的值作为numpy ndarray返回。

另外，eagerexecution支持broadcasting。运算符重载：

a = tf.constant([[1,2],
                 [3,4]
])
print(a)  # a tensor include(matrix,shape=(2,2),dtype=int32)

b = tf.add(a,1)
print(b)  # broadingcasting-> [[2,3],[4,5]]


print(a*b) # operator overloading 

import numpy as np
c = np.multiply(a,b)  # use numpy values
print(c)

print(a.numpy())  # tensor->numpy

动态控制流

使用eager execution的一个好处是在执行模型时可以使用host language的全部功能，例如：

def fizzbuzz(max_num):
  counter = tf.constant(0)
  max_num = tf.convert_to_tensor(max_num)
  for num in range(1, max_num.numpy()+1):
    num = tf.constant(num)
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num.numpy())
    counter += 1

fizzbuzz(15)  # 1 2 Fizz

Eager training

Computing gradients

自动微分（automatic differentiation）在机器学习算法中是非常有用的，比如在神经网络中的反向传播（backpropagation）。在eager execution中，使用tf.GradienTape来跟踪稍后计算梯度的操作。

你可以用tf.GradientTape在eager中训练或计算梯度。这在负载的训练循环中非常有用。

因为在每次发生调用（call）的时候，都可能发生不同的操作，所有的钱向传播都记录到了一个“tape”中，为了计算梯度，将tape反向“播放”然后丢弃掉。一个特定的tf.GradientTape只能计算一次梯度，后续调用会引发运行时的错误。（没懂）

训练模型train a model

下面这个例子创建了一个多层模型，对于标准的MNIST手写数字进行分类。他演示了在eager执行环境下优化器和卷积池化层之类的API构建可训练计算图。

# Fetch and format the mnist data
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()

dataset = tf.data.Dataset.from_tensor_slices(
  (tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32),
   tf.cast(mnist_labels,tf.int64)))
dataset = dataset.shuffle(1000).batch(32)
# Build the model
mnist_model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16,[3,3], activation='relu',
                         input_shape=(None, None, 1)),
  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(10)
])
# Even without training, call the model and inspect the output in eager execution:
for images,labels in dataset.take(1):
  print("Logits: ", mnist_model(images[0:1]).numpy())

虽然keras模型具有内置训练循环（使用fit方法），有时候你需要更多自定义，这是一个用eager实现循环的例子：

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_history = []

def train_step(images, labels):
  with tf.GradientTape() as tape:
    logits = mnist_model(images, training=True)
    
    # Add asserts to check the shape of the output.
    tf.debugging.assert_equal(logits.shape, (32, 10))
    
    loss_value = loss_object(labels, logits)

  loss_history.append(loss_value.numpy().mean())
  grads = tape.gradient(loss_value, mnist_model.trainable_variables)
  optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))

def train():
  for epoch in range(3):
    for (batch, (images, labels)) in enumerate(dataset):
      train_step(images, labels)
    print ('Epoch {} finished'.format(epoch))

train() # Epoch 0 finished;Epoch 1 finished ...

import matplotlib.pyplot as plt

plt.plot(loss_history)
plt.xlabel('Batch #')
plt.ylabel('Loss [entropy]')

Variables and optimizers

在训练期间tf.Variable对象存储mutable(可变的)tf.Tensor的值，可以使得自动微分更加简单，模型的参数可以作为变量封装在类中。

使用tf.Variable和tf.GradientTape更好地封装模型参数。例如，可以在自动微分的例子上进行重写：

class Model(tf.keras.Model):
  def __init__(self):
    super(Model, self).__init__()
    self.W = tf.Variable(5., name='weight')
    self.B = tf.Variable(10., name='bias')
  def call(self, inputs):
    return inputs * self.W + self.B

# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random.normal([NUM_EXAMPLES])
noise = tf.random.normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# The loss function to be optimized
def loss(model, inputs, targets):
  error = model(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W, model.B])

# Define:
# 1. A model.
# 2. Derivatives of a loss function with respect to model parameters.
# 3. A strategy for updating the variables based on the derivatives.
model = Model()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

# Training loop
for i in range(300):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]))
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

View Code

Use objects for state during eager execution

在TF1.x的计算图执行的时候，程序状态（例如 variables）是存储在全局集合中的，其生命周期是由tf.Session对象管理的。相反，在eager模式下，程序状态对象的生命周期是由其相应的python对象的生命周期决定的。

Variables are objects

在eager模式期间，variables在对象的最后一个引用被删除之前将一直存在而不被删除。.

if tf.test.is_gpu_available():
  with tf.device("gpu:0"):
    print("GPU enabled")
    v = tf.Variable(tf.random.normal([1000, 1000]))
    v = None  # v no longer takes up GPU memory

object-based saving 基于对象的保存检查点

这一节是培训检查点指南的缩写版本。

tf.train.Checkpoint 可以用来save和restore tf.Variables to/from checkpoint:

(变量保存和恢复)

# 首先创建一变量，并常见保存点变量
x = tf.Variable(10.)
checkpoint = tf.train.Checkpoint(x=x)
x.assign(2.)   #赋给x一个新的值，并保存
checkpoint_path = './ckpt/'
checkpoint.save('./ckpt/') # 这个地方是./ckpt/而不是./ckpt。
# 所以保存在./ckpt/ 目录下的 -1文件中。
# 如果是./ckpt，则直接保存在当前目录的ckpt-1的文件中

x.assign(11.)  # Change the variable after saving.

# Restore values from the checkpoint
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path))

print(x)  # =><tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>

为了保存和恢复模型，tf.train.Checkpoint存储对象的内部状态，而不需要隐藏变量。要记录一个模型的状态，优化器，以及全局步骤，也需要通过tf.train.Checkpoint来保存：

（模型的保存和恢复）

# save and restore model
import os 

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16,[3,3],activation='relu'),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
checkpoint_dir = 'path/to/model_dir'
if not os.path.exists(checkpoint_dir):
    os.makedirs(checkpoint_dir)
checkpoint_prefix = os.path.join(checkpoint_dir,'ckpt')
# print(checkpoint_prefix)  # path/to/model_dir/ckpt
root = tf.train.Checkpoint(optimizer=optimizer,model=model)

root.save(checkpoint_prefix)  # ./path/to/ckpt-1.xxxx
root.restore(tf.train.latest_checkpoint(checkpoint_dir))  # 恢复变量

注意：在许多训练循环中，在调用tf.train.Checkpoint.restore之后创建变量。这些变量将在创建后立即恢复，并且可以使用断言来确保检查点已完全加载。有关详细信息，请参阅培训检查点指南。

高级自动微分主题

相关推荐阅读：https://www.cnblogs.com/richqian/p/4549590.html

https://www.cnblogs.com/richqian/p/4534356.html

https://www.jianshu.com/p/fe2e7f0e89e5

Dynamic models

tf.GradientTape也可用于动态模型。这是回溯线搜索算法（backtracking line search alg）的示例，尽管控制流很复杂，但它看起来像普通的NumPy代码，除了有自动微分是可区分的：(不会)

def line_search_step(fn, init_x, rate=1.0):
  with tf.GradientTape() as tape:
    # Variables are automatically recorded, but manually watch a tensor
    tape.watch(init_x)
    value = fn(init_x)
  grad = tape.gradient(value, init_x)
  grad_norm = tf.reduce_sum(grad * grad)
  init_value = value
  while value > init_value - rate * grad_norm:
    x = init_x - rate * grad
    value = fn(x)
    rate /= 2.0
  return x, value

Custom gradients（自定义梯度）

自定义梯度是一种重写梯度的简单方法。根据输入，输出或结果定义梯度。例如这有一种在后向传递中剪切渐变范数的简单方法：

@tf.custom_gradient
def clip_gradient_by_norm(x, norm):
  y = tf.identity(x)
  def grad_fn(dresult):
    return [tf.clip_by_norm(dresult, norm), None]
  return y, grad_fn

# 自定义梯度通常用于为一系列操作提供数值稳定的梯度：
def log1pexp(x):
  return tf.math.log(1 + tf.exp(x))

def grad_log1pexp(x):
  with tf.GradientTape() as tape:
    tape.watch(x)
    value = log1pexp(x)
  return tape.gradient(value, x)

# The gradient computation works fine at x = 0.
grad_log1pexp(tf.constant(0.)).numpy()

Performance

在eager模式下，计算会自动卸载（offload）到GPU，如果要控制计算运行的设备，你可以使用tf.device(/gpu:0)快（或等效的CPU设备）中把他包含进去。

import time

def measure(x, steps):
  # TensorFlow initializes a GPU the first time it's used, exclude from timing.
  tf.matmul(x, x)
  start = time.time()
  for i in range(steps):
    x = tf.matmul(x, x)
  # tf.matmul can return before completing the matrix multiplication
  # (e.g., can return after enqueing the operation on a CUDA stream).
  # The x.numpy() call below will ensure that all enqueued operations
  # have completed (and will also copy the result to host memory,
  # so we're including a little more than just the matmul operation
  # time).
  _ = x.numpy()
  end = time.time()
  return end - start

# shape = (1000, 1000)
shape = (50, 50)  # 我的电脑貌似只能跑50的，超过100jupyter notebook就会挂掉，另外 我依然不会查看GPU使用率
steps = 200
print("Time to multiply a {} matrix by itself {} times:".format(shape, steps))

# Run on CPU:
with tf.device("/cpu:0"):
  print("CPU: {} secs".format(measure(tf.random.normal(shape), steps)))

# Run on GPU, if available:
if tf.test.is_gpu_available():
  with tf.device("/gpu:0"):
    print("GPU: {} secs".format(measure(tf.random.normal(shape), steps)))
else:
  print("GPU: not found")

一个tf.tensor对象可以复制到不同的设备上去执行操作：

if tf.test.is_gpu_available():
  x = tf.random.normal([10, 10])

  x_gpu0 = x.gpu()
  x_cpu = x.cpu()

  _ = tf.matmul(x_cpu, x_cpu)    # Runs on CPU
  _ = tf.matmul(x_gpu0, x_gpu0)  # Runs on GPU:0