cs20_3-3

1. tf.data

相比 feed_in和placeholder的优势：数据的一些操作(比如shuffle/batch/repeat/map)集成在tf中，所以效率高速度快，而且属于high-level api，使用方便
tf.data.Dataset

输出一些dataset的types/shape以做sanity check

print(xxxdataset.output_types)			# >> (tf.float32, tf.float32)
print(xxxdataset.output_shapes)		       # >> (TensorShape([]), TensorShape([]))

有很多格式，就我做过大型视频和图像的经验，我推荐tf.data.TFRecordDataset(filenames)

一些基本的数据操作

# 准备数据
dataset = tf.data.TFRecordDataset([file1, file2, file3, ...])
# 数据操作
dataset = dataset.shuffle(1000)
dataset = dataset.repeat(100)
dataset = dataset.batch(128)
dataset = dataset.map(lambda x: tf.one_hot(x, 10)) #转化为 one-hot encoding
# 取数据
iterator = dataset.make_one_shot_iterator() # 一种获取iterator的方式，后面还有更通用的
X, Y = iterator.get_next() # 如果上面batch过，一次就是取一个batch,否则就是一个sample(x,y)

据Notes所说，tf.data比fedd_in和placehodler效率要高

一个非常的编程实践

iterator = tf.data.Iterator.from_structure(train_data.output_types,
                                           train_data.output_shapes)
img, label = iterator.get_next()

train_init = iterator.make_initializer(train_data)  # initializer for train_data
test_init = iterator.make_initializer(test_data)  # initializer for train_data

# ...
sess.run(train_init) # 系统会自动加载training set的img,label
# ...
sess.run(test_init) # 加载的是 testing set的 img,labels

# 最上面的 img, label = iterator.get_next() 完全不存在同名的冲突，因为为init的控制隔离

2. optimizer速记

对梯度做些特殊修改

# create an optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)

# compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example, subtract each of them by 1.
subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]

# ask the optimizer to apply the subtracted gradients.
optimizer.apply_gradients(subtracted_grads_and_vars)

让某些变量不参与计算梯度的过程
```
stop_gradient( input, name=None )
```
- 应用场景举例：
  - When you train a GAN (Generative Adversarial Network) where no backprop should happen through the adversarial example generation process.
  - The EM algorithm where the M-step should not involve backpropagation through the output of the E-step
手动对某些y=f(x)求偏导：
```
tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)
```
- 应用场景举例
  
  Technical detail: This is especially useful when training only parts of a model. For example, we can use tf.gradients() to take the derivative G of the loss w.r.t. to the middle layer. Then we use an optimizer to minimize the difference between the middle layer output M and M + G. This only updates the lower half of the network.(冻结某些层，只训练一些层，比如说：fine-tune过程)

ZhiHu ：HaoZhang的知乎

GitHub：HaoZhang的GitHub

Gmail ：njuhaozhang@gmail.com