cs20_4-1

1. eager速记

可以用大部分python的方法，而不仅限于tf.xxx
- is compatible with Python debugging tools: pdb.set_trace()
- provides immediate error reporting，而不需要sess.run(xxxerror)
- permits use of Python data structures, 但是我觉得tf.data很好用，这一点待看
- enables you to use and differentiate through Python control flow ，这一点值得期待，因为tf.cond比较繁琐且有冗余设计之嫌
- 目前至少1.8支持eager, 只需要加几行：
```
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution() # Call this at program start-up
```
- 一个最简单的示例：
```
x = [[2.]]  # No need for placeholders!
m = tf.matmul(x, x)

print(m)  # No sessions!
# tf.Tensor([[4.]], shape=(1, 1), dtype=float32)
```

2. 两个示例

LinearRegression in eager mode
word2vec in eager mode

3. eager的一些特色

gradient

举例1：

def square(x):
  return x ** 2

grad = tfe.gradients_function(square) # 感觉更像是求了一组【导函数】

print(square(3.))    # tf.Tensor(9., shape=(), dtype=float32)
print(grad(3.))      # [tf.Tensor(6., shape=(), dtype=float32))]

举例2：

x = tfe.Variable(2.0) # eager中推荐的变量定义方式
def loss(y):
  return (y - x ** 2) ** 2 # 用python写公式

grad = tfe.implicit_gradients(loss) # 隐式得到一组偏导函数(我的比喻)

print(loss(7.))  # tf.Tensor(9., shape=(), dtype=float32)
print(grad(7.))  # [(<tf.Tensor: -24.0, shape=(), dtype=float32>, 
                     <tf.Variable 'Variable:0' shape=()                
                      dtype=float32, numpy=2.0>)]

比较有趣的是，即使eager mode没有enable，下面几组求导相关的函数依旧可用

tfe.gradients_function()
tfe.value_and_gradients_function() # 相比上面的得到gradient，还能返回value
tfe.implicit_gradients()
tfe.implicit_value_and_gradients()

4. word2vec为例的很多知识点

NCE Loss
- 参考：
  - https://www.zhihu.com/question/50043438
  - https://www.quora.com/What-is-negative-sampling-in-Word2vec
理解word2vec & embedding
- 参考：
  - https://zhuanlan.zhihu.com/p/27234078
  - http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
    
    Note that sample-based approaches, whether it’s negative sampling or NCE, are only useful at training time -- during inference, the full softmax still needs to be computed to obtain a normalized probability. （NCE or negtive sampling仅仅在train时用来加速，但是test阶段还是要用 fully softmax以获得所有词的概率）
    
    https://docs.google.com/document/d/1wqp8_-H06oE4zB9CHDwzTx5BfAOMM_6nnNJMSfkazkU/edit#heading=h.ojpgch9o4rqc
基于OOP构建一个网络并训练
- pipeline:
  
  Phase 1: assemble your graph
  1. Import data (either with tf.data or with placeholders)
  2. Define the weights
  3. Define the inference model
  4. Define loss function
  5. Define optimizer
  Phase 2: execute the computation
  1. Initialize all model variables for the first time.
  2. Initialize iterator / feed in the training data.
  3. Execute the inference model on the training data, so it calculates for each training input example the output with the current model parameters.
  4. Compute the cost
  5. Adjust the model parameters to minimize/maximize the cost depending on the model.
- tempelate code:
```
class SkipGramModel:
    """ Build the graph for word2vec model """
    def __init__(self, params):
        pass

    def _import_data(self):
        """ Step 1: import data """
        pass

    def _create_embedding(self):
        """ Step 2: in word2vec, it's actually the weights that we care about """
        pass

    def _create_loss(self):
        """ Step 3 + 4: define the inference + the loss function """
        pass

    def _create_optimizer(self):
        """ Step 5: define optimizer """
        pass
```
可视化技术：t-sne, 以及tf.summary, tf.saver, te.restore, 有点复杂，后面再做一次再理解
name scope: 最核心的功能：计算图中同一个name scope中的tensor会被分为一组(合并为一个超节点)，计算图更简洁易懂
- tensorboard中的三种边：(1)solid grey arrows(灰色粗线条)，表示data flow(数据流); (2)solid orange arrows(橙色粗线条)，表示参考关系或影响关系，e.g. optimizer节点会通过bp来影响w,b; (3)dotted arrows(虚线)，表示依赖关系或控制关系，e.g. weights 节点只有在init之后才会生效work

Variable scope: 和name scope功能不同，它主要是：变量共享，更好的重用代码，同时也具有name scope的功能。下面是一个范例：

def fully_connected(x, output_dim, scope):
    with tf.variable_scope(scope) as scope: # 支持下面的变量为共享重用的
        w = tf.get_variable("weights", [x.shape[1], output_dim],                                             initializer=tf.random_normal_initializer())
        b = tf.get_variable("biases", [output_dim],                                                           initializer=tf.constant_initializer(0.0))
        return tf.matmul(x, w) + b

def two_hidden_layers(x):
    h1 = fully_connected(x, 50, 'h1')
    h2 = fully_connected(h1, 10, 'h2')

with tf.variable_scope('two_layers') as scope:
    logits1 = two_hidden_layers(x1)
    scope.reuse_variables() # 允许变量共享重用，反复以不同x来使用中层的two_hidden_layers
    logits2 = two_hidden_layers(x2)

5. 一些其他的点：

Graph collections
- 访问一些变量：tf.get_collection(key, scope=None) //key being the name of the collection, scope is the scope of the variables.
- 默认所有的变量，都放在tf.GraphKeys.GLOBAL_VARIABLES这个collection中。tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='my_scope')
- 默认所有具有trainable=True的变量被收集在：tf.GraphKeys.TRAINABLE_VARIABLES
- 创建自定义集合：tf.add_to_collection(name, value), 任何op放进去都可以，不一定要是variable
- 一些系统默认行为：更多predefined graph keys: tf.GraphKeys在 tf系统中有一些默认的行为。
  
  e.g. tf.train.Optimizer subclasses default to optimizing the variables collected under tf.GraphKeys.TRAINABLE_VARIABLES if none is specified, but it is also possible to pass an explicit list of variables.
```
tf.GraphKeys:

GLOBAL_VARIABLES / LOCAL_VARIABLES / MODEL_VARIABLES / TRAINABLE_VARIABLES / 
SUMMARIES / QUEUE_RUNNERS / MOVING_AVERAGE_VARIABLES / REGULARIZATION_LOSSES

ref-link: https://www.tensorflow.org/api_docs/python/tf/GraphKeys
```

Manage experiments：

motivation: (1)任何时刻可以停止训练保存断点，下次训练从断点处开始；(2)为了确保实验结论可重现，其中一个关键是：控制随机数(control this random factor in our models)

tf.train.Saver()

tf.train.Saver()默认保存所有的变量
saver.restore()默认恢复最后一次保存的model info

tf.train.Saver.save(
    sess,
    save_path,
    global_step=None,
    latest_filename=None,
    meta_graph_suffix='meta',
    write_meta_graph=True,
    write_state=True
)

# a example 

# define model
# create a saver object
saver = tf.train.Saver()

# launch a session to execute the computation
with tf.Session() as sess:
    # actual training loop
    for step in range(training_steps): 
		sess.run([optimizer])
		if (step + 1) % 1000 == 0:
	   		saver.save(sess, 'checkpoint_directory/model_name', 		    						global_step=global_step) # 自动拼接 模型文件名为	 	                                               model_name-global_step

核心问题：saver.save()默认保存的是model的所有variables，saver.restore()默认恢复所有的variables的值，所以我在想一个问题：在恢复变量值之前，graph是不是应该先恢复好？？？靠谁去恢复graph？？？【答：还是需要用户先自己创建graph, 然后再restore variables】。问题是，这个创建graph要怎样才能保持和【之前saver()时的graph】一致呢，还是不用考虑太多，只是简单的create graph吗？
上述问题待解决！！！
这个问题的一个答案：https://www.jianshu.com/p/8850127ed25d

https://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/

自定义需要保存的变量：

v1 = tf.Variable(..., name='v1') 
v2 = tf.Variable(..., name='v2') 

# pass the variables as a dict: 
saver = tf.train.Saver({'v1': v1, 'v2': v2}) 

# pass them as a list
saver = tf.train.Saver([v1, v2]) 

# passing a list is equivalent to passing a dict with the variable op names # as keys //即dict和list两个方案其实差不多
saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})

tf.summary()

我个人觉得：没必要再使用matplot做可视化，tf.summary足够强大和方便(多敲两条指令：tensorboard, SSH -L)

举例：

# 程序最开始部分，开启一个witer
writer = tf.summary.FileWriter('graphs/word2vec/lr' + str(self.lr), sess.graph)

def _create_summaries(self):
     with tf.name_scope("summaries"):
            tf.summary.scalar("loss", self.loss)
            tf.summary.scalar("accuracy", self.accuracy)            
            tf.summary.histogram("histogram loss", self.loss)
            # because you have several summaries, we should merge them all
            # into one op to make it easier to manage
            self.summary_op = tf.summary.merge_all()

# 获得本step的loss result 到 summary
loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict=feed_dict)
# 把本step的loss result的summary写入writer
writer.add_summary(summary, global_step=step)

# 等到不再需要再写summary了，再把witer中所有内容写入磁盘
writer.close()

目前最厉害的作用：可视化观测调参过程：比如做两次训练(学习率分别是0.5,1.0)，然后基于summary一个图中画出两次的loss曲线，就能直观感受进行调参
还有一个功能：显示图片(随时写一个writer, summary, 两条指令:tensorboard, SSH -L)就能代替matplot显示图片tf.summary.image(name, tensor, max_outputs=3, collections=None)
https://www.tensorflow.org/guide/summaries_and_tensorboard?hl=zh-cn

Control randomization

motivation: control TensorFlow’s random state to get stable results for your experiments

(1) Set random seed at operation level:

c = tf.random_uniform([], -10, 10, seed=2)
d = tf.random_uniform([], -10, 10, seed=2)

with tf.Session() as sess:
	print sess.run(c) # >> 3.57493
	print sess.run(d) # >> 3.57493

###########################################
# 几类情况：

# 1.
c = tf.random_uniform([], -10, 10, seed=2)
with tf.Session() as sess:
	print sess.run(c) # >> 3.57493 # 每次初始化sess，都会产生一样的seed
with tf.Session() as sess:
	print sess.run(c) # >> 3.57493 # 又是初始化sess的时刻
    
# 2.
c = tf.random_uniform([], -10, 10, seed=2)
with tf.Session() as sess:
	print sess.run(c) # >> 3.57493 # sess刚刚初始化
	print sess.run(c) # >> -5.97319  # sess初始化之后产生的就是不同的seed

(2) Set random seed at graph level with tf.Graph.seed

tf.set_random_seed(seed)这个级别的seed的意义：确保其他人的图和当前这个图的随机数是相同的，并不关心op-level的随机性

举例：

# a.py
import tensorflow as tf

tf.set_random_seed(2) # 只给了图级别的seed，以方便其他图复现本图的随机数
c = tf.random_uniform([], -10, 10) # 没有给 op-level的seed
d = tf.random_uniform([], -10, 10)

with tf.Session() as sess:
	print sess.run(c) # -4.00752
	print sess.run(d) # -2.98339 # op-level没有控制随机性相同！
    
# b.py
import tensorflow as tf

tf.set_random_seed(2) # 只给了图级别的seed，以方便其他图复现本图的随机数
c = tf.random_uniform([], -10, 10) # 没有给 op-level的seed
d = tf.random_uniform([], -10, 10)

with tf.Session() as sess:
	print sess.run(c) # -4.00752 # 这张图和a.py中那张图的随机数结果完全相同
	print sess.run(d) # -2.98339 # 这张图和a.py中那张图的随机数结果完全相同

Autodiff

tf.gradient()，举例：

tf.gradients(ys, xs, grad_ys=None, name='gradients',                                        colocate_gradients_with_ops=False, 
             gate_gradients=False, 
             aggregation_method=None)
# tf.gradients是手动指定对某些变量求导，可用于【固定某些层不训练，指定某些层训练】

#
x = tf.Variable(2.0)
y = 2.0 * (x ** 3) # 一元函数：dy/dx
grad_y = tf.gradients(y, x)
with tf.Session() as sess:
	sess.run(x.initializer)
	print sess.run(grad_y) # >> 24.0

#
x = tf.Variable(2.0)
y = 2.0 * (x ** 3)
z = 3.0 + y ** 2 # 多元函数，求两个偏导：dz/dx, dz/dy
grad_z = tf.gradients(z, [x, y])
with tf.Session() as sess:
	sess.run(x.initializer)
	print sess.run(grad_z) # >> [768.0, 32.0]
# 768 is the gradient of z with respect to x, 32 with respect to y

ZhiHu ：HaoZhang的知乎

GitHub：HaoZhang的GitHub

Gmail ：njuhaozhang@gmail.com