cs20_4-1

1. eager速记

  1. 可以用大部分python的方法,而不仅限于tf.xxx

    • is compatible with Python debugging tools: pdb.set_trace()

    • provides immediate error reporting,而不需要sess.run(xxxerror)

    • permits use of Python data structures, 但是我觉得tf.data很好用,这一点待看

    • enables you to use and differentiate through Python control flow ,这一点值得期待,因为tf.cond比较繁琐且有冗余设计之嫌

    • 目前至少1.8支持eager, 只需要加几行:

      import tensorflow as tf
      import tensorflow.contrib.eager as tfe
      tfe.enable_eager_execution() # Call this at program start-up
      
    • 一个最简单的示例:

      x = [[2.]]  # No need for placeholders!
      m = tf.matmul(x, x)
      
      print(m)  # No sessions!
      # tf.Tensor([[4.]], shape=(1, 1), dtype=float32)
      

2. 两个示例

  • LinearRegression in eager mode
  • word2vec in eager mode

3. eager的一些特色

  1. gradient

    • 举例1:

      def square(x):
        return x ** 2
      
      grad = tfe.gradients_function(square) # 感觉更像是求了一组【导函数】
      
      print(square(3.))    # tf.Tensor(9., shape=(), dtype=float32)
      print(grad(3.))      # [tf.Tensor(6., shape=(), dtype=float32))]
      
    • 举例2:

      x = tfe.Variable(2.0) # eager中推荐的变量定义方式
      def loss(y):
        return (y - x ** 2) ** 2 # 用python写公式
      
      grad = tfe.implicit_gradients(loss) # 隐式得到一组偏导函数(我的比喻)
      
      print(loss(7.))  # tf.Tensor(9., shape=(), dtype=float32)
      print(grad(7.))  # [(<tf.Tensor: -24.0, shape=(), dtype=float32>, 
                           <tf.Variable 'Variable:0' shape=()                
                            dtype=float32, numpy=2.0>)]
      
    • 比较有趣的是,即使eager mode没有enable,下面几组求导相关的函数依旧可用

      tfe.gradients_function()
      tfe.value_and_gradients_function() # 相比上面的得到gradient,还能返回value
      tfe.implicit_gradients()
      tfe.implicit_value_and_gradients()
      

4. word2vec为例的很多知识点

  1. NCE Loss

  2. 理解word2vec & embedding

  3. 基于OOP构建一个网络并训练

    • pipeline:

      Phase 1: assemble your graph

      1. Import data (either with tf.data or with placeholders)

      2. Define the weights

      3. Define the inference model

      4. Define loss function

      5. Define optimizer

      Phase 2: execute the computation

      1. Initialize all model variables for the first time.

      2. Initialize iterator / feed in the training data.

      3. Execute the inference model on the training data, so it calculates for each training input example the output with the current model parameters.

      4. Compute the cost

      5. Adjust the model parameters to minimize/maximize the cost depending on the model.

    • tempelate code:

      
      class SkipGramModel:
          """ Build the graph for word2vec model """
          def __init__(self, params):
              pass
      
          def _import_data(self):
              """ Step 1: import data """
              pass
      
          def _create_embedding(self):
              """ Step 2: in word2vec, it's actually the weights that we care about """
              pass
      
          def _create_loss(self):
              """ Step 3 + 4: define the inference + the loss function """
              pass
      
          def _create_optimizer(self):
              """ Step 5: define optimizer """
              pass
      
  4. 可视化技术:t-sne, 以及tf.summary, tf.saver, te.restore, 有点复杂,后面再做一次再理解

  5. name scope: 最核心的功能:计算图中同一个name scope中的tensor会被分为一组(合并为一个超节点),计算图更简洁易懂

    • tensorboard中的三种边:(1)solid grey arrows(灰色粗线条),表示data flow(数据流); (2)solid orange arrows(橙色粗线条),表示参考关系或影响关系,e.g. optimizer节点会通过bp来影响w,b; (3)dotted arrows(虚线),表示依赖关系或控制关系,e.g. weights 节点只有在init之后才会生效work
  6. Variable scope: 和name scope功能不同,它主要是:变量共享,更好的重用代码,同时也具有name scope的功能。下面是一个范例:

    def fully_connected(x, output_dim, scope):
        with tf.variable_scope(scope) as scope: # 支持下面的变量为共享重用的
            w = tf.get_variable("weights", [x.shape[1], output_dim],                                             initializer=tf.random_normal_initializer())
            b = tf.get_variable("biases", [output_dim],                                                           initializer=tf.constant_initializer(0.0))
            return tf.matmul(x, w) + b
    
    def two_hidden_layers(x):
        h1 = fully_connected(x, 50, 'h1')
        h2 = fully_connected(h1, 10, 'h2')
    
    with tf.variable_scope('two_layers') as scope:
        logits1 = two_hidden_layers(x1)
        scope.reuse_variables() # 允许变量共享重用,反复以不同x来使用中层的two_hidden_layers
        logits2 = two_hidden_layers(x2)
    

5. 一些其他的点:

  1. Graph collections

    • 访问一些变量:tf.get_collection(key, scope=None) //key being the name of the collection, scope is the scope of the variables.

    • 默认所有的变量,都放在tf.GraphKeys.GLOBAL_VARIABLES这个collection中。tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='my_scope')

    • 默认所有具有trainable=True的变量被收集在:tf.GraphKeys.TRAINABLE_VARIABLES

    • 创建自定义集合:tf.add_to_collection(name, value), 任何op放进去都可以,不一定要是variable

    • 一些系统默认行为: 更多predefined graph keys: tf.GraphKeys在 tf系统中有一些默认的行为。

      e.g. tf.train.Optimizer subclasses default to optimizing the variables collected under tf.GraphKeys.TRAINABLE_VARIABLES if none is specified, but it is also possible to pass an explicit list of variables.

      tf.GraphKeys:
      
      GLOBAL_VARIABLES / LOCAL_VARIABLES / MODEL_VARIABLES / TRAINABLE_VARIABLES / 
      SUMMARIES / QUEUE_RUNNERS / MOVING_AVERAGE_VARIABLES / REGULARIZATION_LOSSES
      
      ref-link: https://www.tensorflow.org/api_docs/python/tf/GraphKeys
      
  2. Manage experiments:

    • motivation: (1)任何时刻可以停止训练保存断点,下次训练从断点处开始;(2)为了确保实验结论可重现,其中一个关键是:控制随机数(control this random factor in our models)

    • tf.train.Saver()

      • tf.train.Saver()默认保存所有的变量

      • saver.restore()默认恢复最后一次保存的model info

      • tf.train.Saver.save(
            sess,
            save_path,
            global_step=None,
            latest_filename=None,
            meta_graph_suffix='meta',
            write_meta_graph=True,
            write_state=True
        )
        
        # a example 
        
        # define model
        # create a saver object
        saver = tf.train.Saver()
        
        # launch a session to execute the computation
        with tf.Session() as sess:
            # actual training loop
            for step in range(training_steps): 
        		sess.run([optimizer])
        		if (step + 1) % 1000 == 0:
        	   		saver.save(sess, 'checkpoint_directory/model_name', 		    						global_step=global_step) # 自动拼接 模型文件名为	 	                                               model_name-global_step
        
      • 核心问题:saver.save()默认保存的是model的所有variables,saver.restore()默认恢复所有的variables的值,所以我在想一个问题:在恢复变量值之前,graph是不是应该先恢复好???靠谁去恢复graph???【答:还是需要用户先自己创建graph, 然后再restore variables】。问题是,这个创建graph要怎样才能保持和【之前saver()时的graph】一致呢,还是不用考虑太多,只是简单的create graph吗?

      • 上述问题待解决!!!

      • 这个问题的一个答案:https://www.jianshu.com/p/8850127ed25d

        https://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/

      • 自定义需要保存的变量:

        v1 = tf.Variable(..., name='v1') 
        v2 = tf.Variable(..., name='v2') 
        
        # pass the variables as a dict: 
        saver = tf.train.Saver({'v1': v1, 'v2': v2}) 
        
        # pass them as a list
        saver = tf.train.Saver([v1, v2]) 
        
        # passing a list is equivalent to passing a dict with the variable op names # as keys //即dict和list两个方案其实差不多
        saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})
        
    • tf.summary()

      • 我个人觉得:没必要再使用matplot做可视化,tf.summary足够强大和方便(多敲两条指令:tensorboard, SSH -L)

      • 举例:

        # 程序最开始部分,开启一个witer
        writer = tf.summary.FileWriter('graphs/word2vec/lr' + str(self.lr), sess.graph)
        
        def _create_summaries(self):
             with tf.name_scope("summaries"):
                    tf.summary.scalar("loss", self.loss)
                    tf.summary.scalar("accuracy", self.accuracy)            
                    tf.summary.histogram("histogram loss", self.loss)
                    # because you have several summaries, we should merge them all
                    # into one op to make it easier to manage
                    self.summary_op = tf.summary.merge_all()
        
        # 获得本step的loss result 到 summary
        loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict=feed_dict)
        # 把本step的loss result的summary写入writer
        writer.add_summary(summary, global_step=step)
        
        # 等到不再需要再写summary了,再把witer中所有内容写入磁盘
        writer.close()
        
      • 目前最厉害的作用:可视化观测调参过程:比如做两次训练(学习率分别是0.5,1.0),然后基于summary一个图中画出两次的loss曲线,就能直观感受进行调参

      • 还有一个功能:显示图片(随时写一个writer, summary, 两条指令:tensorboard, SSH -L)就能代替matplot显示图片tf.summary.image(name, tensor, max_outputs=3, collections=None)

      • https://www.tensorflow.org/guide/summaries_and_tensorboard?hl=zh-cn

    • Control randomization

      • motivation: control TensorFlow’s random state to get stable results for your experiments

      • (1) Set random seed at operation level:

        c = tf.random_uniform([], -10, 10, seed=2)
        d = tf.random_uniform([], -10, 10, seed=2)
        
        with tf.Session() as sess:
        	print sess.run(c) # >> 3.57493
        	print sess.run(d) # >> 3.57493
        
        ###########################################
        # 几类情况:
        
        # 1.
        c = tf.random_uniform([], -10, 10, seed=2)
        with tf.Session() as sess:
        	print sess.run(c) # >> 3.57493 # 每次初始化sess,都会产生一样的seed
        with tf.Session() as sess:
        	print sess.run(c) # >> 3.57493 # 又是初始化sess的时刻
            
        # 2.
        c = tf.random_uniform([], -10, 10, seed=2)
        with tf.Session() as sess:
        	print sess.run(c) # >> 3.57493 # sess刚刚初始化
        	print sess.run(c) # >> -5.97319  # sess初始化之后产生的就是不同的seed
        
      • (2) Set random seed at graph level with tf.Graph.seed

        • tf.set_random_seed(seed)这个级别的seed的意义:确保其他人的图和当前这个图的随机数是相同的,并不关心op-level的随机性

        • 举例:

          # a.py
          import tensorflow as tf
          
          tf.set_random_seed(2) # 只给了图级别的seed,以方便其他图复现本图的随机数
          c = tf.random_uniform([], -10, 10) # 没有给 op-level的seed
          d = tf.random_uniform([], -10, 10)
          
          with tf.Session() as sess:
          	print sess.run(c) # -4.00752
          	print sess.run(d) # -2.98339 # op-level没有控制随机性相同!
              
          # b.py
          import tensorflow as tf
          
          tf.set_random_seed(2) # 只给了图级别的seed,以方便其他图复现本图的随机数
          c = tf.random_uniform([], -10, 10) # 没有给 op-level的seed
          d = tf.random_uniform([], -10, 10)
          
          with tf.Session() as sess:
          	print sess.run(c) # -4.00752 # 这张图和a.py中那张图的随机数结果完全相同
          	print sess.run(d) # -2.98339 # 这张图和a.py中那张图的随机数结果完全相同
          
  3. Autodiff

    • tf.gradient(),举例:

      tf.gradients(ys, xs, grad_ys=None, name='gradients',                                        colocate_gradients_with_ops=False, 
                   gate_gradients=False, 
                   aggregation_method=None)
      # tf.gradients是手动指定对某些变量求导,可用于【固定某些层不训练,指定某些层训练】
      
      #
      x = tf.Variable(2.0)
      y = 2.0 * (x ** 3) # 一元函数:dy/dx
      grad_y = tf.gradients(y, x)
      with tf.Session() as sess:
      	sess.run(x.initializer)
      	print sess.run(grad_y) # >> 24.0
      
      #
      x = tf.Variable(2.0)
      y = 2.0 * (x ** 3)
      z = 3.0 + y ** 2 # 多元函数,求两个偏导:dz/dx, dz/dy
      grad_z = tf.gradients(z, [x, y])
      with tf.Session() as sess:
      	sess.run(x.initializer)
      	print sess.run(grad_z) # >> [768.0, 32.0]
      # 768 is the gradient of z with respect to x, 32 with respect to y
      
原文地址:https://www.cnblogs.com/LS1314/p/10371162.html