我所理解的tensorflow

1. 开门见山

　　其实只是刚入门，并不理解博大精深的tensorflow，只是想取个文艺点的名字。写这篇博客主要是记录下学习tensorflow会掌握哪些才够完成一些项目。首先想吐槽一下这个号称最火的深度学习框架，真的太不友好。我见过很多phd大佬吐槽过这个东西，莫名出现问题，而我，当然也是深受其害，作为一个用了挺久无脑Keras的人来说更是如此。因此个人觉得，tensorflow 应该要感受到 PyTorch 和 MXNet 的压力了。我也是跑过 PyTorch的例程，真是比较顺风顺水啊哈哈。但是不踩过tensorflow的坑人生怎么完整呢。好了，接下来我直接用 tf 代替 tensorflow。

2. 正文

　　接着还是继续吐槽，现在 tf 的网上很多教程，包括官方教程，包括各种博客都是问题百出，可能是版本更新太快了，很多已经不适用了，而且也没人及时更新，有些都被 deprecated 了。另外吐槽下 MNIST 这个数据集，感觉随便就能跑到95+，完全看不出代码哪里有问题。同样的网络模型放到人脸数据库上就完全不行，又比如之前实现过 RBM 在 MNIST上可以但是同样在别的数据集就不行（当然可能得调一下各种参数），我只是想说验证模型是否正确是不能光看 MNIST 上的效果啊。同样，验证你会不会 tf 同样不是只看你跑没跑过教程里面的无脑例子。举个栗子，教程里面说交叉熵损失函数写成：cross_entropy = -tf.reduce_sum(y*tf.log(outputs))，当你满心欢喜的觉得这样写十分简洁并且正确的时候，发现换了一个数据集竟然出问题了。原来是log里面出现0了，才会觉得这是什么鬼教程啊。其他可能存在的问题，就是数据一定要做好预处理（比如tensorflow中输入图像通道数放在了最后一维），权值初始化的方式可能是个要注意的地方，就这样。当然为了多次训练，需要知道怎么保存和加载模型，这就比较的完整了。

　　说到这里，你可能可以勉强跑一个模型了，但可能会觉得管理起来非常麻烦，也不知道自己写的对不对。这是你可能需要对变量进行管理，和对代码进行封装，接下来 class 和 tensorboard 就要出场了。首先引用下知乎一篇文章（http://www.jianshu.com/p/e112012a4b2d），讲到写 tf 的时候要有良好的代码结构，比如分成操作，模型等几个部分。这个多看看 github 上别人是怎么写的会有一个初步的印象。其实写的跟别人 github 上的 tf 代码差不多也就很可以了，毕竟这就是我的最后最终目的。在这个过程中我发现，在代码结构中，还需要用到面向对象的思想，比如定义网络就写成一个类（class），封装好，调用也方便。这就是我之前说到的代码结构，封装，class。

　　接下来，就是把可视化，一个看的清楚的可视化，可以看出你的代码是否有问题，这里就是使用我之前说到的 tensorboard。要用 tensorboard，就必须对变量进行管理，使用到的是name_scope和variable_scope，不管理的话你看到的 graph将是密密麻麻的东西。而使用scope，则可以显示出一个个结点，每个结点包含一些变量，如下代码使用name_scope则可以显示出一个简单的两个全连接层（fully-connected layer）的神经网络，代码部分引用自（http://www.jianshu.com/p/e112012a4b2d）：

 1 #!/usr/bin/python
 2 # -*- coding:utf-8 -*-
 3 
 4 import tensorflow as tf
 5 from tensorflow.examples.tutorials.mnist import input_data
 6 
 7 def weight_variable(shape):
 8     initial = tf.truncated_normal(shape, stddev=0.1)
 9     return tf.Variable(initial)
10 
11 def bias_variable(shape):
12     initial = tf.constant(0.1, shape=shape)
13     return tf.Variable(initial)
14 
15 def conv2d(x, W):
16     return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
17 
18 def max_pool_2x2(x):
19     return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
20 
21 def fc_layer(inputs, in_size, out_size, activation_func=None):
22     with tf.name_scope('layer'): # 自动加下标
23         with tf.name_scope('weights'):
24             Weights = tf.Variable(tf.random_normal([in_size, out_size]), name='W') # 不写name就默认是'Variable'
25         with tf.name_scope('biases'):
26             biases = tf.Variable(tf.zeros([1, out_size]) + 0.1, name='b')
27         with tf.name_scope('Wx_plus_b'):
28             Wx_plus_b = tf.matmul(inputs, Weights) + biases
29         if activation_func is None:
30             outputs = Wx_plus_b
31         else:
32             outputs = activation_func(Wx_plus_b)
33         return outputs
34 
35 def main():
36     mnist = input_data.read_data_sets('/tmp/data', one_hot=True,  fake_data=False)
37 
38     with tf.variable_scope('inputs'):
39         x = tf.placeholder(tf.float32, [None, 784], name='x_input')
40         y = tf.placeholder(tf.float32, [None, 10], name='y_input')
41 
42     fc1 = fc_layer(x, 784, 100, activation_func=tf.nn.relu)
43     fc2 = fc_layer(fc1, 100, 100, activation_func=tf.nn.relu)
44     logits = fc_layer(fc2, 100, 10)
45     outputs = tf.nn.softmax(logits)
46     
47     with tf.name_scope('loss'):
48         # cross_entropy1 = -tf.reduce_sum(y*tf.log(outputs)) # log里面可能为0
49         cross_entropy2 = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
50 
51     with tf.name_scope('train'):
52         train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy2)
53 
54     sess = tf.Session()     
55     sess.run(tf.global_variables_initializer())
56     
57     writer = tf.summary.FileWriter("logs/", sess.graph)
58     
59 if __name__ == '__main__':
60     main()

　　这个代码只有构建网络，没有训练的部分，因为我的小笔记本连训练 MNIST 都吃不消啊。首先说下一些无关紧要却又很紧要的地方：我的环境是win8下的anaconda3，配上tensorflow 1.0，现在sess.run(init)里面的init已经换成上面的tf.global_variables_initializer()，保存graph语句则变成了下面的writer = tf.summary.FileWriter("logs/", sess.graph)。首先python运行代码，然后tensorboard --logdir=logs运行（不用引号），就可以输入网址看可视化了。然后仔细看fc_layer函数会发现，name_scope中的"layer"两次调用重名了，会自动加下标变成layer_1，layer_2这样子，这样全部变量都不会重名了，很神奇。然后cross_entropy2就是我说的一种解决交叉熵log里面为0的问题的方法。

　　现在你会发现，这完全没有分模块写啊。的确，但是先不急，先再接着看另外一个scope：variable_scope会有什么效果。好吧，其实差不多，也是定义一个scope，但是variable_scope 可以跟 get_variable一起使用，这样来说不同 variable_scope name 下我们使用相同的变量名字（如 'weights'）也不会有冲突问题，因为前缀是不同的variable_scope。但是name_scope则没有这样的配合使用效果，因此我发现很多人常用的是 variable_scope 。

----------------------------手动分割----------------------

　　OK，写到这里，你可以可视化看到漂亮的 graph 显示你的网络是长什么样子的。但是接下来还不急着训练网络，因为我们不希望向上面一样把整个流程（读数据，写网络，训练）写到同一份代码里，因此，这里分3块：网络写成一个类（class），读数据写一个文件，最后在main函数（文件）写完整个流程，这样分开写的重用性比较好，也体现了所谓的面向对象编程，感觉这样写是主流了。

　　所以，接下来我就用一个fcn网络（Fully Convolutional Network）来举个栗子。所谓的fcn就是通过一堆卷积和转置卷积，把输入image输出二维的feature map，没有fc层，输出的可能是像素级别的分类，blablabla之类的，根据你的需求了。前面的卷积就不说了，后面的转置卷积是如何操作的，这个让我理解了很久。据说这个网站（https://github.com/vdumoulin/conv_arithmetic）的动画很权威，毕竟是来自 Montreal 的。这些操作对应的 stride 是在让我一开始很懵逼，但是后来我理解的是这些转置卷积的 stride=2 描述的都是其对应卷积操作的 stride=2。这样转置卷积的时候，就相当于中间是间隔着插入0，再进行卷积，这样就把 feature map 增大了一倍，跟卷积的时候用stride=2 使得feature map缩小一倍刚好对应起来了。在 tf 里面实现的时候参数就是根据这样来设置 padding 和 stride 的。接着所谓的 bilinear upsampling 就是转置卷积的时候实现双线性插值一样的效果，即构造一个卷积核，通过 bilinear interpolation 的初始化方式，这样卷积后就是得到双线性插值的效果了。然后在这个基础上，这个卷积核还可以训练的，就设为get_variable，不可训练的就是constant。嗯，感觉没什么问题，直接上代码，一个网络类实现了构造网络，训练，模型保存载入等等功能，有需要的可以看下哈！基本符合我之前所讲的东西，改进后的代码看起来会高端一些了。

  1 #!/usr/bin/python
  2 # -*- coding:utf-8 -*-
  3 
  4 import inspect
  5 import os
  6 
  7 import numpy as np
  8 import tensorflow as tf
  9 import time
 10 from math import ceil
 11 import random
 12 
 13 VGG_MEAN = [103.939, 116.779, 123.68]
 14 
 15 
 16 class FullyConvNet:
 17     def __init__(self, batch_size=16, loss_type='mse', optimizer='momentum', vgg16_npy_path=None):
 18         # if vgg16_npy_path is None:
 19         #     path = inspect.getfile(Vgg16)
 20         #     path = os.path.abspath(os.path.join(path, os.pardir))
 21         #     path = os.path.join(path, "vgg16.npy")
 22         #     vgg16_npy_path = path
 23         #     print (path)
 24 
 25         # self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item()
 26         # print("npy file loaded")
 27         self.batch_size = batch_size
 28         self.loss_type = loss_type
 29         self.optimizer = optimizer
 30         self.learning_rate = 0.01
 31         
 32         self.build()
 33         self.sess = tf.Session()
 34         
 35         self.sess.run(tf.global_variables_initializer())
 36         writer = tf.summary.FileWriter("logs/", self.sess.graph)
 37         self.saver = tf.train.Saver()
 38 
 39     def build(self):
 40 
 41         # start_time = time.time()
 42         # print("build model started")
 43         # rgb_scaled = rgb * 255.0
 44 
 45         # # Convert RGB to BGR
 46         # red, green, blue = tf.split(rgb_scaled, 3, 3)
 47         # assert red.get_shape().as_list()[1:] == [224, 224, 1]
 48         # assert green.get_shape().as_list()[1:] == [224, 224, 1]
 49         # assert blue.get_shape().as_list()[1:] == [224, 224, 1]
 50         # bgr = tf.concat([
 51         #     blue - VGG_MEAN[0],
 52         #     green - VGG_MEAN[1],
 53         #     red - VGG_MEAN[2],
 54         # ], 3)
 55         # assert bgr.get_shape().as_list()[1:] == [224, 224, 3]
 56         
 57 
 58         with tf.variable_scope('inputs'):
 59             self.x = tf.placeholder(tf.float32, [self.batch_size, 224, 224, 3], name='x_input')
 60             self.y = tf.placeholder(tf.float32, [self.batch_size, 224, 224, 5], name='y_input')
 61 
 62         self.conv1_1 = self.conv_layer(self.x, 16, [3, 3], 'conv1')
 63         self.pool1 = self.max_pool(self.conv1_1, 'pool1')
 64         # self.conv2_1 = self.conv_layer(self.pool1, 64, [3, 3], "conv2")
 65         # self.pool2 = self.max_pool(self.conv2_1, 'pool2')
 66         self.deconv1 = self.deconv_layer(self.pool1, 5, [3, 3], [self.batch_size, 224, 224, 5], 'deconv1')
 67         # self.pool2 = self.max_pool(self.deconv1, 'pool2')
 68 
 69         with tf.variable_scope('loss'):
 70             if self.loss_type == 'mse':
 71                 self.loss = tf.reduce_mean(tf.squared_difference(self.y, self.deconv1, name='mse'))
 72 
 73             else:
 74                 self.loss = tf.reduce_mean(tf.squared_difference(self.y, self.deconv1, name='mse'))
 75 
 76         with tf.variable_scope('train'):
 77             if self.optimizer == 'momentum':
 78                 self.train_step = tf.train.MomentumOptimizer(self.learning_rate, 0.9).minimize(self.loss)
 79 
 80             else:
 81                 self.train_step = tf.train.GradientDescentOptimizer(self.learning_rate).minimize(self.loss)
 82                 
 83 
 84     def fit(self, x_train, y_train, x_test, y_test, iterations=2):
 85         train_num = x_train.shape[0]
 86         # print (sample_index)
 87         # print (x_train[sample_index,:].shape)
 88         
 89         # tf.all_variables()
 90         old_cost = 100
 91         for i in range(iterations):
 92             sample_index = random.sample(range(0, train_num), self.batch_size)
 93             _, cost = self.sess.run([self.train_step, self.loss], feed_dict={self.x:x_train[sample_index,:], self.y:y_train[sample_index,:]})
 94             test_cost = self.sess.run([self.loss], feed_dict={self.x:x_test, self.y:y_test})
 95             if cost < old_cost:
 96                 print ('training_cost, testing_cost: ', i, cost, test_cost)
 97                 old_cost = cost
 98                 save_path = self.saver.save(self.sess, 'tmp/model.ckpt')
 99                 # self.saver.restore(self.sess, 'tmp/model.ckpt')
100                 cost = self.sess.run([self.loss], feed_dict={self.x:x_train[sample_index,:], self.y:y_train[sample_index,:]})
101                 print (cost)
102 
103     def forward(self, x, y):
104         self.saver.restore(self.sess, 'tmp/model.ckpt')
105         cost = self.sess.run([self.loss], feed_dict={self.x:x[0:10], self.y:y[0:10]})
106         print (cost)
107         output = self.sess.run([self.deconv1], feed_dict={self.x:x[0:10]})
108         print (output[0].shape)
109         return output[0]
110         
111     def max_pool(self, bottom, name):
112         with tf.variable_scope(name):
113             return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
114 
115     def conv_layer(self, bottom, out_size, kernel_shape, name):
116         with tf.variable_scope(name):
117             in_size = bottom.get_shape().as_list()[-1]
118             # print (bottom.get_shape().as_list())
119             weights = tf.get_variable('weights', shape=[kernel_shape[0], kernel_shape[1], in_size, out_size], initializer=tf.random_normal_initializer())
120             # bias = tf.get_variable('bias', [1, n_l1], initializer=b_initializer, collections=c_names)
121             conv = tf.nn.conv2d(bottom, weights, [1, 1, 1, 1], padding='SAME')
122             # conv_biases = tf.nn.bias_add(conv, bias)
123             relu = tf.nn.relu(conv)
124             return relu
125 
126     def deconv_layer(self, bottom, out_size, kernel_shape, output_shape, name):
127         with tf.variable_scope(name):
128             in_size = bottom.get_shape().as_list()[-1]
129             # print (in_size)
130             f_shape = [kernel_shape[0], kernel_shape[1], out_size, in_size]
131             weights = self.bilinear_interpolation_init(f_shape)
132             # in_shape = bottom.get_shape().as_list()[0]
133             deconv = tf.nn.conv2d_transpose(bottom, weights, output_shape=output_shape, strides=[1,2,2,1], padding="SAME")
134             return deconv
135 
136 
137     def bilinear_interpolation_init(self, f_shape):
138         width = f_shape[0]
139         heigh = f_shape[0]
140         f = ceil(width/2.0)
141         c = (2 * f - 1 - f % 2) / (2.0 * f)
142         bilinear = np.zeros([f_shape[0], f_shape[1]])
143         for x in range(width):
144             for y in range(heigh):
145                 value = (1 - abs(x / f - c)) * (1 - abs(y / f - c))
146                 bilinear[x, y] = value
147         weights = np.zeros(f_shape)
148         for i in range(f_shape[2]):
149             weights[:, :, i, i] = bilinear
150 
151         init = tf.constant_initializer(value=weights, dtype=tf.float32)
152         var = tf.get_variable(name="up_filter", initializer=init, shape=weights.shape)
153         return var

这就是代码当中的模型部分，可以称为 neural net model，所有训练的步骤都写在那里了，main文件中直接调用这个类的函数就可以了，这样main函数就很简洁了。那么另外可以写一个utils文件进行图像预处理，画图等操作，这样三个文件基本就可以处理一个项目了。

3. 未完待续

　　先写这么多了，有漏的再补。写了这么多发现自己写的真是不怎么样，还不如会写诗的AI，有时候真的觉得AI做的比人好其实很正常，模型是很多高智商人士设计的，而AI学习过程却仍有黑箱并且总是能产生让我们意想不到的能力（比如AlphaGo），可以说是基于人类又胜于人类，非常可怕。匿了，Auf wiedersehen。