Win10配Theano环境和Keras框架

网络上有各种各样的win7 64bit安装theano的方法，我也试过好多，各种各样的问题。因为之前没了解过MinGw等东西，所以安装起来比较费劲，经过不断的尝试，最终我按照以下过程安装成功。

其实过程很简单，首先说一下安装条件：

win10 (32和64都可以，下载安装包时一定要选择对应的)
vs2010(不一定非要是vs2010，恰好我有vs2010，应该是配置GPU编程时需要用到vs的编译器)
Anaconda（转到官方下载，打开之后稍微等一会就会出来下载链接了。之所以选择它是因为它内置了python，以及numpy、scipy两个必要库和一些其他库，比起自己安装要省事。至于版本随便选择了，如果想安装python3.4就下载对应的Anaconda3。本教程使用Anaconda，也就是对应的python2.7版本。安装过程无差别。）

安装过程：

一、卸载之前版本。
　　把之前单独安装的Python等统统卸载掉。学python的时候直接安装了python2.7，先把他卸载掉，因为Anaconda里边包含了python。

二、安装Anaconda。
　　这个超级简单，安装目录我用的是的 D:Anaconda2 。这个特别要注意：安装路径千万不要有空格！！！血的教训

三、安装MinGw。
　　其他教程讲在环境变量中添加 path D:Anaconda2MinGWin;D:Anaconda2MinGWx86_64-w64-mingw32lib; ，但是你会发现 D:Anaconda2 下面根本没有MinGw这个目录，所以最好的方法就是用命令安装，不需要自己下载什么mingw-steup.exe等。
安装方法：

打开CMD（注意是windows命令提示符，并不是进入到python环境下，否则会提示语法错误，因为conda命令就是在windows下面执行的。）；
输入conda install mingw libpython，然后回车，会出现安装进度，稍等片刻即可安装完毕。此时就有D:Anaconda2MinGw目录了。

四、配置环境变量。

编辑用户变量中的path变量（如果没有就新建一个，一般会有的），在后边追加D:Anaconda2;D:Anaconda2Scripts; 不要漏掉分号，此处因为我的Anaconda的安装目录是D:Anaconda2，此处需要根据自己的安装目录填写。
在用户变量中新建变量pythonpath，变量值为D:Anaconda2Libsite-packages heano; ，此处就是指明安装的theano的目录是哪，但是现在咱们还没有安装，所以不着急，先写完再说。
打开cmd，会看到窗口里边有个路径，我的是C:UsersAdministrator>，根据自己的路径，找到对应的目录，在该目录下新建一个文本文档.theanorc.txt （注意有两个“.”），编辑它，写入以下内容：
[global]
openmp=False
[blas]
ldflags=
[gcc]
cxxflags=-ID:Anaconda2MinGW
其中红体字部分是你安装的Anaconda的路径，一定不要弄错。否则找不到MinGw。
最好重启一下电脑。

五、安装Theano。
不需要手动下载zip等压缩包，直接用命令安装最简单。

打开CMD，方法和安装MinGw一样，不要进入python。
输入pip install theano，回车后就是赏心悦目的下载进度条，这个很小，所以安装的比较快。
- 这里我的安装出现了pip命令不能识别的问题
- ```
Unable to create process using '""
```
- 暂时用 python -m pip install theano来代替了
在cmd中，输入python 进入到python环境下，然后先输入import theano回车，需要等一段时间。
继续输入theano.test()。又会输出好长一段信息，只要没有error就说明安装成功了。我安装时等了一段时间还在输出，我就ctrl+c退出了。（其实我发现，有部分error信息也没有关系，theano的功能也可以正常使用，包括theano.function()，所以如果有同学无论如何配置还是有error信息的话，可以暂时忽略掉，直接跑一段程序试一下，可以去测试一下卷积操作运算代码。

六、使用GPU

　　因为博主电脑是AMD的显卡，CUDA显然不支持，也不用想把GPU利用起来。

七、深度学习框架Keras

打开CMD，方法和安装MinGw一样，不要进入python。
输入pip install theano，回车后就是赏心悦目的下载进度条。

　　同样pip命令识别不了，用的 python -m pip install keras代替

　　　　注：在Anaconda Prompt中是识别pip命令的，上述两个pip命令也可以直接在这里面装，效果是一样的。

八、小例子

　　1、theano测试

  1 from __future__ import print_function
  2 """
  3 Created on Tue Aug 16 14:05:45 2016
  4 
  5 @author: Administrator
  6 """
  7 
  8 """
  9 This tutorial introduces logistic regression using Theano and stochastic
 10 gradient descent.
 11 
 12 Logistic regression is a probabilistic, linear classifier. It is parametrized
 13 by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is
 14 done by projecting data points onto a set of hyperplanes, the distance to
 15 which is used to determine a class membership probability.
 16 
 17 Mathematically, this can be written as:
 18 
 19 .. math::
 20   P(Y=i|x, W,b) &= softmax_i(W x + b) \
 21                 &= frac {e^{W_i x + b_i}} {sum_j e^{W_j x + b_j}}
 22 
 23 
 24 The output of the model or prediction is then done by taking the argmax of
 25 the vector whose i'th element is P(Y=i|x).
 26 
 27 .. math::
 28 
 29   y_{pred} = argmax_i P(Y=i|x,W,b)
 30 
 31 
 32 This tutorial presents a stochastic gradient descent optimization method
 33 suitable for large datasets.
 34 
 35 
 36 References:
 37 
 38     - textbooks: "Pattern Recognition and Machine Learning" -
 39                  Christopher M. Bishop, section 4.3.2
 40 
 41 """
 42 
 43 
 44 
 45 __docformat__ = 'restructedtext en'
 46 
 47 import six.moves.cPickle as pickle
 48 import gzip
 49 import os
 50 import sys
 51 import timeit
 52 
 53 import numpy
 54 
 55 import theano
 56 import theano.tensor as T
 57 
 58 
 59 class LogisticRegression(object):
 60     """Multi-class Logistic Regression Class
 61 
 62     The logistic regression is fully described by a weight matrix :math:`W`
 63     and bias vector :math:`b`. Classification is done by projecting data
 64     points onto a set of hyperplanes, the distance to which is used to
 65     determine a class membership probability.
 66     """
 67 
 68     def __init__(self, input, n_in, n_out):
 69         """ Initialize the parameters of the logistic regression
 70 
 71         :type input: theano.tensor.TensorType
 72         :param input: symbolic variable that describes the input of the
 73                       architecture (one minibatch)
 74 
 75         :type n_in: int
 76         :param n_in: number of input units, the dimension of the space in
 77                      which the datapoints lie
 78 
 79         :type n_out: int
 80         :param n_out: number of output units, the dimension of the space in
 81                       which the labels lie
 82 
 83         """
 84         # start-snippet-1
 85         # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
 86         self.W = theano.shared(
 87             value=numpy.zeros(
 88                 (n_in, n_out),
 89                 dtype=theano.config.floatX
 90             ),
 91             name='W',
 92             borrow=True
 93         )
 94         # initialize the biases b as a vector of n_out 0s
 95         self.b = theano.shared(
 96             value=numpy.zeros(
 97                 (n_out,),
 98                 dtype=theano.config.floatX
 99             ),
100             name='b',
101             borrow=True
102         )
103 
104         # symbolic expression for computing the matrix of class-membership
105         # probabilities
106         # Where:
107         # W is a matrix where column-k represent the separation hyperplane for
108         # class-k
109         # x is a matrix where row-j  represents input training sample-j
110         # b is a vector where element-k represent the free parameter of
111         # hyperplane-k
112         self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
113 
114         # symbolic description of how to compute prediction as class whose
115         # probability is maximal
116         self.y_pred = T.argmax(self.p_y_given_x, axis=1)
117         # end-snippet-1
118 
119         # parameters of the model
120         self.params = [self.W, self.b]
121 
122         # keep track of model input
123         self.input = input
124 
125     def negative_log_likelihood(self, y):
126         """Return the mean of the negative log-likelihood of the prediction
127         of this model under a given target distribution.
128 
129         .. math::
130 
131             frac{1}{|mathcal{D}|} mathcal{L} (	heta={W,b}, mathcal{D}) =
132             frac{1}{|mathcal{D}|} sum_{i=0}^{|mathcal{D}|}
133                 log(P(Y=y^{(i)}|x^{(i)}, W,b)) \
134             ell (	heta={W,b}, mathcal{D})
135 
136         :type y: theano.tensor.TensorType
137         :param y: corresponds to a vector that gives for each example the
138                   correct label
139 
140         Note: we use the mean instead of the sum so that
141               the learning rate is less dependent on the batch size
142         """
143         # start-snippet-2
144         # y.shape[0] is (symbolically) the number of rows in y, i.e.,
145         # number of examples (call it n) in the minibatch
146         # T.arange(y.shape[0]) is a symbolic vector which will contain
147         # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
148         # Log-Probabilities (call it LP) with one row per example and
149         # one column per class LP[T.arange(y.shape[0]),y] is a vector
150         # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
151         # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
152         # the mean (across minibatch examples) of the elements in v,
153         # i.e., the mean log-likelihood across the minibatch.
154         return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
155         # end-snippet-2
156 
157     def errors(self, y):
158         """Return a float representing the number of errors in the minibatch
159         over the total number of examples of the minibatch ; zero one
160         loss over the size of the minibatch
161 
162         :type y: theano.tensor.TensorType
163         :param y: corresponds to a vector that gives for each example the
164                   correct label
165         """
166 
167         # check if y has same dimension of y_pred
168         if y.ndim != self.y_pred.ndim:
169             raise TypeError(
170                 'y should have the same shape as self.y_pred',
171                 ('y', y.type, 'y_pred', self.y_pred.type)
172             )
173         # check if y is of the correct datatype
174         if y.dtype.startswith('int'):
175             # the T.neq operator returns a vector of 0s and 1s, where 1
176             # represents a mistake in prediction
177             return T.mean(T.neq(self.y_pred, y))
178         else:
179             raise NotImplementedError()
180 
181 
182 def load_data(dataset):
183     ''' Loads the dataset
184 
185     :type dataset: string
186     :param dataset: the path to the dataset (here MNIST)
187     '''
188 
189     #############
190     # LOAD DATA #
191     #############
192 
193     # Download the MNIST dataset if it is not present
194     data_dir, data_file = os.path.split(dataset)
195     if data_dir == "" and not os.path.isfile(dataset):
196         # Check if dataset is in the data directory.
197         new_path = os.path.join(
198             os.path.split(__file__)[0],
199             "..",
200             "data",
201             dataset
202         )
203         if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
204             dataset = new_path
205 
206     if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':
207         from six.moves import urllib
208         origin = (
209             'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
210         )
211         print('Downloading data from %s' % origin)
212         urllib.request.urlretrieve(origin, dataset)
213 
214     print('... loading data')
215 
216     # Load the dataset
217     with gzip.open(dataset, 'rb') as f:
218         try:
219             train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
220         except:
221             train_set, valid_set, test_set = pickle.load(f)
222     # train_set, valid_set, test_set format: tuple(input, target)
223     # input is a numpy.ndarray of 2 dimensions (a matrix)
224     # where each row corresponds to an example. target is a
225     # numpy.ndarray of 1 dimension (vector) that has the same length as
226     # the number of rows in the input. It should give the target
227     # to the example with the same index in the input.
228 
229     def shared_dataset(data_xy, borrow=True):
230         """ Function that loads the dataset into shared variables
231 
232         The reason we store our dataset in shared variables is to allow
233         Theano to copy it into the GPU memory (when code is run on GPU).
234         Since copying data into the GPU is slow, copying a minibatch everytime
235         is needed (the default behaviour if the data is not in a shared
236         variable) would lead to a large decrease in performance.
237         """
238         data_x, data_y = data_xy
239         shared_x = theano.shared(numpy.asarray(data_x,
240                                                dtype=theano.config.floatX),
241                                  borrow=borrow)
242         shared_y = theano.shared(numpy.asarray(data_y,
243                                                dtype=theano.config.floatX),
244                                  borrow=borrow)
245         # When storing data on the GPU it has to be stored as floats
246         # therefore we will store the labels as ``floatX`` as well
247         # (``shared_y`` does exactly that). But during our computations
248         # we need them as ints (we use labels as index, and if they are
249         # floats it doesn't make sense) therefore instead of returning
250         # ``shared_y`` we will have to cast it to int. This little hack
251         # lets ous get around this issue
252         return shared_x, T.cast(shared_y, 'int32')
253 
254     test_set_x, test_set_y = shared_dataset(test_set)
255     valid_set_x, valid_set_y = shared_dataset(valid_set)
256     train_set_x, train_set_y = shared_dataset(train_set)
257 
258     rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
259             (test_set_x, test_set_y)]
260     return rval
261 
262 
263 def sgd_optimization_mnist(learning_rate=0.13, n_epochs=1000,
264                            dataset='mnist.pkl.gz',
265                            batch_size=600):
266     """
267     Demonstrate stochastic gradient descent optimization of a log-linear
268     model
269 
270     This is demonstrated on MNIST.
271 
272     :type learning_rate: float
273     :param learning_rate: learning rate used (factor for the stochastic
274                           gradient)
275 
276     :type n_epochs: int
277     :param n_epochs: maximal number of epochs to run the optimizer
278 
279     :type dataset: string
280     :param dataset: the path of the MNIST dataset file from
281                  http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
282 
283     """
284     datasets = load_data(dataset)
285 
286     train_set_x, train_set_y = datasets[0]
287     valid_set_x, valid_set_y = datasets[1]
288     test_set_x, test_set_y = datasets[2]
289 
290     # compute number of minibatches for training, validation and testing
291     n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size
292     n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] // batch_size
293     n_test_batches = test_set_x.get_value(borrow=True).shape[0] // batch_size
294 
295     ######################
296     # BUILD ACTUAL MODEL #
297     ######################
298     print('... building the model')
299 
300     # allocate symbolic variables for the data
301     index = T.lscalar()  # index to a [mini]batch
302 
303     # generate symbolic variables for input (x and y represent a
304     # minibatch)
305     x = T.matrix('x')  # data, presented as rasterized images
306     y = T.ivector('y')  # labels, presented as 1D vector of [int] labels
307 
308     # construct the logistic regression class
309     # Each MNIST image has size 28*28
310     classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)
311 
312     # the cost we minimize during training is the negative log likelihood of
313     # the model in symbolic format
314     cost = classifier.negative_log_likelihood(y)
315 
316     # compiling a Theano function that computes the mistakes that are made by
317     # the model on a minibatch
318     test_model = theano.function(
319         inputs=[index],
320         outputs=classifier.errors(y),
321         givens={
322             x: test_set_x[index * batch_size: (index + 1) * batch_size],
323             y: test_set_y[index * batch_size: (index + 1) * batch_size]
324         }
325     )
326 
327     validate_model = theano.function(
328         inputs=[index],
329         outputs=classifier.errors(y),
330         givens={
331             x: valid_set_x[index * batch_size: (index + 1) * batch_size],
332             y: valid_set_y[index * batch_size: (index + 1) * batch_size]
333         }
334     )
335 
336     # compute the gradient of cost with respect to theta = (W,b)
337     g_W = T.grad(cost=cost, wrt=classifier.W)
338     g_b = T.grad(cost=cost, wrt=classifier.b)
339 
340     # start-snippet-3
341     # specify how to update the parameters of the model as a list of
342     # (variable, update expression) pairs.
343     updates = [(classifier.W, classifier.W - learning_rate * g_W),
344                (classifier.b, classifier.b - learning_rate * g_b)]
345 
346     # compiling a Theano function `train_model` that returns the cost, but in
347     # the same time updates the parameter of the model based on the rules
348     # defined in `updates`
349     train_model = theano.function(
350         inputs=[index],
351         outputs=cost,
352         updates=updates,
353         givens={
354             x: train_set_x[index * batch_size: (index + 1) * batch_size],
355             y: train_set_y[index * batch_size: (index + 1) * batch_size]
356         }
357     )
358     # end-snippet-3
359 
360     ###############
361     # TRAIN MODEL #
362     ###############
363     print('... training the model')
364     # early-stopping parameters
365     patience = 5000  # look as this many examples regardless
366     patience_increase = 2  # wait this much longer when a new best is
367                                   # found
368     improvement_threshold = 0.995  # a relative improvement of this much is
369                                   # considered significant
370     validation_frequency = min(n_train_batches, patience // 2)
371                                   # go through this many
372                                   # minibatche before checking the network
373                                   # on the validation set; in this case we
374                                   # check every epoch
375 
376     best_validation_loss = numpy.inf
377     test_score = 0.
378     start_time = timeit.default_timer()
379 
380     done_looping = False
381     epoch = 0
382     while (epoch < n_epochs) and (not done_looping):
383         epoch = epoch + 1
384         for minibatch_index in range(n_train_batches):
385 
386             minibatch_avg_cost = train_model(minibatch_index)
387             # iteration number
388             iter = (epoch - 1) * n_train_batches + minibatch_index
389 
390             if (iter + 1) % validation_frequency == 0:
391                 # compute zero-one loss on validation set
392                 validation_losses = [validate_model(i)
393                                      for i in range(n_valid_batches)]
394                 this_validation_loss = numpy.mean(validation_losses)
395 
396                 print(
397                     'epoch %i, minibatch %i/%i, validation error %f %%' %
398                     (
399                         epoch,
400                         minibatch_index + 1,
401                         n_train_batches,
402                         this_validation_loss * 100.
403                     )
404                 )
405 
406                 # if we got the best validation score until now
407                 if this_validation_loss < best_validation_loss:
408                     #improve patience if loss improvement is good enough
409                     if this_validation_loss < best_validation_loss *  
410                        improvement_threshold:
411                         patience = max(patience, iter * patience_increase)
412 
413                     best_validation_loss = this_validation_loss
414                     # test it on the test set
415 
416                     test_losses = [test_model(i)
417                                    for i in range(n_test_batches)]
418                     test_score = numpy.mean(test_losses)
419 
420                     print(
421                         (
422                             '     epoch %i, minibatch %i/%i, test error of'
423                             ' best model %f %%'
424                         ) %
425                         (
426                             epoch,
427                             minibatch_index + 1,
428                             n_train_batches,
429                             test_score * 100.
430                         )
431                     )
432 
433                     # save the best model
434                     with open('best_model.pkl', 'wb') as f:
435                         pickle.dump(classifier, f)
436 
437             if patience <= iter:
438                 done_looping = True
439                 break
440 
441     end_time = timeit.default_timer()
442     print(
443         (
444             'Optimization complete with best validation score of %f %%,'
445             'with test performance %f %%'
446         )
447         % (best_validation_loss * 100., test_score * 100.)
448     )
449     print('The code run for %d epochs, with %f epochs/sec' % (
450         epoch, 1. * epoch / (end_time - start_time)))
451     print(('The code for file ' +
452            os.path.split(__file__)[1] +
453            ' ran for %.1fs' % ((end_time - start_time))), file=sys.stderr)
454 
455 
456 def predict():
457     """
458     An example of how to load a trained model and use it
459     to predict labels.
460     """
461 
462     # load the saved model
463     classifier = pickle.load(open('best_model.pkl'))
464 
465     # compile a predictor function
466     predict_model = theano.function(
467         inputs=[classifier.input],
468         outputs=classifier.y_pred)
469 
470     # We can test it on some examples from test test
471     dataset='mnist.pkl.gz'
472     datasets = load_data(dataset)
473     test_set_x, test_set_y = datasets[2]
474     test_set_x = test_set_x.get_value()
475 
476     predicted_values = predict_model(test_set_x[:10])
477     print("Predicted values for the first 10 examples in test set:")
478     print(predicted_values)
479 
480 
481 if __name__ == '__main__':
482     sgd_optimization_mnist()

　　2、Keras测试

 1 '''Trains a simple convnet on the MNIST dataset.
 2 Gets to 99.25% test accuracy after 12 epochs
 3 (there is still a lot of margin for parameter tuning).
 4 16 seconds per epoch on a GRID K520 GPU.
 5 '''
 6 
 7 from __future__ import print_function
 8 import numpy as np
 9 np.random.seed(1337)  # for reproducibility
10 
11 from keras.datasets import mnist
12 from keras.models import Sequential
13 from keras.layers import Dense, Dropout, Activation, Flatten
14 from keras.layers import Convolution2D, MaxPooling2D
15 from keras.utils import np_utils
16 
17 batch_size = 128
18 nb_classes = 10
19 nb_epoch = 12
20 
21 # input image dimensions
22 img_rows, img_cols = 28, 28
23 # number of convolutional filters to use
24 nb_filters = 32
25 # size of pooling area for max pooling
26 nb_pool = 2
27 # convolution kernel size
28 nb_conv = 3
29 
30 # the data, shuffled and split between train and test sets
31 (X_train, y_train), (X_test, y_test) = mnist.load_data()
32 
33 X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
34 X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
35 X_train = X_train.astype('float32')
36 X_test = X_test.astype('float32')
37 X_train /= 255
38 X_test /= 255
39 print('X_train shape:', X_train.shape)
40 print(X_train.shape[0], 'train samples')
41 print(X_test.shape[0], 'test samples')
42 
43 # convert class vectors to binary class matrices
44 Y_train = np_utils.to_categorical(y_train, nb_classes)
45 Y_test = np_utils.to_categorical(y_test, nb_classes)
46 
47 model = Sequential()
48 
49 model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
50                         border_mode='valid',
51                         input_shape=(1, img_rows, img_cols)))
52 model.add(Activation('relu'))
53 model.add(Convolution2D(nb_filters, nb_conv, nb_conv))
54 model.add(Activation('relu'))
55 model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
56 model.add(Dropout(0.25))
57 
58 model.add(Flatten())
59 model.add(Dense(128))
60 model.add(Activation('relu'))
61 model.add(Dropout(0.5))
62 model.add(Dense(nb_classes))
63 model.add(Activation('softmax'))
64 
65 model.compile(loss='categorical_crossentropy',
66               optimizer='adadelta',
67               metrics=['accuracy'])
68 
69 model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
70           verbose=1, validation_data=(X_test, Y_test))
71 score = model.evaluate(X_test, Y_test, verbose=0)
72 print('Test score:', score[0])
73 print('Test accuracy:', score[1])

参考链接：

1. Keras深度学习框架配置

2. 小白Windows7/10 64Bit安装Theano并实现GPU加速(没有MinGw等，详细步骤)

3. https://bitbucket.org/pypa/distlib/issues/47/exe-launcher-fails-if-there-is-a-space-in

4. http://stackoverflow.com/questions/24627525/fatal-error-in-launcher-unable-to-create-process-using-c-program-files-x86/26428562#26428562

5. Theano 安装教程

6. Installation of Theano on Windows

7. http://stackoverflow.com/questions/33687103/how-to-install-theano-on-anaconda-python-2-7-x64-on-windows?noredirect=1&lq=1

8. Keras官方教程

9. Theano官方教程