一、配置

该 config 模块包含了好几个属性用来修改theano的行为。许多属性会在导入theano模块的时候被检查的，其中一些属性是被假定成只读形式的。约定俗成，在config模块中的属性不应该在用户的代码中被修改。

Theano的代码对这些属性都有默认值的，不过你可以从你的 .theanorc 文件中对它们进行覆盖，然而 THEANO_FLAGS 环境变量又会覆盖这些值。

优先级顺序如下：

对theano.config.<property>的赋值。
在THEANO_FLAGS中的赋值
在 .theanorc file (或者是在 THEANORC中指定的文件)文件中的赋值。

你可以在任何时候通过theano.config打印出当前的配置。例如，为了查看所有激活的配置变量的列表，输入下面的命令：

python -c 'import theano; print theano.config' | less

更详细的，请看库中的 Configuration 。

二、练习

考虑逻辑回归：

import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probability of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])

# Compile expressions to functions
train = theano.function(
            inputs=[x,y],
            outputs=[prediction, xent],
            updates={w:w-0.01*gw, b:b-0.01*gb},
            name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
            name = "predict")

if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
        train.maker.fgraph.toposort()]):
    print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
          train.maker.fgraph.toposort()]):
    print 'Used the gpu'
else:
    print 'ERROR, not able to tell if theano used the cpu or the gpu'
    print train.maker.fgraph.toposort()

for i in range(training_steps):
    pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()

print "target values for D"
print D[1]

print "prediction on D"
print predict(D[0])

修改这个例子然后在cpu（默认情况）上执行，使用floatX=float32，然后使用计时命令 time python file.py（该命令在win8下无法执行）。保存你的代码，稍后还会用到。

note：

在代码中使用theano的flag floatX=float32 (通过theano.config.floatX来配置) 。
在存储到共享变量之前先Cast输入到一个共享变量中
避免本来将int32 cast成float32的，自动cast成float64.
- 在代码中手动插入cast 或者使用[u]int{8,16}.
- 在均值操作上手动插入cast (这会涉及到除以length，其中length是一个int64的类型).
- 注意到一个新的casting机制现在在开发。

答案（Solution）：

#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Configuration Settings and Compiling Modes'

from __future__ import print_function
import numpy
import theano
import theano.tensor as tt

theano.config.floatX = 'float32'

rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N, low=0, high=2).astype(theano.config.floatX))
training_steps = 10000

# Declare Theano symbolic variables
x = tt.matrix("x")
y = tt.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b))  # Probability of having a one
prediction = p_1 > 0.5  # The prediction that is done: 0 or 1
xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1)  # Cross-entropy
cost = tt.cast(xent.mean(), 'float32') + 
       0.01 * (w ** 2).sum()  # The cost to optimize
gw, gb = tt.grad(cost, [w, b])

# Compile expressions to functions
train = theano.function(
            inputs=[x, y],
            outputs=[prediction, xent],
            updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
            name="train")
predict = theano.function(inputs=[x], outputs=prediction,
            name="predict")

if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
    print('Used the cpu')
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
    print('Used the gpu')
else:
    print('ERROR, not able to tell if theano used the cpu or the gpu')
    print(train.maker.fgraph.toposort())

for i in range(training_steps):
    pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()

print("target values for D")
print(D[1])

print("prediction on D")
print(predict(D[0]))

三、模式

每次 theano.function 被调用的时候，介于输入和输出之间的theano变量之间的符号关系是会被优化和编译的。编译的方式是由mode参数所控制的。

Theano通过名字定义的以下模型：

'FAST_COMPILE': 只使用一点graph优化，只用python实现。
'FAST_RUN': 使用所有的优化，并在可能的地方使用c实现。
'DebugMode: 验证所有优化的正确性，对比c和pytho的实现。该模式可能会花比其他模式下更长的时间，不过却能验证几种不同的问题。
'ProfileMode' (弃用): 和FAST_RUN一样的优化，不过打印出一些分析信息
默认模式是 FAST_RUN,，不过可以通过配置变量 config.mode来改变，这可以通过传递关键参数给theano.function来重写该值。

short name	Full constructor	What does it do?
`FAST_COMPILE`	`compile.mode.Mode(linker='py',optimizer='fast_compile')`	只用Python实现,快速和简单的graph转换
`FAST_RUN`	`compile.mode.Mode(linker='cvm',optimizer='fast_run')`	在可以的地方用C实现，使用所有的graph转换技术
`DebugMode`	`compile.debugmode.DebugMode()`	两种实现方式，使用所有的graph转换技术
`ProfileMode`	`compile.profilemode.ProfileMode()`	弃用，在可以的地方c实现，所有的graph转换技术，打印profile 信息

note：对于调试的目的来说，还有一个 MonitorMode 。它可以用来以step的方式来查看函数的执行，更详细的看 the debugging FAQ

四、连接器

模式是有2个部分组成的：一个优化器和一个连接器。许多模式，例如 ProfileMode 和 DebugMode, 在优化器和连接器上增加逻辑。 ProfileMode 和DebugMode使用它们自己的连接器。

可以通过theano flag config.linker来选择使用哪个连接器。这里是一个不同连接器的对比表：

linker	gc [1]	Raise error by op	Overhead	Definition
cvm	yes	yes	“++”	和 c \| py一样， but the runtime algo to execute the code is in c
cvm_nogc	no	yes	“+”	和 cvm一样，不过没有gc
c\|py [2]	yes	yes	“+++”	尝试使用 C code，如果没有有关op 的c代码，那就使用Python的
c\|py_nogc	no	yes	“++”	和 c\|py一样，不过没有 gc
c	no	yes	“+”	只用 C代码 (如果对op没有可用的c代码，抛出错误)
py	yes	yes	“+++”	只用Python代码
ProfileMode	no	no	“++++”	(弃用) 计算一些额外的profiling信息
DebugMode	no	yes	VERY HIGH	在theano的计算上进行许多检查

[1] 在计算的时候对中间的值采用垃圾回收。不然，为了不要重新分配内存，和更少的重写（意味着更快），被ops使用的内存空间将一直保存在theano的函数调用中。

[2] 默认。

更多详细信息，查看库中的 Mode 部分。

五、使用调试模式

通常来说，你应该使用 FAST_RUN 或者 FAST_COMPILE 模式，首先在使用调试模式的时候（mode='DebugMode）运行你的代码的时候，这很有用 (特别是当你在定义新的表达式或新的优化的时候) 。调试模式是设计用来运行一些自我检查和断言，有助于诊断可能的编码错误导致的不正确输出。。注意到DebugMode 比 FAST_RUN 或 FAST_COMPILE 要慢，所以只在开发的时候使用该模式 (不要当在一个集群上运行1000 进程的时候用).

调试模式按如下方式使用：

x = T.dvector('x')

f = theano.function([x], 10 * x, mode='DebugMode')

f([5])
f([0])
f([7])

如果检测到任何问题，DebugMode 将会抛出一个异常来指定出错的信息，不论是在调用的时候(f(5))还是编译的时候(f = theano.function(x, 10 * x, mode='DebugMode'))。这些异常不应该被忽略，和你的当地的theano guru谈谈或者当异常没法搞定的时候记得给使用者发邮件

许多种错误只能只有当某些输入值结合的时候才会被检测到。在上面的例子中，没有方法保证说一个函数的调用，例如f(-1)不会引起问题，DebugMode不是银弹（有本软件工程的书就叫做《没有银弹》）。

如果你使用构造器（见DebugMode）来实例化 DebugMode 而不是使用关键字 DebugMode ，你就能通过构造器的参数来配置它的行为。而DebugMode的关键字版本是相当严格的 (通过使用 mode='DebugMode'来得到) 。

更详细的，见库的DebugMode 。

六、ProfileMode

note：ProfileMode 被弃用了，使用 config.profile 来代替的。

在检查错误的同事，另一个重要的任务就是profile你的代码。对于thean使用的一个特殊的模式叫做ProfileMode，它是用来作为参数传递给 theano.function的。使用该模式是一个三步的过程。

note：为了切换到相应的默认情况下，设置theano 的flag config.mode 为ProfileMode。在这种情况下，当python的进程存在的时候，它会自动的打印profiling信息到标准输出端口上。

T每个apply节点的输出的内存profile可以被theano 的flag config.ProfileMode.profile_memory所启用。

更详细的，看看库中 ProfileMode 的部分。

七、创建一个ProfileMode实例

首先，创建一个ProfileMode实例：

>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())

ProfileMode的构造器将一个优化器和一个连接器作为输入。使用哪个优化器和连接器是由应用所决定的。例如，一个用户想要只profile python的实现，就应该使用gof.PerformLinker (或者 “py” for short)。在另一方面，一个用户想要使用c实现来profile他的graph，那么久应该使用 gof.OpWiseCLinker (or “c|py”)。为了测试你代码的速度，我们推荐使用fast_run 优化器和 gof.OpWiseCLinker 连接器。

八、用ProfileMode来编译graph

一旦ProfileMode实例创建好了，通过指定模式的参数来简化编译你的graph，就和平常一样：

>>> # with functions
>>> f = theano.function([input1,input2],[output1], mode=profmode)

九、检索时间信息

一旦你的graph编译好了，简单的运行你希望profile的程序或操作，然后调用 profmode.print_summary()。 这会给你提供合适的时间信息，用来指明你的graph的哪个地方最耗时。这最好通过一个例子来说明，我们接着使用逻辑回归的例子吧。

使用 ProfileMode来编译模块，然后调用profmode.print_summary() 来生成下面的输出：

"""
ProfileMode.print_summary()
---------------------------

local_time 0.0749197006226 (Time spent running thunks)
Apply-wise summary: <fraction of local_time spent at this position> (<Apply position>, <Apply Op name>)
        0.069   15      _dot22
        0.064   1       _dot22
        0.053   0       InplaceDimShuffle{x,0}
        0.049   2       InplaceDimShuffle{1,0}
        0.049   10      mul
        0.049   6       Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
        0.049   3       InplaceDimShuffle{x}
        0.049   4       InplaceDimShuffle{x,x}
        0.048   14      Sum{0}
        0.047   7       sub
        0.046   17      mul
        0.045   9       sqr
        0.045   8       Elemwise{sub}
        0.045   16      Sum
        0.044   18      mul
   ... (remaining 6 Apply instances account for 0.25 of the runtime)
Op-wise summary: <fraction of local_time spent on this kind of Op> <Op name>
        0.139   * mul
        0.134   * _dot22
        0.092   * sub
        0.085   * Elemwise{Sub{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1779f10>}}[(0, 0)]
        0.053   * InplaceDimShuffle{x,0}
        0.049   * InplaceDimShuffle{1,0}
        0.049   * Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
        0.049   * InplaceDimShuffle{x}
        0.049   * InplaceDimShuffle{x,x}
        0.048   * Sum{0}
        0.045   * sqr
        0.045   * Sum
        0.043   * Sum{1}
        0.042   * Elemwise{Mul{output_types_preference=<theano.scalar.basic.transfer_type object at 0x17a0f50>}}[(0, 1)]
        0.041   * Elemwise{Add{output_types_preference=<theano.scala

参考资料：

[1]官网：http://deeplearning.net/software/theano/tutorial/modes.html