Performance tuning in Python

组内同事下午做了个关于python的小的share，觉得讲的还不错，将其中的Performance Tuning部分摘抄出来，供参考。

首先，谨记高德纳老先生的名言：过早优化是万恶之源（Premature optimization is the root of all evil.）

性能调优分问如下几步：

Find bottlenecks
Use better algorithms
Use faster tools
Write optimized code
Write your own python module
Parallezie the compution

第一步是找出程序运行的性能瓶颈所在。python里提供了相应的工具，如Profile和cProfile。

这里先给出一份描述更详尽的文章：关于Python Profiles性能分析器http://kb.cnblogs.com/a/2337112/

Profile是一个纯的python模块，而cProfile是用C语言写的一个python拓展。这里使用cProfile。

（注：cProfile可能需要自行安装sudo apt-get install python-profiler）。

给出一个待分析的程序profiler_demo.py：

 1 #!/usr/bin/env python
 2 
 3 above_limit = 10000001
 4 def func1():
 5     s = 0
 6     for i in xrange(above_limit):
 7         s += i
 8 
 9 def func2():
10     s = sum(range(above_limit))
11 
12 def func3():
13     s = sum(xrange(above_limit))
14 
15 func1()
16 func2()
17 func3()

在命令行执行python -m cProfile profiler_demo.py ，即可在运行程序的同时，得出程序的性能分析结果。

liuhao@liuhao-Lenovo:~/program/python$ python -m cProfile profiler_demo.py 
         8 function calls in 1.083 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.101    0.101 profiler_demo.py:12(func3)
        1    0.000    0.000    1.083    1.083 profiler_demo.py:3(<module>)
        1    0.377    0.377    0.377    0.377 profiler_demo.py:4(func1)
        1    0.153    0.153    0.604    0.604 profiler_demo.py:9(func2)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.289    0.289    0.289    0.289 {range}
        2    0.263    0.131    0.263    0.131 {sum}

对结果进行简单说明：

1.从第一行可以看出，此Python脚本共包括8个函数调用，一共花费的CPU时间为1.083秒；

2.ncalls为函数调用次数，tottime为函数本身不包含调用其他函数的执行时间，cumtime为总体函数的调用执行时间；

3.cProfiler是基于lsprof的，从输出中标红的部分也可以看到；

4.按照结果，我们知道func3是执行最快的；

第二步，是选用更好的算法，在数据规模很大的时候，能够使用O（NlogN）的算法，就不要选用O（N^2）的算法；能够用(1+100)*100/2，就不要用sum(xrange(101));

这里再给出一个使用Python提供的Decorator，利用空间换时间，计算Fibonacci数列的例子：

程序如下：

 1 def fib_nocache(n):
 2     if n == 0 or n == 1:
 3         return 1
 4     return fib_nocache(n-2) + fib_nocache(n-1)
 5 
 6 def cache(func):
 7     c = {}
 8     def _(n):
 9         r = c.get(n)
10         if r is None:
11             r = c[n] = func(n)
12         return r
13     return _
14 
15 @cache
16 def fib_cache(n):
17     if n == 0 or n == 1:
18         return 1
19     return fib_cache(n-2) + fib_cache(n-1)
20        
21 fib_nocache(32)
22 fib_cache(32)

性能分析结果如下：

         7049317 function calls (69 primitive calls) in 2.305 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.305    2.305 profiler_fib.py:1(<module>)
7049155/1    2.305    0.000    2.305    2.305 profiler_fib.py:1(fib_nocache)
     33/1    0.000    0.000    0.000    0.000 profiler_fib.py:15(fib_cache)
        1    0.000    0.000    0.000    0.000 profiler_fib.py:6(cache)
     63/1    0.000    0.000    0.000    0.000 profiler_fib.py:8(_)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       63    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}

之前没有接触过Decorator，看到类似的写法还是蛮受用的。

第三步，使用更好的工具。这里主要指Python中的一些惯用法，比如：

xrange()比range()更快；
itertools.imap()比map()更快；
dict.iteritems()比dict.items更快；
for i, item in enumerate(seq)比 for i in range(len(seq))更快；
etc

第四步，写更好的代码，将好的算法和好的工具结合到一块；

第五步和第六步，一个是将关键的模块用C来重写，一个是将原本串行的程序改成并行执行，都值得再单独写一篇博客，以后有空再慢慢写吧。