分析由Python编写的大型项目(Volatility和Cuckoo)

之前使用python都是用来做一些简单的脚本，本质上和bat批处理文件没有区别。

但是Python是可以用来编写大型的项目的，比如:

Volatility:https://code.google.com/p/volatility/

Cuckoo:http://cuckoosandbox.org/index.html

1. ctypes库

http://docs.python.org/2/library/ctypes.html

Python被称为一种“胶水语言（http://en.wikipedia.org/wiki/Glue_language#Glue_languages）”，是因为它可以很方便地与其他语言，比如C语言进行协作。

另外，Windows操作系统的API基本上都是由DLL库提供的，Python如果需要使用Windows API，就需要能够与DLL进行协作。

ctypes就是解决这两种需要的库。

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

>>> import ctypes>>> print ctypes.windll.kernel32<WinDLL 'kernel32', handle 77090000 at 29eb208>>>> 

我们到Process Hacker中查看IDLE进程的module列表：

kernel32.dll, 0x77090000, 1.12 MB, Windows NT BASE API Client DLL

“print ctypes.windll.kernel32”是如何打印出来的？

class CDLL(object):    def __repr__(self):            return "<%s '%s', handle %x at %x>" %                    (self.__class__.__name__, self._name,                    (self._handle & (_sys.maxint*2 + 1)),                    id(self) & (_sys.maxint*2 + 1))    

id(self)代表的是object的地址，而self._handle是ctypes.windll.kernel32的成员变量，代表的是这个模块的基地址，具体是由下面的代码初始化的

class CDLL(object):    def __init__(self, name, mode=DEFAULT_MODE, handle=None,                     use_errno=False,                     use_last_error=False):    ......                if handle is None:                self._handle = _dlopen(self._name, mode)            else:                self._handle = handle

当程序访问windll的kernel32成员变量时，如果该变量还没有被定义，那么会调用__getattr__元函数来定义这个变量：

class LibraryLoader(object):    def __init__(self, dlltype):        self._dlltype = dlltype     def __getattr__(self, name):        if name[0] == '_':            raise AttributeError(name)        dll = self._dlltype(name)        setattr(self, name, dll)        return dll     def __getitem__(self, name):        return getattr(self, name)     def LoadLibrary(self, name):        return self._dlltype(name)

而windll正好是一个LibraryLoader的object，它的构造函数的参数为WinDLL类对象，WinDLL继承自CDLL

if _os.name in ("nt", "ce"):    windll = LibraryLoader(WinDLL)    oledll = LibraryLoader(OleDLL)

重新理顺一下，windll实际上是一个WinDLL（该类继承自CDLL）类的对象，kernel32是该对象动态生成（通过元函数__getattr__）的一个成员变量，在kernel32成员变量初始化的过程中，会调用_dlopen打开相应的module，并且将module的基地址（即handle）赋值给kernel32._handle成员变量。

这个过程，可以看出Python是一种很灵活，而且很“面向对象”的语言。而个人的经验是，对于这样的一种语言，高层次的设计就显得尤为重要，而设计模式的地位也就是凸显出来了。

2. Python的数据模型

参考：http://docs.python.org/2/reference/datamodel.html

Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address). An object’s type is also unchangeable. [1] An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object’s type (which is an object itself). The value of some objects can change. Objects whose value can change are said to be mutable; objects whose value is unchangeable once they are created are called immutable. (The value of an immutable container object that contains a reference to a mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.) An object’s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable.

这段话很重要，简单翻译如下：

每个Python的对象都有一个身份，一个类型以及一个值。

1. 身份：【身份可以用来区分两个对象是否相同】

一个Python对象一旦被创建了，它的身份就不会再发生改变；它的身份可以用对象在内存中的存储地址来表示。is操作符可以用来对比两个对象的身份是否相同；而id函数会返回代表一个对象的身份的内存地址，以一个整数的形式作为返回值。

2. 类型：【代表着所有同类对象的抽象特征】

一个Python对象的类型也是不可以改变的，类型决定了这个对象支持哪些操作，以及这个对象可能拥有什么样类型的数据。type函数可以返回一个对象的类型，返回值也是一个对象，用来代表类型的抽象信息。

3. 值：【就是这个对象承载的有效信息负荷】

一个Python对象的值是可以变化的，当然也可以保持不变。我们使用mutable/immutable来区分一个Python对象的值是不是保持不变。

一个Python对象的值是否可变（它的mutability），是由这个对象的类型决定的。

比如：numbers, strings, tuples都是值不可变的类型(immutable)，而dictionaries, list都是值可变的类型(mutable)。

3. Python的特殊成员函数

Python定义了一些特殊的成员函数，用户自定义的类型可以覆盖这些特殊函数的默认实现。

3.1 new与init

参考：http://stackoverflow.com/questions/674304/pythons-use-of-new-and-init

http://bbs.csdn.net/topics/340028226

__new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It is also commonly overridden in custom metaclasses in order to customize class creation.

__new__的调用时机比__init__早，__new__主要是用来个性化地定制那些值不可以改变的内置类型的子类型的实例化过程。这是为什么呢？

前面我们说到，有些内置类型是immutable的，就是说它们一旦被赋值，就无法再改变，那么它们是在哪里被赋值呢，是在__new__中，还是在__init__中呢？

import os,sys class square_of(float):    def __new__(cls, val):        print "invoked __new__(%s, %f)" % ( cls.__name__,val)        return float.__new__(cls, val*val)     def __init__(self, val):        print "invoked __init__(%s, %f)" % ( self.__class__.__name__,self)        float.__init__(self, val*val) a = square_of(2.2)print a

>>> ================================ RESTART ================================>>> invoked __new__(square_of, 2.200000)invoked __init__(square_of, 4.840000)4.84>>> 

很明显，是在float.__new__中赋值的，等到了__init__时，木已成舟，一切都晚了。

至于为什么提供__new__呢，如果我们想要自己定义一种新的Immutable的数据类型，要怎么办呢？

http://en.wikipedia.org/wiki/Immutable_object#Python给出的方案如下：

class Immutable(object):     """An immutable class with a single attribute 'value'."""     def __setattr__(self, *args):         raise TypeError("can't modify immutable instance")     __delattr__ = __setattr__     def __init__(self, value):         # we can no longer use self.value = value to store the instance data         # so we must explicitly call the superclass         super(Immutable, self).__setattr__('value', value)

但是，这并不是一个严格的Immutable的实现，因为数据是存放在父类object类中的，而object本身并不是Immutable的，所以这个实现就是一个伪实现，

class Immutable(object):    """An immutable class with a single attribute 'value'."""    def __setattr__(self, *args):        raise TypeError("can't modify immutable instance")    __delattr__ = __setattr__    def __init__(self, value):        # we can no longer use self.value = value to store the instance data        # so we must explicitly call the superclass        super(Immutable, self).__setattr__('value', value) b = Immutable(2.2)super(Immutable, b).__setattr__('value', 4.4)print b.value

>>> ================================ RESTART ================================>>> 4.4>>> 

因为接口并没有被封死。

那么应该如何实现自定义的Immutable类型呢，~~答案是用内置的Immutable类型来派生新的Immutable类型。~~

__new__可以创建一个与cls不相同的类型的对象，然后返回，虽然它是被要求返回cls类型的对象的。

如果__new__创建的是与cls相同类型的对象，那么会调用__init__函数；否则，不会调用__init__函数。

3.2 del

与__init__是相对的，二者就类似于构造和析构函数。

3.3 repr和str

可以理解为__repr__是为了Python内部对象交换使用的，比如Serialization；而__str__主要是为了print打印的，以方便阅读为前提。

3.4 getattr, setattr和delattr

主要是对于attribute的访问接口

object.attribute

3.5 getitem, setitem和delitem

主要是访问字典的接口

self[key]