再谈collections模块defaultdict()和namedtuple()

defaultdict()和namedtuple()是collections模块里面2个很实用的扩展类型。一个继承自dict系统内置类型，一个继承自tuple系统内置类型。在扩展的同时都添加了额外的很酷的特性，而且在特定的场合都很实用。

defaultdict()

定义以及作用

返回一个和dictionary类似的对象，和dict不同主要体现在2个方面：

可以指定key对应的value的类型。
不必为默认值担心，换句话说就是不必担心有key没有value这回事。总会有默认的value.

示例

from collections import defaultdict

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]

d = defaultdict(list)

for k, v in s:
    d[k].append(v)

print(list(d.items()))

d_2 = {}

for k, v in s:
    d_2.setdefault(k, []).append(v)

print(list(d_2.items()))

d_3 = {}

for k, v in s:
    d_3[k].append(v)

print(d_3.items())

输出：

[('red', [1]), ('blue', [2, 4]), ('yellow', [1, 3])]
[('red', [1]), ('blue', [2, 4]), ('yellow', [1, 3])]
Traceback (most recent call last):
  File "C:/Users/Administrator/Desktop/Python Scripts/collection_eg.py", line 22, in <module>
    d_3[k].append(v)
KeyError: 'yellow'

d = defaultdict(list)，该语句创建一个defaultdict类型（你可以想象为dict类型），value的类型是list。通过对d_3的对比就能看到，defaultdict是可以直接就进行d[k]的操作，即使d此时还是一个空的字典。实际过程就是示例里d_2的处理过程。

总结

主要使用根据数据创建字典时。当你需要为一些数据生成字典，而且对值的类型进行限定的时候，考虑defaultdict。

namedtuple()

定义及作用

namedtuple是继承自tuple的子类。namedtuple和tuple比，有更多更酷的特性。namedtuple创建一个和tuple类似的对象，而且对象拥有可以访问的属性。这对象更像带有数据属性的类，不过数据属性是只读的。

示例

>>> from collections import namedtuple
>>> TPoint = namedtuple('TPoint', ['x', 'y'])
>>> p = TPoint(x=10, y=10)
>>> p
TPoint(x=10, y=10)
>>> p.x
10
>>> p.y
10
>>> p[0]
10
>>> type(p)
<class '__main__.TPoint'>
>>> for i in p:
	print(i)

	
10
10
>>>

TPoint = namedtuple('TPoint', ['x', 'y']) 创建一个TPoint类型，而且带有属性x, y.

通过上面的示例，可以看出不仅可以通过p.x， p.y的方式访问p的属性，而且还可以使用for来进行遍历。这些就和tuple是一样的。

还可以通过设置参数来看namedtuple的全貌。

TPoint = namedtuple('TPoint', ['x', 'y'], verbose=True)
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class TPoint(tuple):
    'TPoint(x, y)'

    __slots__ = ()

    _fields = ('x', 'y')

    def __new__(_cls, x, y):
        'Create new instance of TPoint(x, y)'
        return _tuple.__new__(_cls, (x, y))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new TPoint object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 2:
            raise TypeError('Expected 2 arguments, got %d' % len(result))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return self.__class__.__name__ + '(x=%r, y=%r)' % self

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values'
        return OrderedDict(zip(self._fields, self))

    __dict__ = property(_asdict)

    def _replace(_self, **kwds):
        'Return a new TPoint object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, ('x', 'y'), _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % list(kwds))
        return result

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return tuple(self)

    x = _property(_itemgetter(0), doc='Alias for field number 0')

    y = _property(_itemgetter(1), doc='Alias for field number 1')

这里就显示出了namedtuple的一些方法。很明显的看到namedtuple是直接继承自tuple的。

几个重要的方法：

1.把数据变成namedtuple类：

TPoint = namedtuple('TPoint', ['x', 'y'])
>>> t = [11, 22]
>>> p = TPoint._make(t)
>>> p
TPoint(x=11, y=22)

>>>

2. 根据namedtuple创建的类生成的类示例，其数据是只读的，如果要进行更新需要调用方法_replace.

>>> p
TPoint(x=11, y=22)
>>> p.y
22
>>> p.y = 33
Traceback (most recent call last):
  File "<pyshell#18>", line 1, in <module>
    p.y = 33
AttributeError: can't set attribute
>>> p._replace(y=33)
TPoint(x=11, y=33)

3.将字典数据转换成namedtuple类型。

>>> d = {'x': 44, 'y': 55}
>>> dp = TPoint(**d)
>>> dp
TPoint(x=44, y=55)
>>>

namedtuple最常用还是出现在处理来csv或者数据库返回的数据上。利用map()函数和namedtuple建立类型的_make（）方法。

EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')

import csv
for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):
    print(emp.name, emp.title)

# sqlite数据库
import sqlite3
conn = sqlite3.connect('/companydata')
cursor = conn.cursor()
cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
for emp in map(EmployeeRecord._make, cursor.fetchall()):
    print(emp.name, emp.title)
	
# MySQL 数据库
import mysql
from mysql import connector
from collections import namedtuple
user = 'herbert'
pwd = '######'
host = '127.0.0.1'
db = 'world'
cnx = mysql.connector.connect(user=user, password=pwd, host=host,database=db)
cur.execute("SELECT Name, CountryCode, District, Population FROM CITY where CountryCode = 'CHN' AND Population > 500000")
CityRecord = namedtuple('City', 'Name, Country, Dsitrict, Population')
for city in map(CityRecord._make, cur.fetchall()):
    print(city.Name, city.Population)