迭代器与生成器

代理迭代

问题：

　　你构建了一个自定义容器对象，里面包含有列表、元组或其他可迭代对象。你想直接在你的这个新容器对象上执行迭代操作。

解决方案：

　　实际上你只需要定义一个__iter__ () 方法，将迭代操作代理到容器内部的对象上去。比如：

 1 class Node(object):
 2     def __init__(self, values):
 3         self._values = values
 4         self._children = []
 5 
 6     def __repr__(self):
 7         return 'Node ({!r})'.format(self._values)
 8 
 9     def add_child(self, node):
10         self._children.append(node)
11 
12     def __iter__(self):
13         return iter(self._children)
14 
15 if __name__ == "__main__":
16     root = Node(0)
17     child1 = Node(1)
18     child2 = Node(2)
19     root.add_child(child1)
20     root.add_child(child2)
21 
22     for ch in root:
23         print(ch)

以上代码执行的结果为：

Node (1)
Node (2)

在上面代码中， __iter__ () 方法只是简单的将迭代请求传递给内部的_children属性

使用生成器创建新的迭代模式

问题：

　　你想实现一个自定义迭代模式，跟普通的内置函数比如range() , reversed() 不一样

解决方案：

　　如果你想实现一种新的迭代模式，使用一个生成器函数来定义它。下面是一个生产某个范围内浮点数的生成器

1 def frange(start, stop, increment):
2     x = start
3     while x < stop:
4         yield x
5         x += increment
6 
7 for i in frange(0, 4, 0.5):
8     print(i)

以上代码执行的结果为：

0
0.5
1.0
1.5
2.0
2.5
3.0
3.5

一个函数中需要有一个yield 语句即可将其转换为一个生成器。跟普通函数不同的是，生成器只能用于迭代操作。下面是一个实验，向你展示这样的函数底层工作机制：

 1 def countdown(n):
 2     print("Starting to count from", n)
 3     while n >0:
 4         #生成器，函数中遇到yield会返回当前的值，当外部next调用以后继续执行yield以下的代码
 5         yield n
 6         n -= 1
 7     print("Done!")
 8 
 9 c = countdown(3)
10 print(c)
11 #使用next迭代生成器函数中的值，当生成器的所有值都输出完毕以后，如果继续next会有一个StopIteration的错误
12 print(next(c))
13 print(next(c))
14 print(next(c))
15 
16 #在执行下面的next就会有StopIteration的错误
17 print(next(c))

以上代码执行的结果为：

Starting to count from 3
3
2
1
Done!
Traceback (most recent call last):
  File "/Users/demon/PycharmProjects/cookbook/迭代器与生成器/迭代器.py", line 57, in <module>
    print(next(c))
StopIteration

总结：

　　一个生成器函数主要特征是它只会回应在迭代中使用到的next 操作。一旦生成器函数返回退出，迭代终止。我们在迭代中通常使用的for 语句会自动处理这些细节，所以你无需担心。

实现迭代器协议

问题：

　　你想构建一个能支持迭代操作的自定义对象，并希望找到一个能实现迭代协议的简单方法

解决方案：

　　目前为止，在一个对象上实现迭代最简单的方式是使用一个生成器函数。使用Node 类来表示树形数据结构。你可能想实现一个以深度优先方式遍历树形节点的生成器。下面是代码示例：

 1 class Node(object):
 2     def __init__(self, value):
 3         self._value = value
 4         self._childen = []
 5 
 6     def __repr__(self):
 7         return "Node({!r})".format(self._value)
 8 
 9     def add_child(self, node):
10         self._childen.append(node)
11 
12     def __iter__(self):
13         return iter(self._childen)
14 
15     def depth_first(self):
16         yield self
17         for c in self:
18             yield from c.depth_first()
19 
20 if __name__ == "__main__":
21     root = Node(0)
22     child1 = Node(1)
23     child2 = Node(2)
24     root.add_child(child1)
25     root.add_child(child2)
26     child1.add_child(Node(3))
27     child1.add_child(Node(4))
28     child2.add_child(Node(5))
29 
30     for ch in root.depth_first():
31         print(ch)

以上代码执行的结果为：

Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)

反向迭代

问题：

　　你想反向迭代一个序列及自定义类上实现一个reversed的方法

解决方案：

　　使用内置的reversed() 函数，比如：

 1 a = [1, 2, 3, 4]
 2 #reversed()等同于a.reversed()
 3 for x in reversed(a):
 4     print("使用内置方法：", x)
 5 
 6 print("-"*30)
 7 
 8 class Countdown(object):
 9     def __init__(self, start):
10         self.start = start
11 
12     #迭代器
13     def __iter__(self):
14         n = self.start
15         while n > 0:
16             #生成器
17             yield n
18             n -= 1
19 
20     #自定义反转的方法
21     def __reversed__(self):
22         n = 1
23         while n < self.start:
24             yield n
25             n += 1
26 
27 #使用自定义的reversed实现逆向输出
28 for rr in reversed(Countdown(10)):
29     print("逆向输出：", rr)
30 
31 print('*'*30)
32 
33 #正向输出
34 for rr in Countdown(10):
35     print("正向输出：", rr)

以上代码执行的结果为：

使用内置方法： 4
使用内置方法： 3
使用内置方法： 2
使用内置方法： 1
------------------------------
逆向输出： 1
逆向输出： 2
逆向输出： 3
逆向输出： 4
逆向输出： 5
逆向输出： 6
逆向输出： 7
逆向输出： 8
逆向输出： 9
******************************
正向输出： 10
正向输出： 9
正向输出： 8
正向输出： 7
正向输出： 6
正向输出： 5
正向输出： 4
正向输出： 3
正向输出： 2
正向输出： 1

总结：

　　定义一个反向迭代器可以使得代码非常的高效，因为它不再需要将数据填充到一个列表中然后再去反向迭代这个列表

带有外部状态的生成器函数

问题：

　　你想定义一个生成器函数，但是它会调用某个你想暴露给用户使用的外部状态值

解决方案：

　　如果你想让你的生成器暴露外部状态给用户，别忘了你可以简单的将它实现为一个类，然后把生成器函数放到iter () 方法中过去。比如：

from collections import deque

class linehistory:
    def __init__(self, lines ,histlen=3):
        self.lines = lines
        #声明一个队列，队列的长度为3
        self.history = deque(maxlen=histlen)

    def __iter__(self):
        #生成一个枚举型的，从1开始枚举
        for lineno, line in enumerate(self.lines, 1):
            #每一次循环把当前的序号和行放到队列里面
            self.history.append((lineno, line))
            yield line

    def clear(self):
        self.history.clear()

with open('somefile.txt') as f:
    #实例化
    lines = linehistory(f)
    for line in lines:
        #判断Pythonsh是否为迭代内容的字符
        if 'python' in line:
            #循环内部队列的枚举出来的行号和内容并格式化输出
            for lineno, hline in lines.history:
                print('{}:{}'.format(lineno, hline), end='')

迭代器切片

问题：

　　你想得到一个由迭代器生成的切片对象，但是标准切片操作并不能做到

解决方案：

　　函数itertools.islice() 正好适用于在迭代器和生成器上做切片操作。比如：

 1 def count(n):
 2     while True:
 3         yield  n
 4         n += 1
 5 
 6 c = count(0)
 7 #下面这行语句会报错
 8 #print(c[10:20])
 9 
10 #正确的姿势
11 import itertools
12 for x in itertools.islice(c,10,20):
13     print(x)

以上代码执行的结果为：

序列上索引值迭代

问题：

　　你想在迭代一个序列的同时跟踪正在被处理的元素索引

解决方案：

　　内置的enumerate() 函数可以很好的解决这个问题：

 1 my_list = ['a', 'b', 'c']
 2 
 3 for index,value in enumerate(my_list, 1):
 4     print(index, value)
 5 
 6 #这种情况在你遍历文件时想在错误消息中使用行号定位时候非常有用
 7 def parse_date(filename):
 8     with open(filename, 'rt') as f:
 9         for lineno, line in enumerate(f, 1):
10             field = line.split()
11             try:
12                 count = int(field[1])
13             except ValueError as e:
14                 print('Line {}: Parse error: {}'.format(lineno, e))
15 
16 
17 #如果里遇到元祖上，这里有个坑，一定要注意
18 data = [ (1, 2), (3, 4), (5, 6), (7, 8) ]
19 
20 #正确写法
21 for index,(x, y) in enumerate(data , 1):
22     print(index, (x, y))
23 
24 #错误的写法，下面的是错误写法，没办法解包
25 # for index,x, y in enumerate(data , 1):
26 #     print(index, x, y)

以上代码执行的结果为：

1 a
2 b
3 c
1 (1, 2)
2 (3, 4)
3 (5, 6)
4 (7, 8)

同时迭代多个序列

问题：　　

　　你想同时迭代多个序列，每次分别从一个序列中取一个元素

解决方案：

　　为了同时迭代多个序列，使用zip() 函数。比如：

1 xpts = [1, 5, 4, 2, 10, 7]
2 ypts = [101, 78, 37, 15, 62, 99 ]
3 
4 print('zip的结果为：', list(zip(xpts, ypts)))
5 
6 for x,y in zip(xpts, ypts):
7     print(x, y)

以上代码执行的结果为：

zip的结果为： [(1, 101), (5, 78), (4, 37), (2, 15), (10, 62), (7, 99)]
1 101
5 78
4 37
2 15
10 62
7 99

zip(a, b) 会生成一个可返回元组(x, y) 的迭代器，其中x 来自a，y 来自b。一旦其中某个序列到底结尾，迭代宣告结束。因此迭代长度跟参数中最短序列长度一致

 1 a = [1, 2, 3]
 2 b = ['w', 'x', 'y', 'z']
 3 
 4 #zip因为值不够 所以多的那些就自动给切到了，如果想保留，往下看
 5 for i in zip(a, b):
 6     print("zip的效果：", i)
 7 
 8 #修正上一个问题
 9 from itertools import zip_longest
10 
11 for i in zip_longest(a, b):
12     print("zip_longest的效果：", i)
13 
14 #默认补充指定值得效果
15 for i in zip_longest(a, b, fillvalue=0):
16     print("zip_longest带有补全值得：", i)

以上代码执行的结果为：

zip的效果： (1, 'w')
zip的效果： (2, 'x')
zip的效果： (3, 'y')
zip_longest的效果： (1, 'w')
zip_longest的效果： (2, 'x')
zip_longest的效果： (3, 'y')
zip_longest的效果： (None, 'z')
zip_longest带有补全值得： (1, 'w')
zip_longest带有补全值得： (2, 'x')
zip_longest带有补全值得： (3, 'y')
zip_longest带有补全值得： (0, 'z')

不同集合上元素的迭代

问题：

　　你想在多个对象执行相同的操作，但是这些对象在不同的容器中，你希望代码在不失可读性的情况下避免写重复的循环

解决方案：

　　itertools.chain() 方法可以用来简化这个任务。它接受一个可迭代对象列表作为输入，并返回一个迭代器，有效的屏蔽掉在多个容器中迭代细节。为了演示清楚，考虑下面这个例子：

 1 from itertools import chain
 2 
 3 a = [1, 2, 3, 4]
 4 b = ['x', 'y', 'z']
 5 for x in chain(a, b):
 6     print("chain的结果：", x)
 7 
 8 #初始化集合
 9 active_items = set()
10 inactive_items = set()
11 
12 for item in chain(active_items, inactive_items):
13     pass
14 
15 #下面两种的更加优雅
16 for item in active_items:
17     pass
18 
19 for item in inactive_items:
20     pass

以上代码执行的结果为：

chain的结果： 1
chain的结果： 2
chain的结果： 3
chain的结果： 4
chain的结果： x
chain的结果： y
chain的结果： z

展开嵌套的序列

问题：

　　你想将一个多层嵌套的序列展开成一个单层列表

解决方案：

　　可以写一个包含yield from 语句的递归生成器来轻松解决这个问题。比如：

 1 from collections import Iterable
 2 
 3 #自定义一个函数，用于做迭代，如果判断可迭代的对象中元素是可迭代对象，就调用自己
 4 def flatten(items, ignore_type=(str, bytes)):
 5     #循环每一个元素
 6     for x in items:
 7         #判断当前元素是否为可迭代对象并且当前的不是str或bytes类型
 8         if isinstance(x, Iterable) and not isinstance(x, ignore_type):
 9             yield from flatten(x)
10         else:
11             yield x
12 
13 items = [1, 2, [3, 4, [5, 6], 7], 8,]
14 
15 for x in flatten(items):
16     print(x)

以上代码执行的结果为：

在上面代码中， isinstance(x, Iterable) 检查某个元素是否是可迭代的。如果是的话， yield from 就会返回所有子例程的值。最终返回结果就是一个没有嵌套的简单序列了。额外的参数ignore types 和检测语句isinstance(x, ignore types) 用来将字符串和字节排除在可迭代对象外，防止将它们再展开成单个的字符。

顺序迭代合并后的排序迭代对象

问题：

　　你有一系列排序序列，想将它们合并后得到一个排序序列并在上面迭代遍历

解决方案：

　　heapq.merge() 函数可以帮你解决这个问题。比如：

1 import heapq
2 
3 a = [1, 4, 7, 10]
4 b = [2, 5, 6, 11]
5 
6 for i in heapq.merge(a, b):
7     print(i)

以上代码执行的结果为：

注意：

　　有一点要强调的是heapq.merge() 需要所有输入序列必须是排过序的。特别的，它并不会预先读取所有数据到堆栈中或者预先排序，也不会对输入做任何的排序检测。它仅仅是检查所有序列的开始部分并返回最小的那个，这个过程一直会持续直到所有输入序列中的元素都被遍历完。