如何实现迭代器可迭代对象 (2.1)

如何实现可迭代对象和迭代器对象

什么是可迭代对象和迭代器对象

区分一下容器的概念

容器是用来储存元素的一种数据结构，它支持隶属测试，容器将所有数据保存在内存中，在Python中典型的容器有：

list， deque, …
set，frozesets，…
dict, defaultdict, OrderedDict, Counter, …
tuple, namedtuple, …
str

迭代器

for _ in x: pass
这里x是可迭代对象

迭代器本质上是一个产生值的工厂，每次向迭代器请求下一个值，迭代器都会进行计算出相应的值并返回。

那么什么是迭代器呢？任何具有__next__()方法的对象都是迭代器，对迭代器调用next()方法可以获取下一个值。而至于它使如何产生这个值的，跟它能否成为一个迭代器并没有关系。

使用方法：

# for _ in x 实际调用也是用iter， iter(x) next(iter)

iter1 = iter(x)
print(next(iter1))
print(next(iter1))
print(next(iter1))

from itertools import count
counter = count(start=13)
next(counter)
next(counter)

“用时访问”的策略

“用时访问”的策略也就是说用到的时候，才生成对象，而不是提前生成对象放在内存里

举个例子：

for i in range(100000):
    print(i**i)

这个程序先生成了长度为100000的列表，然后再慢慢进行计算，这样不仅生成列表有延迟，也会占有大量内存。解决方法就是可迭代对象

自己实现可迭代对象和迭代器对象

自己写的在https://www.tianqi.com/ 获取城市温度的一个爬虫

import requests

def getWeather(city):

    # 根据城市生成url
    url = 'https://www.tianqi.com/'
    url += city
    
    try:
        print('try reading from %s' % url)
        # 增加header 模拟浏览器访问
        headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '}
        response = requests.get(url, headers = headers, timeout = 30)
        response.raise_for_status()
        response.encoding = response.apparent_encoding
        print('finished reading from web')
        
        # 使用美丽汤找到天气数据
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(response.text, features='html.parser')
        weather = soup.find(attrs={'class' : 'weatherbox'}).find('span')
        return weather.text
    except:
        pass

# 打印结果
print(getWeather('shanghai'))
print(getWeather('beijing'))

output

try reading from https://www.tianqi.com/shanghai
finished reading from web
小雨转多云6 ~ 10℃
try reading from https://www.tianqi.com/beijing
finished reading from web
多云-4 ~ 7℃

实现上面爬虫的可迭代

from collections import Iterable, Iterator
import requests

class WeatherIterator(Iterator):
    """
        一个迭代器对象， 
        返回城市的天气，
        只有在用的时候才会开始爬虫，不需要先爬虫才能迭代
    """
    
    def __init__(self, cities):
        self.cities = cities
        self.index = 0
        
    def __next__(self): # 继承Iterator后， 需要实现的方法
        if self.index >= len(self.cities):
            raise StopIteration
        else:
            temp_city = self.cities[self.index]
            self.index += 1
            return self.get_weather(temp_city)
        
    def get_weather(self, city):
            url = 'https://www.tianqi.com/'
            url += city
            try:
                headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '}
                response = requests.get(url, headers = headers, timeout = 30)
                response.raise_for_status()
                response.encoding = response.apparent_encoding
                from bs4 import BeautifulSoup
                soup = BeautifulSoup(response.text, features='html.parser')
                weather = soup.find(attrs={'class' : 'weatherbox'}).find('span')
                return ('%s: %s' % (city, weather.text))
            except:
                pass
            
class WeatherIterable(Iterable):
    """
        一个可迭代对象
    """
    def __init__(self, cities):
        self.cities = cities
    def __iter__(self):
        return WeatherIterator(self.cities)

使用效果

>>> cities = ['shanghai', 'beijing', 'nanjing']
>>> for i in WeatherIterable(cities):
>>> 	print(i)
shanghai: 多云8 ~ 15℃

beijing: 多云-4 ~ 3℃
nanjing: 大雨4 ~ 8℃
```

如何实现 迭代器 可迭代对象 (2.1)