黑板客爬虫闯关之关卡一

                                                              黑板客爬虫闯关之关卡一

分析:从起始界面获得下一个界面的地址信息然后开始跳转,然后又在另外界面获得下一个界面的地址信息,直到通关

闯关地址:http://www.heibanke.com/lesson/crawler_ex00/

 注意二者的区别

 1 import re
 2 import datetime
 3 import requests
 4 def Go1(url,i):
 5     headers = {'authorization':'Client-ID c94869b36aa272dd62dfaeefed769d4115fb3189a9d1ec88ed457207747be626'}
 6     html =requests.get(url=url,headers=headers)
 7     text = html.text
 8     number = re.findall(r'数字([0-9]{5})',text)#匹配
 9     url = url +number[0]
10     print(url+'     '+str(i))
11     return url
12 
13 def Go2(url,i):
14     headers = {'authorization':'Client-ID c94869b36aa272dd62dfaeefed769d4115fb3189a9d1ec88ed457207747be626'}
15     html =requests.get(url=url,headers=headers)
16     text = html.text
17     number = re.findall(r'数字是([0-9]{5})',text)#注意这是调整界面跟起始界面的区别,网页源码中多了一个'是'字
18     url = 'http://www.heibanke.com/lesson/crawler_ex00/' + number[0]
19     print(url+'     '+str(i))
20     return url
21 
22 def main():
23     i=1
24     url = 'http://www.heibanke.com/lesson/crawler_ex00/'
25     begin_time=datetime.datetime.now()
26     url = Go1(url,i)
27     while True:
28         i=i+1
29         try:
30             url = Go2(url,i)
31         except:
32             print('最后的界面地址是:'+url)
33             print('耗时为:'+str(datetime.datetime.now()-begin_time))
34             break;
35 main()
36 
37 """
38 结果:
39 http://www.heibanke.com/lesson/crawler_ex00/65392     1
40 http://www.heibanke.com/lesson/crawler_ex00/36133     2
41 http://www.heibanke.com/lesson/crawler_ex00/72324     3
42 http://www.heibanke.com/lesson/crawler_ex00/57633     4
43 http://www.heibanke.com/lesson/crawler_ex00/91251     5
44 http://www.heibanke.com/lesson/crawler_ex00/87016     6
45 http://www.heibanke.com/lesson/crawler_ex00/77055     7
46 http://www.heibanke.com/lesson/crawler_ex00/30366     8
47 http://www.heibanke.com/lesson/crawler_ex00/83679     9
48 http://www.heibanke.com/lesson/crawler_ex00/31388     10
49 http://www.heibanke.com/lesson/crawler_ex00/99446     11
50 http://www.heibanke.com/lesson/crawler_ex00/69428     12
51 http://www.heibanke.com/lesson/crawler_ex00/34798     13
52 http://www.heibanke.com/lesson/crawler_ex00/16780     14
53 http://www.heibanke.com/lesson/crawler_ex00/36499     15
54 http://www.heibanke.com/lesson/crawler_ex00/21070     16
55 http://www.heibanke.com/lesson/crawler_ex00/96749     17
56 http://www.heibanke.com/lesson/crawler_ex00/71822     18
57 http://www.heibanke.com/lesson/crawler_ex00/48739     19
58 http://www.heibanke.com/lesson/crawler_ex00/62816     20
59 http://www.heibanke.com/lesson/crawler_ex00/80182     21
60 http://www.heibanke.com/lesson/crawler_ex00/68171     22
61 http://www.heibanke.com/lesson/crawler_ex00/45458     23
62 http://www.heibanke.com/lesson/crawler_ex00/56056     24
63 http://www.heibanke.com/lesson/crawler_ex00/87450     25
64 http://www.heibanke.com/lesson/crawler_ex00/52695     26
65 http://www.heibanke.com/lesson/crawler_ex00/36675     27
66 http://www.heibanke.com/lesson/crawler_ex00/25997     28
67 http://www.heibanke.com/lesson/crawler_ex00/73222     29
68 http://www.heibanke.com/lesson/crawler_ex00/93891     30
69 http://www.heibanke.com/lesson/crawler_ex00/29052     31
70 http://www.heibanke.com/lesson/crawler_ex00/72996     32
71 http://www.heibanke.com/lesson/crawler_ex00/73999     33
72 http://www.heibanke.com/lesson/crawler_ex00/23814     34
73 http://www.heibanke.com/lesson/crawler_ex00/98084     35
74 http://www.heibanke.com/lesson/crawler_ex00/51103     36
75 http://www.heibanke.com/lesson/crawler_ex00/39603     37
76 http://www.heibanke.com/lesson/crawler_ex00/34316     38
77 http://www.heibanke.com/lesson/crawler_ex00/55719     39
78 http://www.heibanke.com/lesson/crawler_ex00/53685     40
79 http://www.heibanke.com/lesson/crawler_ex00/77771     41
80 http://www.heibanke.com/lesson/crawler_ex00/69187     42
81 http://www.heibanke.com/lesson/crawler_ex00/89677     43
82 http://www.heibanke.com/lesson/crawler_ex00/71935     44
83 http://www.heibanke.com/lesson/crawler_ex00/98538     45
84 http://www.heibanke.com/lesson/crawler_ex00/79152     46
85 http://www.heibanke.com/lesson/crawler_ex00/70999     47
86 http://www.heibanke.com/lesson/crawler_ex00/35102     48
87 http://www.heibanke.com/lesson/crawler_ex00/75956     49
88 http://www.heibanke.com/lesson/crawler_ex00/19122     50
89 最后的界面地址是:http://www.heibanke.com/lesson/crawler_ex00/19122
90 耗时为:0:01:40.219459
91 """
原文地址:https://www.cnblogs.com/yinbiao/p/8145547.html