python_爬虫一之爬取糗事百科上的段子

目标

抓取糗事百科上的段子
实现每按一次回车显示一个段子
输入想要看的页数，按 'Q' 或者 'q' 退出

实现思路

目标网址：糗事百科
使用requests抓取页面 requests官方教程
使用bs4模块解析页面，获取内容 bs4官方教程

代码内容：

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 
 5 def get_content(pages):  # get jokes list
 6     headers = {'user_agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) Apple
 7     WebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36'}  # 用户代理
 8     content_list = []
 9     for page in range(1, pages+1):  # 想看多少页
10         url = 'http://www.qiushibaike.com/text/page/' + str(page) + '/?s=4928950'
11         response = requests.get(url, headers=headers)  # 获取网页内容
12         html = response.text
13         soup = BeautifulSoup(html, 'html5lib')  # 解析网页内容
14         jokes = soup.find_all('div', class_='content')
15         for each in jokes:
16             each_joke = each.get_text()
17             joke = each_joke.replace('
', '')  # 将换行符替换
18             content_list.append(joke)
19     return content_list  # 返回段子列表
20 
21 
22 if __name__ == "__main__":
23     number = int(input("How many pages do you want to read?
If you want to quit, just press 'q'.
"))  # 输入想要看的页数
24     print()  # 换行，便于阅读
25     for paragraph in get_content(number):
26         print(paragraph)
27         user_input = input()
28         if user_input == 'q':  # 按'q'退出
29             break

结果展示：

结果展示

参考：

Python爬虫实战一之爬取糗事百科段子

http://www.jianshu.com/p/19c846daccb3

静谧的爬虫教程：https://cuiqingcai.com/990.html

爬取段子参考：http://www.jianshu.com/p/0e7d1c80b8c3