爬取百度百科

 1 import urllib.request
 2 from bs4 import BeautifulSoup
 3 import re
 4 
 5 def main():
 6     response= urllib.request.urlopen('http://baike.baidu.com/view/284853.htm').read()
 7     soup = BeautifulSoup(response,'html.parser')#使用python默认的解析器
 8     for each in soup.find_all(href = re.compile('view')):
 9         print(each.text,'->',''.join(['http://baike.baidu.com/',each['href']]))#join函数明显比+提高
10 if __name__=='__main__':
11     main()

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/themost/p/6701757.html

推荐文章
C++实现动态顺序表
String 类的实现（3）String类常用函数
String 类的实现（2）引用计数与写时拷贝
String 类的实现（1）浅拷贝存在的问题以及深拷贝实现
计算机系统组成—冯诺依曼体系
etcdctl --help
docker --help
javascript Date format(js日期格式化)
让Win8自动登录免输入密码的小技巧
eclipse 版本号
常用Ubuntu 命令
Spring Project Annotations
Swing 刷新容器
Hibernate + proxool 连接数超过最大允许连接数
skype msnLite 静态路由
爬取掌阅app免费电子书数据
Python 单例设计模式
使用云打码识别验证码
windows下Git的使用教程（github）
python爬虫_简单使用百度OCR解析验证码
python基础易错题
经典案例题2
经典案例题1
Http和Https的区别
爬虫过程中需要注意的问题
[转]项目规模估计方法介绍
[转]23种设计模式总结
[转]分布式session的几种实现方式
[转]Redis哨兵模式（sentinel）学习总结及部署记录（主从复制、读写分离、主从切换）
[转]【Linux】Linux 目录结构