python 爬虫_BeautifulSoup详细用法

BeautifulSoup
中文文档： https://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html；https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#
网页解析库，处理高效，可以代替正则表达式

1. 基本使用

from bs4 import BeautifulSoup
        soup=BeautifulSoup(html,'lxml')
        print(soup.prettigy())
        print(soup.title,string)

2. 标签选择器：

选择元素：

from bs4 import BeautifulSoup
            soup=BeautifulSoup(html,'lxml')
            print(soup.title)
            print(soup.head)     #head标签
            print(soup.p)    #只匹配第一个P标签

3. 获取名称
　　print(soup.title.name)

4. 获取属性
　　print(soup.p.attrs['name'])=print(soup.p['name'])
5. 获取内容：
　　print(soup.p.string)
6. 嵌套选择：
　　print(soup.head.title.string)
7. 子节点和子孙节点

        print(soup.p.contents)    #结果以列表形式显示
            from bs4 import BeautifulSoup
            soup=BeautifulSoup(html,'lxml')
            print(soup.p.children)    #子节点，迭代器
            for i,child in enumerate(soup.p.children)
            print(i,child)
                from bs4 import BeautifulSoup
                soup=BeautifulSoup(html,'lxml')
                print(soup.p.descendants)    #子孙节点，获取下面所有节点
                for i,child in enumerate(soup.p.descendants)     #enumerate ==>枚举
                print(i,child)

8.父节点，祖先节点：
　　print(soup.a.parent) #父节点
　　print(soup.a.parents) #祖先节点
9. 兄弟节点：
　　print(soup.a.next_sonlings)
　　print(soup.a.previous_sonlings)

10. 标准选择器：

　　　　 find_all(name, attrs,text)    #返回所有查找到的元素
        find(name, attrs,text)    #返回查找到的第一个元素
        find_parents()    #查找所有父节点
        find_parant()    #查到上一个父节点

11. CSS选择器
通过select()直接传入CSS选择器即可完成选择

                from bs4 import BeautifulSoup
        soup=BeautifulSoup(html,'lxml')
        print(soup.select('.panel'.panel-heading))