Beautiful Soup 库基础知识

1.安装

cmd------->>pip install beautifulsoup4

2.安装测试。

import requests                        # 导入requests库
from bs4 import BeautifulSoup          # 导入美味汤库

r = requests.get("http://python123.io/ws/demo.html")
print(r.status_code)                  # 测试是否连接正常
# print(r.text)                       # 全部文本信息

demo = r.text                              # 赋值,方便后期处理

soup = BeautifulSoup(demo,"html.parser")   # 开始煲汤   demo 为解析对象。   html.parser 为解析方式

print(soup.prettify())                     # 友好显示结果

煲汤过程可以总结为:

from bs4 import BeautifulSoup                         # B and S 大写 
soup = BeautifulSoup ("<p>date</p>","html.parser")    # <p>date</p> 解析对象   "html.parser" 解析器 

3. BeautifulSoup 的基本元素。

import requests                     #
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text

from bs4 import BeautifulSoup
soup = BeautifulSoup(demo,"html.parser")

# tag
print(soup.a)
print(soup.p)
print(soup.a.prettify())   # 标签内容的友好显示
print(soup.p.prettify())   # 标签内容的友好显示

# name
print(soup.a.name)
print(soup.p.name)

# string
print(soup.a.string)
print(soup.p.string)

# 属性
print(soup.a.attrs)
print(soup.p.attrs)

3.1 Tag标签

import requests                    from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text 

soup = BeautifulSoup(demo,"html.parser")
tag = soup.a              
print(tag)      #a tag               # <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
# 标签。最基本的信息组织单元,别用 <> 和 </> 表明开头和结尾。

3.2 标签的名字

3.3 标签的属性 (不懂有什么用)

3.4 标签的字符串

print(soup.a.string)                                              # Basic Python
print(soup.p.string)                                              # The demo python introduces several python courses.
print(type(soup.p.string))                                        # <class 'bs4.element.NavigableString'>

3.5  注释。

demo,"html.parser"
原文地址:https://www.cnblogs.com/hanbb/p/7223276.html