python:beaufiful

1、安装

下载地址:https://pypi.python.org/pypi/beautifulsoup4/4.5.3

安装:pip install beautiful4

pip install beautifulsoup4
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.5.3-py3-none-any.whl (85kB)
    100% |████████████████████████████████| 92kB 460kB/s
Installing collected packages: beautifulsoup4
Successfully installed beautifulsoup4-4.5.3

判断是否安装成功:from bs4 import BeautifulSoup

2、example:

from bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html)
soup = BeautifulSoup(open('index.html'))
print (soup.prettify())

3、beautifulsoup简介

Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种:

  • Tag
  • NavigableString
  • BeautifulSoup
  • Comment
  • print (soup.title)
    print (soup.head)
    print (soup.a)
    print (soup.p)
    print (soup.name)
    print (soup.head.name)
    print (soup.p.attrs)
    print (soup.p.get('class'))
    soup.p['class']="newClass"
    print (soup.p)
    print (soup.p.string)
原文地址:https://www.cnblogs.com/emma-zhu/p/6796802.html