网络爬虫基础练习

取出h1标签的文本

import requests
newsurl='http://localhost:63342/bd/hello.html_ijt=5sr6o6jh2ruvb3na6c6tkbh9nl'
res = requests.get(newsurl)
res.encoding='utf-8'
from bs4 import BeautifulSoup
soup = BeautifulSoup(res.text,'html.parser')
soup.h1.text

取出a标签的链接

soup.a.attrs.get('href')

取出所有li标签的所有内容

 for i in soup.select('li'):
    print(i.text)

取出一条新闻的标题、链接、发布时间、来源

soup.select('.news-list-title')[0].text
soup.select('li')[1].a.attrs['href']
soup.select('.news-list-info')[0].contents[0].text
soup.select('.news-list-info')[0].contents[1].text