利用BeautifulSoup爬去我爱我家的租房数据

因为之前对BeautifulSoup一直不是很熟悉,刚好身边的朋友同事在找房子,就想着能不能自己写个爬虫爬一下数据,因此就写了这个爬虫。基本都是边看书边写的,不过也没什么好讲的。直接粘代码了。

# coding=utf-8
import requests
from bs4 import BeautifulSoup
import  pymysql
import time
db= pymysql.connect(host="127.0.0.1",port =3306,user="root" ,passwd="root",db="woaiwojia",charset='utf8')
cursor = db.cursor()
for num in range(1,81):
    url = "https://sh.5i5j.com/zufang/o8r1u1n"+str(num)+"/"
    time.sleep(10)
    strhtml = requests.get(url)
    fanlist = BeautifulSoup(strhtml.text,"lxml")
    sthtml = fanlist.find_all("ul",{"class":"pList"})
    for ul in fanlist.find_all("ul",{"class":"pList"}):
        for li in ul.find_all(name="li"):
            for div in li.find_all("div",{"class":"listCon"}):
                xiaoqu = div.h3.a.string
                detailUrl = "https://sh.5i5j.com"+div.h3.a.attrs['href']
                detailhtml = requests.get(detailUrl)
                detail = BeautifulSoup(detailhtml.text,"lxml")
                jinjirenlist =detail.find_all("div",{"id":"housebroker"})
                for div1 in  div.find_all("div",{"class":"listX"}):
                    area = div1.find_all("p")[0].text
                    community = div1.find_all("p")[1].text
                    hot = div1.find_all("p")[2].text
                    price = div1.find_all("div",{"class":"jia"})[0].p.strong.string
                    for uldiv in detail.find_all("div",{"id":"housebroker"}):
                        for  ul in uldiv.find_all("ul"):
                            lxrphone = ul.h3.string+ul.label.string
                            sql = "insert into zufang(area,xiaoqu,community,hot,price,lxrphone) VALUES  ('%s','%s','%s','%s','%s','%s');" % (area, xiaoqu,community,hot,price,lxrphone)
                    try:
                        cursor.execute(sql)
                        db.commit()
                    except:
                        print('插入失败')

有什么问题或者建议可以评论与我进行交流

原文地址:https://www.cnblogs.com/zhendiao/p/9333004.html