Ajax实战微博

转载自:静觅 » [Python3网络爬虫开发实战] 6.3-Ajax结果提取

 上面的代码中比较好的几个地方记录:

 1 base_url = 'https://m.weibo.cn/api/container/getIndex?'
 2 
 3 headers = {
 4     'Host': 'm.weibo.cn',
 5     'Referer': 'https://m.weibo.cn/u/2830678474',
 6     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
 7     'X-Requested-With': 'XMLHttpRequest',
 8 }
 9 
10 
11 def get_page(page):
12     params = {
13         'type': 'uid',
14         'value': '2830678474',
15         'containerid': '1076032830678474',
16         'page': page
17     }
18     
19     # 在这一步中将url分成路径和参数两个部分,使用urlencode对参数进行加载
20     url = base_url + urlencode(params)
21     try:
22         response = requests.get(url, headers=headers)
23         # 这个部分对返回码进行判断,去掉非正常情况的处理
24         if response.status_code == 200:
25             # 返回结果是json格式的直接调用json方法,不用json.loads(response.content)
26             return response.json()
27     except requests.ConnectionError as e:
28         print('Error', e.args)

个人代码:

 1 import requests
 2 import json
 3 
 4 headers = {
 5     "Referer":"https://m.weibo.cn/u/2830678474?sudaref=cuiqingcai.com&display=0&retcode=6102",
 6     "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
 7     "X-Requested-With":"XMLHttpRequest",
 8     "X-XSRF-TOKEN":"609539"
 9 }
10 
11 url = "https://m.weibo.cn/api/container/getIndex?sudaref=cuiqingcai.com&display=0&retcode=6102&type=uid&value=2830678474&containerid=1076032830678474"
12 while True:
13     response = requests.get(url,headers=headers)
14     try:
15         since_id = json.loads(response.content)["data"]["cardlistInfo"]["since_id"]
16     except:
17         break
18     url = "https://m.weibo.cn/api/container/getIndex?sudaref=cuiqingcai.com&display=0&retcode=6102&type=uid&value=2830678474&containerid=1076032830678474&since_id=" + str(since_id)
19     content = json.loads(response.content)["data"]["cards"]
20     for i in range(10):
21         try:
22             print(content[i]["mblog"]["text"])
23         except:
24             continue

部分结果展示:

1 每当我颓废的时候,看看这个视频,我就浑身充满了斗志!为了我和我老婆的小米之家!我可以!我能行!加油! <a data-url="http://t.cn/A6hrPmIS" href="https://m.weibo.cn/p/index?containerid=2304444475185156522026&url_type=39&object_type=video&pos=1&luicode=10000011&lfid=1076032830678474" data-hide=""><span class='url-icon'><img style=' 1rem;height: 1rem' src='https://h5.sinaimg.cn/upload/2015/09/25/3/timeline_card_small_video_default.png'></span><span class="surl-text">崔庆才丨静觅的微博视频</span></a> 
2 <span class="url-icon"><img alt=[doge] src="//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-861403219c.png" style="1em; height:1em;" /></span> 
3 转发微博
4 今天我和我老婆都是健康饮食的好仔仔。<span class="url-icon"><img alt=[馋嘴] src="//h5.sinaimg.cn/m/emoticon/icon/default/d_chanzui-01ee2388fd.png" style="1em; height:1em;" /></span> 
原文地址:https://www.cnblogs.com/waws1314/p/12501707.html