python 抓取美丽说店铺的宝贝图片及详细信息的实现（爬虫）

对于页面的抓取，我们使用的是requests，现在大部分的网站都支持动态加载，我们在firefox f12后查找动态的url ：http://www.meilishuo.com/aj/shop_list/goods?frame=1&page=0&shop_id=1001072849，这里的frame是变化的，因此我们只需要请求该网址即可，在请求的header中出现nt 参数，而且nt参数是变化的，我们猜测这可能是随时间变化的，而且是有有效期的；我们的工作是如何取得第一次的nt值？我们在访问http://www.meilishuo.com/shop/1001072849 返回的页面中找到了nt的值，ok 工作顺利解决

#-*- coding:utf-8 -*-
import re
import requests
import codecs
import simplejson


if __name__=="__main__":
	session=requests.Session()
	search_header={'Host':'www.meilishuo.com',
				   'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
				   'Accept':'application/json, text/javascript, */*; q=0.01',
				   'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
				   'Accept-Encoding':'gzip, deflate',
				   'X-Requested-With':'XMLHttpRequest',#异步加载ajax
				   'Referer':'http://www.meilishuo.com/shop/1001072849',
				   'Connection':'keep-alive'}
	response=requests.get('http://www.meilishuo.com/shop/1001072849?frm=rate_to_shop')

	info=re.search('"nt":"(.+?)",',response.content)
	search_header['nt']=info.group(1)#在header中增加nt选项
	info1=re.search('<script>Meilishuo.config.poster0 = (.+?);fml.vars.notFluid = true;</script>',response.content)#取得静态页面的info
	b=simplejson.loads(info1.group(1))
	totalNum = b['totalNum']#取得页数
	page = int(totalNum)/20
	for i in range(page+1):
		a=requests.get('http://www.meilishuo.com/aj/shop_list/goods?frame='+str(i)+'&page=0&shop_id=1001072849',headers=search_header)
		print a.headers
		j_a=simplejson.loads(a.content)
		print len(j_a['tInfo'])

未完待续，接下来的就是要把宝贝的url保存下来并保存为为本地图片

for key in j_a['tInfo']:
                r=requests.get(key['goods_img'])
                with open(key['goods_title']+".jpg","wb") as title:
                        title.write(r.content)