爬虫笔记

import requests
url = 'https://www.sogou.com/web'

headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}

参数动态化

wd = input('enter a key word:')

query是接受输入信息变量通过https://www.sogou.com/web?query="人民币" 传递

param = {
'query':wd
}
response = requests.get(url,headers=headers,params=param)#UA伪装

修改响应数据的编码格式

response.encoding = 'utf-8'

page_text = response.text
fileName = wd+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
fp.write(page_text)
print(fileName,'爬取成功!!!')

原文地址:https://www.cnblogs.com/ciquankun/p/13329082.html