2020.2.13

爬取自己需要的数据

网址：https://www.yt1998.com/priceHistory.html

通过网址的分析能够看出这个网页的数据是通过post的方式请求的服务器，来看一下form的请求表单

这里能够看到form表单对应的数据如下所示

market对应亳州市场，编号1

代码如下：

#爬取的近三年的药材的价格的数据
import requests
import json
#基于控制台获取到输入的待翻译词语
content = input("请输入：")
#设定请求的URL
url = 'https://www.yt1998.com/price/historyPriceQ!getHistoryPrice.do'
#这里有一个反爬的措施，translate_o?这个_o删除即刻
#url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
#建立post的表单，并且将浏览器拷贝下来的表单修改成最基本的字典的格式
post_form = {
'ycnam': '板蓝根',
'guige': '统',
'chandi': '东北',
'market': '1'
}
#提交post请求
response = requests.post(url,data=post_form)
#接受到相应的结果
trans_json = response.text
#json字符串转化成python的字典格式
trans_dict = json.loads(trans_json)
#result = trans_dict['translateResult'][0][0]
#打印翻译的结果
print("药材价格")
print(trans_dict)
print()

能够爬取到的数据如下展示：