python简单爬数据

失败了,即使跟Firefox看到的headers,参数一模一样都不行,爬出来有网页,但是就是不给数据,尝试禁用了js,然后看到了cookie(不禁用js是没有cookie的),用这个cookie爬,还是不行,隔了时间再看,cookie的内容也并没有变化,有点受挫,但还是发出来,也算给自己留个小任务啥的

如果有大佬经过,还望不吝赐教

另外另两个网站的脚本都可以用,过会直接放下代码,过程就不说了


目标网站 http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml

先解决一下date到decimal years的转换,仅考虑到天的粗略转换

def date2dy(year, month, day):
    months = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    oneyear = 365
    if year%100 == 0:
        if year%400 == 0:
            months[1] = 29
            oneyear = 366
    else:
        if year%4 == 0:
            months[1] = 29
            oneyear = 366

    days = 0
    i = 1
    while i < month:
        days = days + months[i]
        i = i + 1
    days = days + day - 1
    return year + days/366

第一个小目标是抓下2016.12.1的数据

打开FireFox的F12,调到网络一栏

提交数据得到

有用的信息是请求头,请求网址和参数,扒下来扔到程序里面试试

这块我试了大概一天多,抓不下来,我好菜呀.jpg

放下代码吧先,万一有大佬经过还望不吝赐教

#!usr/bin/python

import requests
import sys

web_url = r'http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml'
request_url = r'http://www.geomag.bgs.ac.uk/cgi-bin/igrfsynth'
filepath = sys.path[0] + '\data_igrf_raw_' + '.html'
fid = open(filepath, 'w', encoding='utf-8')
headers = {
    'Host': 'www.geomag.bgs.ac.uk',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:53.0) Gecko/20100101 Firefox/53.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
    'Accept-Encoding': 'gzip, deflate',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Content-Length': '136',
    'Referer': 'http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
}
payload = {
    'name': '-',  # your name and email address
    'coord': '1',  # '1': Geodetic '2': Geocentic
    'date': '2016.92',  # decimal years
    'alt': '150',  # Altitude
    'place': '',
    'degmin': 'y',  # Position Coordinates: 'y': In Degrees and Minutes 'n': In Decimal Degrees
    'latd': '60',  # latitude degrees (degrees negative for south)
    'latm': '0',  # latitude minutes
    'lond': '120',  # longitude degrees (degrees negative for west)
    'lonm': '0',  # longitude minutes
    'tot': 'y',  # Total Intensity(F)
    'dec': 'y',  # Declination(D)
    'inc': 'y',  # Inclination(I)
    'hor': 'y',  # Horizontal Intensity(H)
    'nor': 'y',  # North Component (X)
    'eas': 'y',  # East Component (Y)
    'ver': 'y',  # Vertical Component (Z)
    'map': '0',  # Include a Map of the Location: '0': NO '1': YES
    'sv': 'n'
}
#如果需要Secular Variation (rate of change), 加上'sv': 'y'
r = requests.post(request_url, data=payload, headers=headers)
fid.write(r.text)
fid.close();
原文地址:https://www.cnblogs.com/ippfcox/p/6947080.html