分析post与json

寻找登录的post地址

在form表单中寻找action对应的url地址
- post的数据是input标签中name的值作为键，真正的用户名密码作为值的字典，post的url地址就是action对应的url地址
抓包，寻找登录的url地址
- 勾选perserve log按钮，防止页面跳转找不到url
- 寻找post数据，确定参数
  - 参数不会变，直接用，比如密码不是动态加密的时候
  - 参数会变
    - 参数在当前的响应中
    - 通过js生成

定位想要的js

选择会触发js时间的按钮，点击event listener，找到js的位置
通过chrome中的search all file来搜索url中关键字
添加断点的方式来查看js的操作，通过python来进行同样的操作

安装第三方模块（用于刷新网页）

pip install retrying
下载源码解码，进入解压后的目录，python setup.py install
***.whl 安装方法 pip install ***.whl

-----------------------------------------------------------

1、reqeusts.util.dict_from_cookiejar  把cookie对象转化为字典
1.1. requests.get(url,cookies={})
2、请求 SSL证书验证
        response = requests.get("https://www.12306.cn/mormhweb/ ", verify=False)
3、设置超时
        response = requests.get(url,1)
4、配合状态码判断是否请求成功
       assert response.status_code == 200
下面我们通过一个例子整体来看一下以上4点的用法

# coding=utf-8
import requests
from retrying import retry

headers={"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"}

@retry(stop_max_attempt_number=3)
def _parse_url(url,method,data,proxies):
    print("*"*20)
    if method=="POST":
        response = requests.post(url,data=data,headers=headers,proxies=proxies)
    else:
        response = requests.get(url,headers=headers,timeout=3,proxies=proxies)
    assert  response.status_code == 200
    return response.content.decode()


def parse_url(url,method="GET",data=None,proxies={}):
    try:
        html_str = _parse_url(url,method,data,proxies)
    except:
        html_str = None

    return html_str

if __name__ == '__main__':
    url = "www.baidu.com"
    print(parse_url(url))

--------------------

In [1]: import requests

In [2]: response = requests.get('http://www.baidu.com')

In [3]: response.cookies
Out[3]: <RequestsCookieJar[Cookie(version=0, name='BDORZ', value='27315', port=N
one, port_specified=False, domain='.baidu.com', domain_specified=True, domain_in
itial_dot=True, path='/', path_specified=True, secure=False, expires=1544757805,
 discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>

In [4]: requests.utils.dict_from_cookiejar(response.cookies)
Out[4]: {'BDORZ': '27315'}

In [5]: requests.utils.cookiejar_from_dict({'BDORZ': '27313'})
Out[5]: <RequestsCookieJar[Cookie(version=0, name='BDORZ', value='27313', port=N
one, port_specified=False, domain='', domain_specified=False, domain_initial_dot
=False, path='/', path_specified=True, secure=False, expires=None, discard=True,
 comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)]>

In [6]: requests.utils.quote('http://tieba.baidu.com/f?kw=李颜')
Out[6]: 'http%3A//tieba.baidu.com/f%3Fkw%3D%E6%9D%8E%E9%A2%9C'

In [7]: requests.utils.unquote('http%3A//tieba.baidu.com/f%3Fkw%3D%E6%9D%8E%E9%
   ...: A2%9C')
Out[7]: 'http://tieba.baidu.com/f?kw=李颜'

json使用注意点

json中的字符串都是双引号引起来的
- 如果不是双引号
  - eval：能实现简单的字符串和python类型的转化
  - replace：把单引号替换为双引号
往一个文件中写入多个json串，不再是一个json串，不能直接读取
- 一行写一个json串，按照行来读取

分析post与json

寻找登录的post地址

定位想要的js

安装第三方模块 （用于刷新网页）

json使用注意点

安装第三方模块（用于刷新网页）