常用模块（二）

一序列化

1.什么是序列化与反序列化

序列化指的是把内存的数据类型转换成一个特定的格式的内容

内存中的数据类型-------》序列化--------》特定的格式（jso格式或者 pickle格式）

特定的格式(json 格式或者 pickle 格式)--------》反序列化--------》内存中的数据类型

2、为何要用序列化

序列化得到结果===》特定的格式的内容有两种用途

1、可用于存储==》用于存档

2、传输给其他平台使用==》跨平台数据交互

强调：

针对用途 1 的特定格式：

可以是一种专用的格式==》pickle 只有 python 可以识别

针对用途 2 的特定格式：

应该是一种通用能够被所有语言识别的格式===》json

3、如何序列化与反序列化

二 json

通过 json 序列化得到的都是 json 格式的字符串类型

1 序列化

序列化: python数据类型 ---》 json序列化 ---》字符串 ---》 json文件中

import json
json.dumps(True)
print(res,type(res)) #true, <class 'str'>

json.dumps([1, 'aaa', True, False])
print(res, type(res)) #[1, "aaa", true, false] <class 'str'>

文件操作：

import json
json.dumps：res= json.dumps('aaa'), f = open() --> f.write(res)
with open('a.txt', 'wt', encoding='utf-8')as f:
		json.dump('haha', f) #内部实现 f.write()
    
# 序列化的结果写入文件的复杂方法
json_res=json.dumps([1,'aaa',True, False])
# print(json_res,type(json_res)) # "[1, "aaa", true, false]"
with open('test.json',mode='wt',encoding='utf-8') as f:
    f.write(json_res)

# 将序列化的结果写入文件的简单方法
with open('test.json',mode='wt',encoding='utf-8') as f:
    json.dump([1,'aaa',True,False],f)

2 反序列化

反序列化: json文件中 --》字符串 ---》 json反序列化 ---》 python或其他语言数据类型

import json

json.loads:f = open(), str = f.read(), json.loads(str)

案例二：

import json

#序列化
with open(r'a.json', 'wt', encoding='utf-8')as f:
    json.dump('haha', f)
    
#反序列化
with open(r'a.json', 'rt', encoding='utf-8')as f:
    res = json.load(f)
    print(res,type(res)) #haha <class 'str'>

文件操作

# 从文件读取json格式的字符串进行反序列化操作的复杂方法
with open('test.json',mode='rt',encoding='utf-8') as f:
    json_res=f.read()
    l=json.loads(json_res)
    print(l,type(l))

# 从文件读取json格式的字符串进行反序列化操作的简单方法
with open('test.json',mode='rt',encoding='utf-8') as f:
    l=json.load(f)
    print(l,type(l))

json格式兼容的是所有语言通用的数据类型，但不能序列化 python 的集合数据类型

json.dumps({1,2,3,4,5}) #报错

json 强调：json 数据内部的字符串是双引号，没有单引号。

一定要搞清楚 json 格式，不要与 python 混淆

l=json.loads('[1, "aaa", true, false]')
print(l) #[1, 'aaa', True, False]

l=json.loads("[1,1.3,true,'aaa', true, false]")
print(l) #报错， json格式数据内部的字符串是双引号

python中元组，若将其转换成 json 数据，内部会将元组--->列表

tuple1 = ('1',2,3, 'aa')
res =json.dumps(tuple1)
print(res,type(res)) #["1", 2, 3, "aa"] <class 'str'>

了解：

在 python3.5 之上的解释器 bytes 类型可以反序列化成 python格式数据，但是 python3.5 及之前不支持

l = json.loads(b'[1, "aaa", true, false]') #b'json格式数据'
print(l, type(l)) #[1, 'aaa', True, False] <class 'list'>

with open('test.json',mode='rb') as f:
    l=json.load(f)
    
#注：在 python3.5 之上的解释器 bytes 类型可以反序列化成 python格式数据，但是 python3.5 之前不支持

res=json.dumps({'name':'哈哈哈'}, ensure_ascii=False) #ensure_ascii 默认为 True，此时中文字符序列化的得到 "u54c8u54c8u54c8"
print(res,type(res)) #{"name": "哈哈哈"} <class 'str'>

res=json.loads('{"name": "u54c8u54c8u54c8"}')
print(res,type(res)) #{'name': '哈哈哈'} <class 'dict'>

猴子补丁：一种程序打补丁的思想

在入口处打猴子补丁

import json
import ujson

def monkey_patch_json():
  	json.__name__ = 'ujson'
    json.dumps = ujson.dumps
    json.loads = ujson.loads
    
monekey_patch_json() #在入口文件处运行



# 后续代码中的应用
# json.dumps()
# json.dumps()
# json.dumps()

三 pickle

pickle 模块是 python 提供的一种序列化模块,序列化的结果是 bytes 类型数据

优点：可以支持 python 中所有的数据类型

可以直接存'bytes'类型数据，pickle 存取速度更快

缺点：

只能支持 python 使用，不能跨平台

序列化与反序列化

Pickle 会将python所有数据序列化成 bytes 类型

import pickle
res=pickle.dumps({1,2,3,4,5})
print(res, type(res)) #b'x80x03cbuiltins
set
qx00]qx01(Kx01Kx02Kx03Kx04Kx05ex85qx02Rqx03.' <class 'bytes'>

s=pickle.loads(res)# 后续代码中的应用
print(s,type(s)) #{1, 2, 3, 4, 5} <class 'set'>

python2与python3的pickle兼容性问题

# coding:utf-8
import pickle

with open('a.pkl',mode='wb') as f:
    # 一：在python3中执行的序列化操作如何兼容python2
    # python2不支持protocol>2，默认python3中protocol=4
    # 所以在python3中dump操作应该指定protocol=2
    pickle.dump('你好啊',f,protocol=2)

with open('a.pkl', mode='rb') as f:
    # 二：python2中反序列化才能正常使用
    res=pickle.load(f)
    print(res)

pickle的问题和所有其他编程语言特有的序列化问题一样，就是它只能用于Python，并且可能不同版本的Python彼此都不兼容，因此，只能用Pickle保存那些不重要的数据。

四 configparser

configparser：配置模块

'''aa.ini'''

# 注释1
; 注释2

[section1]
k1 = v1
k2:v2
user=egon
age=18
is_admin=true
salary=31

[section2]
k1 = v1

# coding:utf-8
'''配置文件'''
import configparser

#初始化，并读取配置文件
config=configparser.ConfigParser()
res = config.read('aa.ini') #  'aa.ini':文件路径
print(res,type(res)) #['aa.ini'] <class 'list'>

#查看所有的标题 sections
res=config.sections()
print(res) #['section1', 'section2']

#查看标题section1下所有key=value的key
options=config.options('section1')
print(options) #['k1', 'k2', 'user', 'age', 'is_admin', 'salary']

#查看标题section1下所有key=value的(key,value)格式
item_list=config.items('section1') #返回值是列表套元组形式
print(item_list) #[('k1', 'v1'), ('k2', 'v2'), ('user', 'egon'), ('age', '18'), ('is_admin', 'true'), ('salary', '31')]

# #查看标题section1下user的值=>字符串格式
val=config.get('section1', 'user')
print(val) #egon

#查看标题section1下age的值=>整数格式
val1 = config.getint('section1','age')
print(val1, type(val1)) #18 <class 'int'>

#查看标题section1下is_admin的值=>布尔值格式
val2=config.getboolean('section1','is_admin')
print(val2) #True

#查看标题section1下salary的值=>浮点型格式
val3=config.getfloat('section1','salary')
print(val3) #31.0

改写

import configparser

config=configparser.ConfigParser()
config.read('a.cfg',encoding='utf-8')


#删除整个标题section2
config.remove_section('section2')

#删除标题section1下的某个k1和k2
config.remove_option('section1','k1')
config.remove_option('section1','k2')

#判断是否存在某个标题
print(config.has_section('section1'))

#判断标题section1下是否有user
print(config.has_option('section1',''))


#添加一个标题
config.add_section('egon')

#在标题egon下添加name=egon,age=18的配置
config.set('egon','name','egon')
config.set('egon','age',18) #报错,必须是字符串


#最后将修改的内容写入文件,完成最终的修改
config.write(open('a.cfg','w'))

具体案例：

项目的配置文件采用configparser进行解析

# 生成mysql配置信息
'''
conf_obj = configparser.ConfigParser()

# conf_obj['配置标题'] = {配置字典}
conf_obj['MYSQL'] = {'HOST': '127.0.0.1',
                     'PORT': '3306',
                     'USER': 'tank',
                     'PASSWORD': '123456',
                     }

with open('mysql.ini', 'w') as f:
    conf_obj.write(f)
'''

#结果展示：
'''
[MYSQL]
host = 127.0.0.1
port = 3306
user = tank
password = 123456
'''


# 校验mysql配置信息
conf_obj = configparser.ConfigParser()
# 读取mysql.ini配置文件
conf_obj.read('mysql.ini')

title = conf_obj.sections()

if 'MYSQL' in title:

    ini_user = conf_obj['MYSQL']['USER']
    ini_pwd = conf_obj['MYSQL']['PASSWORD']

    if ini_user == 'tank' and ini_pwd == '123456':
        print('mysql连接成功!')

五 hashlib

1、什么是哈希 hash

hash 一类算法，该算法接收传入的内容，经过运算得到一串 hash 值

hash 值的特点：

1、只要传入值一样，得到的hash 值必然一样

2、不能由 hash 值返解成内容

3、只要使用 hash 算法不变，无论传入的内容有多大，得到的 hash 值长度是固定的

2、用途

用途 1：特点 1：用于密码密文传输与校验

用途2：特点1和特点 3：用于文件完整性校验

3、如何用

hashlib 是一个加密模块，内置了很多算法

1 hashlib使用

#传入的值必须为 bytes类型

import hashlib
m = hashlib.md5()
m.update('hello'.encode('utf-8')) #传入数据必须为 bytes 类型，所以需要将字符串 encode 成 bytes 类型
res = m.hexdigest()
print(res) #5d41402abc4b2a76b9719d911017c592

待加密的数据分开多次传和完整一次性传入给加密对象，最后加密的结果都是一样的。验证了只要hash 的特点 1：只要传入值一样，得到的 hash 值必然一样

import hashlib
m = hashlib.md5('hello'.encode('utf-8'))
m.update('world'.encode('utf-8'))
res =m.hexdigest() #'helloworld'
print(res)

import hashlib
m = hashlib.md5()
m.update('hello'.encode('utf-8'))
m.update('world'.encode('utf-8'))
res =m.hexdigest()#'helloworld'
print(res)

#结果展示
'''
fc5e038d38a57032085441e7fe7010b0
fc5e038d38a57032085441e7fe7010b0
'''

2 模拟撞库

import hashlib
user_pwd = 'beffc46e29d93a4b0ad3765c242d6fc8'
# 模拟撞库
# 制作密码字典
password = [
    'h123an',
    '123han',
    'ha123n',
    '12han3',
    'han123',
    'ha12n3',
]

dic = {}
for str_num in password:
    m = hashlib.md5(str_num.encode('utf-8'))
    dic[str_num] = m.hexdigest() #n拿到密码字典

#模拟撞库得到密码
for k, v in dic.items():
    if v == user_pwd:
        print('撞库成功， 明文密码为%s'%k)#撞库成功， 明文密码为han123
        break

3 加盐

增加撞库的成本，可以提高加密数据的安全性==》加盐

#提升撞库成本====》加盐
import hashlib

m = hashlib.md5()
m.update('han123'.encode('utf-8'))
#加盐
str1 = '天王盖地虎'
m.update(str1.encode('utf-8'))
res = m.hexdigest() #'天王盖地虎han123'
print(res) #5c62dc89fd9440ab7a2b3ecee7e904ad

具体案例：

def pwd_md5(pwd):
    global res
    md5_obj = hashlib.md5()
    str1 = pwd
    md5_obj.update(str1.encode('utf-8'))  # update 中一定要传入bytes类型数据
    # 创造盐
    sa1 = '哈哈'
    # 加盐
    md5_obj.update(sa1.encode('utf-8'))  # update 中一定要传入bytes类型数据
    # 得到一个加密后的字符串
    res1 = md5_obj.hexdigest()

    print(res1)
    return res1

pwd = input('请输入用户密码').strip()
user_str2 = f'tank:{pwd_md5(pwd)}'
# def register(*regs, **kwregs):
#     # # user_str1 = f'tank:1234'
with open('user.txt', 'w', encoding='utf-8')as f:
        f.write(user_str2)

提高加密程度操作

可以截取数据中某些位置进行拼接加密(如：1/4 , 2/4, 3/4, 4/ 4等数据大小位置)，用来提高校验数据完整性是否一致，一般可用于视频、文本及应用软件等是否被完整未被篡改。

具体案例

文件完整性校验

#具体步骤：
'''
1、先打开一个未修改'a.txt'文件，获取该文件的md5值，并保存
2、再打开修改后的'a.txt'文件，获取该文件的md5值，与未修改前的md值值进行校验
3、若校验成功，证明文件是完整的
'''

大文件完整性校验

'''
c.txt
haha
haha
haha
'''

'''
c1.txt
haha
haha
haha
'''

'''执行文件'''
import hashlib
import os

def get_file_md5(file_path):
    '''
    :param file_path: 文件路径
    :return:
    '''
    # 1.先通过os.path.getsize获取文件的大小（int类型）
    file_size = os.path.getsize(file_path)
    # 2）在文件的四个位置找点一个小点
    # 2.1) 获取文件开头位置
    offset1 = 0
    # 2.2) 获取文件3分之1位置
    offset2 = file_size // 3
    # 2.3) 获取文件3分之2位置
    offset3 = (file_size // 3) * 2
    # 2.4) 获取文件最后位置
    offset4 = file_size - 10

    # get_data_list: 里面存放文件中4个位置的值，每个位置获取10个值
    get_data_list = [offset1, offset2, offset3, offset4]
    #加密对象
    md5_obj = hashlib.md5()
    with open(file_path, 'rb') as f:
        # 循环4个位置
        for offset in get_data_list:
            # 光标移动到4个位置中
            f.seek(offset)
            # 读取10个bytes数据
            read_data = f.read(10)
            # 通过md5将4个位置截取的字符做一个MD5加密
            md5_obj.update(read_data)

    return md5_obj.hexdigest()

#源文件的加密数据
file1_md5 = get_file_md5("/Users/tophan/2020_python/day22/c.txt")
#修改后文件的加密数据
file2_md5 = get_file_md5("/Users/tophan/2020_python/day22/c1.txt")

#拿到修改后文件的加密数据与未修改文件的加密数据进行对比
print(file1_md5 == file2_md5) #True

4 hmac模块(了解)

python 提供的 hmac 模块，它内部对我们创建 key 和内容进行进一步的处理然后再加密:

#要想保证hmac最终结果一致，必须保证：
#1:hmac.new括号内指定的初始key一样
#2:无论update多少次，校验的内容累加到一起是一样的内容

# 操作一
import hmac
h1=hmac.new('hello'.encode('utf-8'),digestmod='md5')
h1.update('world'.encode('utf-8'))

print(h1.hexdigest()) # 0e2564b7e100f034341ea477c23f283b

# 操作二
import hmac
h2=hmac.new('hello'.encode('utf-8'),digestmod='md5')
h2.update('w'.encode('utf-8'))
h2.update('orld'.encode('utf-8'))

print(h1.hexdigest()) # 0e2564b7e100f034341ea477c23f283b

六 subprocess

subprocess模块：它是一个子进程模块，可以通过python代码给操作系统终端发送命令

#执行系统命令
'''
Pooen(cmd命令， shell =True,stdout = subprocess.PIPE,strderr = subprocess.PIPE)
调用Popen 就会将用户的终端命令发送给本地操作系统的终端
得到一个对象，对象中包含着正确或错误的结果。
'''

obj = subprocess.Popen('echo 123 ; ls / ; ls /root', shell=True,
                       stdout=subprocess.PIPE, #正确的结果被放入该管道
                       stderr=subprocess.PIPE, #错误的结果被放入该管道
                       )

# print(obj) #<subprocess.Popen object at 0x10f7fad68>

#查看正确管道内的内容
str_true = obj.stdout.read() #读出的是 bytes类型
print(str_true.decode('utf-8'))
# 结果展示
'''
123
Applications
Library
Network
System
Users
Volumes
bin
cores
dev
etc
home
installer.failurerequests
net
private
sbin
tmp
usr
var
'''

#查看错误管道内的内容
str_error = obj.stderr.read()
res = str_error.decode('utf-8')
print(res) #ls: /root: No such file or directory

终端输入命令显示：

了解：

# coding:utf-8
import subprocess

'''
sh-3.2# ls /Users/egon/Desktop |grep txt$
mysql.txt
tt.txt
事物.txt
'''

res1=subprocess.Popen('ls /Users/jieli/Desktop',shell=True,stdout=subprocess.PIPE)
res=subprocess.Popen('grep txt$',shell=True,stdin=res1.stdout,
                 stdout=subprocess.PIPE)

print(res.stdout.read().decode('utf-8'))


#等同于上面,但是上面的优势在于,一个数据流可以和另外一个数据流交互,可以通过爬虫得到结果然后交给grep
res1=subprocess.Popen('ls /Users/jieli/Desktop |grep txt$',shell=True,stdout=subprocess.PIPE)
print(res1.stdout.read().decode('utf-8'))


#windows下:
# dir | findstr 'test*'
# dir | findstr 'txt$'
import subprocess
res1=subprocess.Popen(r'dir C:UsersAdministratorPycharmProjects	est函数备课',shell=True,stdout=subprocess.PIPE)
res=subprocess.Popen('findstr test*',shell=True,stdin=res1.stdout,
                 stdout=subprocess.PIPE)

print(res.stdout.read().decode('gbk')) #subprocess使用当前系统默认编码，得到结果为bytes类型，在windows下需要用gbk解码

七 xml 模块（了解）

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下，就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

xml数据

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml：

# print(root.iter('year')) #全文搜索
# print(root.find('country')) #在root的子节点找，只找一个
# print(root.findall('country')) #在root的子节点找，找所有

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)
 
#遍历xml文档
for child in root:
    print('========>',child.tag,child.attrib,child.attrib['name'])
    for i in child:
        print(i.tag,i.attrib,i.text)
 
#只遍历year 节点
for node in root.iter('year'):
    print(node.tag,node.text)
#---------------------------------------

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
 
#修改
for node in root.iter('year'):
    new_year=int(node.text)+1
    node.text=str(new_year)
    node.set('updated','yes')
    node.set('version','1.0')
tree.write('test.xml')
 
 
#删除node
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)
 
tree.write('output.xml')

#在country内添加（append）节点year2
import xml.etree.ElementTree as ET
tree = ET.parse("a.xml")
root=tree.getroot()
for country in root.findall('country'):
    for year in country.findall('year'):
        if int(year.text) > 2000:
            year2=ET.Element('year2')
            year2.text='新年'
            year2.attrib={'update':'yes'}
            country.append(year2) #往country节点下添加子节点

tree.write('a.xml.swap')

自己创建 xml 文档

import xml.etree.ElementTree as ET
 
 
new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
age = ET.SubElement(name,"age",attrib={"checked":"no"})
sex = ET.SubElement(name,"sex")
sex.text = '33'
name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'
 
et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)
 
ET.dump(new_xml) #打印生成的格式

八 shelve 模块

shelve模块比pickle模块简单，只有一个open函数，返回类似字典的对象，可读可写;key必须为字符串，而值可以是python所支持的数据类型

import shelve

f=shelve.open(r'sheve.txt')
# f['stu1_info']={'name':'egon','age':18,'hobby':['piao','smoking','drinking']}
# f['stu2_info']={'name':'gangdan','age':53}
# f['school_info']={'website':'http://www.pypy.org','city':'beijing'}

print(f['stu1_info']['hobby'])
f.close()