正则表达式,hashlib模块

一.正则表达式

1.什么事正则?
正则就是用一些具有特殊含义的符号组合到一起(成为正则表达式)来描述字符或者字符串的方法.
正则就是用来描述一类事物的规则.在python中,它内嵌在python中,并通过re模式实现.

2.常用匹配模式(元字符)
import re
w 匹配字母数字及下滑线
W 匹配非字母数字下滑线
s 匹配任意空白字符,等价于[ f].
S 匹配任意非空字符
d 匹配任意数字,等价于[0-9]
D 匹配任意非数字
匹配一个换行符
匹配一个制表符及一个空格
^: 从头开始匹配,匹配字符串的开头
# print(re.findall('^alex','1alex3alex'))
$: 从尾部开始匹配,匹配字符串的末尾
# print(re.findall('henry$','hery_li is the best henry'))
.:匹配一个字符,该字符可以是任意字符,除了换行符当re.DOTALL标记被指定时,则可以匹配包括换行符的任意字符.
[]:代表匹配一个字符,这一个字符可以自定义范围

重复匹配:都不能单独使用
?:代表左边那一个字符出现0次或1次
*:代表左边那一个字符出现0次到无穷次
+:代表左边那一个字符出现1次到无穷次
{n,m}:代表左边那个字符出现n次到m次
{n}:代表左边那个字符出现n次

组合使用
.*:匹配任意0个到无穷个字符贪婪匹配,尽可能多的吃掉字符
.*?:匹配任意0个到无穷个字符非贪婪匹配

补充:
a|b:匹配a或b
():分组
print(re.findall('compan(y|ies)','Too many companies have gone bankrupt, and the next one is my company'))
只留组内

print(re.findall('compan(?:y|ies)','Too many companies have gone bankrupt, and the next one is my company'))
?: 取消只留组内,拿到匹配成功所有的内容

转译
a\c 首先被python语法识别成ac,python没有正则表达式功能,其实是调用了C语言正则表达式功能,下来把ac交给C语言正则表达式功能.
a\c 首先被python语法识别成a\c,下来在C语言的正则表达式识别成ac
print(re.findall('a\\c','ac aac'))
结果:a\c 第一个是转译的意思

print(re.findall(r'a\c','ac aac'))
r代表原生字符串,告诉python解释器python语法不要动直接将a\c传给c语言的正则表达式

print(re.findall('henry','henry is star HENRY is good boy Henry like is Irene',re.I))
结果:['henry', 'HENRY', 'Henry']

msg='''
my name is henry
irene like henry
wendy like henry
'''
print(re.findall('henry$',msg))
结果:['henry']
print(re.findall('henry',msg,re.M))
结果:['henry', 'henry', 'henry']

#re模式其他方法
res=re.findall('href="(.*?)"','<href="https://www.hao123.com/3.mp3">酷酷的滕<href="https://www.hao123.com/1.mp3">')
print(res)
结果:['https://www.hao123.com/3.mp3', 'https://www.hao123.com/1.mp3']

re.search()
res=re.search('href="(.*?)"','<href="https://www.hao123.com/3.mp3">酷酷的滕<href="https://www.hao123.com/1.mp3">')
print(res)
结果:<_sre.SRE_Match object; span=(1, 36), match='href="https://www.hao123.com/3.mp3"'>
res=re.search()只匹配成功一次就结束,不会往后匹配
print(res.group()) print(res.group(0))取分组,默认取出完整的分组
结果:href="https://www.hao123.com/3.mp3"

res=re.search('(href)="(.*?)"','<href="https://www.hao123.com/3.mp3">酷酷的滕<href="https://www.hao123.com/1.mp3">')
print(res.group(0))
print(res.group(1))
print(res.group(2))
结果:
href="https://www.hao123.com/3.mp3"
href
https://www.hao123.com/3.mp3

re.match('ab','ab123)等同于re.search('^ab','ab123')

pattern=re.compile('henry')
print(pattern.findall('henry is henry is henry'))
print(pattern.search('henry is henry is henry'))
print(pattern.match('henry is henry is henry'))

# ['1', '2', '60', '-40.35', '5', '-4', '3']
msg="1-2*(60+(-40.35/5)-(-40*3))"
print(re.findall('D?(-?d+.?d*)',msg))

二.hashlib模块

1.什么是hash?
hash是一种算法,该算法接受传入的内容,经过运算得到一串hash值.
2.hash就有三大特性
2.1 只要传入的内容一样,得到的hash值必然一样
2.2 只要使用的hash算法不变,无论校验的内容多大,得到的hash值长度是固定的
2.3 不能由hash值返解成内容,hash值不可逆,即不能通过hash值逆推出内容
3.为何要用hash?
特性1+2=>文件完整性校验
import hashlib
m=hashlib.md5() #hash工厂 md5是hash算法 ()可以传值
m.update()#必须传入bits类型
m.update('你好'.encode('utf-8'))
m.update('美女'.encode('utf-8')) #原材料
print(m.hexdigest()) #产品结果:57d60e88a20ae60d228989070c7333ff

m=hashlib.md5()
m.update('你好帅哥'.encode('utf-8'))
print(m.hexdigest()) #结果:57d60e88a20ae60d228989070c7333ff
print(len(m.hexdigest())) #32

m=hashlib.md5()
m.update(b'abcbdsakdaksdksjd')
print(len(m.hexdigest())) #32

m=hashlib.sha512()
m.update(b'abcbdsakdaksdksjd')
print(len(m.hexdigest())) #128

import hashlib
with open(r'E:python-li课堂day17hash',mode='rb') as f:
　　m=hashlib.md5()
　　for line in f:
　　m.update(line)
　　print(m.hexdigest())
一般不用上面的方法,而是从文件里随机选几段内容

import hashlib
pwd=input('password>>:').strip()
m=hashlib.md5()
m.update('从前从前有个人爱你很久'.encode('utf-8'))
m.update(pwd.encode('utf-8'))
m.update('但偏偏风渐渐把距离吹得好远'.encode('utf-8'))
print(m.hexdigest())