Python自动化开发

本节内容

1、shutil模块

2、shelve模块

3、xml处理模块

一、shutil模块

高级的文件、文件夹、压缩包处理模块

1、shutil.copyfileobj(fsrc, fdst[, length])

将文件内容拷贝到另一个文件中，可以部分内容

import shutil
f1 = open("access.log.1")
f2 = open("access02", "w")
shutil.copyfileobj(f1, f2)  # 参数为文件对象，需要打开文件

2、shutil.copyfile(src, dst)

拷贝文件

shutil.copyfile("access.log.1", "access03")   # 参数为文件名

3、shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

shutil.copymode("access.log.1", "access02")  # 仅拷贝权限，内容、组、用户均不变。文件必须存在

4、shutil.copystat(src, dst)
拷贝状态的信息，包括：mode bits, atime, mtime, flags

shutil.copystat("access.log.1", "access02")  # 拷贝状态的信息，包括：mode bits, atime, mtime, flags。文件必须存在

5、shutil.copy(src, dst)
拷贝文件和权限

shutil.copy("access.log.1", "access03")  # 拷贝文件和权限

6、shutil.copy2(src, dst)
拷贝文件和状态信息

shutil.copy2("access.log.1", "access04")  # 拷贝文件和状态

7、shutil.copytree(src, dst, symlinks=False, ignore=shutil.ignore_patterns(*patterns))

递归式拷贝文件夹src中的内容，不包括文件夹src，dst可以起名为原文件夹名src

shutil.copytree("E:s16day5", "day5", ignore=shutil.ignore_patterns("access*", '*.py'))

8、shutil.rmtree(path[, ignore_errors[, onerror]])

递归的去删除文件

shutil.rmtree('E:s16day6day5')

9、shutil.move(src, dst)

递归的去移动文件

src包括文件、文件夹，必须存在；dst不能存在同名文件，相当于新建文件

shutil.move('E:s16day6day5', 'E:s16day7')

10、shutil.make_archive(base_name, format,...)

创建压缩包并返回文件路径，例如：zip、tar

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，

　　　　如：www =>保存至当前路径
　　　　如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/

format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”
root_dir：要压缩的文件夹路径（默认当前目录）
owner：用户，默认当前用户
group：组，默认当前组
logger：用于记录日志，通常是logging.Logger对象

res = shutil.make_archive("www", 'zip', root_dir='E:s16day7')
print(res)

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的

import zipfile

# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()

# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()

zipfile 压缩解压

zipfile 压缩解压

import tarfile

# 压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
tar.close()

# 解压
tar = tarfile.open('your.tar','r')
tar.extractall()  # 可设置解压地址
tar.close()

tarfile 压缩解压

二、shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式

三、xml处理模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简。

至今很多传统公司如金融行业的很多系统的接口还主要是xml

xml的格式如下，就是通过<>节点来区别数据结构

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

xml数据文件

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)
 
#遍历xml文档
for child in root:
    print(child.tag, child.attrib)
    for i in child:
        print(i.tag,i.text)
 
#只遍历year 节点
for node in root.iter('year'):
    print(node.tag,node.text)

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
 
#修改
for node in root.iter('year'):
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("updated","yes")
 
tree.write("xmltest.xml")
 
 
#删除node
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)
 
tree.write('output.xml')

修改和删除xml文档内容

import xml.etree.ElementTree as ET
 
 
new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
age = ET.SubElement(name,"age",attrib={"checked":"no"})
sex = ET.SubElement(name,"sex")
sex.text = '33'
name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'
 
et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)
 
ET.dump(new_xml) #打印生成的格式

创建xml文档

四、configparser模块

用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser。

来看一个好多软件的常见文档格式如下

[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
 
[bitbucket.org]
User = hg
 
[topsecret.server.com]
Port = 50022
ForwardX11 = no

import configparser
 
config = configparser.ConfigParser()
config["DEFAULT"] = {'ServerAliveInterval': '45',
                      'Compression': 'yes',
                     'CompressionLevel': '9'}
 
config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'
config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Host Port'] = '50022'     # mutates the parser
topsecret['ForwardX11'] = 'no'  # same here
config['DEFAULT']['ForwardX11'] = 'yes'
with open('example.ini', 'w') as configfile:
   config.write(configfile)

创建configparser

>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.sections()
[]
>>> config.read('example.ini')
['example.ini']
>>> config.sections()
['bitbucket.org', 'topsecret.server.com']
>>> 'bitbucket.org' in config
True
>>> 'bytebong.com' in config
False
>>> config['bitbucket.org']['User']
'hg'
>>> config['DEFAULT']['Compression']
'yes'
>>> topsecret = config['topsecret.server.com']
>>> topsecret['ForwardX11']
'no'
>>> topsecret['Port']
'50022'
>>> for key in config['bitbucket.org']: print(key)
...
user
compressionlevel
serveraliveinterval
compression
forwardx11
>>> config['bitbucket.org']['ForwardX11']
'yes'

读取configparser

[section1]
k1 = v1
k2:v2
  
[section2]
k1 = v1
 
import ConfigParser
  
config = ConfigParser.ConfigParser()
config.read('i.cfg')
  
# ########## 读 ##########
#secs = config.sections()
#print secs
#options = config.options('group2')
#print options
  
#item_list = config.items('group2')
#print item_list
  
#val = config.get('group1','key')
#val = config.getint('group1','key')
  
# ########## 改写 ##########
#sec = config.remove_section('group1')
#config.write(open('i.cfg', "w"))
  
#sec = config.has_section('wupeiqi')
#sec = config.add_section('wupeiqi')
#config.write(open('i.cfg', "w"))
  
  
#config.set('group2','k1',11111)
#config.write(open('i.cfg', "w"))
  
#config.remove_option('group2','age')
#config.write(open('i.cfg', "w"))

增删改查configparser

五、hashlib模块

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

m = hashlib.md5()
m.update(b'Hello')
m.update(b' World')
print('md5:', m.hexdigest())

m2 = hashlib.md5()
m2.update(b'Hello World')
print('md5:', m2.hexdigest())  # 十六进制格式hash
print('md5_2进制：',m2.digest())     # 二进制格式hash

sh1 = hashlib.sha1()
sh1.update(b'admin')
print('sha1:', sh1.hexdigest())

sh256 = hashlib.sha256()
sh256.update(b'xxx')
print('sha256:', sh256.hexdigest())

sh384 = hashlib.sha384()
sh384.update(b'xxx')
print('sha384:', sh384.hexdigest())

sh512 = hashlib.sha512()
sh512.update(b'xxx')
print('sha512:', sh512.hexdigest())

Python 还有一个 hmac 模块，它内部对我们创建 key 和内容再进行处理然后再加密

散列消息鉴别码，简称HMAC，是一种基于消息鉴别码MAC（Message Authentication Code）的鉴别机制。

使用HMAC时,消息通讯的双方，通过验证消息中加入的鉴别密钥K来鉴别消息的真伪；

一般用于网络通信中消息加密，前提是双方先要约定好key,就像接头暗号一样，然后消息发送把用key把消息加密，

接收方用key ＋消息明文再加密，拿加密后的值跟发送者的相对比是否相等，这样就能验证消息的真实性，及发送者的合法性了

import hmac
h = hmac.new('天王盖地虎'.encode('utf-8'), '宝塔镇河妖'.encode('utf-8'))
print(h.hexdigest())

六、subprocess模块

这个模块创建新的进程，来连接输入输出错误通道，并得到返回代码，常用示例如下：

# run 执行命令，常用满足大多数需求

>>> subprocess.run(['ls', '-l']

>>> subprocess.run("ls -l", shell=True)

# 执行命令，返回命令执行状态， 0 or 非0

>>> retcode = subprocess.call(["ls", "-l"])

# 执行命令，如果命令结果为0，就正常返回，否则抛异常

>>> subprocess.check_call(["ls", "-l"])

# 接收字符串格式命令，返回元组形式，第1个元素是执行状态，第2个是命令结果

>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')

#上面那些方法，底层都是封装的subprocess.Popen

poll()
Check if child process has terminated. Returns returncode

wait()
Wait for child process to terminate. Returns returncode attribute.

terminate() 杀掉所启动进程

communicate() 等待任务结束

stdin 标准输入

stdout 标准输出

stderr 标准错误

pid

The process ID of the child process.

# 例子
>>> p = subprocess.Popen("df -h|grep disk",stdin=subprocess.PIPE,stdout=subprocess.PIPE,shell=True)
>>> p.stdout.read()

调用subprocess.run(...)是推荐的常用方法，在大多数情况下能满足需求，

但如果你可能需要进行一些复杂的与系统的交互的话，你还可以用subprocess.Popen(),语法如下：

p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} ;",shell=True,stdout=subprocess.PIPE)
print(p.stdout.read())

args：shell命令，可以是字符串或者序列类型（如：list，元组）
bufsize：指定缓冲。0 无缓冲,1 行缓冲,其他缓冲区大小,负值系统缓冲
stdin, stdout, stderr：分别表示程序的标准输入、输出、错误句柄
preexec_fn：只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行之前被调用
close_sfs：在windows平台下，如果close_fds被设置为True，则新创建的子进程将不会继承父进程的输入、输出、错误管道。
所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
shell：同上
cwd：用于设置子进程的当前目录
env：用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。
universal_newlines：不同系统的换行符不同，True -> 同意使用
startupinfo与createionflags只在windows下有效
将被传递给底层的CreateProcess()函数，用于设置子进程的一些属性，如：主窗口的外观，进程的优先级等等

终端输入的命令分为两种：

输入即可得到输出，如：ifconfig
输入进行某环境，依赖再输入，如：python

需要交互的命令示

import subprocess
 
obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
obj.stdin.write('print 1 
 ')
obj.stdin.write('print 2 
 ')
obj.stdin.write('print 3 
 ')
obj.stdin.write('print 4 
 ')
 
out_error_list = obj.communicate(timeout=10)
print out_error_list

subprocess实现sudo 自动输入密码

import subprocess
 
def mypass():
    mypass = '123' #or get the password from anywhere
    return mypass
 
echo = subprocess.Popen(['echo',mypass()],
                        stdout=subprocess.PIPE,
                        )
 
sudo = subprocess.Popen(['sudo','-S','iptables','-L'],
                        stdin=echo.stdout,
                        stdout=subprocess.PIPE,
                        )
 
end_of_pipe = sudo.stdout
 
print "Password ok 
 Iptables Chains %s" % end_of_pipe.read()

七、re模块

常用正则表达式符号

'.'     默认匹配除
之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","
abc
eee",flags=re.MULTILINE)
'$'     匹配字符结尾，或e.search("foo$","bfoo
sdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
 
 
'A'    只从字符开头匹配，re.search("Aabc","alexabc") 是匹配不到的
''    匹配字符结尾，同$
'd'    匹配数字0-9
'D'    匹配非数字
'w'    匹配[A-Za-z0-9]
'W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、	、
、
 , re.search("s+","ab	c1
3").group() 结果 '	'
 
'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

最常用的匹配语法

re.match 从头开始匹配
re.search 匹配包含
re.findall 把所有匹配到的字符放到以列表中的元素返回
re.splitall 以匹配到的字符当做列表分隔符
re.sub      匹配字符并替