python学习-xml处理，yaml处理，hashlib，subprocess模块

xml处理模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下，就是通过<>节点来区别数据结构的:

 1 <?xml version="1.0"?>
 2 <data>
 3     <country name="Liechtenstein">
 4         <rank updated="yes">2</rank>
 5         <year>2008</year>
 6         <gdppc>141100</gdppc>
 7         <neighbor name="Austria" direction="E"/>
 8         <neighbor name="Switzerland" direction="W"/>
 9     </country>
10     <country name="Singapore">
11         <rank updated="yes">5</rank>
12         <year>2011</year>
13         <gdppc>59900</gdppc>
14         <neighbor name="Malaysia" direction="N"/>
15     </country>
16     <country name="Panama">
17         <rank updated="yes">69</rank>
18         <year>2011</year>
19         <gdppc>13600</gdppc>
20         <neighbor name="Costa Rica" direction="W"/>
21         <neighbor name="Colombia" direction="E">test</neighbor>>
22     </country>
23 </data>

View Code

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml

 1 import xml.etree.ElementTree as ET
 2 
 3 tree = ET.parse("xmltest.xml")
 4 root = tree.getroot()
 5 print(root.tag)
 6 
 7 print('-'*50)
 8 # 遍历xml文档
 9 for child in root:
10     print(child.tag, child.attrib)
11     for i in child:
12         print(i.tag, i.text,i.attrib)
13 
14 print('-' * 50)
15 # 只遍历year 节点
16 for node in root.iter('year'):
17     print(node.tag, node.text)

View Code

输出：

 1 data
 2 --------------------------------------------------
 3 country {'name': 'Liechtenstein'}
 4 rank 2 {'updated': 'yes'}
 5 year 2008 {}
 6 gdppc 141100 {}
 7 neighbor None {'direction': 'E', 'name': 'Austria'}
 8 neighbor None {'direction': 'W', 'name': 'Switzerland'}
 9 country {'name': 'Singapore'}
10 rank 5 {'updated': 'yes'}
11 year 2011 {}
12 gdppc 59900 {}
13 neighbor None {'direction': 'N', 'name': 'Malaysia'}
14 country {'name': 'Panama'}
15 rank 69 {'updated': 'yes'}
16 year 2011 {}
17 gdppc 13600 {}
18 neighbor None {'direction': 'W', 'name': 'Costa Rica'}
19 neighbor test {'direction': 'E', 'name': 'Colombia'}
20 --------------------------------------------------
21 year 2008
22 year 2011
23 year 2011

View Code

修改和删除xml文档内容

 1 import xml.etree.ElementTree as ET
 2 
 3 tree = ET.parse("xmltest.xml")
 4 root = tree.getroot()
 5 
 6 # 修改
 7 for node in root.iter('year'):
 8     new_year = int(node.text) + 1
 9     node.text = str(new_year)
10     node.set("updated", "yes")
11 
12 tree.write("xmltest.xml")
13 
14 # 删除node
15 for country in root.findall('country'):
16     rank = int(country.find('rank').text)
17     if rank > 50:
18         root.remove(country)
19 
20 tree.write('output.xml')

View Code

自己创建xml文档

 1 import xml.etree.ElementTree as ET
 2  
 3  
 4 new_xml = ET.Element("namelist")
 5 name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
 6 age = ET.SubElement(name,"age",attrib={"checked":"no"})
 7 sex = ET.SubElement(name,"sex")
 8 sex.text = '33'
 9 name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
10 age = ET.SubElement(name2,"age")
11 age.text = '19'
12  
13 et = ET.ElementTree(new_xml) #生成文档对象
14 et.write("test.xml", encoding="utf-8",xml_declaration=True)
15  
16 ET.dump(new_xml) #打印生成的格式

View Code

PyYAML模块

Python也可以很容易的处理ymal文档格式，只不过需要安装一个模块，参考文档：http://pyyaml.org/wiki/PyYAMLDocumentation

hashlib模块　

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

 1 import hashlib
 2 
 3 m = hashlib.md5()
 4 m.update(b"Hello")
 5 m.update(b"It's me")
 6 print(m.digest())
 7 m.update(b"It's been a long time since last time we ...")
 8 
 9 print(m.digest())  # 2进制格式hash
10 print(m.hexdigest())  # 16进制格式hash
11 print(len(m.hexdigest()))  # 16进制格式hash
12 
13 输出：
14 b']xdexb4{/x92Zxd0xbf$x9cRxe3Brx8a'
15 b'xa0xe9x89Ex03xcbx9fx1ax14xaax07?<xaexfaxa5'
16 a0e9894503cb9f1a14aa073f3caefaa5
17 32

View Code

import hashlib
 
# ######## md5 ########
 
hash = hashlib.md5()
hash.update('admin')
print(hash.hexdigest())
 
# ######## sha1 ########
 
hash = hashlib.sha1()
hash.update('admin')
print(hash.hexdigest())
 
# ######## sha256 ########
 
hash = hashlib.sha256()
hash.update('admin')
print(hash.hexdigest())
 
 
# ######## sha384 ########
 
hash = hashlib.sha384()
hash.update('admin')
print(hash.hexdigest())
 
# ######## sha512 ########
 
hash = hashlib.sha512()
hash.update('admin')
print(hash.hexdigest())

View Code

python 还有一个 hmac 模块，它内部对我们创建 key 和内容再进行处理然后再加密

散列消息鉴别码，简称HMAC，是一种基于消息鉴别码MAC（Message Authentication Code）的鉴别机制。使用HMAC时,消息通讯的双方，通过验证消息中加入的鉴别密钥K来鉴别消息的真伪；

一般用于网络通信中消息加密，前提是双方先要约定好key,就像接头暗号一样，然后消息发送把用key把消息加密，接收方用key ＋消息明文再加密，拿加密后的值跟发送者的相对比是否相等，这样就能验证消息的真实性，及发送者的合法性了。

补充下：hashlib不能直接处理uncode，需要编码，如下

a = hashlib.md5()
a.update('崔'.encode(encoding='utf-8'))
print(a.digest())

Subprocess模块

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several older modules and functions:

os.system
os.spawn*

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

The run() function was added in Python 3.5; if you need to retain compatibility with older versions, see the Older high-level API section.

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, timeout=None, check=False)

Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

The arguments shown above are merely the most common ones, described below in Frequently Used Arguments (hence the use of keyword-only notation in the abbreviated signature). The full function signature is largely the same as that of the Popen constructor - apart from timeout, input and check, all the arguments to this function are passed through to that interface.

This does not capture stdout or stderr by default. To do so, pass PIPE for the stdout and/or stderr arguments.

The timeout argument is passed to Popen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated.

The input argument is passed to Popen.communicate() and thus to the subprocess’s stdin. If used it must be a byte sequence, or a string if universal_newlines=True. When used, the internal Popen object is automatically created withstdin=PIPE, and the stdin argument may not be used as well.

If check is True, and the process exits with a non-zero exit code, a CalledProcessError exception will be raised. Attributes of that exception hold the arguments, the exit code, and stdout and stderr if they were captured.

常用subprocess方法示例

#执行命令，返回命令执行状态， 0 or 非0
>>> retcode = subprocess.call(["ls", "-l"])

#执行命令，如果命令结果为0，就正常返回，否则抛异常
>>> subprocess.check_call(["ls", "-l"])
0

#接收字符串格式命令，返回元组形式，第1个元素是执行状态，第2个是命令结果
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')

#接收字符串格式命令，并返回结果
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'

#执行命令，并返回结果，注意是返回结果，不是打印，下例结果返回给res
>>> res=subprocess.check_output(['ls','-l'])
>>> res
b'total 0 drwxr-xr-x 12 alex staff 408 Nov 2 11:05 OldBoyCRM '

#上面那些方法，底层都是封装的subprocess.Popen
poll()
Check if child process has terminated. Returns returncode

wait()
Wait for child process to terminate. Returns returncode attribute.

terminate() 杀掉所启动进程
communicate() 等待任务结束

stdin 标准输入

stdout 标准输出

stderr 标准错误

pid
The process ID of the child process.

#例子
>>> p = subprocess.Popen("df -h|grep disk",stdin=subprocess.PIPE,stdout=subprocess.PIPE,shell=True)
>>> p.stdout.read()
b'/dev/disk1 465Gi 64Gi 400Gi 14% 16901472 104938142 14% / '

>>> subprocess.run(["ls", "-l"]) # doesn't capture output

CompletedProcess(args=['ls', '-l'], returncode=0)

>>> subprocess.run("exit 1", shell=True, check=True)

Traceback (most recent call last):

...

subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1

>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)

CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,

stdout=b

'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null
'

)

调用subprocess.run(...)是推荐的常用方法，在大多数情况下能满足需求，但如果你可能需要进行一些复杂的与系统的交互的话，你还可以用subprocess.Popen(),语法如下：

p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} ;",shell=True,stdout=subprocess.PIPE)
print(p.stdout.read())

可用参数：

args：shell命令，可以是字符串或者序列类型（如：list，元组）
bufsize：指定缓冲。0 无缓冲,1 行缓冲,其他缓冲区大小,负值系统缓冲
stdin, stdout, stderr：分别表示程序的标准输入、输出、错误句柄
preexec_fn：只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行之前被调用
close_sfs：在windows平台下，如果close_fds被设置为True，则新创建的子进程将不会继承父进程的输入、输出、错误管道。
所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
shell：同上
cwd：用于设置子进程的当前目录
env：用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。
universal_newlines：不同系统的换行符不同，True -> 同意使用
startupinfo与createionflags只在windows下有效
将被传递给底层的CreateProcess()函数，用于设置子进程的一些属性，如：主窗口的外观，进程的优先级等等

终端输入的命令分为两种：

输入即可得到输出，如：ifconfig
输入进行某环境，依赖再输入，如：python

需要交互的命令示例

 1 import subprocess
 2  
 3 obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 4 obj.stdin.write('print 1 
 ')
 5 obj.stdin.write('print 2 
 ')
 6 obj.stdin.write('print 3 
 ')
 7 obj.stdin.write('print 4 
 ')
 8  
 9 out_error_list = obj.communicate(timeout=10)
10 print out_error_list

subprocess实现sudo 自动输入密码

 1 import subprocess
 2  
 3 def mypass():
 4     mypass = '123' #or get the password from anywhere
 5     return mypass
 6  
 7 echo = subprocess.Popen(['echo',mypass()],
 8                         stdout=subprocess.PIPE,
 9                         )
10  
11 sudo = subprocess.Popen(['sudo','-S','iptables','-L'],
12                         stdin=echo.stdout,
13                         stdout=subprocess.PIPE,
14                         )
15  
16 end_of_pipe = sudo.stdout
17  
18 print "Password ok 
 Iptables Chains %s" % end_of_pipe.read()