【Python之路Day7】基础篇

今日目录：

模块

configparser

xml

shutil

zipfile

tarfile

subprocess

面向对象(上)

一. 模块

上一期博客里列出了几个常用模块(os,hashlib,sys,re), 还有几个剩余的，这篇来继续往下走。

1. configparser模块

configparser模块是Python自带模块，主要用于处理特定的文件(ini文件)，格式比较像MySQL的配置文件类型，就是文件中有多个section，每个section下面有多个配置项，如下：

[mysqld]
basedir = /usr/local/mysql
datadir = /data/mysql
socket = /data/mysql/mysql.sock

[client]
host = localhost
port = 3306
socket = /data/mysql/mysqld.sock

假定配置文件名字是 my.cnf

（1）获取所有节点(section)：使用sections() 方法

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)


import configparser

config = configparser.ConfigParser()   #先把config应用一下configparser，个人感觉有点像logging模块中的logger一样
config.read('my.cnf',encoding='utf-8')  #读取配置文件，编码类型为utf-8

result = config.sections() #使用sections()方法读取所有section，以列表形式返回
print(result)


#结果
['mysqld', 'client']

（2）获取指定section下所有键值对：items()方法

import configparser

config = configparser.ConfigParser()
config.read('my.cnf',encoding='utf-8')

#使用items() 方法列出指定section下所有的键值对，返回一个列表的形式，key和value列表中的一个tuple
result = config.items('mysqld')
print(result)

#结果：
[('basedir', '/usr/local/mysql'), ('datadir', '/data/mysql'), ('socket', '/data/mysql/mysql.sock')]

（3）获取指定节点下所有的键, 使用options() 方法

import configparser

config = configparser.ConfigParser()
config.read('my.cnf',encoding='utf-8')

#获取指定section下的所有的键，以列表的形式返回
result = config.options('mysqld')
print(result)

#结果：
['basedir', 'datadir', 'socket']

（4）获取指定节点下键的值，使用get() 方法

import configparser

config = configparser.ConfigParser()
config.read('my.cnf',encoding='utf-8')
#get方法，mysqld节点下socket键
result = config.get('mysqld','socket')
print(result)

#执行结果：
/data/mysql/mysql.sock

（5）检查、删除、添加特定section，方法has_section(), add_section(), remove_section()

import configparser

config = configparser.ConfigParser()
config.read('my.cnf',encoding='utf-8')

#检查指定的section是否存在，返回一个布尔值，存在为True
result = config.has_section('mysqld')

print(result)
#结果： True

#######################################
#添加节点 mysqldump
config.add_section('mysqldump')
config.write(open('my.cnf','w'))  #需要使用write方法写入内存数据到配置文件中，不然是不能持久化到文件的
result = config.sections()
print(result)

#执行sections查看添加后结果:
['mysqld', 'client', 'mysqldump']


#######################################
#删除节点，mysqldump
config.remove_section('mysqldump')  #使用remove()方法
config.write(open('my.cnf','w'))  #同样默认是在内存中操作，需要调用write方法，将内存数据写入到文件来持久化存储
result = config.sections()
print(result)

#执行sections查看添加后结果:
['mysqld', 'client' ]

(6) 检查、删除设置section内的key-value

#检查section mysqld下的socket键值对是否存在
import configparser

config = configparser.ConfigParser()
config.read('my.cnf',encoding='utf-8')
#使用has_option方法，返回一个布尔值，存在为True
result = config.has_option('mysqld','socket')
print(result)
#执行结果：
True


#####################################
#在mysqld中添加 键 innodb_file_per_table 值为1
#使用set方法
config.set('mysqld','innodb_file_per_table','1')
config.write(open('my.cnf','w'))  #同样需要写入内存数据到文件，使用write方法
result = config.options('mysqld')
print(result)

#执行查看结果：
['basedir', 'datadir', 'socket', 'innodb_file_per_table']



#####################################
#删除mysqld下的socket键
#使用remove_option()方法
config.remove_option('mysqld','innodb_file_per_table')
config.write(open('my.cnf','w'))  #写入内存数据到文件
result = config.options('mysqld')
print(result)

#执行查看结果：
['basedir', 'datadir', 'socket']

使用configparser可以方便的对配置文件(ini)进行操作，其实configparser底层也是使用open函数打开文件，然后在此基础上做操作。

2. xml模块

xml在Internet上被广泛的用于数据交换，同时xml也是一种存储应用数据的常用格式。如下xml例子：

<?xml version="1.0"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Planet Python</title>
        <link>http://planet.python.org/</link>
        <language>en</language>
        <description>Planet Python - http://planet.python.org/</description>
        <item>
            <title>Steve Holden: Python for Data Analysis</title>
            <guid>http://holdenweb.blogspot.com/...-data-analysis.html</guid>
            <link>http://holdenweb.blogspot.com/...-data-analysis.html</link>
            <description>...</description>
            <pubDate>Mon, 19 Nov 2012 02:13:51 +0000</pubDate>
        </item>
        <item>
            <title>Vasudev Ram: The Python Data model (for v2 and v3)</title>
            <guid>http://jugad2.blogspot.com/...-data-model.html</guid>
            <link>http://jugad2.blogspot.com/...-data-model.html</link>
            <description>...</description>
            <pubDate>Sun, 18 Nov 2012 22:06:47 +0000</pubDate>
        </item>
        <item>
            <title>Python Diary: Been playing around with Object Databases</title>
            <guid>http://www.pythondiary.com/...-object-databases.html</guid>
            <link>http://www.pythondiary.com/...-object-databases.html</link>
            <description>...</description>
            <pubDate>Sun, 18 Nov 2012 20:40:29 +0000</pubDate>
        </item>
        ...
    </channel>
</rss>

(1) 解析xml

使用XML() 将字符串解析成xml对象

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

#打开文件
result_xml = open('example.xml','r').read()
#将字符串解析成xml特殊对象，root指的是xml文件的根节点
root = ET.XML(result_xml)
print(root)
#结果， 返回一个Element对象
<Element 'rss' at 0x1018772c8>

使用xml.etree.ElementTree.parse() 函数解析整个xml文件并将其转换成一个文档对象。

tree = ET.parse('example.xml')
root = tree.getroot()
print(root)

#执行后结果：
<Element 'rss' at 0x1019772c8>

(2)操作xml

xml格式类型是节点里嵌套节点，所以对于每一个节点都有如下的功能：

class Element:
    """An XML element.

    This class is the reference implementation of the Element interface.

    An element's length is its number of subelements.  That means if you
    want to check if an element is truly empty, you should check BOTH
    its length AND its text attribute.

    The element tag, attribute names, and attribute values can be either
    bytes or strings.

    *tag* is the element name.  *attrib* is an optional dictionary containing
    element attributes. *extra* are additional element attributes given as
    keyword arguments.

    Example form:
        <tag attrib>text<child/>...</tag>tail

    """

    当前节点的标签名
    tag = None
    """The element's name."""

    当前节点的属性

    attrib = None
    """Dictionary of the element's attributes."""

    当前节点的内容
    text = None
    """
    Text before first subelement. This is either a string or the value None.
    Note that if there is no text, this attribute may be either
    None or the empty string, depending on the parser.

    """

    tail = None
    """
    Text after this element's end tag, but before the next sibling element's
    start tag.  This is either a string or the value None.  Note that if there
    was no text, this attribute may be either None or an empty string,
    depending on the parser.

    """

    def __init__(self, tag, attrib={}, **extra):
        if not isinstance(attrib, dict):
            raise TypeError("attrib must be dict, not %s" % (
                attrib.__class__.__name__,))
        attrib = attrib.copy()
        attrib.update(extra)
        self.tag = tag
        self.attrib = attrib
        self._children = []

    def __repr__(self):
        return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self))

    def makeelement(self, tag, attrib):
        创建一个新节点
        """Create a new element with the same type.

        *tag* is a string containing the element name.
        *attrib* is a dictionary containing the element attributes.

        Do not call this method, use the SubElement factory function instead.

        """
        return self.__class__(tag, attrib)

    def copy(self):
        """Return copy of current element.

        This creates a shallow copy. Subelements will be shared with the
        original tree.

        """
        elem = self.makeelement(self.tag, self.attrib)
        elem.text = self.text
        elem.tail = self.tail
        elem[:] = self
        return elem

    def __len__(self):
        return len(self._children)

    def __bool__(self):
        warnings.warn(
            "The behavior of this method will change in future versions.  "
            "Use specific 'len(elem)' or 'elem is not None' test instead.",
            FutureWarning, stacklevel=2
            )
        return len(self._children) != 0 # emulate old behaviour, for now

    def __getitem__(self, index):
        return self._children[index]

    def __setitem__(self, index, element):
        # if isinstance(index, slice):
        #     for elt in element:
        #         assert iselement(elt)
        # else:
        #     assert iselement(element)
        self._children[index] = element

    def __delitem__(self, index):
        del self._children[index]

    def append(self, subelement):
        为当前节点追加一个子节点
        """Add *subelement* to the end of this element.

        The new element will appear in document order after the last existing
        subelement (or directly after the text, if it's the first subelement),
        but before the end tag for this element.

        """
        self._assert_is_element(subelement)
        self._children.append(subelement)

    def extend(self, elements):
        为当前节点扩展 n 个子节点
        """Append subelements from a sequence.

        *elements* is a sequence with zero or more elements.

        """
        for element in elements:
            self._assert_is_element(element)
        self._children.extend(elements)

    def insert(self, index, subelement):
        在当前节点的子节点中插入某个节点，即：为当前节点创建子节点，然后插入指定位置
        """Insert *subelement* at position *index*."""
        self._assert_is_element(subelement)
        self._children.insert(index, subelement)

    def _assert_is_element(self, e):
        # Need to refer to the actual Python implementation, not the
        # shadowing C implementation.
        if not isinstance(e, _Element_Py):
            raise TypeError('expected an Element, not %s' % type(e).__name__)

    def remove(self, subelement):
        在当前节点在子节点中删除某个节点
        """Remove matching subelement.

        Unlike the find methods, this method compares elements based on
        identity, NOT ON tag value or contents.  To remove subelements by
        other means, the easiest way is to use a list comprehension to
        select what elements to keep, and then use slice assignment to update
        the parent element.

        ValueError is raised if a matching element could not be found.

        """
        # assert iselement(element)
        self._children.remove(subelement)

    def getchildren(self):
        获取所有的子节点（废弃）
        """(Deprecated) Return all subelements.

        Elements are returned in document order.

        """
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'list(elem)' or iteration over elem instead.",
            DeprecationWarning, stacklevel=2
            )
        return self._children

    def find(self, path, namespaces=None):
        获取第一个寻找到的子节点
        """Find first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return the first matching element, or None if no element was found.

        """
        return ElementPath.find(self, path, namespaces)

    def findtext(self, path, default=None, namespaces=None):
        获取第一个寻找到的子节点的内容
        """Find text for first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *default* is the value to return if the element was not found,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return text content of first matching element, or default value if
        none was found.  Note that if an element is found having no text
        content, the empty string is returned.

        """
        return ElementPath.findtext(self, path, default, namespaces)

    def findall(self, path, namespaces=None):
        获取所有的子节点
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Returns list containing all matching elements in document order.

        """
        return ElementPath.findall(self, path, namespaces)

    def iterfind(self, path, namespaces=None):
        获取所有指定的节点，并创建一个迭代器（可以被for循环）
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return an iterable yielding all matching elements in document order.

        """
        return ElementPath.iterfind(self, path, namespaces)

    def clear(self):
        清空节点
        """Reset element.

        This function removes all subelements, clears all attributes, and sets
        the text and tail attributes to None.

        """
        self.attrib.clear()
        self._children = []
        self.text = self.tail = None

    def get(self, key, default=None):
        获取当前节点的属性值
        """Get element attribute.

        Equivalent to attrib.get, but some implementations may handle this a
        bit more efficiently.  *key* is what attribute to look for, and
        *default* is what to return if the attribute was not found.

        Returns a string containing the attribute value, or the default if
        attribute was not found.

        """
        return self.attrib.get(key, default)

    def set(self, key, value):
        为当前节点设置属性值
        """Set element attribute.

        Equivalent to attrib[key] = value, but some implementations may handle
        this a bit more efficiently.  *key* is what attribute to set, and
        *value* is the attribute value to set it to.

        """
        self.attrib[key] = value

    def keys(self):
        获取当前节点的所有属性的 key

        """Get list of attribute names.

        Names are returned in an arbitrary order, just like an ordinary
        Python dict.  Equivalent to attrib.keys()

        """
        return self.attrib.keys()

    def items(self):
        获取当前节点的所有属性值，每个属性都是一个键值对
        """Get element attributes as a sequence.

        The attributes are returned in arbitrary order.  Equivalent to
        attrib.items().

        Return a list of (name, value) tuples.

        """
        return self.attrib.items()

    def iter(self, tag=None):
        在当前节点的子孙中根据节点名称寻找所有指定的节点，并返回一个迭代器（可以被for循环）。
        """Create tree iterator.

        The iterator loops over the element and all subelements in document
        order, returning all elements with a matching tag.

        If the tree structure is modified during iteration, new or removed
        elements may or may not be included.  To get a stable set, use the
        list() function on the iterator, and loop over the resulting list.

        *tag* is what tags to look for (default is to return all elements)

        Return an iterator containing all the matching elements.

        """
        if tag == "*":
            tag = None
        if tag is None or self.tag == tag:
            yield self
        for e in self._children:
            yield from e.iter(tag)

    # compatibility
    def getiterator(self, tag=None):
        # Change for a DeprecationWarning in 1.4
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'elem.iter()' or 'list(elem.iter())' instead.",
            PendingDeprecationWarning, stacklevel=2
        )
        return list(self.iter(tag))

    def itertext(self):
        在当前节点的子孙中根据节点名称寻找所有指定的节点的内容，并返回一个迭代器（可以被for循环）。
        """Create text iterator.

        The iterator loops over the element and all subelements in document
        order, returning all inner text.

        """
        tag = self.tag
        if not isinstance(tag, str) and tag is not None:
            return
        if self.text:
            yield self.text
        for e in self:
            yield from e.itertext()
            if e.tail:
                yield e.tail

节点功能

遍历xml文档的所有内容：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

# 解析xml字符串
# result_xml = open('example.xml','r').read()
# root = ET.XML(result_xml)

#直接对解析xml文档
tree = ET.parse('example.xml')
#获取根节点
root = tree.getroot()

#操作
#获取顶级标签
print(root.tag)

#遍历xml文档的第二层
for i in root:
    print(i.tag,i.attrib)
    #循环遍历第三层
    for j in i:
        print(j.tag,j.attrib)
        #第四层的标签和内容
        for c in j:
            print(c.tag,c.text)

遍历xml文档的所有内容

遍历xml文档中指定的节点：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

# 解析xml字符串
# result_xml = open('example.xml','r').read()
# root = ET.XML(result_xml)

#直接对解析xml文档
tree = ET.parse('example.xml')
#获取根节点
root = tree.getroot()

#操作
#获取顶级标签
print(root.tag)

#遍历xml文档的第二层
for i in root:
    print(i.tag,i.attrib)
# 遍历所有的link节点，打印标签和内容，内容用text()  
    for j in i.iter('link'):
        print(j.tag,j.text)

#执行结果如下：
rss
channel {}
link http://planet.python.org/
link http://holdenweb.blogspot.com/...-data-analysis.html
link http://jugad2.blogspot.com/...-data-model.html
link http://www.pythondiary.com/...-object-databases.html

遍历xml文档中指定节点link内容

修改节点的内容

和上面configparser一样，所做的修改是在内存中进行，并不会对xml原始文档做修改，如果需要保存修改的配置，需要将内存里的数据写入到xml文件中。

也是分两种方式，解析字符串方式的修改保存，和直接解析文件方式的修改，保存。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

# 解析xml字符串
result_xml = open('example.xml','r').read()
#找到根节点
root = ET.XML(result_xml)

#顶级标签
print('顶级标签: %s'%root.tag)

#循环所有的'link'节点

for i in root:
    for j in i.iter('item'):
        #修改原有属性:
        j.text = str('http://www.jd.com')

        #添加属性:
        j.set('linkA','https://www.dbq168.com')
        j.set('linkB','http://www.sina.com.cn')

        #删除原有属性:
        del j.attrib['linkB']
        print(type(j.attrib))
        print(j.attrib)
#保存内存中的修改到一个新文件，example_new.xml
tree = ET.ElementTree(root)
tree.write('example_new.xlm',encoding='utf-8')

解析字符串方式修改保存

# #直接对解析xml文档
tree = ET.parse('example.xml')
#获取根节点
root = tree.getroot()

#顶级标签
print('顶级标签: %s'%root.tag)

for i in root:
    for j in i.iter('item'):
        #修改原有属性:
        j.text = str('http://www.jd.com')

        #添加属性:
        j.set('linkA','https://www.dbq168.com')
        j.set('linkB','http://www.sina.com.cn')

        #删除原有属性:
        del j.attrib['linkB']
        print(type(j.attrib))
        print(j.attrib)

#保存文件到example_new2.xml
tree.write('example_new2.xml',encoding='utf-8')

解析xml文件方式修改保存

删除节点，删除所有节点下的language

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

# 解析xml字符串
result_xml = open('example.xml','r').read()
#找到根节点
root = ET.XML(result_xml)

#顶级标签
print('顶级标签: %s'%root.tag)

#遍历
for i in root:
    for j in i.findall('language'):
        i.remove(j)

#保存文件:
tree = ET.ElementTree(root)
tree.write('example_new.xml',encoding='utf-8')

解析字符串方式删除节点，保存配置

from xml.etree import ElementTree as ET
# #直接对解析xml文档
tree = ET.parse('example.xml')
#获取根节点
root = tree.getroot()

#顶级标签
print('顶级标签: %s'%root.tag)

#遍历
for i in root:
    for j in i.findall('language'):
        i.remove(j)

#保存文件:

tree.write('example_new2.xml',encoding='utf-8')

解析xml文件方式删除节点，保存配置

(3) 创建xml文档

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

#创建根节点
root = ET.Element('company')

#创建节点的部门
ceo = ET.Element('manager',{'ceo':'xiaomage'})

#创建coo
coo = ET.Element('manager',{'coo':'xiaoyang'})

#创建cto
cto = ET.Element('manager',{'cto':'laowang'})

#给cto创建部门
dev = ET.Element('tech',{'dev':'xiaoli'})
ops = ET.Element('tech',{'ops':'xiaoqiang'})

cto.append(dev)
cto.append(ops)

#将manager添加到root中
root.append(ceo)
root.append(coo)
root.append(cto)

tree = ET.ElementTree(root)
tree.write('company.xml',encoding='utf-8',short_empty_elements=False)

#默认没有缩进：
<company><manager ceo="xiaomage"></manager><manager coo="xiaoyang"></manager><manager cto="laowang"><tech dev="xiaoli"></tech><tech ops="xiaoqiang"></tech></manager></company>

方式一

from xml.etree import ElementTree as ET

#创建根节点
root = ET.Element('company')

#创建高管
ceo = root.makeelement('manager',{'ceo':'xiaomage'})
coo = root.makeelement('manager',{'coo':'xiaoyang'})
cto = root.makeelement('manager',{'cto':'laowang'})

#创建技术部门
dev = cto.makeelement('tech',{'dev':'xiaoli'})
ops = cto.makeelement('tech',{'ops':'xiaoqiang'})

cto.append(dev)
cto.append(ops)

#添加部门到根节点
root.append(ceo)
root.append(coo)
root.append(cto)

tree = ET.ElementTree(root)
tree.write('company_v2.xml',encoding='utf-8',short_empty_elements=False)

#默认没有缩进：
<company><manager ceo="xiaomage"></manager><manager coo="xiaoyang"></manager><manager cto="laowang"><tech dev="xiaoli"></tech><tech ops="xiaoqiang"></tech></manager></company>

方式二创建

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: DBQ(Du Baoqiang)

from xml.etree import ElementTree as ET

#创建根节点
root = ET.Element('company')

#创建高管
ceo = ET.SubElement(root,'manager',attrib={'ceo':'xiaomage'})
coo = ET.SubElement(root,'manager',attrib={'coo':'xiaoyang'})
cto = ET.SubElement(root,'manager',attrib={'cto':'laowang'})
#创建技术部门
dev = ET.SubElement(cto,'tech',{'dev':'xiaoli'})
ops = ET.SubElement(cto,'tech',{'ops':'xiaoqiang'})

cto.append(dev)
cto.append(ops)


tree = ET.ElementTree(root)  #生成文档对象
tree.write('company_v3.xml',encoding='utf-8',short_empty_elements=False)






# #创建根节点
# root = ET.Element('company')
#
# #创建高管
# ceo = root.makeelement('manager',{'ceo':'xiaomage'})
# coo = root.makeelement('manager',{'coo':'xiaoyang'})
# cto = root.makeelement('manager',{'cto':'laowang'})
#
# #创建技术部门
# dev = cto.makeelement('tech',{'dev':'xiaoli'})
# ops = cto.makeelement('tech',{'ops':'xiaoqiang'})
#
# cto.append(dev)
# cto.append(ops)
#
# #添加部门到根节点
# root.append(ceo)
# root.append(coo)
# root.append(cto)
#
# tree = ET.ElementTree(root)
# tree.write('company_v2.xml',encoding='utf-8',short_empty_elements=False)



# #创建根节点
# root = ET.Element('company')
#
# #创建节点的部门
# ceo = ET.Element('manager',{'ceo':'xiaomage'})
#
# #创建coo
# coo = ET.Element('manager',{'coo':'xiaoyang'})
#
# #创建cto
# cto = ET.Element('manager',{'cto':'laowang'})
#
# #给cto创建部门
# dev = ET.Element('tech',{'dev':'xiaoli'})
# ops = ET.Element('tech',{'ops':'xiaoqiang'})
#
# cto.append(dev)
# cto.append(ops)
#
# #将manager添加到root中
# root.append(ceo)
# root.append(coo)
# root.append(cto)
#
# tree = ET.ElementTree(root)
# tree.write('company.xml',encoding='utf-8',short_empty_elements=False)

#默认没有缩进:
<company><manager ceo="xiaomage"></manager><manager coo="xiaoyang"></manager><manager cto="laowang"><tech dev="xiaoli"></tech><tech ops="xiaoqiang"></tech><tech dev="xiaoli"></tech><tech ops="xiaoqiang"></tech></manager></company>

方式三创建

默认保存的xml没有缩进，特别的难看，可以按照下面方式添加缩进：

from xml.etree import ElementTree as ET
from xml.dom import minidom

def prettify(string):
    '''
    将节点转换成字符串,并添加缩进
    :param string:
    :return:
    '''
    rough_string = ET.tostring(string,'utf-8')
    reparesd = minidom.parseString(rough_string)
    return reparesd.toprettyxml(indent='	')

#创建根节点
root = ET.Element('company')

#创建节点的部门
ceo = ET.Element('manager',{'ceo':'xiaomage'})

#创建coo
coo = ET.Element('manager',{'coo':'xiaoyang'})

#创建cto
cto = ET.Element('manager',{'cto':'laowang'})

#给cto创建部门
dev = ET.Element('tech',{'dev':'xiaoli'})
ops = ET.Element('tech',{'ops':'xiaoqiang'})

cto.append(dev)
cto.append(ops)

#将manager添加到root中
root.append(ceo)
root.append(coo)
root.append(cto)

string = prettify(root)

f = open('company.xml','w',encoding='utf-8')
f.write(string)
f.close()

#执行结果：
<?xml version="1.0" ?>
<company>
    <manager ceo="xiaomage"/>
    <manager coo="xiaoyang"/>
    <manager cto="laowang">
        <tech dev="xiaoli"/>
        <tech ops="xiaoqiang"/>
    </manager>
</company>

缩进

from xml.etree import ElementTree as ET
from xml.dom import minidom

def prettify(string):
    '''
    将节点转换成字符串,并添加缩进
    :param string:
    :return:
    '''
    rough_string = ET.tostring(string,'utf-8')
    reparesd = minidom.parseString(rough_string)
    return reparesd.toprettyxml(indent='	')

#创建根节点
root = ET.Element('company')

#创建高管
ceo = root.makeelement('manager',{'ceo':'xiaomage'})
coo = root.makeelement('manager',{'coo':'xiaoyang'})
cto = root.makeelement('manager',{'cto':'laowang'})

#创建技术部门
dev = cto.makeelement('tech',{'dev':'xiaoli'})
ops = cto.makeelement('tech',{'ops':'xiaoqiang'})

cto.append(dev)
cto.append(ops)

#添加部门到根节点
root.append(ceo)
root.append(coo)
root.append(cto)

#调用函数，转下
string = prettify(root)
#而后用open函数打开，在写入转换后的
f = open('company_v2.xml','w',encoding='utf-8')
f.write(string)
f.close()

三种创建方式都一样!

(4) 命令空间

from xml.etree import ElementTree as ET
from xml.dom import minidom

def prettify(string):
    '''
    将节点转换成字符串,并添加缩进
    :param string:
    :return:
    '''
    rough_string = ET.tostring(string,'utf-8')
    reparesd = minidom.parseString(rough_string)
    return reparesd.toprettyxml(indent='	')


ET.register_namespace('com','http://www.dbq168.com')

#
root = ET.Element('{http://www.dbq168.com}STUFF')
body = ET.SubElement(root,'{http://www.dbq168.com}MORE_STUFF', attrib={'{http://www.dbq168.com}hhh':'123'})
body.text = 'STUFF EVERYWHERE!'

string = prettify(root)
f = open('test.xml','w',encoding='utf-8')
f.write(string)
f.close()

命名空间

详细介绍

3. shutil模块

shutil是一种高级的文件操作工具，可操作的包括：文件、文件夹、压缩文件的处理。

操作方法：

shutil.copyfileobj(fsrc, fdst, [length=16*1024]) 将文件内容拷贝到另外一个文件

import shutil
shutil.copyfileobj(open('data.xml','r'),open('data_new.xml','w'))

#将会生成一个data_new.xml文件，内容和data.xml一模一样，有点类似于Shell中的cat data.xml > data_new.xml

shutil.copyfile(src, dst) 直接拷贝文件，如果dst存在的话，会被覆盖掉，谨慎操作。

import shutil
import os

shutil.copyfile('data.xml','data_new2.xml')

result = os.listdir(os.path.dirname(__file__))
for i in result:
    print(i)
#执行后会在当前目录下生成一个文件
data_new2.xml

shutil.copymode(src, dst) 拷贝文件权限，属主、属组和文件内容均不变。

#先查看文件权限：
os.system('ls -l data.xml')
-rw-r--r--  1 daniel  staff  690 Jun 20 19:15 data.xml

#查看data_new2.xml权限
os.system('ls -l data_new2.xml')
-rw-------  1 daniel  everyone  690 Jun 21 11:40 data_new2.xml

#拷贝权限试试
shutil.copymode('data.xml','data_new2.xml')
os.system('ls -l data_new2.xml')

#结果：权限变成和data.xml一样了，但是属主组这些不变
-rw-r--r--  1 daniel  everyone  690 Jun 21 11:40 data_new2.xml

shutil.copystat(src, dst) 拷贝文件状态信息，状态信息包括: atime , mtime， flags， mode bits

#先查看下两个文件的状态信息：
print(os.stat('data.xml'))
print(os.stat('data_new2.xml'))
#执行结果：
os.stat_result(st_mode=33188, st_ino=7289084, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=690, st_atime=1466480420, st_mtime=1466421339, st_ctime=1466421339)
os.stat_result(st_mode=33188, st_ino=7295271, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=12, st_size=690, st_atime=1466480420, st_mtime=1466480420, st_ctime=1466480490)


#拷贝状态信息data.xml到data_new2.xml, 可以看到mtime同步了
os.stat_result(st_mode=33188, st_ino=7289084, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=690, st_atime=1466480420, st_mtime=1466421339, st_ctime=1466421339)
os.stat_result(st_mode=33188, st_ino=7295271, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=12, st_size=690, st_atime=1466480420, st_mtime=1466421339, st_ctime=1466480937)

shutil.copy(src, dst) 拷贝文件和权限

shutil.copy('data.xml','data2.xml')

print(os.stat('data.xml'))
print(os.stat('data2.xml'))


#结果：uid，gid， 权限都一样
os.stat_result(st_mode=33188, st_ino=7289084, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=690, st_atime=1466481121, st_mtime=1466421339, st_ctime=1466421339)
os.stat_result(st_mode=33188, st_ino=7295817, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=690, st_atime=1466481121, st_mtime=1466481121, st_ctime=1466481121)

shutil.copy2(src, dst) 拷贝文件以及文件的状态信息

shutil.copy2('data.xml','data3.xml')

print(os.stat('data.xml'))
print(os.stat('data3.xml'))

#执行结果，atime, mtime一样
os.stat_result(st_mode=33188, st_ino=7289084, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=690, st_atime=1466481242, st_mtime=1466421339, st_ctime=1466421339)
os.stat_result(st_mode=33188, st_ino=7295871, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=690, st_atime=1466481242, st_mtime=1466421339, st_ctime=1466481242)

shutil.ignore_patters(*patterns) 忽略某匹配模式的文件拷贝，生成一个函数，可以作为调用copytree()的ignore参数。

需要结合shutil.copytree(src , dst,symlinks=False, ignore=None)

import shutil
import os
#先看dir1下面有什么文件：
result = os.listdir('dir1')
for i in result:
    print(i)

__init__.py
company.xml
company_v2.xml
company_v3.xml
data.tar
data.xml
data.zip
data2.xml
data3.xml
data_new
data_new.xml
data_new2.xml
example.xml
example_new.xlm
example_new.xml
example_new2.xml
my.cnf
test_popen
test_popen_2


#那就copy的时候排除xml后缀的文件
shutil.copytree('dir1','dir2', ignore=shutil.ignore_patterns('*.xml'))
#注意，目标文件dir2不能存在，否则会报错！
result = os.listdir('dir2')
for i in result:
    print(i)

#结果：
__init__.py
data.tar
data.zip
data_new
example_new.xlm
my.cnf
test_popen
test_popen_2

#默认的symlinks为False，如果dir1中有一个软链文件的话test.py：
>>> os.system('ls -l dir1')
total 4
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.txt
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.xml
drwxr-xr-x 3 root root 4096 Jun 21 13:27 2
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.txt
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.xml
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.txt
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.xml
lrwxrwxrwx 1 root root   12 Jun 21 13:27 test.py -> /tmp/test.py

#默认的拷贝文件夹树
>>> shutil.copytree('dir1','dir2',ignore=shutil.ignore_patterns('*.xml'))
'dir2'

#查看dir2
>>> os.system('ls -l dir2')
total 4
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.txt
drwxr-xr-x 3 root root 4096 Jun 21 13:27 2
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.txt
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.txt
-rw-r--r-- 1 root root    0 Jun 16 10:59 test.py   #把软链文件直接复制一份过来了



#使用symlinks参数
>>> shutil.copytree('dir1','dir3',symlinks=True,ignore=shutil.ignore_patterns('*.xml'))
'dir3'
>>> os.system('ls -l dir3')
total 4
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 1.txt
drwxr-xr-x 3 root root 4096 Jun 21 13:27 2
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 2.txt
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.mp3
-rw-r--r-- 1 root root    0 Jun 21 12:09 3.txt
lrwxrwxrwx 1 root root   12 Jun 21 13:27 test.py -> /tmp/test.py   #软链也带过来了

shutil.rmtree(path[, ignore_error[,noerror]]) 递归方式删除文件

>>> shutil.rmtree('dir3')
>>> os.listdir('.')
['dir2', 'dir1'] 
#dir3目录已经被删除了

shutil.move(src, dst) 递归方式移动(重命名)文件/文件夹，和shell中的mv命令一样

>>> os.listdir('.')
['dir2', 'dir1']
>>> shutil.move('dir2','/tmp/dir_2')
'/tmp/dir_2'
>>> os.listdir('.')
['dir1']
>>> os.listdir('/tmp')
['dir_2', 'test.py', '1_bak', 'hsperfdata_root']

shutil.make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,dry_run=0, owner=None, group=None, logger=None)

创建压缩包并返回文件的路径

base_name: 压缩包名，也可以指定文件存放的绝对路径，如果不指定路径名，压缩包存放在当前路径。

format：压缩包种类，如 zip, tar, bztar, gztar

root_dir: 要压缩文件/文件夹的路径，默认当前路径

owner：属主，默认当前用户

group: 属组，默认当前用户组

logger：记录日志，通常是logging.Logger对象

>>> shutil.make_archive('dir1','gztar',base_dir='dir1')
'dir1.tar.gz'
#查看压缩的
>>> os.listdir('.')
['dir1.tar.gz', 'dir1']

#指定绝对路径
>>> shutil.make_archive('/tmp/dir1','gztar',base_dir='dir1')
'/tmp/dir1.tar.gz'
>>> os.listdir('/tmp')
['dir_2', 'dir1.tar.gz', 'test.py', '1_bak', 'hsperfdata_root']

ZipFile模块是处理zip压缩包的模块，用于压缩和解压，添加和列出压缩包的内容。ZipFile是主要的类，其过程是讲每个文件以及相关的信息作为ZipInfo对象操作到ZipFile的成员，包括文件头信息和文件内容。ZipFile模块详细介绍，点我

tarfile模块可以用于压缩/解压tar类型文件，并且支持进一步压缩tar.gz和tar.bz2文件，主要是调用open函数打开文件，并返回TarFile实例。Tarfile模块详细介绍，点我

其实shutil对压缩包的处理就是调用ZipFile 和 TarFile两个模块来处理的。

import zipfile

#压缩
zip_file = zipfile.ZipFile('test.zip','w')

zip_file.write('s2_xml.py')
zip_file.write('s3_xml2.py')
zip_file.close()
#执行完上面代码后，会发现在当前目录生辰搞一个test.zip文件



#解压缩:
zf = zipfile.ZipFile('test.zip','r')
zf.extractall()
zf.close()

#手动删除原有的s2_xml.py s3_xml2.py之后，执行上述代码，而后两个文件又恢复了。。。

如果要解压单个文件：

#想要解压缩某一个文件，首先要知道压缩包内包含什么文件。
zf = zipfile.ZipFile('test.zip','r')
#先使用namelist()方法来查看压缩包内的文件，返回一个列表的形式
result = zf.namelist()
for i in result:
    print(i)

#for循环，查看执行结果：
s2_xml.py
s3_xml2.py

#解压s2_xml.py出来，先删除当前目录下的这个文件：
#而后使用extract()方法解压
zf.extract('s2_xml.py')

tarfile

import tarfile

#打包

tar_file = tarfile.open('test.tar','w')
tar_file.add('s3_xml2.py',arcname='s3_xml2.py')
tar_file.add('s4_xml.py',arcname='wobugaosuni.py')
tar_file.close()

#arcname是在包里的文件名字，打包后的文件名字可以和源文件名字不同

#删除原有两个文件,而后解压
tar_file = tarfile.open('test.tar','r')
tar_file.extractall()
tar_file.close()

#注意上面的s4_xml.py在压缩包内的名字是wobugaosuni.py, 解压后的文件，也是这个名字

#解压归档包里的某一个文件，还是首先要查看包里包含什么文件。使用getnames()方法，返回一个列表，而后使用extract()解压所需的单个文件
tar_file = tarfile.open('test.tar','r')

tar_file.extract('s3_xml2.py')

tar_file.close()

4. subprocess模块

subprecess模块允许你产生新的进程，然后连接他们的输入、输出、错误管道，并获取返回值。

启动一个子进程的方式是使用便捷函数。对于更高级的使用场景当便捷函数不能满足需求是，可以使用底层的Popen接口。

subprocess.call(args, * , stdin=None, stdout=None, stderr=None,shell=False) 执行args命令，等待命令完成，然后返回状态码。

import subprocess

result = subprocess.call(['ls','-l'])  #默认shell为False，传入命令带参数的时候也是需要使用列表元素来传入。
print(result)
#可将shell 置为True，使用和shell下一样的方式：
result = subprocess.call('ls -l',shell=True)
print(result)

subprocess.check_call(args, *, stdin=None,stdout=None,stderr=None,shell=False)执行带参数的命令，等待命令完成，如果状态码是0则返回0，否则则抛出CalleProcessError的异常。

result = subprocess.check_call('ls -l',shell=True)
#正常执行后返回执行结果
print(result)

#执行一个不存在的命令，则抛出异常：
result1 = subprocess.check_call('sb',shell=True)
print(result1)

.....
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'sb' returned non-zero exit status 127

subprocess.check_output(args, *, stdin=None,stdout=None,stderr=None,shell=False,universal_newlines=False)

执行带参数的命令并将它的输出作为字字节字符串返回。如果返回值非0，抛出异常CalledProcessError。

>>> subprocess.check_output(['ls','-l'])
b'total 8
drwxr-xr-x 3 root root 4096 Jun 21 13:27 dir1
-rw-r--r-- 1 root root  276 Jun 21 13:55 dir1.tar.gz
'

#返回非0，抛异常
>>> subprocess.check_output(['ls','-l','/sb'])
ls: cannot access /sb: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/subprocess.py", line 620, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['ls', '-l', '/sb']' returned non-zero exit status 2

class subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)

subprocess.Popen() Popen构造函数接受大量可选参数。对于大部分典型的使用场景，这些参数中的许多个可以安全的保持为默认值。常用参数：

args: shell执行命令，默认shell为False，传入的args必须是序列类型数据，如元组或列表。
bufsize：指定缓冲，0为无，1按行缓冲，其它的正数表示使用缓冲区的大小，负值表示使用系统缓冲的默认值，通常表示完全缓存，默认值为0。
stdin, stdout, stderr：分别指定程序的标准输入、标准输出、标准错误的文件句柄。合法的值有PIPE、一个已经存在的文件描述符(一个正整数)、一个已经存在的文件对象和None。PIPE表示应该为子进程创建一个新的管道。如果默认设置None，则不会发生重定向；子进程的文件句柄从父进程继承。另， stderr可以为STDOUT，它表示来自子进程中标准错误的数据应该被捕获到标准输出相同的文件中。
shell：默认为False，指定是否使用shell来执行程序。如果为真，则建议传递一个字符串，而不是一个序列。
preexec_fn ：如果设定为一个可调用对象，该对象将在子进程中在子进程执行之前调用。（只在*nix上）
close_fn ：如果为真，所有的描述符除了0，1，和2之外将在执行子进程之前关闭（只在*nix上）。在windows上为真，那么子进程不会继承任何句柄。注意在windows上你不可以设置close_fds为真并同时设置stdin、stdout或者stderr重定向标准句柄。
cwd：如果不为None，那么子进程当前目录在其执行之前将改变为cwd。注意在搜索可执行程序时，该目录不会考虑在内，所以你不可以指定相对cwd的程序路径。
env：如果不为None，它必须是一个定义了新进程环境变量的映射，这些环境变量将被使用而不是继承当前进程的环境这种默认的行为。
eniversal_newlines: 如果为真，那么文件对象stdout和stderr作为文本文件以universal newlines模式打开。每一行的终止符可能是*nix风格的' '，旧时的Macintosh风格的‘ ' 或者windows的' '. 所有这些外部的表示在Python程序看来都是' '。
startupinfo：如果给出，将是一个STARTUPINFO对象，它将传递给底层CreateProcess函数。creationflags如果给出，可以是CREATE_NEW_CONSOLE或者CREATE_NEW_PROCESS_GROUP。（只在windows上）

输入python3，进入解释器，然后输出一个hello world

import subprocess

#将标准输入、输出、错误使用PIPE管道
obj = subprocess.Popen('Python',stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE,universal_newlines=True)

#而后写入stdin
obj.stdin.write('print("hello world")
')
obj.stdin.write('print("hello world agin")
')
obj.stdin.close()

#读取标准输出，而后关闭
out = obj.stdout.read()
obj.stdout.close()

#读取标准错误，而后关闭
err = obj.stderr.read()
obj.stderr.close()

#打印
print(out)
print(err)

Popen.communicate(input=None)
与进程交互：将数据发送到标准输入。从标准输出和标准错误读取数据，直至到达文件末尾。等待进程终止。可选的input 参数应该是一个要发送给子进程的字符串，如果没有数据要发送给子进程则应该为None。

communicate()返回一个元组(stdoutdata, stderrdata)。

注意如果你需要发送数据到进程的标准输入，你需要以stdin=PIPE创建Popen对象。类似地，在结果的元组中若要得到非None的数据，你还需要给出stdout=PIPE和/或stderr=PIPE。

Note 注意读取的数据是缓存在内存中的，所以如果数据大小很大或者无限制请不要使用这个方法。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from subprocess import Popen, PIPE
from tempfile import TemporaryFile
import json
import os
import time

data = []

def get_all_mountpoint():
    raw_data = Popen(['df', '-P'], stdout=PIPE, stderr=PIPE).communicate()[0].splitlines()
    mountpoints = []
    for line in raw_data:
        if line:
            element = line.split()[5]
            if element.startswith('/'):
                mountpoints.append(element)
    return mountpoints

二. 面向对象上篇

点我