xml模块

1、xml介绍

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。
现在这种格式的文件比较少了，但是还是存在的，所以大家简单了解一下，以备不时之需。

xml文件格式

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

2、对xml的增删改查简单操作（增删改查）

在进行操作之前，都应该进行这两步：

import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')  # 形成树形结构
root = tree.getroot()  # 得到树的根系
print(root)

# 循环打印：
for i in root:
    print(i)
# <Element 'country' at 0x00000196B51191D8>
# <Element 'country' at 0x00000196B5124B88>
# <Element 'country' at 0x00000196B5124D18>

所有的增删改查都是基于这个root根系去操作

2.1、查 iter和findall：
1）、全文搜索 year 将所有的year标签全部找

print(root.iter('year'))
print([i for i in root.iter('year')])

2）、只找第一个，找到就返回

print(root.find('country'))

3）、在root的子节点找，找所有的

print(root.findall('country'))

练习

找到标签也可以找到标签相应的内容：tag,attrib,text

1)、找所有的rank标签，以及 attrib 和 text (这里利用列表推导式比较方便)

print([i for i in root.iter('rank')])
#[<Element 'rank' at 0x000001367D0D49F8>, <Element 'rank' at 0x000001367D0D4BD8>, <Element 'rank' at 0x000001367D0D4D68>]
print([i.attrib for i in root.iter('rank')])
#[{'updated': 'yes'}, {'updated': 'yes'}, {'updated': 'yes'}]
print([i.text for i in root.iter('rank')])  # ['2', '5', '69']

2)、找到第二个country的 neighbor标签以及他的属性

print([tag for tag in root.findall('country')][1].find('neighbor').attrib)
#{'direction': 'N', 'name': 'Malaysia'}

2.2、增 append

import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')  # 形成树形结构
root = tree.getroot()  # 得到树的根系

给 year 大于2010年的所有标签下面添加一个month标签，属性为name:month 内容为30days

for country in root.findall('country'):
    for year in country.findall('year'):
        if int(year.text) > 2010:
            month = ET.Element('month')
            month.text = '30days'
            month.attrib = {'name': 'month'}
            country.append(month)
tree.write('b.xml')

2.3、改 set

import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')  # 形成树形结构
root = tree.getroot()  # 得到树的根系
对所有的year属性以及值进行修改
for node in root.iter('year'):
    new_year=int(node.text)+1
    node.text=str(new_year)
    node.set('updated','yes')
    node.set('version','1.0')
tree.write('test.xml')

2.4、删 remove

import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')  # 形成树形结构
root = tree.getroot()  # 得到树的根系

# 将 rank值大于50的country标签删除
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)

tree.write('output.xml')

3、自己创建xml文档

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml, "name", attrib={"enrolled": "yes"})
age = ET.SubElement(name, "age", attrib={"checked": "no"})
sex = ET.SubElement(name, "sex")
sex.text = '33'
name2 = ET.SubElement(new_xml, "name", attrib={"enrolled": "no"})
age = ET.SubElement(name2, "age")
age.text = '19'

et = ET.ElementTree(new_xml)  # 生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True)

ET.dump(new_xml)  # 打印生成的格式