如何解析简单的xml文档?

需求:
xml是一种十分常用的标记性语言,可提供统一的方法来描述应用程序的结构化数据:

centos_x86_6.4

由字母和数字组成,不能包含空格

b9dcdd92-9b9b-14d6-3938-1982a9746a12
2097152
 #由字母和数字组成,不能包含空格
2097152
1

hvm
python中如何解析xml文件?

思路:
使用标准库中的xml.etree.ElementTree,其中的parse函数可以解析xml文档

代码:

kvm.xml:
<domain type='kvm'>
  <name>centos_x86_6.4</name>
  #由字母和数字组成,不能包含空格
  <uuid>b9dcdd92-9b9b-14d6-3938-1982a9746a12</uuid>
  <memory unit='KiB'>2097152</memory>
 #由字母和数字组成,不能包含空格
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-1.2'>hvm</type>
    #type 表示全虚拟化还是半虚拟化,hvm表示全虚拟化
    <boot dev='hd'/>
  #boot 怎么启动的,如"fd"表示从文件启动, "hd"从硬盘启动, "cdrom"从光驱启动 和 "network"从网络启动 #可以重复多行,指定不同的值,作为一个启动设备列表。 #The dev attribute takes one of the values "fd", "hd", "cdrom" or "network"
  </os>
  #处理器特性
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features> 
  <clock offset='localtime'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
#Guest需要的设备
   <emulator>/bin/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      #目的镜像路径 在这个例子中,在guest中显示为IDE设备。
      <source file='/home/template_make/centos_x86_6.4.img'>
        <seclabel model='selinux' relabel='no'/>
      </source>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/template_make/CentOS-6.4-x86_64-bin-DVD1.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
    #虚拟机网络连接方式
      <mac address='52:54:00:78:f9:5a'/>
      <source bridge='br0'/>
      <target dev='vnet27'/>
      ## 使用virtio: 采用普通的驱动,即硬盘和网卡都采用默认配置情况下,硬盘是 ide 模式, 而网卡工作在 模拟的rtl 8139 网卡下,速度为100M 全双工。 采用 virtio 驱动后,网卡工作在 1000M 的模式下,硬盘工作是SCSI模式下
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    #vnc方式登录,端口号自动分配 可以通过virsh vncdisplay来查询[vncdisplay domainId]
    <graphics type='vnc' port='5915' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>unconfined_u:system_r:svirt_t:s0:c362,c396</label>
    <imagelabel>unconfined_u:object_r:svirt_image_t:s0:c362,c396</imagelabel>
  </seclabel>
</domain>

=========================================================================================
In [1]: from xml.etree.ElementTree import parse

In [3]: f = open('kvm.xml')

In [4]: et = parse(f)

In [5]: root = et.getroot()

In [6]: root
Out[6]: <Element 'domain' at 0x7f88ac6448b8>

In [7]: root.tag
Out[7]: 'domain'

In [8]: root.attrib
Out[8]: {'type': 'kvm'}

In [9]: root.text
Out[9]: '
  '

In [10]: root.text.strip()
Out[10]: ''

In [11]: root.getchildren  # 获取子元素
Out[11]: <function Element.getchildren()>

In [12]: root.getchildren()
/usr/bin/ipython:1: DeprecationWarning: This method will be removed in future versions.  Use 'list(elem)' or iteration over elem instead.
  #!/usr/local/python3/bin/python3.7
Out[12]:
[<Element 'name' at 0x7f88ac644b88>,
 <Element 'uuid' at 0x7f88ac6445e8>,
 <Element 'memory' at 0x7f88ac6449f8>,
 <Element 'currentMemory' at 0x7f88ac6446d8>,
 <Element 'vcpu' at 0x7f88ac644548>,
 <Element 'os' at 0x7f88ac644728>,
 <Element 'features' at 0x7f88ac644f48>,
 <Element 'clock' at 0x7f88ac644098>,
 <Element 'on_poweroff' at 0x7f88ac6444a8>,
 <Element 'on_reboot' at 0x7f88ac6440e8>,
 <Element 'on_crash' at 0x7f88ac644638>,
 <Element 'devices' at 0x7f88ac644f98>,
 <Element 'seclabel' at 0x7f88ac9f6ea8>]

In [13]: for child in root:
    ...:     print(child.get('name'))
    ...:
None
None
None
None
None
None
None
None
None
None
None
None
None

In [14]: for child in root:
    ...:     print(child)
    ...:
    ...:
<Element 'name' at 0x7f88ac644b88>
<Element 'uuid' at 0x7f88ac6445e8>
<Element 'memory' at 0x7f88ac6449f8>
<Element 'currentMemory' at 0x7f88ac6446d8>
<Element 'vcpu' at 0x7f88ac644548>
<Element 'os' at 0x7f88ac644728>
<Element 'features' at 0x7f88ac644f48>
<Element 'clock' at 0x7f88ac644098>
<Element 'on_poweroff' at 0x7f88ac6444a8>
<Element 'on_reboot' at 0x7f88ac6440e8>
<Element 'on_crash' at 0x7f88ac644638>
<Element 'devices' at 0x7f88ac644f98>
<Element 'seclabel' at 0x7f88ac9f6ea8>

In [15]: for child in root:
    ...:     print(child.name)
    ...:
    ...:
    ...:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-e8a3aa266c61> in <module>
      1 for child in root:
----> 2     print(child.name)
      3
      4
      5

AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'name'

In [16]: for child in root:
    ...:     print(child.get())
    ...:
    ...:
    ...:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-18e15cbfadbf> in <module>
      1 for child in root:
----> 2     print(child.get())
      3
      4
      5

TypeError: get() missing required argument 'key' (pos 1)

In [17]: root.find('Element')

In [18]:

In [18]: root.find('devices')  # 寻找第一个子元素
Out[18]: <Element 'devices' at 0x7f88ac644f98>

In [19]: root.findall('country')
Out[19]: []

In [20]: root.findall('devices')  # 寻找所有的包含devices的子元素
Out[20]: [<Element 'devices' at 0x7f88ac644f98>]

In [21]: root.iterfind('devices') # 返回一个生成器
Out[21]: <generator object prepare_child.<locals>.select at 0x7f88aca1e9a8>

In [22]: for e in root.iterfind('devices'):print(e)
<Element 'devices' at 0x7f88ac644f98>

In [23]: root.findall('disk')
Out[23]: []

In [24]: root.iter()  # 寻找所有的元素,包括子元素和孙元素
Out[24]: <_elementtree._element_iterator at 0x7f88ac5864c0>

In [25]: list(root.iter())
Out[25]:
[<Element 'domain' at 0x7f88ac6448b8>,
 <Element 'name' at 0x7f88ac644b88>,
 <Element 'uuid' at 0x7f88ac6445e8>,
 <Element 'memory' at 0x7f88ac6449f8>,
 <Element 'currentMemory' at 0x7f88ac6446d8>,
 <Element 'vcpu' at 0x7f88ac644548>,
 <Element 'os' at 0x7f88ac644728>,
 <Element 'type' at 0x7f88ac644318>,
 <Element 'boot' at 0x7f88ac644a48>,
 <Element 'features' at 0x7f88ac644f48>,
 <Element 'acpi' at 0x7f88ac644278>,
 <Element 'apic' at 0x7f88ac644908>,
 <Element 'pae' at 0x7f88ac644db8>,
 <Element 'clock' at 0x7f88ac644098>,
 <Element 'timer' at 0x7f88ac644e08>,
 <Element 'timer' at 0x7f88ac6441d8>,
 <Element 'on_poweroff' at 0x7f88ac6444a8>,
 <Element 'on_reboot' at 0x7f88ac6440e8>,
 <Element 'on_crash' at 0x7f88ac644638>,
 <Element 'devices' at 0x7f88ac644f98>,
 <Element 'emulator' at 0x7f88ac644cc8>,
 <Element 'disk' at 0x7f88ac644e58>,
 <Element 'driver' at 0x7f88adac1ea8>,
 <Element 'source' at 0x7f88adac1318>,
 <Element 'seclabel' at 0x7f88adac1d68>,
 <Element 'target' at 0x7f88accd29a8>,
 <Element 'alias' at 0x7f88accd2cc8>,
 <Element 'address' at 0x7f88accd2458>,
 <Element 'disk' at 0x7f88accd2db8>,
 <Element 'driver' at 0x7f88acc91e08>,
 <Element 'source' at 0x7f88acc914a8>,
 <Element 'target' at 0x7f88acc91408>,
 <Element 'readonly' at 0x7f88acc91db8>,
 <Element 'alias' at 0x7f88acc91d68>,
 <Element 'address' at 0x7f88acc915e8>,
 <Element 'controller' at 0x7f88adaaaae8>,
 <Element 'alias' at 0x7f88adaaa728>,
 <Element 'address' at 0x7f88adaaa408>,
 <Element 'controller' at 0x7f88adaaac78>,
 <Element 'alias' at 0x7f88adaaa4f8>,
 <Element 'address' at 0x7f88aca04138>,
 <Element 'interface' at 0x7f88aca04188>,
 <Element 'mac' at 0x7f88aca04228>,
 <Element 'source' at 0x7f88adb59728>,
 <Element 'target' at 0x7f88adb594f8>,
 <Element 'model' at 0x7f88adb59ea8>,
 <Element 'alias' at 0x7f88adb59ae8>,
 <Element 'address' at 0x7f88adb59a98>,
 <Element 'input' at 0x7f88adb64458>,
 <Element 'graphics' at 0x7f88adb64b88>,
 <Element 'listen' at 0x7f88adb64408>,
 <Element 'video' at 0x7f88adb64098>,
 <Element 'model' at 0x7f88adb64db8>,
 <Element 'alias' at 0x7f88adb647c8>,
 <Element 'address' at 0x7f88adb64f48>,
 <Element 'memballoon' at 0x7f88adb64958>,
 <Element 'alias' at 0x7f88adb64048>,
 <Element 'address' at 0x7f88adb64138>,
 <Element 'seclabel' at 0x7f88ac9f6ea8>,
 <Element 'label' at 0x7f88ac9f6d68>,
 <Element 'imagelabel' at 0x7f88ac9f6f48>]

In [26]: root.iter('disk')
Out[26]: <_elementtree._element_iterator at 0x7f88ac590ca8>

In [27]: list(root.iter('disk'))
Out[27]: [<Element 'disk' at 0x7f88ac644e58>, <Element 'disk' at 0x7f88accd2db8>]

In [28]: root.findall('emulator/*')
Out[28]: []

In [29]: root.findall('devices/*')  # 寻找子元素devices下面的所有孙元素
Out[29]:
[<Element 'emulator' at 0x7f88ac644cc8>,
 <Element 'disk' at 0x7f88ac644e58>,
 <Element 'disk' at 0x7f88accd2db8>,
 <Element 'controller' at 0x7f88adaaaae8>,
 <Element 'controller' at 0x7f88adaaac78>,
 <Element 'interface' at 0x7f88aca04188>,
 <Element 'input' at 0x7f88adb64458>,
 <Element 'graphics' at 0x7f88adb64b88>,
 <Element 'video' at 0x7f88adb64098>,
 <Element 'memballoon' at 0x7f88adb64958>]

In [30]: root.findall('.//video')  # 可以寻找孙元素,哪怕不是在root根的直接元素下面。
Out[30]: [<Element 'video' at 0x7f88adb64098>]

In [31]: root.findall('.//video/..') # 寻找孙元素的父元素
Out[31]: [<Element 'devices' at 0x7f88ac644f98>]

In [32]: root.findall('vcps[@placement]')
Out[32]: []

In [33]: root.findall('vcpu[@placement]')  # 寻找某个元素包含属性placement的
Out[33]: [<Element 'vcpu' at 0x7f88ac644548>]

In [35]: root.findall('vcpu[@placement="static"]') # 寻找某个元素包含属性placement为特定值的
Out[35]: [<Element 'vcpu' at 0x7f88ac644548>]

In [36]: root.findall('os[type]') # 寻找包含type这个孙元素的名为os的子元素。
Out[36]: [<Element 'os' at 0x7f88ac644728>]

In [37]: root.findall('os[type="hvm"]')
Out[37]: [<Element 'os' at 0x7f88ac644728>]

In [38]: root.findall('name')
Out[38]: [<Element 'name' at 0x7f88ac644b88>]

In [39]: root.findall('name[1]') # 寻找到的元素中的第一个
Out[39]: [<Element 'name' at 0x7f88ac644b88>]

In [40]: root.findall('name[2]')
Out[40]: []

In [41]: root.findall('name[last()]') # 倒数第一个
Out[41]: [<Element 'name' at 0x7f88ac644b88>]

In [42]: root.findall('name[last()-1]') # 倒数第二个
Out[42]: []

原文地址:https://www.cnblogs.com/Richardo-M-Q/p/13338090.html