python基础之读取xml

python怎么操作xml文件详细介绍链接：https://www.jb51.net/article/50812.htm

从结构上来说，xml很像常见的HTML超文本标记语言。不过超文本语言被设计用来显示数据，其焦点是数据的外观。xml被设计用来传输和存储数据，其焦点是数据的内容。

特征：

1. 标签对组成：<TEST></TEST>

2. 标签可以有属性<TEST Loop="1"></TEST>

3. 标签可以嵌入数据：<TEST>CPU</TEST>

4. 标签可以嵌入子标签（具有层级关系）

Python读取xml

import xml.dom.minidom

打开xml文件：xml.dom.minidom.parse()

每个节点都有nodeName, nodeValue, nodeType，nodeName为节点名字，nodeValue是节点的值，只对文本节点有效。catalog是ELEMENT_NODE类型

现在有以下几种：

'ATTRIBUTE_NODE'
'CDATA_SECTION_NODE'
'COMMENT_NODE'
'DOCUMENT_FRAGMENT_NODE'
'DOCUMENT_NODE'
'DOCUMENT_TYPE_NODE'
'ELEMENT_NODE'
'ENTITY_NODE'
'ENTITY_REFERENCE_NODE'
'NOTATION_NODE'
'PROCESSING_INSTRUCTION_NODE'
'TEXT_NODE'

举个例子，有这样一份xml：

abc.xml

<?xml version="1.0" encoding="utf-8"?>
<catalog>
    <maxid>4</maxid>
    <login username="pytest" passwd='123456'>
        <caption>Python</caption>
        <item id="4">
            <caption>测试</caption>
        </item>
    </login>
    <item id="2">
        <caption>Zope</caption>
    </item>
</catalog>

View Code

读取根节点：

from xml.dom.minidom import parse


def read_xml_root_node(xml_path):
    dom = parse(xml_path)
    root = dom.documentElement
    return root


if __name__ == "__main__":
    root_node = read_xml_root_node("abc.xml")
    print(root_node.nodeName)
    print(root_node.nodeType)

View Code

输出结果：

catalog
1

为什么打印出来的类型是1呢，1代表什么呢。参考nodeType。

获取子节点以及value：

from xml.dom.minidom import parse


def read_xml_root_node(xml_path):
    dom = parse(xml_path)
    root = dom.documentElement
    return root


def read_child_label(node, label_name):
    child = node.getElementsByTagName(label_name)
    return child


if __name__ == "__main__":
    root_node = read_xml_root_node("abc.xml")
    print(root_node.nodeName)
    print(root_node.nodeType)
    child_nodes = read_child_label(root_node, "maxid")
    for child_node in child_nodes:
        print(child_node.nodeName)
        print(child_node.nodeType)
        print(child_node.childNodes[0].nodeValue)

View Code

输出结果：

catalog
1
maxid
1
4

获取标签属性

from xml.dom.minidom import parse


def read_xml_root_node(xml_path):
    dom = parse(xml_path)
    root = dom.documentElement
    return root


def read_child_label(node, label_name):
    child = node.getElementsByTagName(label_name)
    return child


def read_attribute(node, attr_name):
    attribute = node.getAttribute(attr_name)
    return attribute


if __name__ == "__main__":
    root_node = read_xml_root_node("abc.xml")
    print(root_node.nodeName)
    print(root_node.nodeType)
    child_nodes_login = read_child_label(root_node, "login")
    for child_node in child_nodes_login:
        attr_username = read_attribute(child_node, "username")
        print(attr_username)

View Code

输出结果：

catalog
1
pytest

另一种模块读取xml的方法，可以遍历指定标签下的子标签

from xml.etree import ElementTree as ET


per = ET.parse("abc.xml")
p = per.findall("./login/item")

for opener in p:
    for child in opener.getchildren():
        print(child.tag, ":", child.text)


p = per.findall("./item")

for oneper in p:
    for child in oneper.getchildren():
        print(child.tag, ":", child.text)

View Code

输出结果：

caption : 测试
caption : Zope