Java读取XML文件

《Java眼中的XML---文件读取》

1.XML

XML 指可扩展标记语言(EXtensible Markup Language)，XML 被设计用来传输和存储数据。

XML 仅仅是纯文本，XML 没什么特别的。它仅仅是纯文本而已。有能力处理纯文本的软件都可以处理 XML。

通过 XML 可以发明自己的标签，XML 没有预定义的标签。

XML 数据以纯文本格式进行存储，因此提供了一种独立于软件和硬件的数据存储方法。这让创建不同应用程序可以共享的数据变得更加容易。XML 简化数据传输，通过 XML，可以在不兼容的系统之间轻松地交换数据。

XML文件格式：

<?xml version="1.0" encoding="UTF-8" ?>
<rootNode>
    <childNode1 attr="attrVal">
        <subChild>...</subChild>
    </childNode1>
    <childNode2 attr="attrVal"></childNode2>
</rootNode>

XML对大小写敏感。所有元素必须关闭标签。标签必须正确嵌套。

XML 文档必须有一个元素是所有其他元素的父元素。该元素称为根元素。

在 XML 中，有 5 个预定义的实体引用：<(<) >(>) &(&) '(') "(")

XML 元素指的是从（且包括）开始标签直到（且包括）结束标签的部分。

XML 元素可以在开始标签中包含属性，类似 HTML。属性 (Attribute) 提供关于元素的额外（附加）信息。XML 属性必须加引号。

属性与元素：元数据（有关数据的数据）应当存储为属性，而数据本身应当存储为元素。

2.JAVA读取XML文件

Java读取XML文件，有四种方式：

1.DOM方式解析XML文件

DOM （Document Object Model，文档对象模型）定义了访问和操作文档的标准方法。

XML DOM (XML Document Object Model) 定义了访问和操作 XML 文档的标准方法。

DOM 把 XML 文档作为树结构来查看。能够通过 DOM 树来访问所有元素。可以修改或删除它们的内容，并创建新的元素。元素，它们的文本，以及它们的属性，都被认为是节点。

<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book category="COOKING" id="123">
  <title lang="en">Everyday Italian</title> 
  <author>Giada De Laurentiis</author> 
  <year>2005</year> 
  <price>30.00</price> 
</book>
<book category="CHILDREN">
  <title lang="en">Harry Potter</title> 
  <author>J K. Rowling</author> 
  <year>2005</year> 
  <price>29.99</price> 
</book>
</bookstore>

books.xml

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class DomTest {
    public static void main(String[] args) {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse("books.xml");

            // 通过Tag获取节点，获得所有book节点的集合
            NodeList bookList = doc.getElementsByTagName("book");
            // 2个节点  代表2本书
            System.out.println("一共有" + bookList.getLength() + "本书。");
            
            // 获取文档根节点
            Element root = doc.getDocumentElement();
            NodeList bookList1 = root.getChildNodes();
            System.out.println("有" + bookList1.getLength() + "个节点");
            /*  5个节点   因为text也是节点
                <bookstore>
                1.(text)
                2.(<book>...</book>)
                3.(text)
                4.(<book>...</book>)
                5.(text)
                </bookstore>
            */
            int index = 0;
            for (int i = 0; i < bookList1.getLength(); i++) {
                // 通过item(i)方法获取第i个节点
                Node node = bookList1.item(i);
                
                // 获取所有名字为book的节点
                if ("book".equals(node.getNodeName())) {
                    System.out.println("———第" + (++index) + "本书的信息————");
                    
                    // 通过getAttributes()获取节点的所有属性
                    NamedNodeMap attrs = node.getAttributes();
                    System.out.print("所有属性:");
                    for (int j = 0; j < attrs.getLength(); j++) {
                        Node attr = attrs.item(j);
                        if (j != 0) System.out.print(",");
                        System.out.print(attr.getNodeName() + "(" + attr.getNodeValue() + ")");
                    }
                    System.out.println();
                    
                    // 也可以直接通过属性名获取属性值
                    // 将node强制转换为Element
                    Element book = (Element)node;
                    System.out.println("属性之种类: " + book.getAttribute("category"));
                    
                    // 继续获取book节点的子节点
                    NodeList childNodes = node.getChildNodes();
                    for (int k = 0; k < childNodes.getLength(); k++) {
                        // 只获取element类型的node，不要text
                        Node childNode = childNodes.item(k); 
                        if (childNode.getNodeType() == Node.ELEMENT_NODE) {
                            System.out.print(childNode.getNodeName() + ":");
                            // 只有<node>text</node>的时候才能获取node的值  否则是null
                            System.out.print(childNode.getFirstChild().getNodeValue());
                            // 获取节点之间所有的文本内容
                            System.out.println("(" + childNode.getTextContent() + ")");
                        }
                    }
                }
            }
            
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

DomTest.java

2.SAX方式解析XML文件

SAX，全称Simple API for XML，既是一种接口，也是一种软件包。它是一种XML解析的替代方法。SAX不同于DOM解析，它逐行扫描文档，一边扫描一边解析。由于应用程序只是在读取数据时检查数据，因此不需要将数据存储在内存中，这对于大型文档的解析是个巨大优势。

package com.immoc.samtest;

public class Book {
    private String id;
    private String category;
    private String title;
    private String author;
    private String year;
    private String price;
    
    
    public String toString() {
        return "id=" + id + "
category=" + category + "
title=" + title 
                + "
author=" + author + "
year=" + year + "
price=" + price;
    }
    
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public String getCategory() {
        return category;
    }
    public void setCategory(String category) {
        this.category = category;
    }
    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }
    public String getAuthor() {
        return author;
    }
    public void setAuthor(String author) {
        this.author = author;
    }
    public String getYear() {
        return year;
    }
    public void setYear(String year) {
        this.year = year;
    }
    public String getPrice() {
        return price;
    }
    public void setPrice(String price) {
        this.price = price;
    }
}

Book.java

package com.immoc.samtest;

import java.util.ArrayList;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserHandler extends DefaultHandler {
    
    Book book = null;
    ArrayList<Book> bookList = new ArrayList<>();
    
    // 全局变量标识第几本书
    int bookIndex = 0;
    
    // 全局变量  记录节点的值
    String nodeValue = null;
    
    public ArrayList<Book> getBookList() {
        return bookList;
    }
    
    /**
     * 用来标识文档开始
     */
    @Override
    public void startDocument() throws SAXException {
        super.startDocument();
        System.out.println("SAX解析开始");
    }
    /**
     * 用来标识文档结束
     */
    @Override
    public void endDocument() throws SAXException {
        super.endDocument();
        System.out.println("SAX解析结束");
    }
    /**
     * 解析XML元素开始
     */
    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes)
            throws SAXException {
        super.startElement(uri, localName, qName, attributes);
        // 如果标签名为book
        if ("book".equals(qName)) {
            System.out.println("第" + (++bookIndex) + "本书：");
            book = new Book();
            for (int i = 0; i < attributes.getLength(); i++) {
                System.out.print("属性：" + attributes.getQName(i));
                System.out.println("=" + attributes.getValue(i));
                if ("category".equals(attributes.getQName(i))) {
                    book.setCategory(attributes.getValue(i));
                } else if ("id".equals(attributes.getQName(i))) {
                    book.setId(attributes.getValue(i));
                }
            }
        } else if (!"bookstore".equals(qName)) {
            System.out.print("节点名：" + qName);
        }
    }
    /**
     * 解析XML元素结束
     */
    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        super.endElement(uri, localName, qName);
        if ("book".equals(qName)) {
            System.out.println("本书遍历结束");
            bookList.add(book);
            book = null;
        } else if ("title".equals(qName)) {
            book.setTitle(nodeValue);
        } else if ("author".equals(qName)) {
            book.setAuthor(nodeValue);
        } else if ("year".equals(qName)) {
            book.setYear(nodeValue);
        } else if ("price".equals(qName)) {
            book.setPrice(nodeValue);
        } 
    }
    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        super.characters(ch, start, length);
        nodeValue = new String(ch, start, length);
        // 去掉空白文本部分
        if (!"".equals(nodeValue.trim()))
            System.out.println("=" + nodeValue);
    }
                    
}

SAXParserHandler.java

package com.immoc.samtest;

import java.io.IOException;
import java.util.ArrayList;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.SAXException;

public class SAXTest {    
    
    public static void main(String[] args) {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser parser = factory.newSAXParser();
            SAXParserHandler handler = new SAXParserHandler();
            parser.parse("books.xml", handler);
            ArrayList<Book> bookList = handler.getBookList();
            for (Book book: bookList) {
                System.out.println("-------");
                System.out.println(book);
            }
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    
}

SAXTest.java

3.JDOM方式解析XML文件

这个比较简单了，不过需要导入外部jar文件，地址:http://www.jdom.org/dist/binary/

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;

import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;

public class JDOMTest {
    
    public static void main(String[] args) {
        SAXBuilder saxBuilder = new SAXBuilder();
        try {
            InputStream in = new FileInputStream("xmls/books.xml");
            Document document = saxBuilder.build(in);
            // 获取xml文件根节点
            Element element = document.getRootElement();
            List<Element> bookList = element.getChildren();
            for (Element book: bookList) {
                // 直接根据属性名获取属性值
                // book.getAttributeValue("id");
                List<Attribute> attrList = book.getAttributes();
                for (Attribute attr: attrList) {
                    System.out.println("属性" + attr.getName() + "=" + attr.getValue());
                }
                List<Element> childList = book.getChildren();
                for (Element child: childList) {
                    System.out.println("子节点" + child.getName() + "=" + child.getValue());
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (JDOMException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
}

JDOMTest

4.DOM4J方式解析XML文件

下载地址：http://www.dom4j.org/dom4j-1.6.1/

import java.io.File;
import java.util.Iterator;
import java.util.List;

import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;
import org.w3c.dom.Attr;


public class DOM4JTest {
    
    public static void main(String[] args) {
        SAXReader reader = new SAXReader();
        Document document;
        try {
            document = reader.read(new File("xmls/books.xml"));

            Element bookStore = document.getRootElement();
            Iterator it = bookStore.elementIterator();
            while (it.hasNext()) {
                System.out.println("开始遍历某本书>>>");
                Element book = (Element)it.next();
                List<Attribute> bookAttrs = book.attributes();
                for (Attribute attr: bookAttrs) {
                    System.out.println("属性" + attr.getName() + "=" + attr.getValue());
                }
                Iterator childIt = book.elementIterator();
                while (childIt.hasNext()) {
                    Element bookChild = (Element) childIt.next();
                    System.out.println("节点" + bookChild.getName() + "=" + bookChild.getStringValue());
                }
                System.out.println("结束遍历>>>");
            }
            
        } catch (DocumentException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
    
    
}

DOM4JTest.java