Java基础（十三）文件高级技术

文件高级技术

一、常见文件类型处理

一）属性文件

属性文件很简单，一行表示一个属性，属性就是键值对，键和值用（=）或者（：）分隔。

#ready to work
name = tang
age = 22
phone = 110

Java使用专门的类java.util.Properties处理这种文件。主要方法：

public synchronized void load(InputStream inStream)
public String getProperty(String key)
public String getProperty(String key, String defaultValue)

Properties props = new Properties();
try {
    props.load(new FileReader("tang.properties"));
    System.out.println("The name is " + props.getProperty("name"));
} catch (IOException e) {
    e.printStackTrace();
}

优势：可以自动处理空格，自动忽略空行，以#或者!开头的会被视为注释。

二）压缩文件

Java SDK支持两种：gzip和zip，gzip只能压缩一个文件，而zip文件可以包含多个。

先看gzip:

java.util.zip.GZIPOutputStream
java.util.zip.GZIPInputStream

它们都是InputStream和OutputStream的子类，都是装饰类，GZIPOutputStream加到

已有的流上，就可以实现压缩，而GZIPInputStream加到已有的流上就可以实现解压。

public static void gzip(String fileName) {
    InputStream in = null;
    String gzipFileName = fileName + ".gz";
    OutputStream out = null;
    try {
        in = new FileInputStream(fileName);
        out = new GZIPOutputStream(new FileOutputStream(gzipFileName));
        copy(in, out);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
public static void unGzip(String gzipFileName, String fileName) {
    InputStream in = null;
    OutputStream out = null;
    try {
        in = new GZIPInputStream(new BufferedInputStream(
                new FileInputStream(gzipFileName)
        ));
        out = new BufferedOutputStream(
                new FileOutputStream(fileName)
        );
        copy(in, out);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
public static void copy(InputStream in, OutputStream out) throws IOException {
    try {
        byte[] buf = new byte[1024];
        int count = 0;
        while ((count = in.read(buf)) != -1) {
            out.write(buf, 0, count);
        }
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (in != null) in.close();
        if (out != null) out.close();
    }
}

zip支持压缩文件包中包含多个文件，Java API中的主要类是：

java.util.zip.ZipOutputStream
java.util.zip.ZipInputStream

ZipOutputStream可以写入多个文件，它有一个重要方法：

//在写入一个文件前，必须先调用该方法，表示准备写入一个压缩条目ZipEntry
public void putNextEntry(ZipEntry e) throws IOException

//每个压缩条目都有一个名称，这个名称是压缩文件的相对路径，如果以'/'结尾表示目录
public ZipEntry(String name)

/**
* 压缩一个文件或者目录
* @param inFile 表示输入，可以是文件或者目录
* @param zipFile 表示输出的zip文件
* */
public static void zip(File inFile, File zipFile) throws IOException {
    ZipOutputStream zipOut = new ZipOutputStream(new BufferedOutputStream(
            new FileOutputStream(zipFile)));
    try {
        //输入文件不存在抛出异常
        if (!inFile.exists())
            throw new FileNotFoundException(inFile.getAbsolutePath());
        inFile = inFile.getCanonicalFile();
        String rootPath = inFile.getParent();
        //如果根路径不是以"/"结尾
        if (!rootPath.endsWith(File.separator)) {
            rootPath += File.separator;
        }
        addFileToZipOut(inFile, zipOut, rootPath);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        zipOut.close();
    }
}

public static void addFileToZipOut(File inFile, ZipOutputStream out, String rootPath) throws IOException {
    String relativePath = inFile.getCanonicalPath()
            .substring(rootPath.length());
    if (inFile.isFile()) {
        out.putNextEntry(new ZipEntry(relativePath));
        InputStream in = new BufferedInputStream(
                new FileInputStream(inFile)
        );
        try {
            copy(in, out);
        } finally {
            in.close();
        }
        //如果是目录
    } else {
        out.putNextEntry(new ZipEntry(relativePath + File.separator));
        for (File f : inFile.listFiles()) {
            addFileToZipOut(f, out, rootPath);
}

二、随机读写文件（RandomAccessFile）

一）用法

RandomAccessFile构造方法：

public RandomAccessFile(String name, String mode) throws FileNotFoundException
public RandomAccessFile(File file, String mode) throws FileNotFoundException

其中name和file,表示文件路径和File对象。mode表示打开模式：

1）r:只读

2）rw:读和写

3）rws：读和写，另外，要求文件内容和元数据的任何更新都同步到设备上。

4）rwd：读和写，另外，文件内容的更新同步到设备上，元数据更新不同步。

该类有类似于InputStream/OutputStream类似的读写字节流的方法。

另外，它还实现了DataInput/DataOutput接口。

public int read() throws IOException
public int read(byte b[]) throws IOException
public final int readInt() throws IOException
public final void writeInt(int v) throws IOException
public void write(byte b[]) throws IOException

另外还有两个read方法：

public final void readFully(byte b[]) throws IOException
public final void readFully(byte b[], int off, int len) throws IOException

与对应的read方法的区别是，它们可以确保读够期望的长度，如果到了文件结尾也没读够，抛异常。

RandomAccessFile内部有一个文件指针，指向当前的读写位置，各种read/write操作都会自动更新

该指针。与流操作不同的是，RandomAccessFile可以获取该指针，也可以更改该指针。

//获取当前指针
public native long getFilePointer() throws IOException
//更改当前指针到pos
public native void seek(long pos) throws IOException

跳过输入流中的n个字节：

public int skipBytes(int n) throws IOException //通过更改指针实现

获取文件字节数：

public native long length() throws IOException

修改文件长度：

//如果当前文件的长度小于newLength，则文件会扩展，大于会收缩，文件指针比newLength
//大则会调整到newLength
public native void setLength(long newLength) throws IOException

请避免使用以下两个方法：

public final void writeBytes(String s) throws IOException
public final String readLine() throws IOException

三、内存映射文件

内存映射文件不是Java引入的概念，而是操作系统提供的一种功能，大部分操作系统都支持。

一）基本概念

所谓内存映射文件，就是将文件映射到内存，文件对应于内存的一个字节数组，对文件的操作

变为对这个字节数组的操作，而字节数组的操作直接映射到文件上。这种映射可以是文件的全部

区域也可以是部分区域。

内存映射文件特点：

1）使用的是操作系统内核内存空间，只有一次复制，比普通读写效率高

2）可被多个不同程序共享，一个程序对内存的修改，其他程序也可以看

到，这使得它特别适合不同程序间的通信

操作系统自身在加载可执行文件的时候，一般都利用了内存映射。

内存映射局限性：

因为是按页分配内存，对小文件来说浪费内存

二）用法

内存映射文件需要通过FileInputStream/FileOutputStream/RandomAccessFile,它们都有方法：

public FileChannel getChannel()

FileChannel都有方法：

/**
* 该方法将当前文件映射到内存，映射结果就是MappedByteBuffer对象，它代表内存中的字节数组
* 如果映射区域超过了文件的范围，文件会自动扩展
* @param mode 表示映射模式：
* READ_ONLY:只读
* READ_WRITE:读写
* PRIVATE:私有模式，更改不反映到文件，也不被其他程序看到
* @param position 表示映射的起始位置
* @param size 表示映射的长度
* @return 映射完成后，文件就可以关闭，对文件的后续读写可以通过MappedByteBuffer
* */
public MappedByteBuffer map(MapMode mode, long position,
                            long size) throws IOException{
}

MappedByteBuffer是ByteBuffer的子类，ByteBuffer可以理解为封装了一个长度不可变的字节数组，

在内存映射文件中这个长度由map方法中的size决定。ByteBuffer有一个基本属性position,表示当前

读写位置，相关方法是:

public final int position() //获取当前读写位置
public final Buffer position(int newPosition)  //修改当前读写位置

ByteBuffer中有很多基于当前读写位置的读写数据方法：

public abstract byte get()  //从当前位置获取一个字节
public ByteBuffer get(byte[] dst)  //从当前位置获取dst.length长度的字节到dst
public abstract int getInt() //从当前位置读取一个int
public final ByteBuffer put(byte[] src)  //将字节数组src写入到当前位置
public abstract ByteBuffer putLong(long value) //将value写入到当前位置

这些方法读写后都会自动增加position,与之对应的还有一组方法可以指定position:

public abstract int getInt(int index)  //从index处读取一个int
public abstract double getDouble(int index) 
public abstract ByteBuffer putDouble(int index, double value)
public abstract ByteBuffer putLong(int index, long value)

这些方法在读写时，不会改变当前的读写位置。

MappedByteBuffer自己还定义了一些方法：

//检查文件内容是否真正加载到了内存，仅供参考
public final boolean isLoaded()
//尽量将文件内容加载到内存
public final MappedByteBuffer load() 
//将对内存的修改强制同步到硬盘上
public final MappedByteBuffer force()

四、标准序列化机制

序列化就是将对象转换为字节流，反序列化就是将字节流转换为对象。

一）基本用法

要让一个类支持序列化，只需要让这个类实现接口java.io.Serializable,该接口是一个标记接口。

读取/保存声明了Serializable接口的类可以使用ObjecOutputStream/ObjectInputStream流了。

ObjectOutputStream是OutputStream的子类，但实现了ObjectOutput接口，该接口是DataOutput

的子接口，增加了一个方法：

public void writeObject(Object obj) throws IOExceptio

该方法能把对象obj转化为字节，写到流中。

ObjectInputStream核心方法：

public Object readObject() throws ClassNotFoundException, IOException

该方法中流中读取字节，转化为对象。

public static void writeStudents(List<Student> students)
        throws IOException {
    ObjectOutputStream out = new ObjectOutputStream(
            new BufferedOutputStream(new FileOutputStream("students.dat")));
    try {
        out.writeInt(students.size());for(Student s : students) {
            out.writeObject(s);
        }
    } finally {
        out.close();
    }
}

public static List<Student> readStudents() throws IOException,
        ClassNotFoundException {
    ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(
            new FileInputStream("students.dat")));
    try {
        int size = in.readInt();
        List<Student> list = new ArrayList<>(size);
        for(int i = 0; i < size; i++) {
            list.add((Student) in.readObject());
        }
        return list;
    } finally {
        in.close();
    }
}

二）定制序列化

主要有两种定制序列化的机制：

1）使用transient关键字

声明为transient的字段，Java的默认序列化机制就不会保存该字段了，

但可以通过writeObject来自己保存。

2）实现writeObject和readObject方法

writeObject声明必须为：

private void writeObject(java.io.ObjectOutputStream s) throws java.io.IOException

ArrayList中有：

private void writeObject(java.io.ObjectOutputStream s)
    throws java.io.IOException{
    // Write out element count, and any hidden stuff
    int expectedModCount = modCount;
    //该方法必须被调用，即使类中所有的字段都是transient
    s.defaultWriteObject();
    // Write out size as capacity for behavioural compatibility with clone()
    s.writeInt(size);
    // Write out all elements in the proper order.
    for (int i=0; i<size; i++) {
        s.writeObject(elementData[i]);
    }
    if (modCount != expectedModCount) {
        throw new ConcurrentModificationException();
    }
}

readObject方法必须声明为：

private void readObject(java.io.ObjectInputStream s) throws java.io.IOException, ClassNotFoundException

private void readObject(java.io.ObjectInputStream s)
    throws java.io.IOException, ClassNotFoundException {
    elementData = EMPTY_ELEMENTDATA;
    // Read in size, and any hidden stuff
    s.defaultReadObject();
    // Read in capacity
    s.readInt(); // ignored
    if (size > 0) {
        // be like clone(), allocate array based upon size not capacity
        ensureCapacityInternal(size);
        Object[] a = elementData;
        // Read in all elements in the proper order.
        for (int i=0; i<size; i++) {
            a[i] = s.readObject();
        }
    }
}

三）基本逻辑

writeObject基本逻辑：

1）如果对象没有实现Serializable,抛出NotSerializable异常

2）每个对象都有一个编号，如果之前已经写过该对象，下次写入只会写该

对象的引用，这可以解决对象引用和循环引用的问题

3）如果对象实现了writeObject方法，调用它的自定义方法

4）利用的是反射机制

readObject基本逻辑：

1）不调用任何构造方法

2）它自己就相当于一个独立的构造方法，根据字节流初始化对象，利用的也是反射机制

3）在解析流时，对于引用到的类型信息会动态加载，如果找不到，抛出ClassNotFoundException

四）版本问题

需要解决的问题：序列化到文件的对象是持久保存的，不会自己改变的，而我们

的代码是不断演变改进的，如果类的定义发生了变化，反序列化会出现问题。

解决方法：Java会给类自动定义一个版本号，这个版本号是根据类中的信息生成的。

在反序列化时，如果类的定义发生了变化，版本号就会变化，与流中的版本号就会

不匹配，反序列化就会抛出java.in.InvalidClassException.

但因为Java自动生成版本号性能较低，还有为了更好地控制，我们通常自定义这个

版本号。注意通过编辑器自动生成的版本号不会自己更新。

如果版本号一样，但实际字段不匹配：

1）字段删除了：即使流中有该字段，类定义中有，该字段会被忽略；

2）新增了字段：即类定义中有，而流中没有，该字段会被设置为默认值；

3）字段类型改变：抛出InvalidClassException;

五、使用Jackson序列化

一）基本用法

1.JSON

Student student = new Student("ୟӣ", 18, 80.9d);
//它是一个线程安全类
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
//默认情况下会保存所有声明为public或者有public getter方法的字段
String str = mapper.writeValueAsString(student);
System.out.println(str);

ObjectMapper mapper = new ObjectMapper();
//默认情况下，被反序列化的类必须要有无参构造函数
Student s = mapper.readValue(new File("student.json"), Student.class);
System.out.println(s.toString());

其他重要方法：

public byte[] writeValueAsBytes(Object value)
public void writeValue(OutputStream out, Object value)
public void writeValue(Writer w, Object value)
public void writeValue(File resultFile, Object value)

public <T> T readValue(InputStream src, Class<T> valueType)
public <T> T readValue(Reader src, Class<T> valueType)
public <T> T readValue(String content, Class<T> valueType)
public <T> T readValue(byte[] src, Class<T> valueType)

2.XML

与序列化为JSON类似：

Student student = new Student("tom", 18, 80.9d);
ObjectMapper mapper = new XmlMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
String str = mapper.writeValueAsString(student);
mapper.writeValue(new File("student.xml"), student);
System.out.println(str);

3.MessagePack

MessagePack是一种二进制形式的JSON编码更为精简高效，因为是二进制，因此不能写出为String。

Student student = new Student("jim", 18, 80.9d);
ObjectMapper mapper = new ObjectMapper(new MessagePackFactory());
byte[] bytes = mapper.writeValueAsBytes(student);
mapper.writeValue(new File("student.bson"), student);

Student s = mapper.readValue(new File("student.bson"), Student.class);
System.out.println(s.toString())

4.容器对象

List<Student> students = Arrays.asList(new Student[] {
new Student("tom", 18, 80.9d), new Student("๫ ",ࢥ17, 67.5d) });
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
String str = mapper.writeValueAsString(students);
mapper.writeValue(new File("students.json"), students);
System.out.println(str);
//反序列化不同，需要新建一个TypeReference对象
List<Student> list = mapper.readValue(new File("students.json"),
new TypeReference<List<Student>>() {});
System.out.println(list.toString());

二）定制序列化

1.忽略字段

//用于字段，getter、setter方法
@JsonIgnore
double score;

//用于类，指定忽略字段
@JsonIgnoreProperties("score")
public class Student {

2.引用同一个对象

问题:

Dog price = new Dog("Price");
Person jim = new Person("Jim", price, price);
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
String value = mapper.writeValueAsString(jim);
System.out.println(value);
Person person = mapper.readValue(value, Person.class);
if (person.getFirst() == person.getSecond()) {
    System.out.println("Same");
} else {
    System.out.println("Different"); //different 指向了不同的对象
}

解决办法：

@JsonIdentityInfo(
        generator = ObjectIdGenerators.IntSequenceGenerator.class,
        property = "id"
)
public class Dog {

3.循环引用

问题：

Parent parent = new Parent();
parent.name = "Father";
Child child = new Child();
child.name = "Child";
parent.child = child;
child.parent = parent;
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
String s = mapper.writeValueAsString(parent); //java.lang.StackOverflowError
System.out.println(s);

解决办法：

public class Parent {
    public String name;
    @JsonManagedReference //标记为主引用
    public Child child;
}
public class Child {
    public String name;
    @JsonBackReference //标记为反向引用
    public Parent parent;
}

4.反序列化时忽略未知字段

问题：与Java标准序列化不同，在反序列化时，对于未知字段，Jackson默认会抛出异常：UnrecognizedPropertyException.

解决办法：

mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);

或者：

@JsonIgnoreProperties(ignoreUnknown=true)
public class Student {
//...
}

5.继承和多态

Jackson不能自动处理继承和多态：

public class Shape {
}

public class Circle extends Shape {
    private int r;
    public Circle() {
    }
    public Circle(int r) {
        this.r = r;
    }
}

public class Square extends Shape {
    private int l;
    public Square() {
    }
    public Square(int l) {
        this.l = l;
    }
}

public class ShapeManager {
    private List<Shape> shapeList;
    public List<Shape> getShapeList() {
        return shapeList;
    }
    public void setShapeList(List<Shape> shapeList) {
        this.shapeList = shapeList;
    }
}

解决办法：

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.PROPERTY, property = "type")
@JsonSubTypes({
        @JsonSubTypes.Type(value = Circle.class, name = "circle"),
        @JsonSubTypes.Type(value = Square.class, name = "square")
})
public class Shape {
}

6.修改字段名称

@JsonProperty("名称：") //改变输出
String name;

//对于xml修改根元素名称
@JsonRootName("student")
public class Student {

7.格式化日期

默认情况下日期会被序列化为一个长整数。解决：

@JsonFormat(pattern="yyyy-MM-dd HH:mm:ss", timezone="GMT+8")
public Date date = new Date();

8.配置构造方法

序列化时，如果类没有无参构造函数，会抛异常。解决：

@JsonCreator
public Student(
        @JsonProperty("name") String name,
        @JsonProperty("age") int age,
        @JsonProperty("score") double score) {
    this.name = name;
    this.age = age;
    this.score = score;
}

Simple is important！

Java基础（十三） 文件高级技术

文件高级技术

一、常见文件类型处理

一）属性文件

二）压缩文件

二、随机读写文件（RandomAccessFile）

一）用法

三、内存映射文件

一）基本概念

二）用法

四、标准序列化机制

一）基本用法

二）定制序列化

三）基本逻辑

四）版本问题

五、使用Jackson序列化

一）基本用法

二）定制序列化

Java基础（十三）文件高级技术