数据的I/O序列化操作

序列化是将对象转化为字节流的方法，序列化目的有：

1> 进程间通信；

2> 数据持久性存储。

RPC（Remote Procedure Call Protocol）——远程过程调用协议，它是一种通过网络从远程计算机程序上请求服务，而不需要了解底层网络技术的协议。RPC协议假定某些传输协议的存在，如TCP或UDP，为通信程序之间携带信息数据。在OSI网络通信模型中，RPC跨越了传输层和应用层。RPC使得开发包括网络分布式多程序在内的应用程序更加容易。

Hadoop采用RPC来实现进程间的通信。Generally，RPC的序列化机制有以下特点：

1> 紧凑：紧凑的格式可以利用带宽，加快传输速度；

2> 快速：能减少序列化和反序列化的开销，这会有效减少进程间通信的时间；

3> 可扩展：可以逐步改变，是Client与Server端直接相关的。例如，可以随时加入一个新的参数方法调用；

4> 互操作性：支持不同语言编写的Client和Server端交换Data。

在Hadoop中，序列化处于核心地位。因为无论是存储文件还是在计算中传输数据，都需要执行序列化的过程。序列化与反序列化的速度，序列化后的data大小等都会影响数据传输的速度，以致影响计算的效率。Hadoop并没有采用Java的序列化机制，而是重新写了一个序列化机制Writable（具有紧凑、快速但不易扩展，亦不利于不同语言的互操作），并允许对自己定义的类加入序列化与反序列化方法. 当要在进程间传递对象或持久化对象的时候，就需要序列化对象成字节流，反之当要将接收到或从磁盘读取的字节流转换为对象，就要进行反序列化。Writable是Hadoop的序列化格式，Hadoop定义了这样一个Writable接口。

public interface Writable {

// Serialize the fields of this object to out.

// @param(out) DataOuput (to serialize this object into). @throws IOException

void write(DataOutput out) throws IOException;

// Deserialize the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.

// @param(in) DataInput (to deseriablize this object from). @throws IOException

void readFields(DataInput in) throws IOException;

}

Writable是Hadoop的核心，Hadoop通过它定义了Hadoop中基本的数据类型及其操作。Generally，无论是上传下载data还是运行MapReduce程序，都需使用Writable类。

//WritableComparable can be compared to each other, typically via Comparator. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.

public interface WritableComparable<T> extends Writable, Comparable<T> { }

看看一个WritableComparable的具体实例：

/** A WritableComparable for ints. */

public class IntWritable implements WritableComparable {

private int value;

public IntWritable() {}

public IntWritable(int value) { set(value); }

/** Set the value of this IntWritable. */

public void set(int value) { this.value = value; }

/** Return the value of this IntWritable. */

public int get() { return value; }

public void readFields(DataInput in) throws IOException { value = in.readInt(); }

public void write(DataOutput out) throws IOException { out.writeInt(value); }

/** Returns true if o is a IntWritable with the same value. */

public boolean equals(Object o) {

if (!(o instanceof IntWritable))

{ return false; }

IntWritable other = (IntWritable)o;

return this.value == other.value;

}

public int hashCode() { return value; }

/** Compares two IntWritables. */

public int compareTo(Object o) {

int thisValue = this.value;

int thatValue = ((IntWritable)o).value;

return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));

}

public String toString() { return Integer.toString(value); }

/** A Comparator optimized for IntWritable. */

public static class Comparator extends WritableComparator {

public Comparator() { super(IntWritable.class); }

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

int thisValue = readInt(b1, s1);

int thatValue = readInt(b2, s2);

return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));

}

// register this comparator

static { WritableComparator.define(IntWritable.class, new Comparator()); }

}

代码中的static块调用WritableComparator的static方法define()用来注册上面这个Comparator，就是将其加入WritableComparator的comparators成员中，comparators是HashMap类型且是static的。这样，就告诉WritableComparator，当我使用WritableComparator.get（IntWritable.class）方法的时候，你返回我注册的这个Comparator给我【对IntWritable来说就是IntWritable.Comparator】，然后我就可以使用comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)来比较b1和b2，而不需要将它们反序列化成对象。comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)中的readInt()是从WritableComparator继承来的，它将IntWritable的value从byte数组中通过移位转换出来。

相关调用如下：

//params byte[] b1, byte[] b2  
RawComparator<IntWritable>comparator = WritableComparator.get(IntWritable.class);  
comparator.compare(b1,0,b1.length,b2,0,b2.length);

注意，当comparators中没有注册要比较的类的Comparator，则会返回一个默认的Comparator，然后使用这个默认Comparator的compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)方法比较b1、b2的时候还是要序列化成对象的，详见后面细讲WritableComparator。

另外关于WritableComparator类定义如下（上面用到过）:

 1 public class WritableComparator implements RawComparator {
 2 
 3   private static HashMap<Class, WritableComparator> comparators =
 4     new HashMap<Class, WritableComparator>(); // registry
 5 
 6   /** Get a comparator for a {@link WritableComparable} implementation. */
 7   public static synchronized WritableComparator get(Class<? extends WritableComparable> c) {
 8     WritableComparator comparator = comparators.get(c);
 9     if (comparator == null)
10       comparator = new WritableComparator(c, true);
11     return comparator;
12   }
13 
14   /** Register an optimized comparator for a {@link WritableComparable}
15    * implementation. */
16   public static synchronized void define(Class c,
17                                          WritableComparator comparator) {
18     comparators.put(c, comparator);
19   }
20   .......
21 }