Hadoop序列化

遗留问题：

Hadoop序列化可以复用对象，是在哪里复用的?

介绍Hadoop序列化机制
Hadoop序列化机制详解
1. Hadoop序列化的核心
2. Hadoop序列化的比较接口
3. ObjectWritable类
参考
1. Comparable 和 Comparator
2. ConcurrentHashMap

介绍Hadoop序列化机制

JAVA的序列化机制是在对象流ObjectOutputStream对象上调用writeObject 方法。Hadoop的序列化机制是通过write函数将对象序列化到流中。Hadoop序列化可以复用对象，这样会节省系统开销。

Hadoop序列化机制详解

1.Hadoop序列化的核心

Hadoop序列化的核心是Writable接口，所有的实现这个接口的对象，都是可以序列化的。Writable有两个方法，一个是将序列化的对象写入流中，一个是从流中读取对象。

public interface Writable {
  /** 
   * Serialize the fields of this object to <code>out</code>.
   * 
   * @param out <code>DataOuput</code> to serialize this object into.
   * @throws IOException
   */
  void write(DataOutput out) throws IOException;

  /** 
   * Deserialize the fields of this object from <code>in</code>. 
   * 
   * <p>For efficiency, implementations should attempt to re-use storage in the 
   * existing object where possible.</p>
   * 
   * @param in <code>DataInput</code> to deseriablize this object from.
   * @throws IOException
   */
  void readFields(DataInput in) throws IOException;
}

2.Hadoop序列化的比较接口

Hadoop中重要的比较接口有WritableComparable, RawComparator 和 WritableComparator。WritableComparable如下：

public interface WritableComparable<T> extends Writable, Comparable<T> {
}

该接口继承了Writable 和Comparable接口。所有实现WritableComparable的序列化类型都会实现CompareTo类型。例如IntWritable类型：

public class IntWritable implements WritableComparable<VIntWritable> { 
/** Compares two IntWritables. */
@Override 
public int compareTo(IntWritable o) {
  int thisValue = this.value;
  int thatValue = o.value;
  return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
 
}

RawComparator 继承了Comparator接口，该接口包含一个compare函数，用来从流中读取内容，并进行比较，避免了对象的创建。

public interface RawComparator<T> extends Comparator<T> {

  /**
   * Compare two objects in binary.
   * b1[s1:l1] is the first object, and b2[s2:l2] is the second object.
   * 
   * @param b1 The first byte array.
   * @param s1 The position index in b1. The object under comparison's starting index.
   * @param l1 The length of the object in b1.
   * @param b2 The second byte array.
   * @param s2 The position index in b2. The object under comparison's starting index.
   * @param l2 The length of the object under comparison in b2.
   * @return An integer result of the comparison.
   */
  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

}

WritableComparator 是 RawComparator 对 WritableComparable 类的一个通用实现。它有两个功能：

a.提供了一个compare的默认实现，从数据流中反序列化要比较的对象，然后调用Compare函数进行比较。

b.充当了RawComparator实例的一个工厂方法。

在所有的定长类型的类中都会有一个静态类继承WritableComparator ，并实现 compare 函数，然后通过define函数注册到WritableComparator类的 ConcurrentHashMap中。

@Override
public String toString() {
  return Integer.toString(value);
}

/** A Comparator optimized for IntWritable. */ 
public static class Comparator extends WritableComparator {
  public Comparator() {
    super(IntWritable.class);
  }
 
  @Override
  public int compare(byte[] b1, int s1, int l1,
                     byte[] b2, int s2, int l2) {
    int thisValue = readInt(b1, s1);
    int thatValue = readInt(b2, s2);
    return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
  }
}

static {                                        // register this comparator
  WritableComparator.define(IntWritable.class, new Comparator());
}

3. ObjectWritable类

ObjectWritable类对所有的序列化类型进行了封装，然后实现对象的序列化和反序列化。具体过程如下：

1.如果declaredClass是一个数组，对数组中的每个declaredClass对象调用WriteObject（）。

2.如果declaredClass对象是一个ArrayPrimitiveWritable类型的，调用Array类型的Write函数。

3.如果declaredClass是 PrimitiveWritable类型的，对不同的类型调用不同的write***函数。

4.如果declaredClass是enum类型的，写入enum的名字。

5.如果declaredClass是Writable类型的，写入对象实例的类名。

参考

1.comparable 和 comparator

http://www.cnblogs.com/sunflower627/p/3158042.html

2.ConcurrentHashMap

http://ifeve.com/concurrenthashmap/