hadoop的序列化机制

序列化（Serialization）是指把结构化对象转化为字节流。

反序列化（Deserialization）是序列化的逆过程。即把字节流转回结构化对象。
hadoop将Java中的序列化接口（java.io.Serializable）进行了改造，已达到高效传输的目的

序列化格式特点：

1. 紧凑：高效使用存储空间。
2. 快速：读写数据的额外开销小
3. 可扩展：可透明地读取老格式的数据
4. 互操作：支持多语言的交互
（Hadoop的序列化格式：Writable
例：long对应LongWritable）

这里写图片描述
序列化在分布式环境的两大作用：

进程间通信，永久存储。
Hadoop节点间通信。

自定义Writable类型

　　有时候，使用hadoop自带的一些writable序列化类（如LongWritable、Text）无法满足要求，需要自定义一些序列化类型。
　　首先，定义一个类例如DataBean实现Writable接口，并实现其接口的 write() 和 readFilelds() 方法
　　注意：一定不能把顺序和数据类型写错了，不然无法保证正常序列化！

public class DataBean implements Writable{ 

    private String telNo;
    private long uoPayLoad;
    private long downPayLoad;
    private long totalPayLoad;
    //序列化
    public void write(DaraOutput out) throws IOException {
        out.writeUTF(telNo);
        out.writeLong(upPayLoad);
        out.writeLong(downPayLoad);
        out.writeLong(totalPayLoad);
    }
    //反序列化
    public void readFileds(DataInput in) throws IOException{
        this.telNo = in.readUTF();
        this.upPayLoad = in.readLong();
        this.downPayLoad= in.readLong();
        this.totalPayLoad= in.readLong();
    }
    getter and setters...
}