Hadoop--序列化

序列化：

对象的序列化用于将一个对象编码成字节流，以及从字节流中重新构建对象。

将一个对象编码成一个字节流称为序列化该对象。

序列化三种主要的用途：

1.作为一种持久化格式。

2.作为一种通信的数据格式。

3.作为一种拷贝、克隆机制。

分布式处理中主要用了上面2种，持久化格式和通信数据格式。

Hadoop序列化机制：

Hadoop序列化机制是调用的write方法将对象序列化到流中，调用readFiles方法进行反序列化。

java序列化机制与Hadoop序列化机制区别：

java：反序列化过程中不断的创建新对象。

Hadoop：反序列化的工程中，可以服用对象，也就是说在同一个对象上得到多个反序列化的结果。

减少了java对象的分配和回收，提高了应用的效率。

Hadoop序列化机制的特征

1.紧凑：Hadoop中最稀缺的资源是宽带，所以紧凑的序列化机制可以充分的利用宽带。

2.快速：通信时大量使用序列化机制，因此，需要减少序列化和反序列化的开销。

3.可扩展：随着通信协议的升级而可升级。

4.互操作：支持不同开发语言的通信。

Hadoop Writable机制

Hadoop通过Writable接口实现的序列化机制。

接口提供连个方法，write和readFiles。

Hadoop还包含另外几个重要的序列化接口，WritableCompareable、RawComparator、WritableComparator。

Writable

类PersonWritable继承自WritableComparable，所以类中要实现write和readFiles两个方法。

代码来自私塾在线，TestCompression工程中。

代码：

package com.test;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;

public class PersonWritable implements WritableComparable<PersonWritable> {

	Text name = new Text();
	Text sex = new Text();
	IntWritable age = new IntWritable();

	public PersonWritable() {
		set("tom", "man", 12);
	}

	public void set(String name, String sex, int age) {
		this.name = new Text(name);
		this.sex = new Text(sex);
		this.age = new IntWritable(age);
	}

	public PersonWritable(String name, String sex, int age) {
		set(name, sex, age);
	}

	@Override
	public String toString() {
		return "PersonWritable [name=" + name.toString() + ", sex="
				+ sex.toString() + ", age=" + age.get() + "]";
	}

	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + ((age == null) ? 0 : age.hashCode());
		result = prime * result + ((name == null) ? 0 : name.hashCode());
		result = prime * result + ((sex == null) ? 0 : sex.hashCode());
		return result;
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		PersonWritable other = (PersonWritable) obj;
		if (age == null) {
			if (other.age != null)
				return false;
		} else if (!age.equals(other.age))
			return false;
		if (name == null) {
			if (other.name != null)
				return false;
		} else if (!name.equals(other.name))
			return false;
		if (sex == null) {
			if (other.sex != null)
				return false;
		} else if (!sex.equals(other.sex))
			return false;
		return true;
	}

	@Override
	public void readFields(DataInput arg0) throws IOException {
		name.readFields(arg0);
		sex.readFields(arg0);
		age.readFields(arg0);
	}

	@Override
	public void write(DataOutput arg0) throws IOException {
		name.write(arg0);
		sex.write(arg0);
		age.write(arg0);
	}

	@Override
	public int compareTo(PersonWritable o) {

		int result = name.compareTo(o.name);
		if (result != 0) {
			return result;
		}

		int result1 = sex.compareTo(o.sex);
		if (result1 != 0) {
			return result1;
		}

		int result2 = age.compareTo(o.age);

		if (result2 != 0) {
			return result2;
		}
		return result2;
	}

}

序列化与反序列化话工具类；

package com.test.myselfwritable;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class HadoopSerializationUtil {


	public static byte[] serialize(Writable writable) throws IOException {
		// create bytes ByteArrayOutputStream
		ByteArrayOutputStream out = new ByteArrayOutputStream();
		// create DataOutputStream 
		DataOutputStream dataout = new DataOutputStream(out);
		// call write method
		writable.write(dataout);
		dataout.close();
		// bytes 
		return out.toByteArray();
	}

	public static void deserialize(Writable writable, byte[] bytes)
			throws Exception {

		// create ByteArrayInputStream
		ByteArrayInputStream in = new ByteArrayInputStream(bytes);
		// create DataInputStream
		DataInputStream datain = new DataInputStream(in);
		// read fields
		writable.readFields(datain);
		datain.close();
	}

}

测试类：

package com.test;

import java.io.IOException;

import org.apache.hadoop.util.StringUtils;

import com.test.myselfwritable.HadoopSerializationUtil;

public class Test {

	public static void main(String[] args) throws Exception {

		// test serilizable

		System.out.println("test1");

		PersonWritable personWritable = new PersonWritable("tom", "man", 13);
		// begin serialztion
		byte[] result = HadoopSerializationUtil.serialize(personWritable);
		System.out.print(StringUtils.byteToHexString(result));

		System.out.println("test2");

		PersonWritable personWritable1 = new PersonWritable();
		HadoopSerializationUtil.deserialize(personWritable1, result);

		System.out.print(personWritable1.toString());

	}
}

内容来源：

《Hadoop 技术内幕》