大数据应用之HBase数据插入性能优化实测教程

引言：

　　大家在使用HBase的过程中，总是面临性能优化的问题，本文从HBase客户端参数设置的角度，研究HBase客户端数据批量插入性能优化的问题。事实胜于雄辩，数据比理论更有说服力，基于此，作者设计了这么一个HBase数据插入性能优化实测实验，希望大家用自己的服务器跑出的结果，给自己一个值得信服的结论。

一、客户单优化参数

　　1.Put List Size
　　HBase的Put支持单条插入，也支持批量插入。

　　2. AutoFlush　　
　　AutoFlush指的是在每次调用HBase的Put操作，是否提交到HBase Server。默认是true,每次会提交。如果此时是单条插入，就会有更多的IO,从而降低性能

　　3.Write Buffer Size
　　Write Buffer Size在AutoFlush为false的时候起作用，默认是2MB,也就是当插入数据超过2MB,就会自动提交到Server

　　4.WAL
　　WAL是Write Ahead Log的缩写，指的是HBase在插入操作前是否写Log。默认是打开，关掉会提高性能，但是如果系统出现故障(负责插入的Region Server　　挂掉)，数据可能会丢失。

参数	默认值	说明
JVM Heap Size		平台不同值不同自行设置
AutoFlush	True	默认逐条提交
Put List Size	1	支持逐条和批量
Write Buffer Size	2M	与autoflush配合使用
Write Ahead Log	True	默认开启，需要手动关闭
…
…

二、源码程序

  1 import java.io.IOException;
  2 import java.util.ArrayList;
  3 import java.util.List;
  4 import java.util.Random;
  5  
  6 import org.apache.hadoop.conf.Configuration;
  7 import org.apache.hadoop.hbase.HBaseConfiguration;
  8 import org.apache.hadoop.hbase.HColumnDescriptor;
  9 import org.apache.hadoop.hbase.HTableDescriptor;
 10 import org.apache.hadoop.hbase.KeyValue;
 11 import org.apache.hadoop.hbase.MasterNotRunningException;
 12 import org.apache.hadoop.hbase.ZooKeeperConnectionException;
 13 import org.apache.hadoop.hbase.client.Delete;
 14 import org.apache.hadoop.hbase.client.Get;
 15 import org.apache.hadoop.hbase.client.HBaseAdmin;
 16 import org.apache.hadoop.hbase.client.HTable;
 17 import org.apache.hadoop.hbase.client.Result;
 18 import org.apache.hadoop.hbase.client.ResultScanner;
 19 import org.apache.hadoop.hbase.client.Scan;
 20 import org.apache.hadoop.hbase.client.Put;
 21 import org.apache.hadoop.hbase.util.Bytes;
 22 
 23 /*
 24  * -------优化案例说明------------
 25  * 1.优化参数1：Autoflush                默认关闭，需要手动开启
 26  * 2.优化参数2：put list size            支持单条与批量
 27  * 3.优化参数3：JVM heap size             默认值是平台而不同，需要手动设置
 28  * 4.优化参数4：Write Buffer Size        默认值2M    
 29  * 5.优化参数5：Write Ahead Log             默认开启，需要手动关闭
 30  * */
 31 
 32 public class TestInsert {
 33     static Configuration hbaseConfig = null;
 34 
 35     public static void main(String[] args) throws Exception {
 36         Configuration HBASE_CONFIG = new Configuration();
 37         HBASE_CONFIG.set("hbase.master", "192.168.230.133:60000");
 38         HBASE_CONFIG.set("hbase.zookeeper.quorum", "192.168.230.133");
 39         HBASE_CONFIG.set("hbase.zookeeper.property.clientPort", "2181");
 40         hbaseConfig = HBaseConfiguration.create(HBASE_CONFIG);
 41         //关闭wal,autoflush,writebuffer = 24M
 42         insert(false,false,1024*1024*24);
 43         //开启AutoFlush，writebuffer = 0
 44         insert(false,true,0);
 45         //默认值，全部开启
 46         insert(true,true,0);
 47     }
 48 
 49     private static void insert(boolean wal,boolean autoFlush,long writeBuffer)
 50             throws IOException {
 51         String tableName="etltest";
 52         HBaseAdmin hAdmin = new HBaseAdmin(hbaseConfig);
 53         if (hAdmin.tableExists(tableName)) {
 54             hAdmin.disableTable(tableName);
 55             hAdmin.deleteTable(tableName);
 56         }
 57 
 58         HTableDescriptor t = new HTableDescriptor(tableName);
 59         t.addFamily(new HColumnDescriptor("f1"));
 60         t.addFamily(new HColumnDescriptor("f2"));
 61         t.addFamily(new HColumnDescriptor("f3"));
 62         t.addFamily(new HColumnDescriptor("f4"));
 63         hAdmin.createTable(t);
 64         System.out.println("table created");
 65         
 66         HTable table = new HTable(hbaseConfig, tableName);
 67         table.setAutoFlush(autoFlush);
 68         if(writeBuffer!=0){
 69             table.setWriteBufferSize(writeBuffer);
 70         }
 71         List<Put> lp = new ArrayList<Put>();
 72         long all = System.currentTimeMillis();
 73         
 74         System.out.println("start time = "+all);
 75         int count = 20000;
 76         byte[] buffer = new byte[128];
 77         Random r = new Random();
 78         for (int i = 1; i <= count; ++i) {
 79             Put p = new Put(String.format("row d",i).getBytes());
 80             r.nextBytes(buffer);
 81             p.add("f1".getBytes(), null, buffer);
 82             p.add("f2".getBytes(), null, buffer);
 83             p.add("f3".getBytes(), null, buffer);
 84             p.add("f4".getBytes(), null, buffer);
 85             p.setWriteToWAL(wal);
 86             lp.add(p);
 87             if(i%1000 == 0){
 88                 table.put(lp);
 89                 lp.clear();
 90             }
 91         }
 92         
 93         System.out.println("WAL="+wal+",autoFlush="+autoFlush+",buffer="+writeBuffer+",count="+count);
 94         long end = System.currentTimeMillis();
 95         System.out.println("total need time = "+ (end - all)*1.0/1000+"s");
 96         
 97         
 98         System.out.println("insert complete"+",costs:"+(System.currentTimeMillis()-all)*1.0/1000+"ms");
 99     }
100 }

三、集群配置

3.1 服务器硬件配置清单

序号	节点名称	CUP	内存	硬盘	带宽
1	HMaster
2	HregionServer1
3	HregionServer2
4	…
5
6
7

3.2 客户端硬件配置清单

设备	节点名称
Cpu
内存
硬盘
带宽

四、测试报告

数据量	JVM	AutoFlush	Put List Size	WriteBufferSize	WAL	耗时
1000	512m	false	1000	1024102424	false
2000
5000
10000
20000
50000
100000
200000
500000
100000