HBase的Java API操作（DML）

1 环境准备

创建一个Maven工程，在pom.xml文件中添加如下依赖，之后点击右上角图标“Load Maven Changes”下载依赖。

    <dependencies>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.3.1</version>
        </dependency>
    </dependencies>

View Code

2 添加数据

    // 1 添加数据
    @Test
    public void putData() throws IOException {
        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

        // 4 获取put对象
        Put put = new Put(Bytes.toBytes("row1"));
        put.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("name"), Bytes.toBytes("zhangsan"));

        // 5 向表中添加数据
        table.put(put);

        // 6 关闭资源
        table.close();
        connection.close();
    }

View Code

首先获取Configuration对象，并向其添加配置信息。其中，HBaseConfiguration类在org.apache.hadoop.hbase包，Configuration在org.apache.hadoop.conf包。通过Configuration对象的set方法添加配置信息。此例中使用完全分布式的HBase。

        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

创建Connection对象，用于表的DML操作。使用ConnectionFactory类的静态createConnection()方法返回一个Connection对象，该方法需要传入一个Configuration对象。其中，ConnectionFactory类、Connection类均在org.apache.hadoop.hbase.client包。createConnection()方法需要抛出IOException异常。

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

向表中添加数据，首先需要获取待操作的表对象（Table对象）。connection对象的getTable()方法可以返回一个表对象，该方法需要传入一个TableName对象。TableName对象通过TableName类的静态方法valueOf()即可获得。

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

有了Table对象，下一步需要获取待添加的数据对象。添加数据时使用Put对象。首先实例化一个Put对象，之后为其添加各种信息。Put类的构造函数中需要传入数据的行键（以byte[]数组的形式），Bytes类中的静态toBytes方法可以很方便地将字符串转换成byte[]数组。Put对象的addColumn()方法为put对象设置其他信息，如列族、列名和值，同样以byte[]数组的形式。Put类在org.apache.hadoop.hbase.client包。

        // 4 获取put对象
        Put put = new Put(Bytes.toBytes("row1"));
        put.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("name"), Bytes.toBytes("zhangsan"));

Table对象和Put对象都获取完毕后，通过Table对象的put()方法添加数据。该方法需要传入一个Put对象。

        // 5 向表中添加数据
        table.put(put);

最后，关闭资源。

        // 6 关闭资源
        table.close();
        connection.close();

3 查看数据（get）

    // 2 查看数据（get）
    @Test
    public void getData() throws IOException {
        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

        // 4 获取get对象
        Get get = new Get(Bytes.toBytes("row1"));
        get.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("name"));

        // 5 获取result对象并解析
        Result result = table.get(get);
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.println("RowKey: " + Bytes.toString(CellUtil.cloneRow(cell))
                    + " ColumnFamily: " + Bytes.toString(CellUtil.cloneFamily(cell))
                    + " ColumnName: " + Bytes.toString(CellUtil.cloneQualifier(cell))
                    + " Value: " + Bytes.toString(CellUtil.cloneValue(cell)));
        }

        // 6 关闭资源
        table.close();
        connection.close();
    }

View Code

查看数据有两种，分别是get和scan，下面先介绍get。前三个步骤都是一样的。

        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

同样地，有了Table对象，下一步需要获取待查看的数据对象。查看数据时使用Get对象。Get类的构造函数和Get对象的addColumn()方法与Put类似。Get类在org.apache.hadoop.hbase.client包。

        // 4 获取get对象
        Get get = new Get(Bytes.toBytes("row1"));
        get.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("name"));

有了Table对象和Get对象之后，通过Table对象的get()方法获取一个Result对象，其中包含着所需要的数据信息。在获取Result对象后，需要对其进行解析。Cell是HBase中的一个概念，是由{行键、列族、列名、时间戳}唯一确定的单元，对于具有相同行键、列族、列名的数据，如果其时间戳不同，则值可能不同（HBase的多版本机制）。Result对象的rawCells()方法返回一个Cell[]数组，其中包含着指定行键、列族、列名的所有Cell对象（如果只有一个时间戳，则Cell[]数组中只有一个Cell对象）。对Cell[]数组进行遍历，打印其行键、列族、列名、值。CellUtil类的静态cloneRow()、cloneFamily()、cloneQualifier()、cloneValue()方法分别返回Cell对象的行键、列族、列名和值（以byte[]数组的形式）。通过Bytes类的静态toString()方法将byte[]数组形式的值转换成字符串形式。Result类在org.apache.hadoop.hbase.client包，Cell接口和CellUtil类在org.apache.hadoop.hbase包。

        // 5 获取result对象并解析
        Result result = table.get(get);
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.println("RowKey: " + Bytes.toString(CellUtil.cloneRow(cell))
                    + " ColumnFamily: " + Bytes.toString(CellUtil.cloneFamily(cell))
                    + " ColumnName: " + Bytes.toString(CellUtil.cloneQualifier(cell))
                    + " Value: " + Bytes.toString(CellUtil.cloneValue(cell)));
        }

最后，关闭资源。

        // 6 关闭资源
        table.close();
        connection.close();

4 查看数据（scan）

    // 3 查看数据（scan）
    @Test
    public void scanData() throws IOException {
        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

        // 4 获取scan对象
        Scan scan = new Scan(Bytes.toBytes("row1"), Bytes.toBytes("row4"));
        ResultScanner scanner = table.getScanner(scan);

        // 5 解析scan对象
        for (Result result : scanner) {
            Cell[] cells = result.rawCells();
            for (Cell cell : cells) {
                System.out.println("RowKey: " + Bytes.toString(CellUtil.cloneRow(cell))
                        + " ColumnFamily: " + Bytes.toString(CellUtil.cloneFamily(cell))
                        + " ColumnName: " + Bytes.toString(CellUtil.cloneQualifier(cell))
                        + " Value: " + Bytes.toString(CellUtil.cloneValue(cell)));
            }
        }

        // 6 关闭资源
        table.close();
        connection.close();
    }

View Code

首先获取配置信息、创建连接对象、获取表对象。

        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

创建Scan对象，构造函数为多条数据起始、终止的行键，顺序为字典序，范围为左闭右开。通过Table对象的getScanner()方法获取ResultScanner对象，该方法需要传入一个Scan对象。ResultScanner对象包含我们需要查看的数据的所有信息。Scan类和ResultScanner接口均在org.apache.hadoop.hbase.client包。

        // 4 获取scan对象
        Scan scan = new Scan(Bytes.toBytes("row1"), Bytes.toBytes("row4"));
        ResultScanner scanner = table.getScanner(scan);

ResultScanner对象包含许多的Result对象，Result对象的概念及处理方法与上文相同。可以认为通过Get对象查看数据获取了一个Result对象，通过Scan对象查看数据获取了若干个Result对象，而对Result对象的处理都是通过Cell对象来进行的。

        // 5 解析scan对象
        for (Result result : scanner) {
            Cell[] cells = result.rawCells();
            for (Cell cell : cells) {
                System.out.println("RowKey: " + Bytes.toString(CellUtil.cloneRow(cell))
                        + " ColumnFamily: " + Bytes.toString(CellUtil.cloneFamily(cell))
                        + " ColumnName: " + Bytes.toString(CellUtil.cloneQualifier(cell))
                        + " Value: " + Bytes.toString(CellUtil.cloneValue(cell)));
            }
        }

最后，关闭资源。

        // 6 关闭资源
        table.close();
        connection.close();

在运行程序前，先向student表中添加几条数据。

程序运行结果如下。

5 删除数据

    // 4 删除数据
    @Test
    public void deleteData() throws IOException {
        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

        // 4 获取delete对象
        Delete delete = new Delete(Bytes.toBytes("row4"));
        // delete.addColumns(Bytes.toBytes("info1"), Bytes.toBytes("name"));  // 删除指定列族、列名的数据
        // delete.addFamily(Bytes.toBytes("info1"));  // 删除指定列族的数据
        
        // 5 删除表中的数据
        table.delete(delete);

        // 6 关闭资源
        table.close();
        connection.close();
    }

View Code

首先获取配置信息、创建连接对象、获取表对象。

        // 1 获取配置信息
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        // 2 创建连接对象
        Connection connection = ConnectionFactory.createConnection(conf);

        // 3 获取表对象
        Table table = connection.getTable(TableName.valueOf("student"));

下面获取Delete对象，构造函数中传入数据的行键。Delete类在org.apache.hadoop.hbase.client包。

        // 4 获取delete对象
        Delete delete = new Delete(Bytes.toBytes("row4"));

通过Table对象的delete()方法执行删除操作，该方法需要传入一个Delete对象。此时，将student表中行键为row4的数据删除。

        // 5 删除表中的数据
        table.delete(delete);

最后，关闭资源。

        // 6 关闭资源
        table.close();
        connection.close();

此外，在第4步获取Delete对象时，还可以通过Delete对象的addColumns()方法删除指定列族、列名的数据，通过Delete对象的addFamily()方法删除指定列族的数据。

        // delete.addColumns(Bytes.toBytes("info1"), Bytes.toBytes("name"));  // 删除指定列族、列名的数据
        // delete.addFamily(Bytes.toBytes("info1"));  // 删除指定列族的数据

6 修改数据

HBase中，修改数据可以看作是重新Put添加数据，程序代码都是一样的。此时对于相同行键、列族、列名的数据，会有多个不同时间戳的版本。

参考：
尚硅谷HBase教程(hbase框架快速入门)