python2.7.12操作Hbase

前置条件:您已经安装好Hbase、python2.7

题外话:最好自己安装个虚拟环境,以下操作都是在虚拟环境中的

(ma) hadoop@master:/usr/local/pycharm/bin$ sudo pip install thrift
[sudo] password for hadoop:
The directory '/home/hadoop/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/hadoop/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting thrift
  Downloading thrift-0.10.0.zip (87kB)
    100% |████████████████████████████████| 92kB 415kB/s
Requirement already satisfied: six>=1.7.2 in /usr/local/lib/python2.7/dist-packages (from thrift)
Installing collected packages: thrift
  Running setup.py install for thrift ... done
Successfully installed thrift-0.10.0
 
(ma) hadoop@master:/usr/local/pycharm/bin$ sudo pip install hbase-thrift
[sudo] password for hadoop:
The directory '/home/hadoop/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/hadoop/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting hbase-thrift
  Downloading hbase-thrift-0.20.4.tar.gz
Requirement already satisfied: Thrift in /usr/local/lib/python2.7/dist-packages (from hbase-thrift)
Requirement already satisfied: six>=1.7.2 in /usr/local/lib/python2.7/dist-packages (from Thrift->hbase-thrift)
Installing collected packages: hbase-thrift
  Running setup.py install for hbase-thrift ... done
Successfully installed hbase-thrift-0.20.4


Hbase的bin目录下启动bin/./hbase-daemon.sh start thrift
hadoop@master:/opt/Hadoop/hbase-1.3.1/bin$ ./hbase-daemon.sh start thrift
启动pycharm
注意在虚拟环境中启动,其它环境中有可能程序运行不了。
(ma) hadoop@master:/usr/local/pycharm/bin$ ./pycharm.sh


参考文档:http://www.cnblogs.com/hitandrew/archive/2013/01/21/2870419.html,此文档中有的例子运行有问题

创建hbase表:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

from hbase import Hbase
from hbase.ttypes import *

transport = TSocket.TSocket('localhost', 9090);

transport = TTransport.TBufferedTransport(transport)

protocol = TBinaryProtocol.TBinaryProtocol(transport);

client = Hbase.Client(protocol)
transport.open()


contents = ColumnDescriptor(name='cf:', maxVersions=1)
client.createTable('test', [contents])

print client.getTableNames()


输出内容:
/usr/bin/python2.7 /home/py/PycharmProjects/ThirdTest/testThrift.py
['member', 'test']

Process finished with exit code 0


在hbase shell中用list查看有刚才创建的test.

插入数据:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

from hbase import Hbase

from hbase.ttypes import *

transport = TSocket.TSocket('localhost', 9090)

transport = TTransport.TBufferedTransport(transport)

protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)

transport.open()

row = 'row-key1'

mutations = [Mutation(column="cf:a", value="1")]
client.mutateRow('test', row, mutations)

在hbase shell中用scan 'test'查看有刚才创建的test.

hbase(main):001:0> scan 'test'
ROW                   COLUMN+CELL                                               
 row-key1             column=cf:a, timestamp=1506406128150, value=1             
1 row(s) in 0.3570 seconds


获取一行数据:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

from hbase import Hbase
from hbase.ttypes import *

transport = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(transport)

protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)

transport.open()

tableName = 'test'
rowKey = 'row-key1'

result = client.getRow(tableName, rowKey)
print result
for r in result:
    print 'the row is ' , r.row
    print 'the values is ' , r.columns.get('cf:a').value



输出内容:

/usr/bin/python2.7 /home/py/PycharmProjects/ThirdTest/getOneRow.py
[TRowResult(columns={'cf:a': TCell(timestamp=1506406612641, value='2')}, row='row-key1')]
the row is  row-key1
the values is  2


查询多行:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

from hbase import Hbase
from hbase.ttypes import *

transport = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(transport)

protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)
transport.open()


tableName = 'test'
id = client.scannerOpenWithStop(tableName,'','','')

result2 = client.scannerGetList(id, 10)

print result2

输出内容:

/usr/bin/python2.7 /home/py/PycharmProjects/ThirdTest/getMultiRow.py
[TRowResult(columns={'cf:a': TCell(timestamp=1506406612641, value='2')}, row='row-key1'), TRowResult(columns={'cf:a': TCell(timestamp=1506406650902, value='2')}, row='row-key2')]

























原文地址:https://www.cnblogs.com/herosoft/p/8134173.html