python3.7 利用pyhive 连接上hive(亲测可用)

来python爬虫中，经常会遇到数据的存储问题，如果有大量数据，hive存储是个不错的选择。

那么python如何来连接hive呢？网上有各种教程但是都不是很好用，亲自测试pyhive可用

要求：可用的hive环境 python3++ hive环境必须要安装hiveserver2(

HiveServer是一种可选服务，允许远程客户端可以使用各种编程语言向Hive提交请求并检索结果。HiveServer是建立在Apache ThriftTM（http://thrift.apache.org/）之上的，因此有时会被称为Thrift Server，这可能会导致混乱，因为新服务HiveServer2也是建立在Thrift之上的．自从引入HiveServer2后，HiveServer也被称为HiveServer1。

)

下载需求包

pip install sasl

pip install thrift

pip install thrift-sasl

pip install PyHive

连接hive 注意端口这里是hiveserver2的端口默认为10000

from pyhive import hive
conn = hive.Connection(host='10.8.13.120', port=10000, username='hdfs', database='default')
cursor = conn.cursor()
cursor.execute('show tables')

for result in cursor.fetchall():
    print(result)

WINDOS篇参考 https://ask.hellobi.com/blog/ysfyb/18251

注意 WINDOWS 用pyhive会有问题，且目前无法解决。所以选择

python3.7 利用pyhive 连接上hive(亲测可用)

impala