Hive操作笔记

hive库清表，删除数据

　　insert overwrite table lorry.bigdata select * from lorry.bigdata where 1=0

hive的simple模式

　　hive的select如果是fetch模式（select <columnname> from [where] [limit]），是不是就没有资源管控了，此时不走map-reduce；其实通过在YARN的Resource Manager你就可以观察到，如果是select count(*)...就可以看到任务信息；如果是select * from...这种没有聚合函数的普通select查询，是不会再任务列表中显示信息；这意味着fetch模式（普通查询语句），是不走YARN的资源分配，但是，这是否意味着select是无法进行跟踪和处理的呢？
　　关于使用fetch，通过hive.fetch.task.conversion进行配置，有三种选项（cloudera中配置）：
　　1. none，不启动fetch；查询始终都走map-reduce来处理查询；
　　2. minimal，原文：SELECT STAR, FILTER on partition columns, LIMIT only；select开头，where语句在分区列上面；limit；where条件限制以及limit两者满足一者即可；
　　3. more，原文：SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)，more，相比于minimal，就不在限制于where条件要是分区列；所以，一般如果采用fetch模式则设置为more即可；
 
　　在实践中并没有发现minimal和more的区别，所以如果是正常使用fetch，就设置为more，如果关闭fetch（始终使用map-reduce）就使用none配置
Connection refused
　　Could not open connection to jdbc:hive2://localhost:10000/default: java.net.ConnectException: Connection refused (state=08S01,code=0)
　　后来发现，原来是IP错误；部署hive2service的机器（41）和Thrift服务（42）是分开部署的，我执行beeline是在hive2service机器执行的，后来beeline链接的时候路径改为Thrift service部署的机器，问题解决。
beeline链接字符串

　　!connect jdbc:hive2://10.1.108.65:10000/default;principal=hive/slave1@BD.COM;auth=kerberos;kerberosAuthType=fromSubject;
　　beeline -u "jdbc:hive2://10.1.108.65:10000/default;principal=hive/slave1@BD.COM;auth-kerberos"