005-hive概述，计算原理及模型

计算原理及模型

　　优化的根本思想：

　　　　尽早尽量过滤数据，减少每个阶段的数据量

　　　　减少job数

　　　　解决数据倾斜问题

Hive概述

名称
hive系统架构	metastore	derby mysql
	HDFS	/usr/hive/warehouse
	Mapreduce
hive配置文件	hive-env.sh hive-site.xml hive-log4j.properties
hive命令行	hive --config
	hive shell	quit、exit reset set add、list、delete FILES !<command> dfs<command> HQL source files
	hive service	hive --service cli hive --service hiveserver hive --service metastore hive --service hwi hive --service jar
HiveQL	语法关键字	show databases show PARTITIONS show tables create table load data(local) inpath select * from desc、alert/drop limit、as、case when then、union like、group by、having order by、sort by cluster by
	数据类型	简单类型	tinyint、smallint、int、bigint float、double boolean string timestamp binary
		复杂类型	array map struct
	表	内部表
		外部表	HDFS HBase Cassandra DynamoDB
	表查询	单表查询 inner joins outer joins Semi joins map joins 子查询视图
数据表设计	每日一表每日一表分区按桶分散数据
Hive优化	表分区Partitions 表存储桶buckets 表压缩索引→bitmap indexes 执行计划控制Mappers、reducer数量
访问方式	Hive shell java jdbc api thrift client Rhive
自定义函数	自定义函数UDF 自定义聚合函数UADF
Hive安全	认证	hive.files.umask.value hive.metastore.authorization.storage.checks hive.metastore.execute.setugi
	授权	hive.security.authorization.enabled hive.security.authorization.createtable.owner.grants hive.security.authorization.createtable.user.grants
	权限模型	User Group Role
web控制台	hwi:9999
软件集成	zookeeper thrift Ooize HCatalog AWS

图解示例

HIve-mapreduce