简介:
hive是建立在hadoop之上的数据仓库,一般用于对大型数据集的读写和管理,存在hive里的数据实际上就是存在HDFS上,都是以文件的形式存在,不能进行读写操作,所以我们需要元数据或者说叫schem来对hdfs上的数据进行管理。那元数据表之间有没有什么关联呢?答案是肯定的。hive默认元数据表是存储在derby中的,但是derby是单session的,所以我们一般会修改会mysql
那么该如何启用mysql来管理元数据呢?
1 <configuration>
2 <property>
3 <name>javax.jdo.option.ConnectionURL</name>
4 <value>jdbc:mysql://hadoop001:3306/ruoze_d6?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
5 </property>
6 <property>
7 <name>javax.jdo.option.ConnectionDriverName</name>
8 <value>com.mysql.jdbc.Driver</value>
9 </property>
10 <property>
11 <name>javax.jdo.option.ConnectionUserName</name>
12 <value>root</value>
13 </property>
14 <property>
15 <name>javax.jdo.option.ConnectionPassword</name>
16 <value>123456</value>
17 </property>
18 </configuration>
以上的配置就会启用MYSQL管理元数据
第4行的配置是配置了mysql里的数据库名字叫ruoze_d6,第11行和第16行式配置了MySQL的登录用户名和密码,并且ruoze_d6这个库不需要在mysql中特别建立
mysql> use ruoze_d6;
Database changed
mysql>
1 mysql> show tables;
2 +---------------------------+
3 | Tables_in_ruoze_d6 |
4 +---------------------------+
5 | bucketing_cols |
6 | cds |
7 | columns_v2 |
8 | database_params |
9 | dbs |
10 | func_ru |
11 | funcs |
12 | global_privs |
13 | groupinfor |
14 | idxs |
15 | index_params |
16 | makedata_job |
17 | part_col_privs |
18 | part_col_stats |
19 | part_privs |
20 | partition_key_vals |
21 | partition_keys |
22 | partition_params |
23 | partitions |
24 | roles |
25 | sd_params |
26 | sds |
27 | sequence_table |
28 | serde_params |
29 | serdes |
30 | skewed_col_names |
31 | skewed_col_value_loc_map |
32 | skewed_string_list |
33 | skewed_string_list_values |
34 | skewed_values |
35 | sort_cols |
36 | tab_col_stats |
37 | table_params |
38 | tbl_col_privs |
39 | tbl_privs |
40 | tbls |
41 | version |
42 | |
43 +---------------------------+
44 37 rows in set (0.00 sec)
这里一共有37张表, 我们捡主次分析一下
mysql> select * from version ;
+--------+----------------+----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+----------------------------------------+
| 11 | 1.1.0 | Set by MetaStore hadoop@172.16.202.233 |
+--------+----------------+----------------------------------------+
1 row in set (0.00 sec)
- 第一列是ID主键;第二列是hive的版本,第三列是版本说明,并且这张表里只有一条数据,且只能有一条数据,如果这张表被删除,当启动Hive-Cli时候,就会报错”Table ‘hive.version’ doesn’t exist”。
- 但是前提示关闭某个参数,如果那个参数开着,那么你如果删除了这张表或者说清空这张表,他都会自动建立,那个参数我忘记是啥了,回头想起来会来补上
mysql> select * from DBS G;
*************************** 1. row ***************************
DB_ID: 1
DESC: Default Hive database
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse
NAME: default
OWNER_NAME: public
OWNER_TYPE: ROLE
*************************** 2. row ***************************
DB_ID: 6
DESC: NULL
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/hadoop_g6.db
NAME: hadoop_g6
OWNER_NAME: hadoop
OWNER_TYPE: USER
*************************** 3. row ***************************
DB_ID: 11
DESC: NULL
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/ruoze_d6.db
NAME: ruoze_d6
OWNER_NAME: hadoop
OWNER_TYPE: USER
3 rows in set (0.00 sec)
列名 |
解释 |
DB_ID
|
数据库ID
|
DESC
|
数据库描述
|
DB_LOCATION_URI
|
数据库HDFS路径
|
NAME
|
数据库名
|
OWNER_NAME
|
数据库所有者用户名
|
OWNER_TYPE
|
所有者角色
|
-
database_params(hive数据库相关的元数据表)
mysql> desc database_params;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| DB_ID | bigint(20) | NO | PRI | NULL | |
| PARAM_KEY | varchar(180) | NO | PRI | NULL | |
| PARAM_VALUE | varchar(4000) | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+
- 说明:该表存储数据库的相关参数,在CREATE DATABASE时候用 WITH DBPROPERTIES (property_name=property_value, …)指定的参数
字段 |
说明 |
示例 |
DB_ID
|
数据库ID |
11 |
PARAM_KEY
|
参数名 |
createby |
PARAM_VALUE
|
参数值 |
root |
mysql> select * from TBLS G;
*************************** 1. row ***************************
TBL_ID: 37
CREATE_TIME: 1555494334
DB_ID: 1
LAST_ACCESS_TIME: 0
OWNER: hadoop
RETENTION: 0
SD_ID: 37
TBL_NAME: makedata_job
TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
- 说明:该表中存储Hive表、视图、索引表的基本信息。
TBL_ID
|
表ID |
CREATE_TIME
|
创建时间 |
DB_ID
|
数据库ID |
LAST_ACCESS_TIME
|
上次访问时间 |
OWNER
|
所有者 |
RETENTION
|
保留字段 |
SD_ID
|
序列化配置信息(对应SDS表中的SD_ID) |
TBL_NAME
|
表名 |
TBL_TYPE
|
表类型 |
VIEW_EXPANDED_TEXT
|
视图详细的HQL语句 |
VIEW_ORIGINAL_TEXT
|
视图原始的HQL语句 |
|
|
-
table_params(Hive表和视图相关的元数据表)
mysql> select * from table_params;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+-----------------------+-------------+
| 37 | COLUMN_STATS_ACCURATE | true |
| 37 | numFiles | 5 |
| 37 | numRows | 0 |
| 37 | rawDataSize | 0 |
| 37 | totalSize | 2921282 |
| 37 | transient_lastDdlTime | 1555551458 |
| 42 | EXTERNAL | TRUE |
| 42 | transient_lastDdlTime | 1555555620 |
| 46 | COLUMN_STATS_ACCURATE | true |
| 46 | numFiles | 1 |
| 46 | numRows | 500000 |
| 46 | rawDataSize | 72051224 |
| 46 | totalSize | 30284817 |
| 46 | transient_lastDdlTime | 1555557177 |
| 51 | EXTERNAL | TRUE |
| 51 | transient_lastDdlTime | 1555772013 |
| 52 | COLUMN_STATS_ACCURATE | true |
| 52 | numFiles | 1 |
| 52 | numRows | 500000 |
| 52 | rawDataSize | 67551224 |
| 52 | totalSize | 75265591 |
| 52 | transient_lastDdlTime | 1555772485 |
| 56 | COLUMN_STATS_ACCURATE | true |
| 56 | numFiles | 1 |
| 56 | numRows | 500000 |
| 56 | rawDataSize | 64051224 |
| 56 | totalSize | 64641768 |
| 56 | transient_lastDdlTime | 1555773864 |
| 66 | COLUMN_STATS_ACCURATE | true |
| 66 | numFiles | 1 |
| 66 | numRows | 500000 |
| 66 | rawDataSize | 359000000 |
| 66 | totalSize | 17782969 |
| 66 | transient_lastDdlTime | 1555775575 |
| 67 | COLUMN_STATS_ACCURATE | true |
| 67 | numFiles | 1 |
| 67 | numRows | 500000 |
| 67 | orc.compress | NONE |
| 67 | rawDataSize | 359000000 |
| 67 | totalSize | 53967047 |
| 67 | transient_lastDdlTime | 1555775880 |
| 68 | COLUMN_STATS_ACCURATE | true |
| 68 | numFiles | 1 |
| 68 | numRows | 500000 |
| 68 | rawDataSize | 4000000 |
| 68 | totalSize | 61117546 |
| 68 | transient_lastDdlTime | 1555776185 |
| 69 | COLUMN_STATS_ACCURATE | true |
| 69 | numFiles | 1 |
| 69 | numRows | 500000 |
| 69 | rawDataSize | 4000000 |
| 69 | totalSize | 16854027 |
| 69 | transient_lastDdlTime | 1555776356 |
| 71 | COLUMN_STATS_ACCURATE | true |
| 71 | numFiles | 1 |
| 71 | numRows | 1 |
| 71 | rawDataSize | 0 |
| 71 | totalSize | 1 |
| 71 | transient_lastDdlTime | 1555809751 |
| 76 | transient_lastDdlTime | 1555836141 |
| 77 | COLUMN_STATS_ACCURATE | true |
| 77 | numFiles | 1 |
| 77 | numRows | 0 |
| 77 | rawDataSize | 0 |
| 77 | totalSize | 366 |
| 77 | transient_lastDdlTime | 1555837173 |
+--------+-----------------------+-------------+
字段 |
dec |
TBL_ID
|
表ID(对应TBLS中的TBL_ID) |
PARAM_KEY
|
属性名 |
PARAM_VALUES
|
属性值 |
-
TBL_PRIVS 该表存储表/视图的授权信息(不做详细说明)
mysql> desc TBL_PRIVS;
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| TBL_GRANT_ID | bigint(20) | NO | PRI | NULL | |
| CREATE_TIME | int(11) | NO | | NULL | |
| GRANT_OPTION | smallint(6) | NO | | NULL | |
| GRANTOR | varchar(128) | YES | | NULL | |
| GRANTOR_TYPE | varchar(128) | YES | | NULL | |
| PRINCIPAL_NAME | varchar(128) | YES | | NULL | |
| PRINCIPAL_TYPE | varchar(128) | YES | | NULL | |
| TBL_PRIV | varchar(128) | YES | | NULL | |
| TBL_ID | bigint(20) | YES | MUL | NULL | |
+----------------+--------------+------+-----+---------+-------+
9 rows in set (0.01 sec)
TBL_ID对应TBLS中的TBL_ID
mysql> desc sds;
+---------------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+---------------+------+-----+---------+-------+
| SD_ID | bigint(20) | NO | PRI | NULL | |
| CD_ID | bigint(20) | YES | MUL | NULL | |
| INPUT_FORMAT | varchar(4000) | YES | | NULL | |
| IS_COMPRESSED | bit(1) | NO | | NULL | |
| IS_STOREDASSUBDIRECTORIES | bit(1) | NO | | NULL | |
| LOCATION | varchar(4000) | YES | | NULL | |
| NUM_BUCKETS | int(11) | NO | | NULL | |
| OUTPUT_FORMAT | varchar(4000) | YES | | NULL | |
| SERDE_ID | bigint(20) | YES | MUL | NULL | |
+---------------------------+---------------+------+-----+---------+-------+
SD_ID
|
|
CD_ID
|
字段信息ID |
INPUT_FORMAT
|
文件输入格式 |
IS_COMPRESSED
|
是否压缩 |
IS_STOREDASSUBDIRECTORIES
|
是否以子目录存储 |
LOCATION
|
HDFS路径 |
NUM_BUCKETS
|
分桶数量 |
OUTPUT_FORMAT
|
文件输出格式 |
SERDE_ID
|
序列化类ID |
字段 |
说明 |
-
SD_PARAMS(Hive文件存储信息相关的元数据表)
mysql> desc SD_PARAMS;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| SD_ID | bigint(20) | NO | PRI | NULL | |
| PARAM_KEY | varchar(256) | NO | PRI | NULL | |
| PARAM_VALUE | varchar(4000) | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
- 说明:该表存储Hive存储的属性信息,在创建表时候使用
STORED BY ‘storage.handler.class.name’ [WITH SERDEPROPERTIES (…)指定。
-
serdes(Hive文件存储信息相关的元数据表)
mysql> select * from serdes;
+----------+------+-------------------------------------------------------------+
| SERDE_ID | NAME | SLIB |
+----------+------+-------------------------------------------------------------+
| 37 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 42 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 43 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 46 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 51 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 52 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 56 | NULL | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe |
| 66 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde |
| 67 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde |
| 68 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
| 69 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
| 71 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 76 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 77 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+----------+------+-------------------------------------------------------------+
14 rows in set (0.00 sec)
字段 |
字段说明 |
SERDE_ID
|
序列化类配置ID(对应SDS的SERDE_ID )
|
NAME
|
序列化类别名 |
SLIB
|
序列化类 |
-
serde_params(Hive文件存储信息相关的元数据表)
mysql> select * from serde_params;
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY | PARAM_VALUE |
+----------+----------------------+-------------+
| 37 | field.delim | |
| 37 | serialization.format | |
| 42 | field.delim | |
| 42 | serialization.format | |
| 43 | field.delim | |
| 43 | serialization.format | |
| 46 | serialization.format | 1 |
| 51 | field.delim | |
| 51 | serialization.format | |
| 52 | serialization.format | 1 |
| 56 | serialization.format | 1 |
| 66 | serialization.format | 1 |
| 67 | serialization.format | 1 |
| 68 | serialization.format | 1 |
| 69 | serialization.format | 1 |
| 71 | serialization.format | 1 |
| 76 | field.delim | |
| 76 | serialization.format | |
| 77 | field.delim | |
| 77 | serialization.format | |
+----------+----------------------+-------------+
20 rows in set (0.00 sec)
- 说明:该表存储序列化的一些属性、格式信息,比如:行、列分隔符
字段 |
字段说明 |
SERDE_ID
|
序列化类配置ID(对应SDS的SERDE_ID ) |
PARAM_KEY
|
属性名 |
PARAM_VALUE
|
属性值 |
mysql> select * from columns_v2;
+-------+---------+-------------+--------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+--------------+-------------+
| 37 | NULL | ip | varchar(20) | 4 |
| 37 | NULL | levelnm | varchar(6) | 2 |
| 37 | NULL | region | varchar(6) | 1 |
| 37 | NULL | time_random | varchar(20) | 3 |
| 37 | NULL | traffic | varchar(12) | 7 |
| 37 | NULL | urlid | varchar(100) | 6 |
| 37 | NULL | urlnm | varchar(6) | 0 |
| 37 | NULL | urlym | varchar(20) | 5 |
| 42 | NULL | cdn | string | 0 |
| 42 | NULL | domain | string | 5 |
| 42 | NULL | ip | string | 4 |
| 42 | NULL | level | string | 2 |
| 42 | NULL | region | string | 1 |
| 42 | NULL | time | string | 3 |
| 42 | NULL | traffic | bigint | 7 |
| 42 | NULL | url | string | 6 |
+-------+---------+-------------+--------------+-------------+
17 rows in set (0.00 sec)
字段 |
字段说明 |
CD_ID
|
字段信息ID(对应表SDS的CD_ID) |
COMMENT
|
字段注释 |
COLUMN_NAME
|
字段名 |
TYPE_NAME
|
字段类型 |
INTEGER_IDX
|
字段顺序 |
-
partitions(Hive表分区相关的元数据表)
mysql> select * from partitions ;
+---------+-------------+------------------+--------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------+-------+--------+
| 21 | 1555555926 | 0 | day=20190418 | 43 | 42 |
+---------+-------------+------------------+--------------+-------+--------+
1 row in set (0.00 sec)
字段 |
字段说明 |
PART_ID
|
分区ID |
CREATE_TIME
|
分区创建时间 |
LAST_ACCESS_TIME
|
最后一次访问时间 |
PART_NAME
|
分区名称 |
SD_ID
|
分区存储ID |
TBL_ID
|
表ID |
-
partition_keys(Hive表分区相关的元数据表)
mysql> select * from partition_keys;
+--------+--------------+-----------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-----------+-----------+-------------+
| 42 | NULL | day | string | 0 |
+--------+--------------+-----------+-----------+-------------+
1 row in set (0.00 sec)
字段名称 |
字段说明 |
TBL_ID
|
表ID |
PKEY_COMMENT
|
分区字段说明 |
PKEY_NAME
|
分区字段名称 |
PKEY_TYPE
|
分区字段类型 |
INTEGER_IDX
|
分区字段顺序 |
-
partition_key_vals(Hive表分区相关的元数据表)
mysql> select * from partition_key_vals;
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
| 21 | 20190418 | 0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)
字段 |
字段说明 |
PART_ID
|
分区ID |
PART_KEY_VAL
|
分区字段值 |
INTEGER_IDX
|
分区字段值顺序 |
-
partition_params(Hive表分区相关的元数据表)
mysql> select * from partition_params;
+---------+-----------------------+-------------+
| PART_ID | PARAM_KEY | PARAM_VALUE |
+---------+-----------------------+-------------+
| 21 | COLUMN_STATS_ACCURATE | true |
| 21 | numFiles | 1 |
| 21 | totalSize | 29975501 |
| 21 | transient_lastDdlTime | 1555556171 |
+---------+-----------------------+-------------+
4 rows in set (0.00 sec)
字段 |
字段说明 |
PART_ID
|
分区ID |
PARAM_KEY
|
分区属性名 |
PARAM_VALUE
|
分区属性值 |
![](https://img2018.cnblogs.com/blog/1326937/201904/1326937-20190423161606388-1928437758.png)
此图转载于https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQ
参考博客:https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQ