【Hive】数据类型实战

Hive中的数据类型可以分为两种，分别是基本数据类型和复杂数据类型，本篇将通过一个例子来说明：

1 数据样本

[hadoop@strong ~]$ vim employee 
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer^DLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

2 创建表

0: jdbc:hive2://localhost:10000/hive> create table employee(
. . . . . . . . . . . . . . . . . . > name string,
. . . . . . . . . . . . . . . . . . > work_place array<string>,
. . . . . . . . . . . . . . . . . . > sex_age struct<sex:string,age:int>,
. . . . . . . . . . . . . . . . . . > skills_score map<string,int>,
. . . . . . . . . . . . . . . . . . > depart_title map<string,array<string>>)
. . . . . . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . . . . . > fields terminated by '|'
. . . . . . . . . . . . . . . . . . > collection items terminated by ','
. . . . . . . . . . . . . . . . . . > map keys terminated by ':';
No rows affected (3.454 seconds)

3 查看表创建

0: jdbc:hive2://localhost:10000/hive> !table employee
+------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  | TYPE_CAT  | TYPE_SCHEM  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  |
+------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
|            | hive         | employee    | TABLE       | NULL     | NULL      | NULL        | NULL       | NULL                       | NULL            |
+------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
0: jdbc:hive2://localhost:10000/hive>

4 加载数据

0: jdbc:hive2://localhost:10000/hive> load data local inpath '/home/hadoop/employee' overwrite into table employee;
No rows affected (3.957 seconds)

5 查询数据

1）查询整个数组与单个数组列

0: jdbc:hive2://localhost:10000/hive> select work_place from employee;
+-------------------------+
|       work_place        |
+-------------------------+
| ["Montreal","Toronto"]  |
| ["Montreal"]            |
| ["New York"]            |
| ["Vancouver"]           |
+-------------------------+
4 rows selected (0.431 seconds)
0: jdbc:hive2://localhost:10000/hive> select work_place[0] as c1,work_place[1] as c2,work_place[2] from employee; 
+------------+----------+-------+
|     c1     |    c2    |  _c2  |
+------------+----------+-------+
| Montreal   | Toronto  | NULL  |
| Montreal   | NULL     | NULL  |
| New York   | NULL     | NULL  |
| Vancouver  | NULL     | NULL  |
+------------+----------+-------+
4 rows selected (0.775 seconds)

2）查询整个结构体和单个结构体列

0: jdbc:hive2://localhost:10000/hive> select sex_age from employee;
+----------------------------+
|          sex_age           |
+----------------------------+
| {"sex":"Male","age":30}    |
| {"sex":"Male","age":35}    |
| {"sex":"Female","age":27}  |
| {"sex":"Female","age":57}  |
+----------------------------+
4 rows selected (0.895 seconds)
0: jdbc:hive2://localhost:10000/hive> select sex_age.sex ,sex_age.age from employee;
+---------+------+
|   sex   | age  |
+---------+------+
| Male    | 30   |
| Male    | 35   |
| Female  | 27   |
| Female  | 57   |
+---------+------+
4 rows selected (0.825 seconds)

3）查询整个map和单个map列

0: jdbc:hive2://localhost:10000/hive> select skills_score from employee;
+-----------------------+
|     skills_score      |
+-----------------------+
| {"DB":80}             |
| {"Perl":85}           |
| {"Python":80}         |
| {"Sales":89,"HR":94}  |
+-----------------------+
4 rows selected (1.011 seconds)
0: jdbc:hive2://localhost:10000/hive> select name,skills_score['DB'] as db,skills_score['Perl'] as perl,
. . . . . . . . . . . . . . . . . . > skills_score['Python'] as python,skills_score['Sales'] as sales,skills_score['HR'] as hr
. . . . . . . . . . . . . . . . . . > from employee;
+----------+-------+-------+---------+--------+-------+
|   name   |  db   | perl  | python  | sales  |  hr   |
+----------+-------+-------+---------+--------+-------+
| michael  | 80    | NULL  | NULL    | NULL   | NULL  |
| Will     | NULL  | 85    | NULL    | NULL   | NULL  |
| Shelley  | NULL  | NULL  | 80      | NULL   | NULL  |
| Lucy     | NULL  | NULL  | NULL    | 89     | 94    |
+----------+-------+-------+---------+--------+-------+
4 rows selected (0.576 seconds)

4）查询组合数据类型

0: jdbc:hive2://localhost:10000/hive> select depart_title from employee;
+----------------------------------------+
|              depart_title              |
+----------------------------------------+
| {"Product":["Developer","Lead"]}       |
| {"Product":["Lead"],"Test":["Lead"]}   |
| {"Test":["Lead"],"COE":["Architect"]}  |
| {"Sales":["Lead"]}                     |
+----------------------------------------+
4 rows selected (0.511 seconds)
0: jdbc:hive2://localhost:10000/hive> select name,
. . . . . . . . . . . . . . . . . . > depart_title['Product'] as product,
. . . . . . . . . . . . . . . . . . > depart_title['Test'] as test,
. . . . . . . . . . . . . . . . . . > depart_title['COE'] as coe,
. . . . . . . . . . . . . . . . . . > depart_title['Sales'] as sales 
. . . . . . . . . . . . . . . . . . > from employee;
+----------+-----------------------+-----------+----------------+-----------+
|   name   |        product        |   test    |      coe       |   sales   |
+----------+-----------------------+-----------+----------------+-----------+
| Michael  | ["Developer","Lead"]  | NULL      | NULL           | NULL      |
| Will     | ["Lead"]              | ["Lead"]  | NULL           | NULL      |
| Shelley  | NULL                  | ["Lead"]  | ["Architect"]  | NULL      |
| Lucy     | NULL                  | NULL      | NULL           | ["Lead"]  |
+----------+-----------------------+-----------+----------------+-----------+
4 rows selected (0.524 seconds)
0: jdbc:hive2://localhost:10000/hive> select name,depart_title['Product'][0] as prd_c1,
. . . . . . . . . . . . . . . . . . > depart_title['Test'][0] as tst_c1
. . . . . . . . . . . . . . . . . . > from employee;
+----------+------------+---------+
|   name   |   prd_c1   | tst_c1  |
+----------+------------+---------+
| Michael  | Developer  | NULL    |
| Will     | Lead       | Lead    |
| Shelley  | NULL       | Lead    |
| Lucy     | NULL       | NULL    |
+----------+------------+---------+
4 rows selected (0.469 seconds)

注：Hive中默认的分隔符如下：

Row delimiter：Ctrl + A 或 ^A（创建表时用01）；
Collection item delimiter：Ctrl + B 或 ^B（创建表时用02）；
Map key delimiter：Ctrl + C 或 ^C（创建表时用03）；
嵌套Map的Array：Ctrl + D 或 ^D（创建表时用04）；