hive 集群初探，查询比较

在slave3安装MySQL作为hive 的matastore

[root@slave3 hadoop]# yum search mysql

[root@slave3 hadoop]# yum install mysql-server.x86_64

成功安装MySQL

Installed:

mysql-server.x86_64 0:5.1.73-5.el6_6

Dependency Installed:

mysql.x86_64 0:5.1.73-5.el6_6 perl.x86_64 4:5.10.1-136.el6_6.1 perl-DBD-MySQL.x86_64 0:4.013-3.el6

perl-DBI.x86_64 0:1.609-4.el6 perl-Module-Pluggable.x86_64 1:3.90-136.el6_6.1 perl-Pod-Escapes.x86_64 1:1.04-136.el6_6.1

perl-Pod-Simple.x86_64 1:3.13-136.el6_6.1 perl-libs.x86_64 4:5.10.1-136.el6_6.1 perl-version.x86_64 3:0.77-136.el6_6.1

Dependency Updated:

mysql-libs.x86_64 0:5.1.73-5.el6_6

Complete!

[root@slave3 hadoop]# service mysqld start

启动MySQL

修改MySQL登录密码

mysql> set password =password(‘root’);

Query OK, 0 rows affected (0.00 sec)

slave3cpu信息

内存

[hadoop@slave4 ~]$ free -m
total used free shared buffers cached
Mem: 1866 1798 68 0 7 1500
-/+ buffers/cache: 290 1575
Swap: 3999 0 3999

单位为ＭＢ

详细内存

[hadoop@slave4 ~]$ cat /proc/meminfo
MemTotal: 1911400 kB
MemFree: 66904 kB
Buffers: 7308 kB
Cached: 1539760 kB
SwapCached: 0 kB
Active: 135492 kB
Inactive: 1603924 kB
Active(anon): 93060 kB
Inactive(anon): 99604 kB
Active(file): 42432 kB
Inactive(file): 1504320 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4095992 kB
SwapFree: 4095992 kB
Dirty: 48 kB
Writeback: 0 kB
AnonPages: 192444 kB
Mapped: 15264 kB
Shmem: 220 kB
Slab: 73204 kB
SReclaimable: 17448 kB
SUnreclaim: 55756 kB
KernelStack: 1264 kB
PageTables: 2504 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5051692 kB
Committed_AS: 292260 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 540012 kB
VmallocChunk: 34359184520 kB
HardwareCorrupted: 0 kB
AnonHugePages: 159744 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 9792 kB
DirectMap2M: 2076672 kB

磁盘

查看硬盘和分区分布

[hadoop@slave4 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 232.9G 0 disk
鈹溾攢sda1 8:1 0 500M 0 part /boot
鈹斺攢sda2 8:2 0 232.4G 0 part
鈹溾攢vg_slave4-lv_root (dm-0) 253:0 0 50G 0 lvm /
鈹溾攢vg_slave4-lv_swap (dm-1) 253:1 0 3.9G 0 lvm [SWAP]
鈹斺攢vg_slave4-lv_home (dm-2) 253:2 0 178.5G 0 lvm /home

[root@slave4 hadoop]# fdisk -l

Disk /dev/sda: 250.1 GB, 250058268160 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x2c63be37

Device Boot Start End Blocks Id System
/dev/sda1 * 1 64 512000 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 64 30402 243684352 8e Linux LVM

Disk /dev/mapper/vg_slave4-lv_root: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg_slave4-lv_swap: 4194 MB, 4194304000 bytes
255 heads, 63 sectors/track, 509 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg_slave4-lv_home: 191.7 GB, 191650332672 bytes
255 heads, 63 sectors/track, 23300 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

dfs系统节点

[hadoop@slave4 ~]$ hadoop dfsadmin -report
Configured Capacity: 985770651648 (918.07 GB)
Present Capacity: 934353133568 (870.18 GB)
DFS Remaining: 590443302912 (549.89 GB)
DFS Used: 343909830656 (320.29 GB)
DFS Used%: 36.81%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

Datanodes available: 7 (7 total, 0 dead)

Name: 192.168.2.6:50010
Decommission Status : Normal
Configured Capacity: 188643102720 (175.69 GB)
DFS Used: 75069222912 (69.91 GB)
Non DFS Used: 9779052544 (9.11 GB)
DFS Remaining: 103794827264(96.67 GB)
DFS Used%: 39.79%
DFS Remaining%: 55.02%
Last contact: Sun Jul 05 10:52:17 CST 2015

Name: 192.168.2.13:50010
Decommission Status : Normal
Configured Capacity: 21277569024 (19.82 GB)
DFS Used: 136781824 (130.45 MB)
Non DFS Used: 1261211648 (1.17 GB)
DFS Remaining: 19879575552(18.51 GB)
DFS Used%: 0.64%
DFS Remaining%: 93.43%
Last contact: Sun Jul 05 10:52:17 CST 2015

Name: 192.168.2.9:50010
Decommission Status : Normal
Configured Capacity: 188643102720 (175.69 GB)
DFS Used: 58468474880 (54.45 GB)
Non DFS Used: 9779011584 (9.11 GB)
DFS Remaining: 120395616256(112.13 GB)
DFS Used%: 30.99%
DFS Remaining%: 63.82%
Last contact: Sun Jul 05 10:52:18 CST 2015

Name: 192.168.2.10:50010
Decommission Status : Normal
Configured Capacity: 188643102720 (175.69 GB)
DFS Used: 74225582080 (69.13 GB)
Non DFS Used: 9778978816 (9.11 GB)
DFS Remaining: 104638541824(97.45 GB)
DFS Used%: 39.35%
DFS Remaining%: 55.47%
Last contact: Sun Jul 05 10:52:17 CST 2015

Name: 192.168.2.11:50010
Decommission Status : Normal
Configured Capacity: 188643102720 (175.69 GB)
DFS Used: 63778144256 (59.4 GB)
Non DFS Used: 9779015680 (9.11 GB)
DFS Remaining: 115085942784(107.18 GB)
DFS Used%: 33.81%
DFS Remaining%: 61.01%
Last contact: Sun Jul 05 10:52:15 CST 2015

Name: 192.168.2.12:50010
Decommission Status : Normal
Configured Capacity: 21277569024 (19.82 GB)
DFS Used: 966615040 (921.84 MB)
Non DFS Used: 1261236224 (1.17 GB)
DFS Remaining: 19049717760(17.74 GB)
DFS Used%: 4.54%
DFS Remaining%: 89.53%
Last contact: Sun Jul 05 10:52:17 CST 2015

Name: 192.168.2.7:50010
Decommission Status : Normal
Configured Capacity: 188643102720 (175.69 GB)
DFS Used: 71265009664 (66.37 GB)
Non DFS Used: 9779011584 (9.11 GB)
DFS Remaining: 107599081472(100.21 GB)
DFS Used%: 37.78%
DFS Remaining%: 57.04%
Last contact: Sun Jul 05 10:52:17 CST 2015

安装之后修改配置文件，

[hadoop@slave3 bin]$ ./hive
15/07/06 10:58:20 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

Logging initialized using configuration in file:/opt/hadoop/hive-0.12.0/conf/hive-log4j.properties
hive> set javax.jdo.option.ConnectionURL

;
javax.jdo.option.ConnectionURL=jdbc:mysql://slave3:3306/hive?createDatabaseIfNoExist=true
hive>

hive> show tables;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

hive>

原因是没有driver

加上之后还是不行

2015-07-06 11:04:16,786 WARN bonecp.BoneCPConfig (BoneCPConfig.java:sanitize(1537)) - Max Connections < 1. Setting to 20

2015-07-06 11:04:16,911 ERROR Datastore.Schema (Log4JLogger.java:error(125)) - Failed initialising database.

Unable to open a test connection to the given database. JDBC url = jdbc:mysql://slave3:3306/hive?createDatabaseIfNoExist=true, username = root. Terminating connection pool. Original Exception: ——

java.sql.SQLException: Access denied for user ‘root’@’slave3’ (using password: YES)

MySQL连接不上

https://hadooptutorial.info/unable-open-test-connection-given-database/

根据这篇文章找到问题，原来是写错了

下面惊醒ncdc 的数据分析阶段

根据我的博客中间写的流程创建表；

http://blog.csdn.net/mrcharles/article/details/46514359

create table ncdc (

year string,

month string,

data string,

time string,

air string,

a string,

b string,

c string,

d string,

e string,

f string,

g string

)ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ’ STORED AS TEXTFILE;

ncdc的数据通过一个简单的java程序进行处理，将空格变为制表符

导入数据到hive中

load data local inpath ‘/opt/software/ncdc/summary’ into table ncdc

hive> load data local inpath ‘/opt/hadoop/hadoopDATA/summary’ into table ncdc;

Copying data from file:/opt/hadoop/hadoopDATA/summary

Copying file: file:/opt/hadoop/hadoopDATA/summary

Loading data to table default.ncdc

Table default.ncdc stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 17013314, raw_data_size: 0]

Time taken: 2.231 seconds

FAILED: RuntimeException java.net.UnknownHostException: unknown host: node1

出现以上的错误

修改配置文件，之前的配置文件是我直接从一台机器中拷贝出来的，没有修改hdfs的url

查询数据

可以查询每一年的平均气温，最高气温，最低气温等等，也可以使用分组函数，和MySQL操作差不多

select year,avg(air) from ncdc group by year;
hive> select year,avg(air) from ncdc group by year;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0001, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0001
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0001
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2015-07-06 13:05:52,921 Stage-1 map = 0%, reduce = 0%
2015-07-06 13:05:59,965 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 7.02 sec
2015-07-06 13:06:00,972 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 7.02 sec
2015-07-06 13:06:01,978 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 7.02 sec
2015-07-06 13:06:02,985 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.62 sec
2015-07-06 13:06:03,992 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.62 sec
2015-07-06 13:06:04,998 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.62 sec
2015-07-06 13:06:06,005 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.62 sec
2015-07-06 13:06:07,012 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.62 sec
2015-07-06 13:06:08,019 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 11.62 sec
2015-07-06 13:06:09,025 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 11.62 sec
2015-07-06 13:06:10,033 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.53 sec
2015-07-06 13:06:11,040 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.53 sec
2015-07-06 13:06:12,046 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.53 sec
MapReduce Total cumulative CPU time: 14 seconds 530 msec
Ended Job = job_201507050950_0001
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 Cumulative CPU: 14.53 sec HDFS Read: 51040494 HDFS Write: 537 SUCCESS
Total MapReduce CPU Time Spent: 14 seconds 530 msec
OK
1901 45.16831683168317
1902 21.659558263518658
1903 -17.67699115044248
1904 33.32224247948952
1905 43.3322664228014
1906 47.0834855681403
1907 28.09189090243456
1908 28.80607441154138
1909 25.24907112526539
1910 29.00013071895425
1911 28.088644112247575
1912 16.801145236855803
1913 8.191569568197396
1914 26.378301131816624
1915 2.811635615498914
1916 21.42393787117405
1917 22.895140080045742
1918 27.712506047411708
1919 23.67520250849229
1920 43.508667830133795
1921 31.834957020057306
1922 -44.03716409376787
1923 26.79247747159462
Time taken: 33.348 seconds, Fetched: 23 row(s)

select year,max(air) from ncdc group by year;
结果：

hive> select year,max(air) from ncdc group by year;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0002, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0002
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2015-07-06 13:07:28,648 Stage-1 map = 0%, reduce = 0%
2015-07-06 13:07:34,675 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.57 sec
2015-07-06 13:07:35,681 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.57 sec
2015-07-06 13:07:36,687 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.57 sec
2015-07-06 13:07:37,693 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.57 sec
2015-07-06 13:07:38,699 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.26 sec
2015-07-06 13:07:39,705 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.26 sec
2015-07-06 13:07:40,711 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.26 sec
2015-07-06 13:07:41,716 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.26 sec
2015-07-06 13:07:42,722 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.26 sec
2015-07-06 13:07:43,727 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 11.26 sec
2015-07-06 13:07:44,734 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.04 sec
2015-07-06 13:07:45,740 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.04 sec
2015-07-06 13:07:46,746 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.04 sec
MapReduce Total cumulative CPU time: 14 seconds 40 msec
Ended Job = job_201507050950_0002
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 Cumulative CPU: 14.04 sec HDFS Read: 51040494 HDFS Write: 184 SUCCESS
Total MapReduce CPU Time Spent: 14 seconds 40 msec
OK
1901 94
1902 94
1903 94
1904 94
1905 94
1906 94
1907 94
1908 94
1909 94
1910 94
1911 94
1912 94
1913 94
1914 94
1915 94
1916 94
1917 94
1918 94
1919 94
1920 94
1921 94
1922 94
1923 94
Time taken: 26.002 seconds, Fetched: 23 row(s)

select count(*) from ncdc;

hive> select count(*) from ncdc;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0004, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0004
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0004
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2015-07-06 13:08:56,762 Stage-1 map = 0%, reduce = 0%
2015-07-06 13:09:03,803 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.26 sec
2015-07-06 13:09:04,809 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:05,816 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:06,825 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:07,831 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:08,838 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:09,844 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:10,850 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.21 sec
2015-07-06 13:09:11,857 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 10.21 sec
2015-07-06 13:09:12,866 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.14 sec
2015-07-06 13:09:13,872 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.14 sec
2015-07-06 13:09:14,878 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.14 sec
MapReduce Total cumulative CPU time: 13 seconds 140 msec
Ended Job = job_201507050950_0004
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 Cumulative CPU: 13.14 sec HDFS Read: 51040494 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 140 msec
OK
1006038
Time taken: 28.984 seconds, Fetched: 1 row(s)