【原创】集群远程访问和管理以及查看作业和文件的一些技巧

  其实,这是一个懒人去研究的东西,因为如果冬天喜欢去实验室或者机房,不呆在宿舍或者家里,就没有远程的问题了,但是,总有不巧的时候,这个时候你就只有在远程命令行里去看一切和操作一切了。远程操作的第一步从配置ssh远程访问集群开始:

通过ssh远程访问集群

  有2个前提:

  1. 集群中需要有机器在公网路由配置了DHCP转发
  2. 配置了DHCP转发的机器需要开启了SSH服务,SSH的服务端口是22

  例如,我一般是将master(NameNode/JobTracker)配置DHCP转发,在路由处的配置:外部端口:aa  IP:masterIP 内部端口:22;然后我们在任何地方都可以通过类似:ssh -P aa hadoop@机器的公网IP 来访问集群。其中hadoop是机器master的用户名,意思就是通过aa端口将请求转发到机器的公网IP的22号端口,然后以hadoop用户名登录。

  当然,如果你配置过hadoop集群,就应该知道如何配置ssh免密码登录,所以将master的公钥放到经常访问集群的个人电脑:~/.ssh/authorized_keys里面,这样以后直接一个命令,密码都不用输入就可以登录集群了,登录上master,然后master到其他集群机器也是免密码登录的。

集群管理的一些好工具

  1. pdsh
    这是一个神器,可以在一台机器上去执行分布式shell操作整个集群的机器
    $ pdsh -h
    Usage: pdsh [-options] command ...
    -S                return largest of remote command return values
    -h                output usage menu and quit
    -V                output version information and quit
    -q                list the option settings and quit
    -b                disable ^C status feature (batch mode)
    -d                enable extra debug information from ^C status
    -l user           execute remote commands as user
    -t seconds        set connect timeout (default is 10 sec)
    -u seconds        set command timeout (no default)
    -f n              use fanout of n nodes
    -w host,host,...  set target node list on command line
    -x host,host,...  set node exclusion list on command line
    -R name           set rcmd module to name
    -M name,...       select one or more misc modules to initialize first
    -N                disable hostname: labels on output lines
    -L                list info on all loaded modules and exit
    -g query,...      target nodes using genders query
    -X query,...      exclude nodes using genders query
    -F file           use alternate genders file `file'
    -i                request alternate or canonical hostnames if applicable
    -a                target all nodes except those with "pdsh_all_skip" attribute
    -A                target all nodes listed in genders database
    available rcmd modules: ssh,rsh,exec (default: rsh)
    pdsh -w ssh:brix-[00-09],lbt,gbt uptime

    上面这条命令可以在brix-00到brix-09以及lbt和gbt所有机器上执行uptime命令,并会在当前机器上打印出来。但是,我这里将pdsh定义了下别名,常规情况下这样应该执行会报错,将pdsh替换成下面这样就可以了

    alias pdsh='PDSH_RCMD_TYPE=ssh pdsh'

    然后执行结果如下:

  2. gbt:  17:33:21 up  2:31,  1 user,  load average: 0.00, 0.01, 0.05
    lbt:  17:33:18 up  2:27,  2 users,  load average: 0.00, 0.02, 0.05
    brix-02:  17:33:21 up  2:31,  0 users,  load average: 0.00, 0.01, 0.05
    brix-01:  17:33:21 up  2:31,  0 users,  load average: 0.03, 0.02, 0.05
    brix-00:  17:33:21 up  2:33,  4 users,  load average: 0.08, 0.05, 0.09
    brix-03:  17:33:20 up  2:31,  0 users,  load average: 0.00, 0.01, 0.05
    brix-04:  17:33:21 up  2:31,  0 users,  load average: 0.01, 0.04, 0.05
    brix-08:  17:33:21 up  2:31,  0 users,  load average: 0.04, 0.06, 0.05
    brix-09:  17:33:20 up  2:31,  0 users,  load average: 0.10, 0.06, 0.06
    brix-07:  17:33:21 up  2:31,  0 users,  load average: 0.03, 0.06, 0.05
    brix-05:  17:33:21 up  2:31,  0 users,  load average: 0.08, 0.04, 0.05
    brix-06:  17:33:21 up  2:31,  0 users,  load average: 0.05, 0.04, 0.05
    pdsh -w ssh:brix-[00-09],lbt,gbt scp brix-00:~/HadoopInstall/test.txt ~/HadoopInstall/

    上面这条示例可以将brix-00上的test.txt文件拷贝到brix-09以及lbt和gbt机器上。

  3. scp
    上面的命令已经展示了scp如何结合pdsh一起来使用了,这里不再细说,下面贴上scp的一些指令。
    usage: scp [-12346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]
               [-l limit] [-o ssh_option] [-P port] [-S program]
               [[user@]host1:]file1 ... [[user@]host2:]file2

 如何通过命令行查看HDFS上文件的健康情况和数据块分布

$ hadoop fsck /ftTest/totalWiki  -files -blocks -locations
Warning: $HADOOP_HOME is deprecated.

FSCK started by hadoop from /192.168.1.230 for path /ftTest/totalWiki at Wed Nov 18 17:42:27 CST 2015
/ftTest/totalWiki 3259108351 bytes, 25 block(s):  OK
0. blk_-3539743872639772968_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.63:50010, 192.168.1.235:50010]
1. blk_-7700661535252568451_1003 len=134217728 repl=3 [192.168.1.231:50010, 192.168.1.232:50010, 192.168.1.238:50010]
2. blk_-3214646852454192434_1003 len=134217728 repl=3 [192.168.1.237:50010, 192.168.1.236:50010, 192.168.1.238:50010]
3. blk_-8860437510624268282_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.239:50010, 192.168.1.235:50010]
4. blk_-1765246693355320434_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.66:50010, 192.168.1.232:50010]
5. blk_9063781070378080202_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.234:50010]
6. blk_8687961040692226467_1003 len=134217728 repl=3 [192.168.1.234:50010, 192.168.1.237:50010, 192.168.1.239:50010]
7. blk_-5717347662754027031_1003 len=134217728 repl=3 [192.168.1.236:50010, 192.168.1.232:50010, 192.168.1.63:50010]
8. blk_-5624359065285533759_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.231:50010]
9. blk_622948206607478459_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.63:50010, 192.168.1.236:50010]
10. blk_-4154428280295153090_1003 len=134217728 repl=3 [192.168.1.232:50010, 192.168.1.235:50010, 192.168.1.63:50010]
11. blk_6638201995439663469_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.63:50010, 192.168.1.237:50010]
12. blk_-3282418422086241856_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.233:50010]
13. blk_2802846523093904336_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.239:50010, 192.168.1.237:50010]
14. blk_-7425405918846384842_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.66:50010, 192.168.1.234:50010]
15. blk_-8997936298966969491_1003 len=134217728 repl=3 [192.168.1.237:50010, 192.168.1.235:50010, 192.168.1.238:50010]
16. blk_-827035362476515573_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.63:50010, 192.168.1.235:50010]
17. blk_-5734389503841877028_1003 len=134217728 repl=3 [192.168.1.231:50010, 192.168.1.235:50010, 192.168.1.66:50010]
18. blk_1446125973144404377_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.238:50010, 192.168.1.235:50010]
19. blk_-7161959344923757995_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.238:50010, 192.168.1.234:50010]
20. blk_-2171786920309180709_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.66:50010, 192.168.1.237:50010]
21. blk_7184760167274632839_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.233:50010]
22. blk_1315507788295151463_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.239:50010, 192.168.1.233:50010]
23. blk_5923416026032542888_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.239:50010, 192.168.1.236:50010]
24. blk_-8960096699099874150_1003 len=37882879 repl=3 [192.168.1.234:50010, 192.168.1.233:50010, 192.168.1.63:50010]

Status: HEALTHY
 Total size:    3259108351 B
 Total dirs:    0
 Total files:    1
 Total blocks (validated):    25 (avg. block size 130364334 B)
 Minimally replicated blocks:    25 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        11
 Number of racks:        2
FSCK ended at Wed Nov 18 17:42:27 CST 2015 in 3 milliseconds


The filesystem under path '/ftTest/totalWiki' is HEALTHY

上面列出来文件的数据块的分布以及文件的一些健康情况。如果要继续看数据块在哪个机架,可以下面这样加一个-racks

$ hadoop fsck /ftTest/totalWiki  -files -blocks -locations -racks
Warning: $HADOOP_HOME is deprecated.

FSCK started by hadoop from /192.168.1.230 for path /ftTest/totalWiki at Wed Nov 18 17:43:08 CST 2015
/ftTest/totalWiki 3259108351 bytes, 25 block(s):  OK
0. blk_-3539743872639772968_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.235:50010]
1. blk_-7700661535252568451_1003 len=134217728 repl=3 [/rack1/192.168.1.231:50010, /rack1/192.168.1.232:50010, /rack2/192.168.1.238:50010]
2. blk_-3214646852454192434_1003 len=134217728 repl=3 [/rack1/192.168.1.237:50010, /rack1/192.168.1.236:50010, /rack2/192.168.1.238:50010]
3. blk_-8860437510624268282_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.235:50010]
4. blk_-1765246693355320434_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.232:50010]
5. blk_9063781070378080202_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.234:50010]
6. blk_8687961040692226467_1003 len=134217728 repl=3 [/rack1/192.168.1.234:50010, /rack1/192.168.1.237:50010, /rack2/192.168.1.239:50010]
7. blk_-5717347662754027031_1003 len=134217728 repl=3 [/rack1/192.168.1.236:50010, /rack1/192.168.1.232:50010, /rack2/192.168.1.63:50010]
8. blk_-5624359065285533759_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.231:50010]
9. blk_622948206607478459_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.236:50010]
10. blk_-4154428280295153090_1003 len=134217728 repl=3 [/rack1/192.168.1.232:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.63:50010]
11. blk_6638201995439663469_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.237:50010]
12. blk_-3282418422086241856_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.233:50010]
13. blk_2802846523093904336_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.237:50010]
14. blk_-7425405918846384842_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.234:50010]
15. blk_-8997936298966969491_1003 len=134217728 repl=3 [/rack1/192.168.1.237:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.238:50010]
16. blk_-827035362476515573_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.235:50010]
17. blk_-5734389503841877028_1003 len=134217728 repl=3 [/rack1/192.168.1.231:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.66:50010]
18. blk_1446125973144404377_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.238:50010, /rack1/192.168.1.235:50010]
19. blk_-7161959344923757995_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.238:50010, /rack1/192.168.1.234:50010]
20. blk_-2171786920309180709_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.237:50010]
21. blk_7184760167274632839_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.233:50010]
22. blk_1315507788295151463_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.233:50010]
23. blk_5923416026032542888_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.236:50010]
24. blk_-8960096699099874150_1003 len=37882879 repl=3 [/rack1/192.168.1.234:50010, /rack1/192.168.1.233:50010, /rack2/192.168.1.63:50010]
原文地址:https://www.cnblogs.com/gslyyq/p/4975313.html