Apache Hadoop配置日志聚集实战案例

　　　　　　　　　　　　　　Apache Hadoop配置日志聚集实战案例

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正杰

一.准备工作

1>.搭建完全分布式集群

　　博主推荐阅读:
　　　　https://www.cnblogs.com/yinzhengjie2020/p/12424192.html

2>.配置历史服务器

　　博主推荐阅读:
　　　　https://www.cnblogs.com/yinzhengjie2020/p/12430965.html

二.配置日志聚集实操案例

1>.日志聚集的功能

　　还记得上一篇博客我们分享如何配置历史服务器吗？我们在那篇博客给大家截图演示了一个job运行完成之后，会将数据存储在HDFS集群上，如果你没有指定日志存放路径默认放在HDFS集群的"/tmp"目录下。

　　综上所述，无论是一个Spark,Flink还是MapReduce的job在应用(比如:"application_1584002509171_0001")运行完成以后，将程序运行日志信息上传到HDFS系统上，这就是日志聚集的概念。

　　有了日志聚集每一台Gateway主机都可以访问HDFS集群，从而获取日志信息，可以方便查看到job的运行详情，方便运维或开发调试。

　　温馨提示:
　　　　开启日志聚集功能，需要重新启动NodeManager 、ResourceManager和HistoryManager。

2>.开启日志聚集功能

[root@hadoop101.yinzhengjie.org.cn ~]# vim ${HADOOP_HOME}/etc/hadoop/yarn-site.xml 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<configuration>

<!-- Site specific YARN configuration properties -->

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>Reducer获取数据的方式</description>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop106.yinzhengjie.org.cn</value>
        <description>指定YARN的ResourceManager的地址</description>
    </property>


    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
        <description>启用或禁用日志聚合的配置,默认为false,即禁用,将该值设置为true,表示开启日志聚集功能使能</description>
    </property>

    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
        <description>删除聚合日志前要保留多长时间(默认单位是秒),默认值是"-1"表示禁用,请注意,将此值设置得太小,您将向Namenode发送垃圾邮件.</description>
    </property>


    <property>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>3600</value>
        <description>单位为秒,检查聚合日志保留之间的时间.如果设置为0或负值,那么该值将被计算为聚合日志保留时间的十分之一;请注意,将此值设置得太小,您将向名称节点发送垃圾邮件.</description>
    </property>

</configuration>
[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# rsync-hadoop.sh ${HADOOP_HOME}/etc/hadoop/yarn-site.xml
******* [hadoop102.yinzhengjie.org.cn] node starts synchronizing [/yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml] *******
命令执行成功
******* [hadoop103.yinzhengjie.org.cn] node starts synchronizing [/yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml] *******
命令执行成功
******* [hadoop104.yinzhengjie.org.cn] node starts synchronizing [/yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml] *******
命令执行成功
******* [hadoop105.yinzhengjie.org.cn] node starts synchronizing [/yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml] *******
命令执行成功
******* [hadoop106.yinzhengjie.org.cn] node starts synchronizing [/yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml] *******
命令执行成功
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# rsync-hadoop.sh ${HADOOP_HOME}/etc/hadoop/yarn-site.xml

3>.重启YARN和HistoryServer服务

[root@hadoop101.yinzhengjie.org.cn ~]# ansible all -m shell -a 'jps'
hadoop102.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8737 NodeManager
9460 Jps
8198 DataNode

hadoop104.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8182 DataNode
8648 NodeManager
9372 Jps

hadoop105.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8456 SecondaryNameNode
9406 Jps

hadoop103.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8245 DataNode
8935 NodeManager
9751 Jps

hadoop101.yinzhengjie.org.cn | SUCCESS | rc=0 >>
15664 JobHistoryServer
16685 Jps
13214 NameNode

hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
12427 ResourceManager
12893 JobHistoryServer
13438 Jps

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible all -m shell -a 'jps'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'stop-yarn.sh'
hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
stopping yarn daemons
stopping resourcemanager
hadoop102.yinzhengjie.org.cn: stopping nodemanager
hadoop102.yinzhengjie.org.cn: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
hadoop104.yinzhengjie.org.cn: stopping nodemanager
hadoop104.yinzhengjie.org.cn: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
hadoop103.yinzhengjie.org.cn: stopping nodemanager
hadoop103.yinzhengjie.org.cn: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'stop-yarn.sh'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'mr-jobhistory-daemon.sh stop historyserver'
hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
stopping historyserver

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'mr-jobhistory-daemon.sh stop historyserver'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible nn -m shell -a 'mr-jobhistory-daemon.sh stop historyserver'
hadoop101.yinzhengjie.org.cn | SUCCESS | rc=0 >>
stopping historyserver

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible nn -m shell -a 'mr-jobhistory-daemon.sh stop historyserver'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible all -m shell -a 'jps'
hadoop101.yinzhengjie.org.cn | SUCCESS | rc=0 >>
17104 Jps
13214 NameNode

hadoop103.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8245 DataNode
9918 Jps

hadoop104.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8182 DataNode
9534 Jps

hadoop105.yinzhengjie.org.cn | SUCCESS | rc=0 >>
9543 Jps
8456 SecondaryNameNode

hadoop102.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8198 DataNode
9623 Jps

hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
13782 Jps

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible all -m shell -a 'jps'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'start-yarn.sh'
hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
starting yarn daemons
starting resourcemanager, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/yarn-root-resourcemanager-hadoop106.yinzhengjie.org.cn.out
hadoop102.yinzhengjie.org.cn: starting nodemanager, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/yarn-root-nodemanager-hadoop102.yinzhengjie.org.cn.out
hadoop103.yinzhengjie.org.cn: starting nodemanager, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/yarn-root-nodemanager-hadoop103.yinzhengjie.org.cn.out
hadoop104.yinzhengjie.org.cn: starting nodemanager, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/yarn-root-nodemanager-hadoop104.yinzhengjie.org.cn.out

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'start-yarn.sh'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible nn -m shell -a 'mr-jobhistory-daemon.sh start historyserver'
hadoop101.yinzhengjie.org.cn | SUCCESS | rc=0 >>
starting historyserver, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/mapred-root-historyserver-hadoop101.yinzhengjie.org.cn.out

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible nn -m shell -a 'mr-jobhistory-daemon.sh start historyserver'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'mr-jobhistory-daemon.sh start historyserver'
hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
starting historyserver, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/mapred-root-historyserver-hadoop106.yinzhengjie.org.cn.out

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible rm -m shell -a 'mr-jobhistory-daemon.sh start historyserver'

[root@hadoop101.yinzhengjie.org.cn ~]# ansible all -m shell -a 'jps'
hadoop101.yinzhengjie.org.cn | SUCCESS | rc=0 >>
17393 JobHistoryServer
17590 Jps
13214 NameNode

hadoop103.yinzhengjie.org.cn | SUCCESS | rc=0 >>
9969 NodeManager
8245 DataNode
10221 Jps

hadoop105.yinzhengjie.org.cn | SUCCESS | rc=0 >>
9670 Jps
8456 SecondaryNameNode

hadoop104.yinzhengjie.org.cn | SUCCESS | rc=0 >>
9584 NodeManager
8182 DataNode
9836 Jps

hadoop102.yinzhengjie.org.cn | SUCCESS | rc=0 >>
8198 DataNode
9926 Jps
9672 NodeManager

hadoop106.yinzhengjie.org.cn | SUCCESS | rc=0 >>
13889 ResourceManager
14285 JobHistoryServer
14398 Jps

[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# ansible all -m shell -a 'jps'

4>.执行wordcount案例

[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - root supergroup          0 2020-03-12 16:20 /inputDir
drwxr-xr-x   - root supergroup          0 2020-03-12 16:54 /outputDir
drwxrwx---   - root supergroup          0 2020-03-12 15:40 /tmp
drwxrwx---   - root supergroup          0 2020-03-12 16:51 /yinzhengjie
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -rm -r /outputDir
Deleted /outputDir
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2020-03-12 16:20 /inputDir
drwxrwx---   - root supergroup          0 2020-03-12 15:40 /tmp
drwxrwx---   - root supergroup          0 2020-03-12 16:51 /yinzhengjie
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -rm -r /outputDir　　　　　　　　#删除输出目录的数据

[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2020-03-12 16:20 /inputDir
drwxrwx---   - root supergroup          0 2020-03-12 15:40 /tmp
drwxrwx---   - root supergroup          0 2020-03-12 16:51 /yinzhengjie
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /inputDir
Found 1 items
-rw-r--r--   3 root supergroup         60 2020-03-12 16:20 /inputDir/wc.txt
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /inputDir/wc.txt
-rw-r--r--   3 root supergroup         60 2020-03-12 16:20 /inputDir/wc.txt
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -cat /inputDir/wc.txt
yinzhengjie 18 bigdata
bigdata java python
java golang java
[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -cat /inputDir/wc.txt　　　　　　#查看测试数据

[root@hadoop101.yinzhengjie.org.cn ~]# hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar wordcount /inputDir /outputDir
20/03/12 19:20:38 INFO client.RMProxy: Connecting to ResourceManager at hadoop106.yinzhengjie.org.cn/172.200.4.106:8032
20/03/12 19:20:39 INFO input.FileInputFormat: Total input files to process : 1
20/03/12 19:20:39 INFO mapreduce.JobSubmitter: number of splits:1
20/03/12 19:20:39 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
20/03/12 19:20:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1584011863930_0001
20/03/12 19:20:39 INFO conf.Configuration: resource-types.xml not found
20/03/12 19:20:39 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
20/03/12 19:20:39 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
20/03/12 19:20:39 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
20/03/12 19:20:39 INFO impl.YarnClientImpl: Submitted application application_1584011863930_0001
20/03/12 19:20:39 INFO mapreduce.Job: The url to track the job: http://hadoop106.yinzhengjie.org.cn:8088/proxy/application_1584011863930_0001/
20/03/12 19:20:39 INFO mapreduce.Job: Running job: job_1584011863930_0001
20/03/12 19:20:47 INFO mapreduce.Job: Job job_1584011863930_0001 running in uber mode : false
20/03/12 19:20:47 INFO mapreduce.Job:  map 0% reduce 0%
20/03/12 19:20:52 INFO mapreduce.Job:  map 100% reduce 0%
20/03/12 19:20:57 INFO mapreduce.Job:  map 100% reduce 100%
20/03/12 19:20:57 INFO mapreduce.Job: Job job_1584011863930_0001 completed successfully
20/03/12 19:20:57 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=84
        FILE: Number of bytes written=411077
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=181
        HDFS: Number of bytes written=54
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2745
        Total time spent by all reduces in occupied slots (ms)=2295
        Total time spent by all map tasks (ms)=2745
        Total time spent by all reduce tasks (ms)=2295
        Total vcore-milliseconds taken by all map tasks=2745
        Total vcore-milliseconds taken by all reduce tasks=2295
        Total megabyte-milliseconds taken by all map tasks=2810880
        Total megabyte-milliseconds taken by all reduce tasks=2350080
    Map-Reduce Framework
        Map input records=3
        Map output records=9
        Map output bytes=96
        Map output materialized bytes=84
        Input split bytes=121
        Combine input records=9
        Combine output records=6
        Reduce input groups=6
        Reduce shuffle bytes=84
        Reduce input records=6
        Reduce output records=6
        Spilled Records=12
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=159
        CPU time spent (ms)=1020
        Physical memory (bytes) snapshot=501035008
        Virtual memory (bytes) snapshot=4323725312
        Total committed heap usage (bytes)=290455552
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=60
    File Output Format Counters 
        Bytes Written=54
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - root supergroup          0 2020-03-12 19:20 /inputDir
drwxr-xr-x   - root supergroup          0 2020-03-12 19:20 /outputDir
drwxrwx---   - root supergroup          0 2020-03-12 19:20 /tmp
drwxrwx---   - root supergroup          0 2020-03-12 16:51 /yinzhengjie
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]#

[root@hadoop101.yinzhengjie.org.cn ~]# hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar wordcount /inputDir /outputDir

5>.点击“log”可以查看日志

6>.在WebUI查看日志信息，如下图所示，点击"here"可以查看完整日志

7>.查看完整日志