hdp 集群问题解决记录

2019-04-23 14:16:21,769 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/hnscluster/current/edits_inprogress_0000000000554042931 while determining its valid length. Position was 815104
java.io.IOException: Can't scan a pre-transactional edit log.
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4974)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)
at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)

原因:日志节点在日志中记录WARN以下,并且ambari警告日记网络ui无法访问
解决:
在有问题的节点上,将fsimage编辑目录(/hadoop/hdfs/journal/hnscluster/current)移动到备用位置。
将fsimage edits目录(/ hadoop / hdfs / journal / stanleyhotel / current)从正常运行的JournalNode复制到此节点
启动JournalNodes 或者启动hdfs


under replicated blocks
解决:
找出没有复制的block:
hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
然后循环修复:
for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
输出如下:
Fixing /user/hdfs/.staging/job_1547173493660_0405/job.jar :
Replication 3 set: /user/hdfs/.staging/job_1547173493660_0405/job.jar
Fixing /user/hdfs/.staging/job_1547173493660_0405/job.split :
Replication 3 set: /user/hdfs/.staging/job_1547173493660_0405/job.split
Fixing /user/hdfs/.staging/job_1547173493660_0481/job.jar :
Replication 3 set: /user/hdfs/.staging/job_1547173493660_0481/job.jar
Fixing /user/hdfs/.staging/job_1547173493660_0481/job.split :
Replication 3 set: /user/hdfs/.staging/job_1547173493660_0481/job.split
Fixing /user/hdfs/.staging/job_1547173493660_0483/job.jar :
Replication 3 set: /user/hdfs/.staging/job_1547173493660_0483/job.jar
Fixing /user/hdfs/.staging/job_1547173493660_0483/job.split :
Replication 3 set: /user/hdfs/.staging/job_1547173493660_0483/job.split
Fixing /user/hdfs/.staging/job_1547197402450_0021/job.jar :
Replication 3 set: /user/hdfs/.staging/job_1547197402450_0021/job.jar
Fixing /user/hdfs/.staging/job_1547197402450_0021/job.split :
Replication 3 set: /user/hdfs/.staging/job_1547197402450_0021/job.split

原文地址:https://www.cnblogs.com/shanhua-fu/p/10914269.html