HDFS数据均衡篇

　　　　　　　　　　　　　　HDFS数据均衡篇

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正杰

一.HDFS数据均衡概述

　　随着时间的推移，HDFS存储中数据分布可能变得不平衡，某些DataNode上可能具有更多的数据块。在极端的情况下，在具有更多的节点上读取和写入过于频繁，而一些较少的节点则未被充分利用。

　　当向集群添加新节点时，HDFS的数据分布也会失去平衡。Hadoop不会自动移动现有数据到新节点，以均衡集群Datanode中的数据分布。它只是开始使用新的DataNode来存储新数据。

　　Hadoop不寻求实现完全均衡的集群。在具有连续数据流的集群中，这种状态很难实现。相反，当每个Datanode上的空间使用率与Hadoop集群总的空间使用率的差值小于特定百分比时，Hadoop认为集群是均衡的。此外，它还利用阈值为数据均衡提供灵活性。

　　Hadoop提供了一个有用的工具，即均衡器，使用它能够重新均衡集群的快分布，因此所有DataNode都存储大致相等的数据量。

　　温馨提示：
　　　　在集群中定期运行HDFS均衡器是一个很好的做法。

二.HDFS数据不均衡的原因

　　HDFS不能保证在集群中的DataNode之间均匀分配数据。例如，当向集群添加新节点时，所有新块都可以分配给该节点，从而使数据分布不均衡。

　　当Namenode将数据块分配给Datanode时，它执行以下标准来决定哪些DataNode获得新的块：
　　　　(1)在集群的DataNode上统一分布数据;
　　　　(2)正在写该块的节点保留数据块的一个副本;
　　　　(3)将其中一个副本放置在与写入块节点相同的机架上，以最小化跨机架网络I/O;
　　　　(4)将副本跨机架进行复制，以支持冗余并在整个机架丢失后继续运行;

　　当给定Datanode中的空间百分比越高于或低于该集群中Datanode所使用的平均空间百分比时，Hadoop会认为集群时均衡的。这个"略高于或者略低于"的标准有参数阈值定义。

三.运行均衡器以均衡HDFS数据

1>.HDFS均衡器原理

　　HDFS均衡器是Hadoop提供的工具，使用该工具可以从过度使用的DataNodes移动数据块到利用不足的Datanode，从而均衡集群的DataNode数据。

　　HDFS均衡器的原理如下图所示，最初Rack 1和Rack 2有数据块。新的机架(Rack 3)没有数据，而且只有新添加的数据才被放置在那里。

　　这意味着添加节点导致集群数据不均衡。需要将现有DataNode的数据移动到新的没有数据的DataNode(或者添加新数据时将数据直接写入到新节点)。

　　当运行均衡器时，Hadoop将数据块从现有位置移动到具有更多自由空间的节点，最终所有节点具有大致相同的空间使用率。

2>.运行均衡器的方式

可以通过"start-balancer.sh"脚本调用均衡器，也可以通过执行命令"hdfs balancer"来运行均衡器。下面是balancer命令的用法:

[root@hadoop101.yinzhengjie.com ~]# hdfs balancer --help
Usage: hdfs balancer
    [-policy <policy>]    the balancing policy: datanode or blockpool
    [-threshold <threshold>]    Percentage of disk capacity
    [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]    Excludes the specified datanodes.
    [-include [-f <hosts-file> | <comma-separated list of hosts>]]    Includes only the specified datanodes.
    [-source [-f <hosts-file> | <comma-separated list of hosts>]]    Pick only the specified datanodes as source nodes.
    [-blockpools <comma-separated list of blockpool ids>]    The balancer will only run on blockpools included in this list.
    [-idleiterations <idleiterations>]    Number of consecutive idle iterations (-1 for Infinite) before exit.
    [-runDuringUpgrade]    Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

[root@hadoop101.yinzhengjie.com ~]#

3>.为均衡器设置适当的阈值

　　threshold参数表示每个DataNode的HDFS使用率于集群的平均DFS利用率的偏差百分比。以任意一方式(更高或更低)超过该阈值将意味着该节点会被重新均衡。

　　如下面的案例所示，可以运行不带任何参数的balancer命令，则此均衡器明朗了使用10%的默认阈值，这意味着均衡器通过将块从过度使用的节点移动到未充分使用的节点来均衡数据，直到每个Datanode的磁盘使用率不超过集群中平均磁盘使用率的正负10%。

　　有时，可能希望将阈值设置为不同的级别，例如，当集群中的可用空间变小，并且你希望将单个DataNode上使用的存储量保持在比默认的10%阈值更小的范围内时，可以这样指定阈值"hdfs balancer -threshold 5"

　　当运行均衡器时，它会查看集群中的两个关键HDFS使用情况值:
　　　　(1)平均DFS使用百分比:
　　　　　　　　可以通过计算得到集群中使用的平均DFS百分比："Average DFS Used = (DFS Used * 100) / Present Capacity"
　　　　(2)节点使用的DFS百分比:
　　　　　　　　此度量显示每个节点使用的DFS百分比。

[root@hadoop101.yinzhengjie.com ~]# hdfs balancer 　　　　　　　　　　#运行均衡器若不带任何参数的balancer命令，则使用默认阈值(10%)。
20/08/20 18:59:50 INFO balancer.Balancer: namenodes  = [hdfs://hadoop101.yinzhengjie.com:9000]
20/08/20 18:59:50 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0,
 #source nodes = 0, #blockpools = 0, run during upgrade = false]20/08/20 18:59:50 INFO balancer.Balancer: included nodes = []
20/08/20 18:59:50 INFO balancer.Balancer: excluded nodes = []
20/08/20 18:59:50 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
20/08/20 18:59:51 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
20/08/20 18:59:51 INFO balancer.Balancer: dfs.blocksize = 536870912 (default=134217728)
20/08/20 18:59:51 INFO net.NetworkTopology: Adding a new node: /rack001/172.200.6.102:50010
20/08/20 18:59:51 INFO net.NetworkTopology: Adding a new node: /rack002/172.200.6.104:50010
20/08/20 18:59:51 INFO net.NetworkTopology: Adding a new node: /rack002/172.200.6.103:50010
20/08/20 18:59:51 INFO balancer.Balancer: 0 over-utilized: []
20/08/20 18:59:51 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Aug 20, 2020 6:59:51 PM           0                  0 B                 0 B                0 B
Aug 20, 2020 6:59:51 PM  Balancing took 764.0 milliseconds
[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# hdfs balancer 　　　　　　　　　　#运行均衡器若不带任何参数的balancer命令，则使用默认阈值(10%)。

[root@hadoop101.yinzhengjie.com ~]# hdfs balancer -threshold 5　　　　        #我的集群相对较小，而且并没有大量使用，因此尽管我设置的百分比很小，依旧没有触发数据均衡。
20/08/20 19:20:57 INFO balancer.Balancer: Using a threshold of 5.0
20/08/20 19:20:57 INFO balancer.Balancer: namenodes  = [hdfs://hadoop101.yinzhengjie.com:9000]
20/08/20 19:20:57 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, 
#source nodes = 0, #blockpools = 0, run during upgrade = false]20/08/20 19:20:57 INFO balancer.Balancer: included nodes = []
20/08/20 19:20:57 INFO balancer.Balancer: excluded nodes = []
20/08/20 19:20:57 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
20/08/20 19:20:58 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
20/08/20 19:20:58 INFO balancer.Balancer: dfs.blocksize = 536870912 (default=134217728)
20/08/20 19:20:58 INFO net.NetworkTopology: Adding a new node: /rack001/172.200.6.102:50010
20/08/20 19:20:58 INFO net.NetworkTopology: Adding a new node: /rack002/172.200.6.103:50010
20/08/20 19:20:58 INFO net.NetworkTopology: Adding a new node: /rack002/172.200.6.104:50010
20/08/20 19:20:58 INFO balancer.Balancer: 0 over-utilized: []
20/08/20 19:20:58 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Aug 20, 2020 7:20:58 PM           0                  0 B                 0 B                0 B
Aug 20, 2020 7:20:58 PM  Balancing took 750.0 milliseconds
[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# hdfs balancer -threshold 5　　　　 #我的集群相对较小，而且并没有大量使用，因此尽管我设置的百分比很小，依旧没有触发数据均衡。

[root@hadoop101.yinzhengjie.com ~]# ll -h
total 374M
-rw-r--r-- 1 root root 374M Aug 10 15:42 hadoop-2.10.0.tar.gz
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# nohup hdfs balancer -threshold 5  > ~/ballancer-stdout.log 2> ~/ballancer-stderr.log &　　　　　　#生产环境建议搭建让均衡器后台执行
[1] 9066
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# jobs　　　　　　　　　　　　　　　　　   #通过jobs命令可以查看到后台任务，我使用的是测试集群，因此很快就执行完毕啦~（目的是为大家展示使用方式）　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　
[1]+  Done                    nohup hdfs balancer -threshold 5 > ~/ballancer-stdout.log 2> ~/ballancer-stderr.log
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll　　　　　　　　　　　　　　　　　　　　#可以通过查看日志来判断当前集群是否处于均衡状态，亦或者均衡器停止运行说明集群已处于均衡状态啦~
total 382936
-rw-r--r-- 1 root root      1813 Aug 20 19:26 ballancer-stderr.log
-rw-r--r-- 1 root root       287 Aug 20 19:26 ballancer-stdout.log
-rw-r--r-- 1 root root 392115733 Aug 10 15:42 hadoop-2.10.0.tar.gz
[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# nohup hdfs balancer -threshold 5 > ~/ballancer-stdout.log 2> ~/ballancer-stderr.log &　　　　　　#生产环境建议搭建让均衡器后台执行

4>.调整均衡器的带宽

　　在理想的情况下，必须在集群较控线的时段运行均衡器，这样开销通常不高。可以调整均衡器的带宽，以确定集群中每个DataNode可用于重新均衡的每秒字节数。

　　在增加带宽之前，请确保有足够的带宽。可以通过"ethtool"等工具查看NIC卡的速度，如下所示。

[root@hadoop101.yinzhengjie.com ~]# ethtool bond0　　　　　　　　　　#我的笔记本网卡的速度是1000Mb/s，因此可以将均衡器代码设置为它的10%，即100MB。服务器的网卡速度会更快(基本上都是万兆口)，推荐设置1G以上。
Settings for bond0:
    Supported ports: [ ]
    Supported link modes:   Not reported
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: Not reported
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: Not reported
    Speed: 1000Mb/s
    Duplex: Full
    Port: Other
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: off
    Link detected: yes
[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# ethtool bond0　　　　　　　　　　#我的笔记本网卡的速度是1000Mb/s，因此可以将均衡器代码设置为它的10%，即100MB。服务器的网卡速度会更快(基本上都是万兆口)，推荐设置1G以上。

　　带宽的默认值为10MB/s，可以提高该值以使均衡器更快地完成工作。可以将带宽提高到网络速度的大约10%，这不会对集群的工作负载造成任何明显的影响。

　　可以修改"hdfs-site.xml"中的"dfs.datanode.balance.bandwidthPerSec"属性(默认10MB)，也可以使用hdfs dfsadmin命令设置均衡器使用的网络带宽，如下所示。

　　温馨提示:
　　　　如果均衡器要运行很长时间，则可以安排它在峰值和非峰值时段以不同的带宽运行。可以在峰值时段以低带宽运行，并且在集群不太忙的时段以较高的带宽运行。一次只能运行一个均衡器作业，当非高峰作业启动时，它停止高峰均衡器作业。

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setBalancerBandwidth 
-setBalancerBandwidth <bandwidth>:
    Changes the network bandwidth used by each datanode during
    HDFS block balancing.

        <bandwidth> is the maximum number of bytes per second
        that will be used by each datanode. This value overrides
        the dfs.balance.bandwidthPerSec parameter.

        --- NOTE: The new value is not persistent on the DataNode.---

[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setBalancerBandwidth

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -setBalancerBandwidth 104857600　　 #设置100MB带宽，该值将覆盖"dfs.datanode.balance.bandwidthPerSec"属性(不推荐使用"dfs.balance.bandwidthPerSec"参数，该参数在Hadoop 3.x版本已废弃)
Balancer bandwidth is set to 104857600
[root@hadoop101.yinzhengjie.com ~]#

5>.使用均衡器注意事项

　　(1)默认的DataNode策略是在DataNode级别均衡存储，但均衡器不会在DataNode的各个存储卷之间均衡数据。
　　(2)仅当DataNode使用的DFS百分比和(由集群使用的)平均DFS之间的差大于(或小于)规定阈值时，均衡器才会均衡DataNode。否则，它不会重新均衡集群。
　　(3)均衡器运行多长时间取决于集群的大小和数据的不平衡程度。第一次运行均衡器，或者不经常调度均衡器，以及在添加一组DataNode之后运行均衡器，它将运行很长时间(通常是几天，如果数据量达到PB或者接近EB级别，可能需要一个多月的时间来均衡哟~)
　　(4)如果有一个数据写入和删除频繁的集群，集群可能永远不会达到完全均衡的状态，均衡器仅仅将数据从一个节点移动到另一个节点。
　　(5)向集群添加新节点后最好立即运行均衡器。如果一次添加大量节点，则运行均衡器需要一段时间才能完成其工作。
　　(6)如果确定阈值？这很容易，秩序选择整个集群中节点最低DFS使用百分比即可。不必花费大量的时间了解每个节点使用的DFS百分比，使用"hdfs dfsadmin -report"命令即可找出正确的阈值。阈值越小，均衡器需要执行的工作越多，集群就越均衡。