HDFS Federation

This guide provides an overview of the HDFS Federation feature and how to configure and manage the federated cluster.
这篇文档包好了hdfs federation特点的概述和如何配置并且管理federation集群。
Background（背景）

HDFS has two main layers:
HDFS有两种主要功能：
•    Namespace
o    Consists of directories, files and blocks.
o    It supports all the namespace related file system operations such as create, delete, modify and list files and directories.
        HDFS的命名空间包含目录、文件和块。
        命名空间支持对HDFS中的目录、文件和块做类似文件系统的创建、修改、删除、列表文件和目录等基本操作。
•    Block Storage Service, which has two parts:
o    Block Management (performed in the Namenode)
    Provides Datanode cluster membership by handling registrations, and periodic heart beats.
    Processes block reports and maintains location of blocks.
    Supports block related operations such as create, delete, modify and get block location.
    Manages replica placement, block replication for under replicated blocks, and deletes blocks that are over replicated.
o    Storage - is provided by Datanodes by storing blocks on the local file system and allowing read/write access.
•    在namenode中的块的管理
o    提供datanode集群的注册、心跳检测等功能。
o    处理块的报告信息和维护块的位置信息。
o    支持块相关的操作，如创建、删除、修改、获取块的位置信息。
o    管理块的冗余信息、创建副本、删除多余的副本等。
•    存储：datanode提供本地文件系统上块的存储、读写、访问等。
The prior HDFS architecture allows only a single namespace for the entire cluster. In that configuration, a single Namenode manages the namespace. HDFS Federation addresses this limitation by adding support for multiple Namenodes/namespaces to HDFS.
以前的HDFS框架整个集群只允许有一个namenode，一个namenode管理所有的命名空间，HDFS联邦通过增加多个namenode来打破这种限制。
Multiple Namenodes/Namespaces
In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated; the Namenodes are independent and do not require coordination with each other. The Datanodes are used as common storage for blocks by all the Namenodes. Each Datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports. They also handle commands from the Namenodes.
    为了水平扩展名称服务，联邦使用多个独立的namenodes/namespaces。所有的namenodes是联邦的，因此，单个namenode是独立的，不需要和其它namenode协调合作。datanode作为统一的块存储设备被所有namenode节点使用。每一个datanode节点都在所有的namenode进行注册。datanode发送心跳信息、块报告到所有namenode,同时执行所有namenode发来的命令。
Users may use ViewFs to create personalized namespace views. ViewFs is analogous to client side mount tables in some Unix/Linux systems.
用户可以应用viewfs创建一个个性化的namespace，viewfs类似于在某些linux/unix系统中的客户端安装表。

Block Pool
A Block Pool is a set of blocks that belong to a single namespace. Datanodes store blocks for all the block pools in the cluster. Each Block Pool is managed independently. This allows a namespace to generate Block IDs for new blocks without the need for coordination with the other namespaces. A Namenode failure does not prevent the Datanode from serving other Namenodes in the cluster.
　一个块池就是属于一个namespace的一组块。datanodes存储集群中所有的块池，它独立于其它块池进行管理。这允许namespace在不与其它namespace交互的情况下生成块的ID，有故障的namenode不影响datanode继续为集群中的其它namenode服务。
A Namespace and its block pool together are called Namespace Volume. It is a self-contained unit of management. When a Namenode/namespace is deleted, the corresponding block pool at the Datanodes is deleted. Each namespace volume is upgraded as a unit, during cluster upgrade.
一个namespace和它的blockpool一起叫做namespace volume，这是一个自己的管理单位，当一个namenode被删除，那么在datanode上的相应的block pool也会被删除。在集群进行升级的时候，每一个namespace volume独立的进行升级。
ClusterID
A ClusterID identifier is used to identify all the nodes in the cluster. When a Namenode is formatted, this identifier is either provided or auto generated. This ID should be used for formatting the other Namenodes into the cluster.
增加一个新的ClusterID标识来在集群中所有的节点。当一个namenode被格式化的时候，这个标识被指定或自动生成，这个ID会用于格式化集群中的其它namenode。
Key Benefits
•    Namespace Scalability - Federation adds namespace horizontal scaling. Large deployments or deployments using lot of small files benefit from namespace scaling by allowing more Namenodes to be added to the cluster.
•    Performance - File system throughput is not limited by a single Namenode. Adding more Namenodes to the cluster scales the file system read/write throughput.
•    Isolation - A single Namenode offers no isolation in a multi user environment. For example, an experimental application can overload the Namenode and slow down production critical applications. By using multiple Namenodes, different categories of applications and users can be isolated to different namespaces.
namespace的可扩展性：HDFS的水平扩展，但是命名空间不能扩展，通过在集群中增加namenode来扩展namespace,以达到大规模部署或者解决有很多小文件的情况。
　　Performance（性能）：在之前的框架中，单个namenode文件系统的吞吐量是有限制的，增加更多的namenode能增大文件系统读写操作的吞吐量。
　　Isolation（隔离）：一个单一的namenode不能对多用户环境进行隔离，一个实验性的应用程序会加大namenode的负载，减慢关键的生产应用程序，在多个namenode情况下，不同类型的程序和用户可以通过不同的namespace来进行隔离。
Federation Configuration
Federation configuration is backward compatible and allows existing single Namenode configurations to work without any change. The new configuration is designed such that all the nodes in the cluster have the same configuration without the need for deploying different configurations based on the type of the node in the cluster.
Federation adds a new NameServiceID abstraction. A Namenode and its corresponding secondary/backup/checkpointer nodes all belong to a NameServiceId. In order to support a single configuration file, the Namenode and secondary/backup/checkpointer configuration parameters are suffixed with the NameServiceID.
    联邦的配置是向后兼容的，允许在不改变任何配置的情况下让当前运行的单节点环境转换成联邦环境。新的配置方案确保了在集群环境中的所有节点的配置文件都是相同的，没有必要因为节点的不同而配置不同的文件。
　　在联邦环境下引入了一个新的概念叫NameServiceID，namenode和secondary/backup/checkpointer都属于这个，为了支持单文件配置， Namenode和secondary/backup/checkpointer的配置参数都以NameServiceID为后缀加到同一个配置文件中。
Configuration:
Step 1: Add the dfs.nameservices parameter to your configuration and configure it with a list of comma separated NameServiceIDs. This will be used by the Datanodes to determine the Namenodes in the cluster.
第一步：把dfs.federation.nameservices配置参数加到配置文件，配置以逗号分隔的所有NameServiceID，这将用于让datanode识别在集群中的所有namenode。
Step 2: For each Namenode and Secondary Namenode/BackupNode/Checkpointer add the following configuration parameters suffixed with the corresponding NameServiceID into the common configuration file:
第二步：对于每一个Namenode和Secondary Namenode/BackupNode/Checkpointer 增加以NameServiceID为后缀的下列配置项：
Daemon    Configuration Parameter
Namenode    dfs.namenode.rpc-address
dfs.namenode.servicerpc-address
dfs.namenode.http-address
dfs.namenode.https-address
dfs.namenode.keytab.file
dfs.namenode.name.dir
dfs.namenode.edits.dir
dfs.namenode.checkpoint.dir
dfs.namenode.checkpoint.edits.dir
Secondary Namenode    dfs.namenode.secondary.http-address
dfs.secondary.namenode.keytab.file
BackupNode    dfs.namenode.backup.address
dfs.secondary.namenode.keytab.file
Here is an example configuration with two Namenodes:
下面是两个namenode的配置示例：
<configuration>
<property>
    <name>dfs.nameservices</name>
    <value>ns1,ns2</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.ns1</name>
    <value>nn-host1:rpc-port</value>
</property>
<property>
    <name>dfs.namenode.http-address.ns1</name>
    <value>nn-host1:http-port</value>
</property>
<property>
    <name>dfs.namenode.secondaryhttp-address.ns1</name>
    <value>snn-host1:http-port</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.ns2</name>
    <value>nn-host2:rpc-port</value>
</property>
<property>
    <name>dfs.namenode.http-address.ns2</name>
    <value>nn-host2:http-port</value>
</property>
<property>
    <name>dfs.namenode.secondaryhttp-address.ns2</name>
    <value>snn-host2:http-port</value>
</property>

.... Other common configuration ...
</configuration>
Formatting Namenodes
Step 1: Format a Namenode using the following command:
第一步：使用下面的命令格式化namenode。
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
Choose a unique cluster_id which will not conflict other clusters in your environment. If a cluster_id is not provided, then a unique one is auto generated.
选择一个唯一的cluster_id，不能和当前环境中的其它集群相同，如果不提供这个参数，则系统会给一个默认值。
Step 2: Format additional Namenodes using the following command:
第二步：使用如下命令格式化其它的namenode.
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
Note that the cluster_id in step 2 must be same as that of the cluster_id in step 1. If they are different, the additional Namenodes will not be part of the federated cluster.
注意，cluster_id必须和上一步中的相同，如果不同，其它的namenode将不是联邦的一部份。
Upgrading from an older release and configuring federation
Older releases only support a single Namenode. Upgrade the cluster to newer release in order to enable federation During upgrade you can provide a ClusterID as follows:
老版本只支持一个namenode，在集群升级到更新的版本可以使用federation你可以指定一个cluster id，如下：
[hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId <cluster_ID>
If cluster_id is not provided, it is auto generated.
如果cluster_id不指定，它将自定产生。
Adding a new Namenode to an existing HDFS cluster
Perform the following steps:
•    Add dfs.nameservices to the configuration.
•    Update the configuration with the NameServiceID suffix. Configuration key names changed post release 0.20. You must use the new configuration parameter names in order to use federation.
•    Add the new Namenode related config to the configuration file.
•    Propagate the configuration file to the all the nodes in the cluster.
•    Start the new Namenode and Secondary/Backup.
•    Refresh the Datanodes to pickup the newly added Namenode by running the following command against all the Datanodes in the cluster:
•    [hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>

按如下步骤操作：
    增加dfs.nameservices配置项到配置文件。
　　用NameServiceID为后缀更新其它配置项，从0.2版本，配置项的Key发生了变化，你需要使用新的配置项名称。
　　增加一个新的namenode配置信息到配置文件。
　　把配置文件分发到集群中所有的节点。
　　启动新的Namenode,Secondary/Backup.
　　通过运行下面的命令来更新datanode以识别新的namenode：
    [hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
Managing the cluster
Starting and stopping cluster
To start the cluster run the following command:
启动集群运行如下命令：
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
To stop the cluster run the following command:
停止集群运行如下命令：
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
These commands can be run from any node where the HDFS configuration is available. The command uses the configuration to determine the Namenodes in the cluster and then starts the Namenode process on those nodes. The Datanodes are started on the nodes specified in the slaves file. The script can be used as a reference for building your own scripts to start and stop the cluster.
这些命令可以在有配置文件的任何一个节点上执行即可。这个命令使用配置文件来确定集群中的namenodes，然后启动这些节点。只启动在slaves文件中的datanode节点。你可以编写自己的脚本来启动和停止集群。
Balancer
The Balancer has been changed to work with multiple Namenodes. The Balancer can be run using the command:
Balancer 已经针对多个namenodes进行了更改，以保证能够平衡集群环境。可以运行如下命令：
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh start balancer [-policy <policy>]
The policy parameter can be any of the following:
policy参数可以是：
•    datanode - this is the default policy. This balances the storage at the Datanode level. This is similar to balancing policy from prior releases.
•    blockpool - this balances the storage at the block pool level which also balances at the Datanode level.
node 这是默认策略，这是在datanode一级进行存储平衡，和以前的版本一样。
blockpool 这是在blockpool一级的存储平衡，这将同时平衡blockpool和datanode的存储。
Note that Balancer only balances the data and does not balance the namespace. For the complete command usage, see balancer.
注意：Balancer 只平衡数据，不平衡namesapce。完整的命令说明，参考balancer。
Decommissioning
Decommissioning is similar to prior releases. The nodes that need to be decomissioned are added to the exclude file at all of the Namenodes. Each Namenode decommissions its Block Pool. When all the Namenodes finish decommissioning a Datanode, the Datanode is considered decommissioned.
退役和之前的版本很像，需要退役的节点增加到namenodes的exclude文件，每一个namenode退役它自己的block pool。当所有的namenode结束了一个datanode的退役，datanode才被认为已退役。
Step 1: To distribute an exclude file to all the Namenodes, use the following command:
第一步：使用下面的命令分发exclude文件到所有namenodes：
[hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>
Step 2: Refresh all the Namenodes to pick up the new exclude file:
第二步：更新所有namenodes去获取exclude文件。
[hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
The above command uses HDFS configuration to determine the configured Namenodes in the cluster and refreshes them to pick up the new exclude file.
上面的命令使用HDFS配置文件去确定集群环境中的所有namenode，然后更新所有namenodes去获取exclude文件.
Cluster Web Console
Similar to the Namenode status web page, when using federation a Cluster Web Console is available to monitor the federated cluster at http://<any_nn_host:port>/dfsclusterhealth.jsp. Any Namenode in the cluster can be used to access this web page.
和namenode状态web页面一样，联邦环境增加了一个web控制台来监控联邦集群。地址是：http://<any_nn_host:port>/dfsclusterhealth.jsp.集群中的任何一个namenode都可以访问到这个页面。
The Cluster Web Console provides the following information:
•    A cluster summary that shows the number of files, number of blocks, total configured storage capacity, and the available and used storage for the entire cluster.
•    A list of Namenodes and a summary that includes the number of files, blocks, missing blocks, and live and dead data nodes for each Namenode. It also provides a link to access each Namenode’s web UI.
•    The decommissioning status of Datanodes.
这个页面提供如下一些信息：
1.    整个集群的文件数、块数、总的配置的存储容量、可用和已用的存储的信息。
2.    提供一个所有namenodes的列表及其所包含的文件数、块数、丢失的块数、在线和离线的datanode。提供一个方便的连接到namenodes Web界面的URL。
3.    同时也提供datanode的退役状态。