HDFS简单配置笔记

第五章:HDFS
一、操作HDFS
1、Web Console:端口50070
2、命令行:有两种类型
3、Java API

二、HDFS输出数据的原理(画图):比较重要
1、数据上传的原理(过程)
2、数据下载的原理(过程)

缓存元信息的内存:1000M
	/root/training/hadoop-2.7.3/etc/hadoop
   文件:hadoop-env.sh
	# The maximum amount of heap to use, in MB. Default is 1000.
	#export HADOOP_HEAPSIZE=
	#export HADOOP_NAMENODE_INIT_HEAPSIZE=""	

三、HDFS的高级特性
1、回收站: recyclebin
日志
-rmr: 删除目录,包括子目录
hdfs dfs -rmr /bbb
日志:
17/12/08 20:32:10 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /bbb

	(*)默认,HDFS的回收站是关闭
	(*)启用回收站:参数---> core-site.xml
		 本质:删除数据的时候,实际是一个ctrl+x操作
	
		<property>
		   <name>fs.trash.interval</name>
		   <value>1440</value>
		</property>
		
		日志:
		hdfs dfs -rmr /folder1
		rmr: DEPRECATED: Please use 'rm -r' instead.
		17/12/11 21:05:57 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
		Moved: 'hdfs://bigdata11:9000/folder1' to trash at: hdfs://bigdata11:9000/user/root/.Trash/Current			
	(*)恢复:实际就是cp,拷贝
	     hdfs dfs -cp /user/root/.Trash/Current/input/data.txt /input
		 
		 清空:hdfs dfs -expunge
		 
	(*)补充:Oracle数据库也有回收站
			SQL> select * from tab;

			TNAME                          TABTYPE  CLUSTERID
			------------------------------ ------- ----------
			BIN$WBSNMvxJpWvgUAB/AQBygg==$0 TABLE
			BONUS                          TABLE
			DEPT                           TABLE
			EMP                            TABLE
			RESULT                         TABLE
			SALGRADE                       TABLE

			6 rows selected.

			SQL> -- drop table mydemo1;
			SQL> show recyclebin;
			ORIGINAL NAME    RECYCLEBIN NAME                OBJECT TYPE  DROP TIME
			---------------- ------------------------------ ------------ -------------------
			MYDEMO1          BIN$WBSNMvxJpWvgUAB/AQBygg==$0 TABLE        2017-09-01:06:56:15
			SQL> select count(*) from mydemo1;
			select count(*) from mydemo1
								 *
			ERROR at line 1:
			ORA-00942: table or view does not exist


			SQL> select count(*) from BIN$WBSNMvxJpWvgUAB/AQBygg==$0;
			select count(*) from BIN$WBSNMvxJpWvgUAB/AQBygg==$0
													*
			ERROR at line 1:
			ORA-00933: SQL command not properly ended


			SQL> select count(*) from "BIN$WBSNMvxJpWvgUAB/AQBygg==$0";

			  COUNT(*)
			----------
					30

			SQL> flashback table mydemo1 to before drop;

			Flashback complete.

			SQL> show recyclebin;
			SQL> select count(*) from mydemo1;

			  COUNT(*)
			----------
					30
	
2、快照snapshot:备份  ---> 一般来说:不建议使用快照

	(*)默认:HDFS的快照是禁用的
	(*)第一步:管理员开启某个目录的快照功能
		[-allowSnapshot <snapshotDir>]
		[-disallowSnapshot <snapshotDir>]	

		hdfs dfsadmin -allowSnapshot /mydir1
	
	(*)第二步:使用HDFS的操作命令,创建快照
		[-createSnapshot <snapshotDir> [<snapshotName>]]
		[-deleteSnapshot <snapshotDir> <snapshotName>]	
		[-renameSnapshot <snapshotDir> <oldName> <newName>]	
		
		hdfs dfs -createSnapshot /mydir1 mydir1_backup_01
		日志:Created snapshot /mydir1/.snapshot/mydir1_backup_01
		本质:将数据拷贝一份到当前目录的一个隐藏目录下
		
	(*)继续试验
		hdfs dfs -put student02.txt /mydir1
		hdfs dfs -createSnapshot /mydir1 mydir1_backup_02
		
		对比快照: hdfs snapshotDiff /mydir1 mydir1_backup_01 mydir1_backup_02
		Difference between snapshot mydir1_backup_01 and snapshot mydir1_backup_02 under directory /mydir1:
		M       .
		+       ./student02.txt
		
3、配额quota:(1)名称配额: 规定某个目录下,存放文件(目录)的个数
                             实际的个数:N-1个
				[-setQuota <quota> <dirname>...<dirname>]
				[-clrQuota <dirname>...<dirname>]

				hdfs dfs -mkdir /quota1
				设置该目录的名称配额:3
				hdfs dfsadmin -setQuota 3 /quota1
				
				当我们放第三个文件的时候
				hdfs dfs -put data.txt /quota1
				put: The NameSpace quota (directories and files) of directory /quota1 is exceeded: quota=3 file count=4
				
				
              (2)空间配额: 规定某个目录下,文件的大小
				[-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]
				[-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]
				
				hdfs dfs -mkdir /quota2
				设置该目录的空间配额是:10M
				hdfs dfsadmin -setSpaceQuota 10M /quota2
				
				正确的做法:hdfs dfsadmin -setSpaceQuota 130M /quota2
				
				放一个小于10M的文件,会出错
				Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.DSQuotaExceededException): The DiskSpace quota of /quota2 is exceeded: quota = 10485760 B = 10 MB but diskspace consumed = 134217728 B = 128 MB
				
				注意:尽管数据不到128M,但是占用的数据块依然是128M
				切记:当设置空间配额的时候,这个值不能小于128M

	
4、HDFS安全模式: safemode  ---> HDFS只读
    命令: hdfs dfsadmin -safemode get|wait|leave|enter
	作用:检查数据块的副本率,如果副本率不满足要求,就会进行水平复制

6、HDFS的集群:开个头
		集群的两大功能:负载均衡,高可用(失败迁移)

               (1)NameNode联盟(Federation) ----> HDFS
			   
               (2)HA: HDFS、Yarn、HBase、Storm、Spark ---> 都需要ZooKeeper
原文地址:https://www.cnblogs.com/notes-study/p/8435683.html