HDFS命令行工具

  一、根据hadoop fs -help 命令自练习相应命令行
1
[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -help 2 Usage: hadoop fs [generic options] 3 [-appendToFile <localsrc> ... <dst>] 4 [-cat [-ignoreCrc] <src> ...] 5 [-checksum <src> ...] 6 [-chgrp [-R] GROUP PATH...] 7 [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] 8 [-chown [-R] [OWNER][:[GROUP]] PATH...] 9 [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] 10 [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] 11 [-count [-q] [-h] [-v] <path> ...] 12 [-cp [-f] [-p | -p[topax]] <src> ... <dst>] 13 [-createSnapshot <snapshotDir> [<snapshotName>]] 14 [-deleteSnapshot <snapshotDir> <snapshotName>] 15 [-df [-h] [<path> ...]] 16 [-du [-s] [-h] <path> ...] 17 [-expunge] 18 [-find <path> ... <expression> ...] 19 [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] 20 [-getfacl [-R] <path>] 21 [-getfattr [-R] {-n name | -d} [-e en] <path>] 22 [-getmerge [-nl] <src> <localdst>] 23 [-help [cmd ...]] 24 [-ls [-d] [-h] [-R] [<path> ...]] 25 [-mkdir [-p] <path> ...] 26 [-moveFromLocal <localsrc> ... <dst>] 27 [-moveToLocal <src> <localdst>] 28 [-mv <src> ... <dst>] 29 [-put [-f] [-p] [-l] <localsrc> ... <dst>] 30 [-renameSnapshot <snapshotDir> <oldName> <newName>] 31 [-rm [-f] [-r|-R] [-skipTrash] <src> ...] 32 [-rmdir [--ignore-fail-on-non-empty] <dir> ...] 33 [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] 34 [-setfattr {-n name [-v value] | -x name} <path>] 35 [-setrep [-R] [-w] <rep> <path> ...] 36 [-stat [format] <path> ...] 37 [-tail [-f] <file>] 38 [-test -[defsz] <path>] 39 [-text [-ignoreCrc] <src> ...] 40 [-touchz <path> ...] 41 [-usage [cmd ...]] 42 43 -appendToFile <localsrc> ... <dst> : 44 Appends the contents of all the given local files to the given dst file. The dst 45 file will be created if it does not exist. If <localSrc> is -, then the input is 46 read from stdin. 47 48 -cat [-ignoreCrc] <src> ... : 49 Fetch all files that match the file pattern <src> and display their content on 50 stdout. 51 52 -checksum <src> ... : 53 Dump checksum information for files that match the file pattern <src> to stdout. 54 Note that this requires a round-trip to a datanode storing each block of the 55 file, and thus is not efficient to run on a large number of files. The checksum 56 of a file depends on its content, block size and the checksum algorithm and 57 parameters used for creating the file. 58 59 -chgrp [-R] GROUP PATH... : 60 This is equivalent to -chown ... :GROUP ... 61 62 -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH... : 63 Changes permissions of a file. This works similar to the shell's chmod command 64 with a few exceptions. 65 66 -R modifies the files recursively. This is the only option currently 67 supported. 68 <MODE> Mode is the same as mode used for the shell's command. The only 69 letters recognized are 'rwxXt', e.g. +t,a+r,g-w,+rwx,o=r. 70 <OCTALMODE> Mode specifed in 3 or 4 digits. If 4 digits, the first may be 1 or 71 0 to turn the sticky bit on or off, respectively. Unlike the 72 shell command, it is not possible to specify only part of the 73 mode, e.g. 754 is same as u=rwx,g=rx,o=r. 74 75 If none of 'augo' is specified, 'a' is assumed and unlike the shell command, no 76 umask is applied. 77 78 -chown [-R] [OWNER][:[GROUP]] PATH... : 79 Changes owner and group of a file. This is similar to the shell's chown command 80 with a few exceptions. 81 82 -R modifies the files recursively. This is the only option currently 83 supported. 84 85 If only the owner or group is specified, then only the owner or group is 86 modified. The owner and group names may only consist of digits, alphabet, and 87 any of [-_./@a-zA-Z0-9]. The names are case sensitive. 88 89 WARNING: Avoid using '.' to separate user name and group though Linux allows it. 90 If user names have dots in them and you are using local file system, you might 91 see surprising results since the shell command 'chown' is used for local files. 92 93 -copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> : 94 Identical to the -put command. 95 96 -copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst> : 97 Identical to the -get command. 98 99 -count [-q] [-h] [-v] <path> ... : 100 Count the number of directories, files and bytes under the paths 101 that match the specified file pattern. The output columns are: 102 DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 103 or, with the -q option: 104 QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA 105 DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 106 The -h option shows file sizes in human readable format. 107 The -v option displays a header line. 108 109 -cp [-f] [-p | -p[topax]] <src> ... <dst> : 110 Copy files that match the file pattern <src> to a destination. When copying 111 multiple files, the destination must be a directory. Passing -p preserves status 112 [topax] (timestamps, ownership, permission, ACLs, XAttr). If -p is specified 113 with no <arg>, then preserves timestamps, ownership, permission. If -pa is 114 specified, then preserves permission also because ACL is a super-set of 115 permission. Passing -f overwrites the destination if it already exists. raw 116 namespace extended attributes are preserved if (1) they are supported (HDFS 117 only) and, (2) all of the source and target pathnames are in the /.reserved/raw 118 hierarchy. raw namespace xattr preservation is determined solely by the presence 119 (or absence) of the /.reserved/raw prefix and not by the -p option. 120 121 -createSnapshot <snapshotDir> [<snapshotName>] : 122 Create a snapshot on a directory 123 124 -deleteSnapshot <snapshotDir> <snapshotName> : 125 Delete a snapshot from a directory 126 127 -df [-h] [<path> ...] : 128 Shows the capacity, free and used space of the filesystem. If the filesystem has 129 multiple partitions, and no path to a particular partition is specified, then 130 the status of the root partitions will be shown. 131 132 -h Formats the sizes of files in a human-readable fashion rather than a number 133 of bytes. 134 135 -du [-s] [-h] <path> ... : 136 Show the amount of space, in bytes, used by the files that match the specified 137 file pattern. The following flags are optional: 138 139 -s Rather than showing the size of each individual file that matches the 140 pattern, shows the total (summary) size. 141 -h Formats the sizes of files in a human-readable fashion rather than a number 142 of bytes. 143 144 Note that, even without the -s option, this only shows size summaries one level 145 deep into a directory. 146 147 The output is in the form 148 size disk space consumed name(full path) 149 150 -expunge : 151 Empty the Trash 152 153 -find <path> ... <expression> ... : 154 Finds all files that match the specified expression and 155 applies selected actions to them. If no <path> is specified 156 then defaults to the current working directory. If no 157 expression is specified then defaults to -print. 158 159 The following primary expressions are recognised: 160 -name pattern 161 -iname pattern 162 Evaluates as true if the basename of the file matches the 163 pattern using standard file system globbing. 164 If -iname is used then the match is case insensitive. 165 166 -print 167 -print0 168 Always evaluates to true. Causes the current pathname to be 169 written to standard output followed by a newline. If the -print0 170 expression is used then an ASCII NULL character is appended rather 171 than a newline. 172 173 The following operators are recognised: 174 expression -a expression 175 expression -and expression 176 expression expression 177 Logical AND operator for joining two expressions. Returns 178 true if both child expressions return true. Implied by the 179 juxtaposition of two expressions and so does not need to be 180 explicitly specified. The second expression will not be 181 applied if the first fails. 182 183 -get [-p] [-ignoreCrc] [-crc] <src> ... <localdst> : 184 Copy files that match the file pattern <src> to the local name. <src> is kept. 185 When copying multiple files, the destination must be a directory. Passing -p 186 preserves access and modification times, ownership and the mode. 187 188 -getfacl [-R] <path> : 189 Displays the Access Control Lists (ACLs) of files and directories. If a 190 directory has a default ACL, then getfacl also displays the default ACL. 191 192 -R List the ACLs of all files and directories recursively. 193 <path> File or directory to list. 194 195 -getfattr [-R] {-n name | -d} [-e en] <path> : 196 Displays the extended attribute names and values (if any) for a file or 197 directory. 198 199 -R Recursively list the attributes for all files and directories. 200 -n name Dump the named extended attribute value. 201 -d Dump all extended attribute values associated with pathname. 202 -e <encoding> Encode values after retrieving them.Valid encodings are "text", 203 "hex", and "base64". Values encoded as text strings are enclosed 204 in double quotes ("), and values encoded as hexadecimal and 205 base64 are prefixed with 0x and 0s, respectively. 206 <path> The file or directory. 207 208 -getmerge [-nl] <src> <localdst> : 209 Get all the files in the directories that match the source file pattern and 210 merge and sort them to only one file on local fs. <src> is kept. 211 212 -nl Add a newline character at the end of each file. 213 214 -help [cmd ...] : 215 Displays help for given command or all commands if none is specified. 216 217 -ls [-d] [-h] [-R] [<path> ...] : 218 List the contents that match the specified file pattern. If path is not 219 specified, the contents of /user/<currentUser> will be listed. Directory entries 220 are of the form: 221 permissions - userId groupId sizeOfDirectory(in bytes) 222 modificationDate(yyyy-MM-dd HH:mm) directoryName 223 224 and file entries are of the form: 225 permissions numberOfReplicas userId groupId sizeOfFile(in bytes) 226 modificationDate(yyyy-MM-dd HH:mm) fileName 227 228 -d Directories are listed as plain files. 229 -h Formats the sizes of files in a human-readable fashion rather than a number 230 of bytes. 231 -R Recursively list the contents of directories. 232 233 -mkdir [-p] <path> ... : 234 Create a directory in specified location. 235 236 -p Do not fail if the directory already exists 237 238 -moveFromLocal <localsrc> ... <dst> : 239 Same as -put, except that the source is deleted after it's copied. 240 241 -moveToLocal <src> <localdst> : 242 Not implemented yet 243 244 -mv <src> ... <dst> : 245 Move files that match the specified file pattern <src> to a destination <dst>. 246 When moving multiple files, the destination must be a directory. 247 248 -put [-f] [-p] [-l] <localsrc> ... <dst> : 249 Copy files from the local file system into fs. Copying fails if the file already 250 exists, unless the -f flag is given. 251 Flags: 252 253 -p Preserves access and modification times, ownership and the mode. 254 -f Overwrites the destination if it already exists. 255 -l Allow DataNode to lazily persist the file to disk. Forces 256 replication factor of 1. This flag will result in reduced 257 durability. Use with care. 258 259 -renameSnapshot <snapshotDir> <oldName> <newName> : 260 Rename a snapshot from oldName to newName 261 262 -rm [-f] [-r|-R] [-skipTrash] <src> ... : 263 Delete all files that match the specified file pattern. Equivalent to the Unix 264 command "rm <src>" 265 266 -skipTrash option bypasses trash, if enabled, and immediately deletes <src> 267 -f If the file does not exist, do not display a diagnostic message or 268 modify the exit status to reflect an error. 269 -[rR] Recursively deletes directories 270 271 -rmdir [--ignore-fail-on-non-empty] <dir> ... : 272 Removes the directory entry specified by each directory argument, provided it is 273 empty. 274 275 -setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>] : 276 Sets Access Control Lists (ACLs) of files and directories. 277 Options: 278 279 -b Remove all but the base ACL entries. The entries for user, group 280 and others are retained for compatibility with permission bits. 281 -k Remove the default ACL. 282 -R Apply operations to all files and directories recursively. 283 -m Modify ACL. New entries are added to the ACL, and existing entries 284 are retained. 285 -x Remove specified ACL entries. Other ACL entries are retained. 286 --set Fully replace the ACL, discarding all existing entries. The 287 <acl_spec> must include entries for user, group, and others for 288 compatibility with permission bits. 289 <acl_spec> Comma separated list of ACL entries. 290 <path> File or directory to modify. 291 292 -setfattr {-n name [-v value] | -x name} <path> : 293 Sets an extended attribute name and value for a file or directory. 294 295 -n name The extended attribute name. 296 -v value The extended attribute value. There are three different encoding 297 methods for the value. If the argument is enclosed in double quotes, 298 then the value is the string inside the quotes. If the argument is 299 prefixed with 0x or 0X, then it is taken as a hexadecimal number. If 300 the argument begins with 0s or 0S, then it is taken as a base64 301 encoding. 302 -x name Remove the extended attribute. 303 <path> The file or directory. 304 305 -setrep [-R] [-w] <rep> <path> ... : 306 Set the replication level of a file. If <path> is a directory then the command 307 recursively changes the replication factor of all files under the directory tree 308 rooted at <path>. 309 310 -w It requests that the command waits for the replication to complete. This 311 can potentially take a very long time. 312 -R It is accepted for backwards compatibility. It has no effect. 313 314 -stat [format] <path> ... : 315 Print statistics about the file/directory at <path> in the specified format. 316 Format accepts filesize in blocks (%b), group name of owner(%g), filename (%n), 317 block size (%o), replication (%r), user name of owner(%u), modification date 318 (%y, %Y) 319 320 -tail [-f] <file> : 321 Show the last 1KB of the file. 322 323 -f Shows appended data as the file grows. 324 325 -test -[defsz] <path> : 326 Answer various questions about <path>, with result via exit status. 327 -d return 0 if <path> is a directory. 328 -e return 0 if <path> exists. 329 -f return 0 if <path> is a file. 330 -s return 0 if file <path> is greater than zero bytes in size. 331 -z return 0 if file <path> is zero bytes in size, else return 1. 332 333 -text [-ignoreCrc] <src> ... : 334 Takes a source file and outputs the file in text format. 335 The allowed formats are zip and TextRecordInputStream and Avro. 336 337 -touchz <path> ... : 338 Creates a file of zero length at <path> with current time as the timestamp of 339 that <path>. An error is returned if the file exists with non-zero length 340 341 -usage [cmd ...] : 342 Displays the usage for given command or all commands if none is specified. 343 344 Generic options supported are 345 -conf <configuration file> specify an application configuration file 346 -D <property=value> use value for given property 347 -fs <local|namenode:port> specify a namenode 348 -jt <local|resourcemanager:port> specify a ResourceManager 349 -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster 350 -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. 351 -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. 352 353 The general command line syntax is 354 bin/hadoop command [genericOptions] [commandOptions]


1、ls lsr
 1 [hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -ls /
 2 18/01/28 22:44:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 3 Found 1 items
 4 drwxr-xr-x   - hadoop supergroup          0 2018-01-28 19:44 /user
 5 [hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -lsr /
 6 lsr: DEPRECATED: Please use 'ls -R' instead.
 7 18/01/28 22:44:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 8 drwxr-xr-x   - hadoop supergroup          0 2018-01-28 19:44 /user
 9 drwxr-xr-x   - hadoop supergroup          0 2018-01-28 19:44 /user/hadoop
10 drwxr-xr-x   - hadoop supergroup          0 2018-01-28 19:46 /user/hadoop/test
11 -rw-r--r--   2 hadoop supergroup         37 2018-01-28 19:46 /user/hadoop/test/lijieran.txt

2、mkdir

hadoop fs -mkdir

3、df   du

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -df /
18/01/28 22:49:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Filesystem               Size   Used    Available  Use%
hdfs://h201:9000  37248688128  94208  27015729152    0%

hadoop fs -count report/archive/RefreshActiveStatus/quote/2019-07-22

1            2             280507 report/archive/RefreshActiveStatus/quote/2019-07-22

1            2               1368 report/archive/RefreshActiveStatus/instrument/2019-07-22

1            2          116935428 report/archive/RefreshSummaryCurrency/instrument/2019-07-22

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -du /user/hadoop
18/01/28 22:50:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
37  74  /user/hadoop/test

4、put   get   getmerge 目录

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -get /user/hadoop/test/lijieran.txt /home/hadoop

[hadoop@h201 ~]$ pwd
/home/hadoop
[hadoop@h201 ~]$ ls
hadoop-2.6.0-cdh5.5.2  hadoop-native-64-2.6.0.tar  lijieran.txt

put

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -put /home/hadoop/lijieran.txt /user/hadoop/test

一般脚本文件会这样写

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -put /home/hadoop/lijieran.txt hdfs://192.168.121.132:9000/user/hadoop/test

这里介绍一下从节点备份的数据:

[hadoop@h202 subdir0]$ pwd
/home/hadoop/hadoop-2.6.0-cdh5.5.2/dfs/data/current/BP-1169871882-192.168.121.132-1516635548925/current/finalized/subdir0/subdir0

[hadoop@h202 subdir0]$ ls -l
total 12
-rw-rw-r-- 1 hadoop hadoop  37 Jan 28 19:46 blk_1073741825
-rw-rw-r-- 1 hadoop hadoop  11 Jan 28 19:46 blk_1073741825_1001.meta

blk_1073741825:数据文件,blk_1073741825_1001.meta:验证文件

块大小默认为64M,超出64M会在分出一个文件

5、rm  rmr

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -rm /user/hadoop/test/lijieran.txt

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fs -rmr /user/hadoop/test

6、fsck

检查dfs的文件的健康状况  只能运行在master上

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop fsck /user/hadoop
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

18/01/28 23:15:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://h201:50070
FSCK started by hadoop (auth:SIMPLE) from /192.168.121.132 for path /user/hadoop at Sun Jan 28 23:15:58 CST 2018
.Status: HEALTHY
 Total size:    37 B
 Total dirs:    2
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 37 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     2.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          2
 Number of racks:               1
FSCK ended at Sun Jan 28 23:15:58 CST 2018 in 2 milliseconds


The filesystem under path '/user/hadoop' is HEALTHY

7、dfsadmin

[hadoop@h201 hadoop-2.6.0-cdh5.5.2]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

18/01/28 23:19:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 37248688128 (34.69 GB)
Present Capacity: 27015819264 (25.16 GB)
DFS Remaining: 27015725056 (25.16 GB)
DFS Used: 94208 (92 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.121.131:50010 (h202)
Hostname: h202
Decommission Status : Normal
Configured Capacity: 18624344064 (17.35 GB)
DFS Used: 49152 (48 KB)
Non DFS Used: 5170069504 (4.82 GB)
DFS Remaining: 13454225408 (12.53 GB)
DFS Used%: 0.00%
DFS Remaining%: 72.24%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jan 28 23:19:04 CST 2018


Name: 192.168.121.130:50010 (h203)
Hostname: h203
Decommission Status : Normal
Configured Capacity: 18624344064 (17.35 GB)
DFS Used: 45056 (44 KB)
Non DFS Used: 5062799360 (4.72 GB)
DFS Remaining: 13561499648 (12.63 GB)
DFS Used%: 0.00%
DFS Remaining%: 72.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jan 28 23:19:01 CST 2018

视图模式:http://192.168.121.132:50070

 

二、参数配置

1、HDFS hdfs-site.xml 参数配置

  • dfs.name.dir
  • NameNode 元数据存放位置
  • 默认值:使用core-site.xml中的hadoop.tmp.dir/dfs.name
  • dfs.block.size
  • 对于新文件切分的大小,单位byte.默认64M,建议128M。每一个节点都要指定,包括客户端
  • 默认值:67108864
  • dfs.data.dir
  • DataNode在本地磁盘存放block的位置,可以是以逗号分隔的目录列表,DataNode循环向磁盘中写入数据,每个DataNode可单独指定与其他DataNode不一样
  • 默认值:${hadoop.tmp.dir}/dfs/data
  • dfs.namenode.handler.count
  • NameNode用来处理来自DataNode的RPC请求的线程数量
  • 建议设置为DataNode数量的10%,一般在10~200个之间
  • 如设置太小,DataNode在传输数据的时候日志中会报告“connecton refused“信息
  • 在NameNode上设定
  • 默认值:10
  • dfs.datanode.handler.count
  • DataNode用来连接NameNode的RPC请求的线程数量
  • 取决于系统的繁忙程度
  • 设置太小会导致性能下降甚至报错
  • 在DataNode上设定
  • 默认值:3
  • dfs.datanode.max.xcievers
  • DataNode可以同时处理的数据传输连接数
  • 默认值:256
  • 建议值:4096
  • dfs.permissions
  • 如果是true则检查权限,否则不检查(每一个人都可以存取文件)
  • 于NameNode上设定
  • 默认值:true
  • dfs.datanode.du.reserved
  • 在每个卷上面HDFS不能使用的空间大小
  • 在每个DataNode上面设定
  • 默认值:0
  • 建议为10737418240,即10G。需要结合MapReduce场景设置。
  • dfs.datanode.failed.volumes.tolerated
  • DataNode可以容忍损块的磁盘数量,超过这个数量DataNode将会离线,所有在这个节点上面的block将会被重新复制
  • 默认是0,但是在有多块磁盘的时候一般会增大这个值
  • dfs.replication
  • 在文件被写入的时候,每一块将要被复制多少份
  • 默认是3份。建议3份
  • 在客户端上设定
  • 通常也需要在DataNode上设定

2、HDFS core-site.xml 参数配置

  • fs.default.name
  • 文件系统的名字。通常是NameNode的hostname与port
  • 需要在每一个需要访问集群的机器上指定,包括集群中的节点
  • 例如:hdfs://<your_namenode>:9000/
  • fs.checkpoint.dir
  • 以逗号分隔的文件夹列表,SecondNameNode用来存储checkpoint image文件
  • 如果多于一个文件夹,那么都会被写入数据
  • 需要在SecondNameNode上设定
  • 默认值:${hadoop.tmp.dir}/dfs/namesecondary
  • hadoop.tmp.dir
  • HDFS与本地磁盘的临时文件
  • 默认是/tmp/hadoop-${user.name}.需要在所有的节点中设定
  • fs.trash.interval
  • 当一个文件被删掉后,它会被放到用户目录的.Trash目录下,而不是立即删掉
  • 经过此参数设置的分钟数之后,再删掉数据
  • 默认是0,禁用此功能,建议1440(一天)
  • io.file.buffer.size
  • 设定在读写数据时的缓存大小,应该为硬件分页大小的2倍
  • 默认是4096,建议为65536 ( 64K)
原文地址:https://www.cnblogs.com/jieran/p/8372871.html