使用Hadoop WebHDFS访问HDFS

           使用Hadoop WebHDFS访问HDFS

                                   作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

  webHDFS和HttpFS都是Hadoop的HTTP/HTTPS REST接口。这两个接口使我们能够读取HDFS数据并写入,以及执行与HDFS相关的几个管理命令。可以将它们嵌入程序,脚本或通过命令行工具(如curl或wget)来使用这两个接口。

  WebHDFS不支持高可用NameNode架构,但HttpFS支持。

一.WebHDFS概述

  当在Hadoop集群中运行的应用程序想要访问HDFS数据时,它们使用Hadoop的本地客户端在HDFS上工作。但是,可能需要从集群外部访问HDFS,以便处理,存储和检索HDFS数据。

  如果应用程序需要使用本机HDFS协议,则必须在运行应用程序的服务器上安装Hadoop,并且要提供与应用程序的Java依赖。

  Hadoop的WebHDFS提供了一组强大的HTTP REST API。REST是一种用于构建大规模Web服务的架构风格,其允许应用程序远程访问和使用HDFS。除了便于从外部访问HDFS之外,当尝试使用两个Hadoop(每个都运行不同版本的Hadoop)集群时,WebHDFS也很有用。

  由于WebHDFS和MapReduce,HDFS版本无关,因为它使用REST API,所以它可以在两个集群中使用。例如,当需要使用DistCp实用程序在两个集群之间执行数据复制时,可以使用它。

  当使用WebHDFS远程访问HDFS数据时,不需要在客户端上安装Hadoop。可以使用curl和wget等知名工具来访问HDFS数据。WebHDFS支持直接连接到Hadoop集群执行所有HDFS操作。

  WebHDFS使用基本的HTTP操作,如GET,PUT,POST和DELETE来远程操作HDFS文件系统。

  博主推荐阅读:
    https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

  温馨提示:
    如果你得HDFS集群启用来了Kerberos安全认证,则你应该需要关心以下参数(修改hdfs-site..xml):
      dfs.web.authentication.kerberos.principal
      dfs.web.authentication.kerberos.keytab

二.使用HDFS命令行工具通过WebHDFS REST API访问HDFS实战案例

  使用WebHDFS很简单,需要做的就是将HDFS文件系统URI替换为HTTP URL,接下来我们看一下几个案例。

1>.列出"/yinzhengjie"的HDFS目录所有文件和目录

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /        #需要注意的是,我们在使用命令行工具并没有指定文件系统的名称则使用"core-site.xml"文件中"fs.defaultFS"属性定义的默认文件系统名称。
Found 4 items
drwxr-xr-x   - root admingroup          0 2020-08-21 16:40 /bigdata
drwxr-xr-x   - root admingroup          0 2020-08-20 19:26 /system
drwx------   - root admingroup          0 2020-08-14 19:19 /user
drwxr-xr-x   - root admingroup          0 2020-08-21 18:42 /yinzhengjie
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie
Found 3 items
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/            
Found 3 items
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie        #使用webhdfs协议访问HDFS
Found 3 items
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie        #使用webhdfs协议访问HDFS

2>.将本地文件上传到HDFS集群中

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 3 items
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -put /etc/fstab webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/fstab        #将本地文件"/etc/fstab"文件上传到HDFS的"/yinzhengjie/"目录
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 4 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -put /etc/fstab webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/fstab        #将本地文件"/etc/fstab"文件上传到HDFS的"/yinzhengjie/"目录

3>.下载HDFS文件系统中的文件或目录

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 4 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# ll
total 0
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d      #下载目录
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# ll
total 0
drwxr-xr-x 2 root root 229 Aug 31 14:32 yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# ll yum.repos.d/
total 40
-rw-r--r-- 1 root root 1664 Aug 31 14:32 CentOS-Base.repo
-rw-r--r-- 1 root root 1309 Aug 31 14:32 CentOS-CR.repo
-rw-r--r-- 1 root root  649 Aug 31 14:32 CentOS-Debuginfo.repo
-rw-r--r-- 1 root root  314 Aug 31 14:32 CentOS-fasttrack.repo
-rw-r--r-- 1 root root  630 Aug 31 14:32 CentOS-Media.repo
-rw-r--r-- 1 root root 1331 Aug 31 14:32 CentOS-Sources.repo
-rw-r--r-- 1 root root 5701 Aug 31 14:32 CentOS-Vault.repo
-rw-r--r-- 1 root root  951 Aug 31 14:32 epel.repo
-rw-r--r-- 1 root root 1050 Aug 31 14:32 epel-testing.repo
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d      #下载目录
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 4 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# ll
total 0
drwxr-xr-x 2 root root 229 Aug 31 14:32 yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz       #下载文件
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# ll
total 4
-rw-r--r-- 1 root root  69 Aug 31 14:33 wc.txt.gz
drwxr-xr-x 2 root root 229 Aug 31 14:32 yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz       #下载文件

4>.删除HDFS文件系统中的文件或目录

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 4 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm -r webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d        #删除目录
20/08/31 14:38:12 INFO fs.TrashPolicyDefault: Moved: 'webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d' to trash at: webhdfs://hadoop101.yinzhengjie.com:50070/user/root/.Tr
ash/Current/yinzhengjie/yum.repos.d
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 3 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm -r webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d        #删除目录
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 3 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
-rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz            #删除文件
20/08/31 14:38:28 INFO fs.TrashPolicyDefault: Moved: 'webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz' to trash at: webhdfs://hadoop101.yinzhengjie.com:50070/user/root/.Tras
h/Current/yinzhengjie/wc.txt.gz
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz            #删除文件

5>.其它操作

  有了上面的4个案例打底,想必接下来让你自行探索其它使用方法估计问题不大,和我之前分享的hdfs dfs工具的使用方法基本雷同,只不过需要将hdfs协议换成webhdfs协议即可。

  博主推荐阅读:
    https://www.cnblogs.com/yinzhengjie2020/p/13296680.html

三.使用curl工具通过WebHDFS REST API访问HDFS实战案例

  WebHDFS真的是一个相当全面的工具,其包括许多用于访问和使用HDFS数据的命令。接下来我们就来看如何使用curl工具通过WebHDFS REST API访问HDFS。

  关于curl工具的使用我这里就不赘述了,感兴趣的小伙伴可以自行参考网上的博客,该工具的基本使用方法查看我的笔记即可。curl常见的选项如下所示:
    -A/--user-agent <string>:
      设置用户代理发送给服务器

    -e/--referer <URL>:
      来源网址

    --cacert <file>:
      CA证书 (SSL)

    -k/--insecure:
      允许忽略证书进行 SSL 连接

    --compressed:
      要求返回是压缩的格式

    -H/--header <line>:
      自定义首部信息传递给服务器

    -i:
      显示页面内容,包括报文首部信息

    -I/--head:
      只显示响应报文首部信息

    -D/--dump-header <file>:
      将url的header信息存放在指定文件中

    --basic:
      使用HTTP基本认证

    -u/--user <user[:password]>:
      设置服务器的用户和密码

    -L:
      如果有3xx响应码,重新发请求到新位置
  
    -O:
      使用URL中默认的文件名保存文件到本地

    -o <file>:
      将网络文件保存为指定的文件中

    --limit-rate <rate>:
      设置传输速度

    -0/--http1.0:
      数字0,使用HTTP 1.0

    -v/--verbose:
      更详细

    -C:
      选项可对文件使用断点续传功能

    -c/--cookie-jar <file name>:
      将url中cookie存放在指定文件中

    -x/--proxy <proxyhost[:port]>:
      指定代理服务器地址

    -X/--request <command>:
    向服务器发送指定请求方法

    -U/--proxy-user <user:password>:
      代理服务器用户和密码

    -T:
      选项可将指定的本地文件上传到FTP服务器上

    --data/-d:
      方式指定使用POST方式传递数据

    -b name=data:
      从服务器响应set-cookie得到值,返回给服务器
 
  博主推荐阅读:
    https://www.cnblogs.com/yinzhengjie/p/7719804.html

1>.读取HDFS中的文件(本案例读取的是"/yinzhengjie/hosts")

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=OPEN&user.name=yinzhengjie"      #op指定操作,而user.name指定访问URI的用户
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 07:39:16 GMT
Date: Mon, 31 Aug 2020 07:39:16 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 07:39:16 GMT
Date: Mon, 31 Aug 2020 07:39:16 GMT
Pragma: no-cache
Content-Type: application/octet-stream
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=yinzhengjie&p=yinzhengjie&t=simple&e=1598895556829&s=ak8QrD/3I7HowelGDzH9uvnDeAGBihJhCbCm0wVqS2M="; Path=/; HttpOnly
Location: http://hadoop104.yinzhengjie.com:50075/webhdfs/v1/yinzhengjie/hosts?op=OPEN&user.name=yinzhengjie&namenoderpcaddress=hadoop101.yinzhengjie.com:9000&offset=0
Content-Length: 0

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 371

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

#Hadoop 2.x
172.200.6.101 hadoop101.yinzhengjie.com
172.200.6.102 hadoop102.yinzhengjie.com
172.200.6.103 hadoop103.yinzhengjie.com
172.200.6.104 hadoop104.yinzhengjie.com
172.200.6.105 hadoop105.yinzhengjie.com
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=OPEN&user.name=yinzhengjie"     #op指定操作,而user.name指定访问URI的用户

2>.检查HDFS目录的状态

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - root admingroup          0 2020-08-21 16:40 /bigdata
drwxr-xr-x   - root admingroup          0 2020-08-20 19:26 /system
drwx------   - root admingroup          0 2020-08-14 19:19 /user
drwxr-xr-x   - root admingroup          0 2020-08-31 14:38 /yinzhengjie
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=LISTSTATUS"        #查看"/yinzhengjie"目录的状态
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 07:51:31 GMT
Date: Mon, 31 Aug 2020 07:51:31 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 07:51:31 GMT
Date: Mon, 31 Aug 2020 07:51:31 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Transfer-Encoding: chunked

{"FileStatuses":{"FileStatus":[
{"accessTime":1598855175268,"blockSize":536870912,"childrenNum":0,"fileId":16489,"group":"admingroup","length":490,"modificationTime":1598855175823,"owner":"root","pathSuffix":"fstab","perm
ission":"644","replication":3,"storagePolicy":0,"type":"FILE"},{"accessTime":1598859477240,"blockSize":536870912,"childrenNum":0,"fileId":16484,"group":"admingroup","length":371,"modificationTime":1597999554986,"owner":"root","pathSuffix":"hosts","perm
ission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=LISTSTATUS"        #查看"/yinzhengjie"目录的状态

3>.检查HDFS文件的状态

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=GETFILESTATUS" ;echo       #查看"/yinzhengjie/hosts"文件的状态
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 07:58:53 GMT
Date: Mon, 31 Aug 2020 07:58:53 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 07:58:53 GMT
Date: Mon, 31 Aug 2020 07:58:53 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Transfer-Encoding: chunked

{"FileStatus":{"accessTime":1598859477240,"blockSize":536870912,"childrenNum":0,"fileId":16484,"group":"admingroup","length":371,"modificationTime":1597999554986,"owner":"root","pathSuffix"
:"","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=GETFILESTATUS" ;echo       #查看"/yinzhengjie/hosts"文件的状态

4>.创建目录

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - root admingroup          0 2020-08-21 16:40 /bigdata
drwxr-xr-x   - root admingroup          0 2020-08-20 19:26 /system
drwx------   - root admingroup          0 2020-08-14 19:19 /user
drwxr-xr-x   - root admingroup          0 2020-08-31 16:17 /yinzhengjie
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -X PUT "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?user.name=root&op=MKDIRS&permissions=751" ;echo     #创建"/yinzhengjie/webHDFS"目录
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 08:14:10 GMT
Date: Mon, 31 Aug 2020 08:14:10 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 08:14:10 GMT
Date: Mon, 31 Aug 2020 08:14:10 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1598897650918&s=rp1JdtIpaV59fm8TFisjCUMH3ARerDWzI4oL+jCezrs="; Path=/; HttpOnly
Transfer-Encoding: chunked

{"boolean":true}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 3 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
drwxr-xr-x   - root admingroup          0 2020-08-31 16:14 /yinzhengjie/webHDFS
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -X PUT "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?user.name=root&op=MKDIRS&permissions=751" ;echo 

5>.创建并写入数据到文件

  我使用的是"Hadoop 2.10.0"版本,在尝试使用webhdfs官方的方法创建文件或者往已有的文件追加内容均失败了,官方提供的2个方法需要发送2次HTTP请求,但我在测试多次均无法创建,若有成功的小伙伴请不吝赐教。

  参考连接:
    https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File
    https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File

6>.删除目录或文件

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 3 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
drwxr-xr-x   - root admingroup          0 2020-08-31 18:07 /yinzhengjie/webHDFS
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE  "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?op=DELETE&user.name=root";echo     #删除目录
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 10:07:56 GMT
Date: Mon, 31 Aug 2020 10:07:56 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 10:07:56 GMT
Date: Mon, 31 Aug 2020 10:07:56 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1598904476157&s=4aHgz6EwyJfdmjlwOtkXs+8Je94BybNxDUYoon7FIWE="; Path=/; HttpOnly
Transfer-Encoding: chunked

{"boolean":true}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?op=DELETE&user.name=root";echo     #删除目录
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 2 items
-rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
-rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE  "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/fstab?op=DELETE&user.name=root";echo       #删除文件
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 10:08:52 GMT
Date: Mon, 31 Aug 2020 10:08:52 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 10:08:52 GMT
Date: Mon, 31 Aug 2020 10:08:52 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1598904532486&s=MCjvGp705lVZcZx7hc5UCeERNoRDGC5rsW5E/USXi6c="; Path=/; HttpOnly
Transfer-Encoding: chunked

{"boolean":true}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
Found 1 items
-rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/fstab?op=DELETE&user.name=root";echo       #删除文件

7>.检查目录配额

[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /yinzhengjie
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf            none             inf            1            2                742 /yinzhengjie
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i   "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=GETCONTENTSUMMARY" ;echo 
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 10:30:13 GMT
Date: Mon, 31 Aug 2020 10:30:13 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 10:30:13 GMT
Date: Mon, 31 Aug 2020 10:30:13 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Transfer-Encoding: chunked

{"ContentSummary":{"directoryCount":1,"fileCount":2,"length":742,"quota":-1,"spaceConsumed":29631,"spaceQuota":-1,"typeQuota":{}}}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -setSpaceQuota 10g /yinzhengjie/
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -setQuota 50 /yinzhengjie/
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /yinzhengjie
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
          50              47            10 G          10.0 G            1            2                742 /yinzhengjie
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i   "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=GETCONTENTSUMMARY" ;echo 
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 31 Aug 2020 10:30:52 GMT
Date: Mon, 31 Aug 2020 10:30:52 GMT
Pragma: no-cache
Expires: Mon, 31 Aug 2020 10:30:52 GMT
Date: Mon, 31 Aug 2020 10:30:52 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Transfer-Encoding: chunked

{"ContentSummary":{"directoryCount":1,"fileCount":2,"length":742,"quota":50,"spaceConsumed":29631,"spaceQuota":10737418240,"typeQuota":{}}}
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# curl -i "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=GETCONTENTSUMMARY" ;echo

8>.其它操作

  博主推荐阅读:
    https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

原文地址:https://www.cnblogs.com/yinzhengjie2020/p/13352498.html