hadoop-hdfs(三)

HDFS概念

1 数据块*

HDFS的一个数据块默认是64M,与元数据分开管理。

优点:

      数据块的大小设计的较大,所以寻址占传输的时间比例较小,只需要计算传输速度即可。

      便于简化管理,利于计算剩余空间、冗余备份(默认三个)

      与元数据分开管理,保持他本身无属性的特性。

2 nameNode,DataNode*

nameNode:

1 命名空间

2 维护文件系统树(命名空间镜像文件)与目录(编辑日志文件)(本地磁盘)

3 保存每个块的元数据信息

4 维护多个dataNode

备份策略:写入远程磁盘、两个NameNode同时运行

DataNode

1 文件系统的工作节点

2 定期向NameNode发送块列表

3 收到NameNode和Client的调度

3 外部接口

Thrift:Hadoop提供给外部非JAVA语言调用的接口

HTTP:网页监控

FTP:传输文件

4 JAVA接口

1 URL API读取

@Test
    public void input1() throws MalformedURLException, IOException {
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
        InputStream in = new URL("hdfs://192.168.1.100:9000/user/sunfan/input/file1.txt").openStream();
        byte[] buff = new byte[1024];
        int len;
        while (-1 != (len = in.read(buff))) {
            for (int i = 0; i < len; i++) {
                System.out.print((char) buff[i]);
            }
        }
        in.close();
    }

2 FileSystem API 读取FSDatainputStream流的使用(seek方法可以重新定位读取,和inputStream的skip不一样) *

    @Test
    public void input2() throws MalformedURLException, IOException {
        String uri = "hdfs://192.168.1.100:9000/user/sunfan/input/file1.txt";
        FileSystem fs = FileSystem.get(URI.create(uri), new Configuration());
        FSDataInputStream in = null;
        in = fs.open(new Path(uri));
        byte[] buff = new byte[1024];
        int len;
        while (-1 != (len = in.read(buff))) {
            for (int i = 0; i < len; i++) {
                System.out.print((char) buff[i]);
            }
        }
        in.seek(3);
        while (-1 != (len = in.read(buff))) {
            for (int i = 0; i < len; i++) {
                System.out.print((char) buff[i]);
            }
        }
        in.close();
    }

写入数据 FSDataOutPutStream

    @Test
    public void out3() throws IOException {
        String uri2 = "hdfs://192.168.1.100:9000/user/sunfan/input/file3.txt";
        FileSystem fs = FileSystem.get(URI.create(uri2), new Configuration());
        FSDataOutputStream out = fs.create(new Path(uri2));
        System.out.println(fs.exists(new Path(uri2)));
        out.write(97);
    }

本地文件的复制:注意这里重写Progressable来写进度条,用IOUtils.copy方法来复制

    @Test
    public void out3() throws IOException {
        long start = System.currentTimeMillis();
        FileInputStream in = new FileInputStream("C:\Users\sunfan\Desktop\copy.pdf");
        String uri2 = "hdfs://192.168.1.100:9000/user/sunfan/input/file3.txt";
        FileSystem fs = FileSystem.get(URI.create(uri2), new Configuration());
        FSDataOutputStream out = fs.create(new Path(uri2), new Progressable() {
            public void progress() {
                System.out.print(".");
            }
        });
        IOUtils.copyBytes(in, out, 4096, true);
        System.out.println(System.currentTimeMillis()-start);
    }

 读取文件的详细信息:通过fs.getFileStatus得到FileStatus

    @Test
    public void showFilesystem() throws IOException {
        String dir = "hdfs://192.168.1.100:9000/user/sunfan/input";
        FileSystem fs = FileSystem.get(URI.create("hdfs://192.168.1.100:9000"), new Configuration());
        FileStatus status = fs.getFileStatus(new Path(dir));
        System.out.println(status.getPermission());
    }

 读取文件列表:通过fs.listStatus获取FileStatus数组

    @Test
    public void showFilesystem2() throws IOException {
        String dir = "hdfs://192.168.1.100:9000/user/sunfan/input";
        FileSystem fs = FileSystem.get(URI.create(dir), new Configuration());
        FileStatus[] status = fs.listStatus(new Path("hdfs://192.168.1.100:9000/user/sunfan/input"));
        for (FileStatus fileStatus : status) {
            System.out.println(fileStatus.getPath());
        }
    }
    

 用正则读取文件:通过fs.globStatus读取

    @Test
    public void showFilesystem2() throws IOException {
        String dir = "hdfs://192.168.1.100:9000/user/sunfan/input";
        FileSystem fs = FileSystem.get(URI.create(dir), new Configuration());
        FileStatus[] status = fs.globStatus(new Path("hdfs://192.168.1.100:9000/user/sunfan/input/*"));
        for (FileStatus fileStatus : status) {
            System.out.println(fileStatus.getPath());
        }
    }
原文地址:https://www.cnblogs.com/sunfan1988/p/4296495.html