JAVA网络编程-URL和URI

RUL是可以唯一标识一个资源在Internet上的位置.URL是最常见的URI,即统一资源标识符.URI可以由资源的网络位置来标识资源(如URL)也可以由资源的名字,编号或其他特性来标识.(URI包括URL)例如网络上有一个主机的DNS是www.search.com它就是URI,我们可以通过http协议或ftp协议或https协议来访问这个资源站点,http://www.search.com就是URL.一个URI可能包含许多的URL.

URL类

public static void main(String[] args) throws Exception {//构造URL
        URL u1 = new URL("http://www.cnblogs.com/zumengjie/p/14959200.html");
        URL u2 = new URL("http","www.cnblogs.com","/zumengjie/p/14959200.html");
        URL u3 = new URL("http","www.cnblogs.com",80,"/zumengjie/p/14959200.html");
        URL u4 = new URL(u3,"14945701.html");
}
public static void main(String[] args) {//从URL获取流读取其内容(假设内容是文本)
        try {
            URL u1 = new URL("https://www.cnblogs.com/zumengjie/p/14959200.html");
            try (BufferedReader br = new BufferedReader(new InputStreamReader(u1.openStream()))) {
                String s = null;
                while ((s = br.readLine()) != null) {
                    System.out.println(s);
                }
            }
        } catch (Exception e) {
            System.out.println(e);
        }
}
public static void main(String[] args) {//通过设置代理或不设置代理的方式获取URLConnection,这个对象可以获取URL中更多的选项。
        try {
            URL u1 = new URL("https://www.cnblogs.com/zumengjie/p/14959200.html");
            URLConnection oc = u1.openConnection();//u1.openConnection(Proxy proxy);设置代理
            //oc.getOutputStream();
            //oc.getContentEncoding();
            //oc.getContentType();
            //....
            InputStream stream = oc.getInputStream();
            try (BufferedReader br = new BufferedReader(new InputStreamReader(stream))) {
                String s = null;
                while ((s = br.readLine()) != null) {
                    System.out.println(s);
                }
            }
        } catch (Exception e) {
            System.out.println(e);
        }
}
public static void main(String[] args) {
//getContent()的做法是,在从服务器获取的数据首部中查找Content-type字段,如果服务器没有使用MIME首部
//或发送了一个不熟悉的Content-type,getContent()会返回某种InputStream。否则则返回正确的Java类型。
try { URL u1 = new URL("https://img-pre.ivsky.com/img/tupian/pre/201910/01/dongman_meinv-004.jpg"); Object content = u1.getContent(); System.out.println(content.getClass().getName());//sun.awt.image.URLImageSource URL u2 = new URL("https://www.cnblogs.com/zumengjie/p/14959200.html"); Object content2 = u2.getContent(); System.out.println(content2.getClass().getName());//sun.net.www.protocol.http.HttpURLConnection$HttpInputStream } catch (Exception e) { System.out.println(e); } }

分解RUL

RUL由以下5部分组成:模式,也称协议。授权机构。路径。片段标识符,也称为ref或者说是网页里边的锚链接。查询字符串也叫参数。

例如在查询URL https://www.cnblogs.com/zumengjie/p/14959200.html?1=1#toc中,模式是https,授权机构是www.cnblogs.com路径是/zumengjie/p/14959200.html查询字符串是1=1片段标识符是#top。其中片段标识符和字符串不是必须有的。

查询机构可以细分为用户信息,主机和端口。例如在URL中 http://admin@www.blackstar.com:8080/中,授权机构是admin@www.blackstar.com:8080包含用户信息admin主机www.blackstar.com和端口8080。

public static void main(String[] args) {//获取各个组件
        try {
            URL u1 = new URL("http://admin@www.blackstar.com:8080/aa/?1=1#top");
            System.out.println(u1.getProtocol());//获取URL模式
            System.out.println(u1.getHost());//获取主机名
            System.out.println(u1.getPort());//获取端口号,若没有指定端口返回-1
            System.out.println(u1.getDefaultPort());//返回默认端口
            System.out.println(u1.getFile());//返回URL路径部分和查询字符串
            System.out.println(u1.getPath());//只返回路径部分不返回查询字符串
            System.out.println(u1.getRef());//返回锚链接
            System.out.println(u1.getQuery());//返回查询字符串,参数
            System.out.println(u1.getUserInfo());//返回位于模式之后主机之前的用户信息,一般的URL没有
            System.out.println(u1.getAuthority());//返回模式与路径之间
        } catch (Exception e) {
            System.out.println(e);
        }
}

相等性和比较

public static void main(String[] args) {//两个URL若解析的主机相同,协议相同,路径相同,参数相同,锚链接相同则返回true。只有用户信息可以不相同
try { URL u1 = new URL("https://admin@127.0.0.1/zumengjie/p/14897556.html?1=1#a5"); URL u2 = new URL("https://users@localhost/zumengjie/p/14897556.html?1=1#a5"); System.out.println(u2.equals(u1)); } catch (Exception e) { System.out.println(e); } }
//URL的equals()可能是一个阻塞的IO操作!应当尽量避免使用。

URI类

URL对象是对应网络获取的应用层协议的一个表示,而URI对象纯粹用于解析和处理字符串.URI类没有网络获取功能.尽管URL类有一些字符串解析方法,如getFile()和getRef()但其中很多方法都有问题,与相关规范所要求的行为不完全一致.正常情况下,假如你想下载一个URL的内容,应当使用URL类,如果想使用URL来完成标识而不是获取(例如表示一个XML命名空间)就应当使用URI类.二者都需要时,可以通过toURL()方法将URI转换称URL,还可以使用toURI()方法将URL转换为URI.

构建URI不会解析主机或路径是否存在.

public static void main(String[] args) throws Exception {
        
        //五种构造器
        //全路径
        URI u1 = new URI("https://www.cnblogs.com/zumengjie/p/14897556.html#a3?1=1");
        //模式,主机+路径,锚链接
        URI u2 = new URI("https","//www.cnblogs.com/zumengjie/p/14897556.html","#a3");
        //模式,主机,路径,锚链接
        URI u3 = new URI("https","www.cnblogs.com","/zumengjie/p/14897556.html","#a3");
        //模式,主机,路径,参数,锚链接
        URI u4 = new URI("https","www.cnblogs.com","/zumengjie/p/14897556.html","1=1","#a3");
        //模式,用户信息,主机,端口,路径,参数,锚链接
        URI u5 = new URI("https","user:dfsn","www.cnblogs.com",80,"/zumengjie/p/14897556.html","1=1","#a3");
        
        
        //通过静态方法创建URI
        URI u6 = URI.create("https://www.cnblogs.com/zumengjie/p/14897556.html#a3?1=1");
    
    }

解析URI的各个部分

public static void main(String[] args) throws Exception {
        URI u1 = new URI("https", "user:dfsn", "www.cnblogs.com", 80, "/zumengjie/p/14897556.html", "1=1", "#a3");
        System.out.println("----" + u1.getScheme());// https
        System.out.println("----" + u1.getSchemeSpecificPart());// //user:dfsn@www.cnblogs.com:80/zumengjie/p/14897556.html?1=1
        System.out.println("----" + u1.getRawSchemeSpecificPart());// //user:dfsn@www.cnblogs.com:80/zumengjie/p/14897556.html?1=1
        System.out.println("----" + u1.getFragment());// #a3
        System.out.println("----" + u1.getRawFragment());// %23a3
        System.out.println("----" + u1.isAbsolute());// true,若构造中模式参数是null则返回false
        System.out.println("----" + u1.isOpaque());// URI分层表示透明,返回false
        System.out.println("=========================");
        // 如果URI是透明的,如上创建的.则可以获取各个层次的URI.以下方法获取的结果是解码后的,例如#字符就是解码后的.
        System.out.println("----" + u1.getAuthority());// user:dfsn@www.cnblogs.com:80
        System.out.println("----" + u1.getFragment());// #a3
        System.out.println("----" + u1.getHost());// www.cnblogs.com
        System.out.println("----" + u1.getPath());// /zumengjie/p/14897556.html
        System.out.println("----" + u1.getPort());// 80 返回-1表示省略端口
        System.out.println("----" + u1.getQuery());// 1=1
        System.out.println("----" + u1.getUserInfo());// user:dfsn
        // 以下方法获取原始编码,未解码的.#号编码后是%23
        System.out.println("=======================");
        System.out.println("----" + u1.getRawAuthority());//user:dfsn@www.cnblogs.com:80
        System.out.println("----" + u1.getRawFragment());//%23a3
        System.out.println("----" + u1.getRawPath());// /zumengjie/p/14897556.html
        System.out.println("----" + u1.getRawQuery());//1=1
        System.out.println("----" + u1.getRawUserInfo());//user:dfsn
    }

解码URI

public static void main(String[] args) throws Exception {
        URI u1 = new URI("https://image.baidu.com/search/detail?ct=503316480&z=0&ipn=d&word=灵主图片&hs=0&pn=5&spn=0&di=440&pi=0&rn=1&tn=baiduimagedetail&is=0%2C0&ie=utf-8&oe=utf-8&cl=2&lm=-1&cs=2250253212%2C2891258082&os=3911047135%2C2284680691&simid=3285346737%2C265045608&adpicid=0&lpn=0&ln=30&fr=ala&fm=&sme=&cg=&bdtype=0&oriquery=%E7%81%B5%E4%B8%BB%E5%9B%BE%E7%89%87&objurl=https%3A%2F%2Fgimg2.baidu.com%2Fimage_search%2Fsrc%3Dhttp%3A%2F%2Fpic1.win4000.com%2Fpic%2F0%2F5d%2Fa38c5786c7_250_300.jpg%26refer%3Dhttp%3A%2F%2Fpic1.win4000.com%26app%3D2002%26size%3Df9999%2C10000%26q%3Da80%26n%3D0%26g%3D0n%26fmt%3Djpeg%3Fsec%3D1628343782%26t%3Dad65ecffa0156eb065679753e862e31b&fromurl=ippr_z2C%24qAzdH3FAzdH3Fooo_z%26e3Botg9aaa_z%26e3Bv54AzdH3F4pAzdH3Fi7w3twg2i7zitstg2zi7_z%26e3Bip4s&gsm=1&islist=&querylist=");    
        System.out.println(u1.toString());//原样输出
        System.out.println(u1.toASCIIString());//uri中的文字和符号转换ASCII
    }

URLEncoder

URLEncoder.encode()方法可以对字符串进行URL编码.对所有非字母,数字会转换称%序列(除空格,下划线,连字符,点号和星号符以外).它还会对所有的非ASCLL字符进行编码.空格转换为加号,波浪线,单引号,感叹号和圆括号转换为百分号转义字符,即使它们并不一定需要转换.

尽管这个方法允许指定字符集,但是最好只选择UTF-8.与你选择的其他编码方式相比,UTF-8与IRI规范,URL类,现代Web浏览器和其他软件更兼容.

public static void main(String[] args) throws Exception {
        System.out.println(URLEncoder.encode("This string has spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This*string*has*spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This%string%has%spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This+string+has+spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This/string/has/spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This"string"has"spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This:string:has:spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This~string~has~spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This(string)has(spaces)","UTF-8"));
        System.out.println(URLEncoder.encode("This.string.has.spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This=string=has=spaces","UTF-8"));
        System.out.println(URLEncoder.encode("This&string&has&spaces","UTF-8"));
        System.out.println(URLEncoder.encode("天下熙熙皆为利来,天下攘攘皆为利往.","UTF-8"));
    }
This+string+has+spaces
This*string*has*spaces
This%25string%25has%25spaces
This%2Bstring%2Bhas%2Bspaces
This%2Fstring%2Fhas%2Fspaces
This%22string%22has%22spaces
This%3Astring%3Ahas%3Aspaces
This%7Estring%7Ehas%7Espaces
This%28string%29has%28spaces%29
This.string.has.spaces
This%3Dstring%3Dhas%3Dspaces
This%26string%26has%26spaces
%E5%A4%A9%E4%B8%8B%E7%86%99%E7%86%99%E7%9A%86%E4%B8%BA%E5%88%A9%E6%9D%A5%2C%E5%A4%A9%E4%B8%8B%E6%94%98%E6%94%98%E7%9A%86%E4%B8%BA%E5%88%A9%E5%BE%80.

URLDecoder

    public static void main(String[] args) throws Exception {
        System.out.println(URLEncoder.encode("天下熙熙皆为利来,天下攘攘皆为利往.","UTF-8"));
        System.out.println(URLDecoder.decode("%E5%A4%A9%E4%B8%8B%E7%86%99%E7%86%99%E7%9A%86%E4%B8%BA%E5%88%A9%E6%9D%A5%2C%E5%A4%A9%E4%B8%8B%E6%94%98%E6%94%98%E7%9A%86%E4%B8%BA%E5%88%A9%E5%BE%80.", "UTF-8"));
    }
%E5%A4%A9%E4%B8%8B%E7%86%99%E7%86%99%E7%9A%86%E4%B8%BA%E5%88%A9%E6%9D%A5%2C%E5%A4%A9%E4%B8%8B%E6%94%98%E6%94%98%E7%9A%86%E4%B8%BA%E5%88%A9%E5%BE%80.
天下熙熙皆为利来,天下攘攘皆为利往.

 

原文地址:https://www.cnblogs.com/zumengjie/p/14963644.html