60 网络编程(二)——URL

认识URI、URL、URN

详细请参考:https://blog.51cto.com/xoyabc/1905492

URI:uniform resource Indent 统一资源标识符

URL:uniform resource locator 统一资源定位符

URN:统一资源名称

它们的关系如:

URL

我们学习java网络编程最常用的类就是URL。

一个完整的URL由:protocol、host、port、path、parameter、anchor(锚点)组成

代码测试:

package _20191213;
import java.net.MalformedURLException;
import java.net.URL;
/**
 * URL测试类
 * @author TEDU
 *
 */
public class URLTest {
	public static void main(String[] args) throws MalformedURLException {
		URL url = new URL("https://www.cnblogs.com/Scorpicat/category/1596649.html");
		System.out.println(url.getProtocol());
		System.out.println(url.getFile());
		System.out.println(url.getAuthority());
		System.out.println(url.getDefaultPort());
		System.out.println(url.getPort());
		System.out.println(url.getQuery());
		System.out.println(url.getHost());
		System.out.println(url.getRef());
		System.out.println(url.getUserInfo());
	}
}

  

运行结果:

https
/Scorpicat/category/1596649.html
www.cnblogs.com
443
-1
null
www.cnblogs.com
null
null

通过URL与IO流爬取一张网页的数据

运行后将会生成一个web.txt文件,存储有目标地址的网页数据。

package _20191213;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

public class DownloadAWebPage {
	public static void main(String[] args) throws IOException {
		//目标地址
		URL url = new URL("https://gy.anjuke.com/?pi=navi-tencent-qq-mz");
		//流创建:选择源,选择流,读取,关闭
		BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(),"utf-8"));
		BufferedWriter bw = new BufferedWriter(new FileWriter(new File("web.txt")));
		char[] cbuf = new char[1024*8];
		String content;
		while((content = br.readLine())!=null) {
			System.out.println(content);
			bw.write(content);
			bw.newLine();
			bw.flush();
		}
		bw.close();
		br.close();
	}
}

  

原文地址:https://www.cnblogs.com/Scorpicat/p/12035835.html