一次我使用如下程序连接到网易,意图获取其网站的html文本:
try { String urlPath = "http://www.163.com/"; URL url = new URL(urlPath); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.connect(); int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { InputStream inputStream = connection.getInputStream(); File dir = new File("D:\logs\"); if (!dir.exists()) { dir.mkdirs(); } File file = new File(dir, "163.txt"); FileOutputStream fos = new FileOutputStream(file); byte[] buf = new byte[1024 * 8]; int len = -1; while ((len = inputStream.read(buf)) != -1) { fos.write(buf, 0, len); } fos.flush(); fos.close(); }else { System.out.println("download file failed because responseCode="+responseCode); } } catch (Exception e) { e.printStackTrace(); }
但是,实质性代码没有进去,而是进去了else分支,原因是返回码是503。
503是服务器未准备好的意思,但是我用浏览器访问网易是正常的,于是我想有以下可能:
1.网易采用了防爬机制,得在头信息里加入浏览器信息以绕过。
2.未必是网易给我返回的503,中途路由一样可以给我返回。
经测试后,发现头信息加入浏览器信息无效。
这时想浏览器里有代理设置,HttpUrlConnection没有代理怎么可以上网呢,于是在代码开头处加入了代理;
// SetProxy System.setProperty("http.proxyHost", "pkg.proxy.prod.jp.local"); System.setProperty("http.proxyPort", "10080");
然后测试就顺利通过了。
下面是全部代码,供大家参考:
package urlconn; import java.io.File; import java.io.FileOutputStream; import java.io.InputStream; import java.net.HttpURLConnection; import java.net.URL; public class DownloadFileTest { public static void main(String[] args) { try { // SetProxy System.setProperty("http.proxyHost", "pkg.proxy.prod.jp.local"); System.setProperty("http.proxyPort", "10080"); String urlPath = "http://www.163.com/"; URL url = new URL(urlPath); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.connect(); int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { InputStream inputStream = connection.getInputStream(); File dir = new File("D:\logs\"); if (!dir.exists()) { dir.mkdirs(); } File file = new File(dir, "163.txt"); FileOutputStream fos = new FileOutputStream(file); byte[] buf = new byte[1024 * 8]; int len = -1; while ((len = inputStream.read(buf)) != -1) { fos.write(buf, 0, len); } fos.flush(); fos.close(); }else { System.out.println("download file failed because responseCode="+responseCode); } } catch (Exception e) { e.printStackTrace(); } } }
--2020-03-03--