网络爬虫入门系列(3) httpClient

上一篇文章介绍了 Jsoup设置请求头，抓取网页的java 代码

这一篇文章介绍 httpClient 设置请求头抓取网页的 java 代码实现

首先到官网上下载 httpClient 这里下载的是 4.5.5版本的

http://mirror.bit.edu.cn/apache//httpcomponents/httpclient/binary/httpcomponents-client-4.5.5-bin.zip

将

commons-logging-1.2.jar
httpclient-4.5.5.jar
httpcore-4.4.9.jar
httpmime-4.5.5.jar

导入到项目中 ,

新建一个类 , httpClientConnection

编写如下代码

public class httpClientConnection {
public static void main(String[] args) {
CloseableHttpClient httpclient = HttpClients.createDefault();

CloseableHttpResponse responseGet = null;
try {
// 以get方法执行请求
HttpGet httpGet = new HttpGet("http://www.cnblogs.com/szw-blog/p/8565944.html");
// 获得服务器响应的所有信息
responseGet = httpclient.execute(httpGet);
System.out.println(responseGet.getStatusLine());
// 获得服务器响应的消息体（不包括http head）
HttpEntity entity = responseGet.getEntity();

if (entity != null) {
// 获得响应字符集编码
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
InputStream is = entity.getContent();
// 将inputstream转化为reader，并使用缓冲读取，还可按行读取内容
BufferedReader br = new BufferedReader(
new InputStreamReader(is, charset));
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
is.close();
responseGet.close();
httpclient.close();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
responseGet.close();
httpclient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

运行后的结果是

以上就是 java 使用httpclient 抓取网页的简单代码