网络爬虫入门系列(3) httpClient

    上一篇文章介绍了  Jsoup设置请求头, 抓取网页的java 代码 

    这一篇文章介绍 httpClient 设置请求头 抓取网页的  java 代码实现

    首先  到官网 上 下载 httpClient   这里下载的 是  4.5.5版本的 

   http://mirror.bit.edu.cn/apache//httpcomponents/httpclient/binary/httpcomponents-client-4.5.5-bin.zip

    将

commons-logging-1.2.jar
httpclient-4.5.5.jar
httpcore-4.4.9.jar
httpmime-4.5.5.jar

   导入到项目中 ,

   新建一个类 , httpClientConnection

     编写如下代码 

        

public class httpClientConnection {
public static void main(String[] args) {
CloseableHttpClient httpclient = HttpClients.createDefault();

CloseableHttpResponse responseGet = null;
try {
// 以get方法执行请求
HttpGet httpGet = new HttpGet("http://www.cnblogs.com/szw-blog/p/8565944.html");
// 获得服务器响应的所有信息
responseGet = httpclient.execute(httpGet);
System.out.println(responseGet.getStatusLine());
// 获得服务器响应的消息体(不包括http head)
HttpEntity entity = responseGet.getEntity();

if (entity != null) {
// 获得响应字符集编码
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
InputStream is = entity.getContent();
// 将inputstream转化为reader,并使用缓冲读取,还可按行读取内容
BufferedReader br = new BufferedReader(
new InputStreamReader(is, charset));
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
is.close();
responseGet.close();
httpclient.close();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
responseGet.close();
httpclient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

运行后的结果是 

以上就是 java 使用httpclient 抓取网页的  简单代码

原文地址:https://www.cnblogs.com/szw-blog/p/8569925.html