HttpClient基础教程

1、HttpClient相关的重要资料

官方网站：http://hc.apache.org/

API：http://hc.apache.org/httpcomponents-client-4.3.x/httpclient/apidocs/index.html

tutorial: http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/index.html 【PDF版本】http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/pdf/httpclient-tutorial.pdf

2、HttpClient有2个版本

org.apache.http.impl.client.HttpClients 与 org.apache.commons.httpclient.HttpClient

目前后者已被废弃，apache已不再支持。

一般而言，使用HttpClient均需导入httpclient.jar与httpclient-core.jar2个包。

3、使用HttpClient进行网络处理的基本步骤

（1）通过get的方式获取到Response对象。

[java]view
 plaincopy

CloseableHttpClient httpClient = HttpClients.createDefault();  

HttpGet httpGet = new HttpGet("http://www.baidu.com/");  

CloseableHttpResponse response = httpClient.execute(httpGet);  

注意，必需要加上http://的前缀，否则会报：Target host is null异常。

（2）获取Response对象的Entity。

[java]view
 plaincopy

HttpEntity entity = response.getEntity();  

注：HttpClient将Response的正文及Request的POST/PUT方法中的正文均封装成一个HttpEntity对象。可以通过entity.getContenType()，entity.getContentLength()等方法获取到正文的相关信息。但最重要的方法是通过getContent()获取到InputStream对象。

（3）通过Entity获取到InputStream对象，然后对返回内容进行处理。

[java]view
 plaincopy

is = entity.getContent();  

sc = new Scanner(is);  

// String filename = path.substring(path.lastIndexOf('/')+1);  

String filename = "2.txt";  

os = new PrintWriter(filename);  

while (sc.hasNext()) {  

    os.write(sc.nextLine());  

}

使用HtppClient下载一个网页的完整代码如下：

[java]view
 plaincopy

package com.ljh.test;  

import java.io.IOException;  

import java.io.InputStream;  

import java.io.PrintWriter;  

import java.io.Writer;  

import java.util.Scanner;  

import org.apache.http.HttpEntity;  

import org.apache.http.HttpStatus;  

import org.apache.http.client.ClientProtocolException;  

import org.apache.http.client.methods.CloseableHttpResponse;  

import org.apache.http.client.methods.HttpGet;  

import org.apache.http.impl.client.CloseableHttpClient;  

import org.apache.http.impl.client.HttpClients;  

public class DownloadWebPage{  

    public static void downloadPagebyGetMethod() throws IOException {  

        // 1、通过HttpGet获取到response对象  

        CloseableHttpClient httpClient = HttpClients.createDefault();  

        HttpGet httpGet = new HttpGet("http://www.baidu.com/");  

        CloseableHttpResponse response = httpClient.execute(httpGet);  

        InputStream is = null;  

        Scanner sc = null;  

        Writer os = null;  

        if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {  

            try {  

                // 2、获取response的entity。  

                HttpEntity entity = response.getEntity();  

                // 3、获取到InputStream对象，并对内容进行处理  

                is = entity.getContent();  

                sc = new Scanner(is);  

                // String filename = path.substring(path.lastIndexOf('/')+1);  

                String filename = "2.txt";  

                os = new PrintWriter(filename);  

                while (sc.hasNext()) {  

                    os.write(sc.nextLine());  

                }  

            } catch (ClientProtocolException e) {  

                e.printStackTrace();  

            } finally {  

                if (sc != null) {  

                    sc.close();  

                }  

                if (is != null) {  

                    is.close();  

                }  

                if (os != null) {  

                    os.close();  

                }  

                if (response != null) {  

                    response.close();  

                }  

            }  

        }  

    }  

    public static void main(String[] args) {  

        try {  

            downloadPagebyGetMethod();  

        } catch (IOException e) {  

            e.printStackTrace();  

        }  

    }  

}

注意：直接将HttpGet改为HttpPost，返回的结果有误，百度返回302状态，即重定向，新浪返回拒绝访问。怀疑大多网站均不允许POST方法直接访问网站。