Heritrix 3.1.0 源码解析(二十二)

本文继续分析Heritrix3.1.0系统的源码,其实本人感觉接下来待分析的问题不是一两篇文章能够澄清,本人不能因为迫于表述而乱了问题本身的章法,接下来的分析的Heritrix3.1.0系统封装HttpClient组件可能要分几篇文章来解析

我们知道,Heritrix3.1.0系统是通过封装HttpClient组件(里面封装了Socket)来与服务器通信的,Socket的输出流写入数据,输入流接收数据

那么Heritrix3.1.0系统是怎样封装Httpclient(Heritrix3.1.0系统是采用的以前的Apache版本)组件的呢?

我们可以看到,在FetchHTTP处理器里面有一段静态代码块,用于注册Socket工厂,分别用于HTTP通信与HTTPS通信协议(基于TCP协议通信,至于两者的关系本文就不再分析了,不懂的读者可以参考网络通信方面的教程)

/**
     * 注册http和https协议
     */
    static {
        Protocol.registerProtocol("http", new Protocol("http",
                new HeritrixProtocolSocketFactory(), 80));
        try {
            ProtocolSocketFactory psf = new HeritrixSSLProtocolSocketFactory();
            Protocol p = new Protocol("https", psf, 443); 
            Protocol.registerProtocol("https", p);
        } catch (KeyManagementException e) {
            e.printStackTrace();
        } catch (KeyStoreException e) {
            e.printStackTrace();
        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        }
    }

上面的两个类HeritrixProtocolSocketFactory和HeritrixSSLProtocolSocketFactory都实现了HttpClient组件的ProtocolSocketFactory接口,用于创建客户端Socket对象(HeritrixSSLProtocolSocketFactory类间接实现了ProtocolSocketFactory接口)

ProtocolSocketFactory接口定义了创建SOCKET对象的方法(package org.apache.commons.httpclient.protocol)

/**
 * A factory for creating Sockets.
 * 
 * <p>Both {@link java.lang.Object#equals(java.lang.Object) Object.equals()} and 
 * {@link java.lang.Object#hashCode() Object.hashCode()} should be overridden appropriately.  
 * Protocol socket factories are used to uniquely identify <code>Protocol</code>s and 
 * <code>HostConfiguration</code>s, and <code>equals()</code> and <code>hashCode()</code> are 
 * required for the correct operation of some connection managers.</p>
 * 
 * @see Protocol
 * 
 * @author Michael Becke
 * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a>
 * 
 * @since 2.0
 */
public interface ProtocolSocketFactory {

    /**
     * Gets a new socket connection to the given host.
     * 
     * @param host the host name/IP
     * @param port the port on the host
     * @param localAddress the local host name/IP to bind the socket to
     * @param localPort the port on the local machine
     * 
     * @return Socket a new socket
     * 
     * @throws IOException if an I/O error occurs while creating the socket
     * @throws UnknownHostException if the IP address of the host cannot be
     * determined
     */
    Socket createSocket(
        String host, 
        int port, 
        InetAddress localAddress, 
        int localPort
    ) throws IOException, UnknownHostException;

    /**
     * Gets a new socket connection to the given host.
     * 
     * @param host the host name/IP
     * @param port the port on the host
     * @param localAddress the local host name/IP to bind the socket to
     * @param localPort the port on the local machine
     * @param params {@link HttpConnectionParams Http connection parameters}
     * 
     * @return Socket a new socket
     * 
     * @throws IOException if an I/O error occurs while creating the socket
     * @throws UnknownHostException if the IP address of the host cannot be
     * determined
     * @throws ConnectTimeoutException if socket cannot be connected within the
     *  given time limit
     * 
     * @since 3.0
     */
    Socket createSocket(
        String host, 
        int port, 
        InetAddress localAddress, 
        int localPort,
        HttpConnectionParams params
    ) throws IOException, UnknownHostException, ConnectTimeoutException;

    /**
     * Gets a new socket connection to the given host.
     *
     * @param host the host name/IP
     * @param port the port on the host
     *
     * @return Socket a new socket
     *
     * @throws IOException if an I/O error occurs while creating the socket
     * @throws UnknownHostException if the IP address of the host cannot be
     * determined
     */
    Socket createSocket(
        String host, 
        int port
    ) throws IOException, UnknownHostException;

}

HeritrixProtocolSocketFactory类实现了上面的ProtocolSocketFactory接口(用于HTTP通信)

public class HeritrixProtocolSocketFactory implements ProtocolSocketFactory {
    /**
     * Constructor.
     */
    public HeritrixProtocolSocketFactory() {
        super();
    }
    @Override
    public Socket createSocket(String host, int port, InetAddress localAddress,
            int localPort) throws IOException, UnknownHostException {
        // TODO Auto-generated method stub
        return new Socket(host, port, localAddress, localPort);
    }
    @Override
    public Socket createSocket(String host, int port, InetAddress localAddress,
            int localPort, HttpConnectionParams params) throws IOException,
            UnknownHostException, ConnectTimeoutException {
        // TODO Auto-generated method stub
        // Below code is from the DefaultSSLProtocolSocketFactory#createSocket
        // method only it has workarounds to deal with pre-1.4 JVMs.  I've
        // cut these out.
        if (params == null) {
            throw new IllegalArgumentException("Parameters may not be null");
        }
        Socket socket = null;
        int timeout = params.getConnectionTimeout();
        if (timeout == 0) {
            socket = createSocket(host, port, localAddress, localPort);
        } else {
            socket = new Socket();
            
            InetAddress hostAddress;
            Thread current = Thread.currentThread();
            if (current instanceof HostResolver) {
                HostResolver resolver = (HostResolver)current;
                hostAddress = resolver.resolve(host);
            } else {
                hostAddress = null;
            }
            InetSocketAddress address = (hostAddress != null)?
                    new InetSocketAddress(hostAddress, port):
                    new InetSocketAddress(host, port);
            socket.bind(new InetSocketAddress(localAddress, localPort));
            try {
                socket.connect(address, timeout);
            } catch (SocketTimeoutException e) {
                // Add timeout info. to the exception.
                throw new SocketTimeoutException(e.getMessage() +
                    ": timeout set at " + Integer.toString(timeout) + "ms.");
            }
            assert socket.isConnected(): "Socket not connected " + host;
        }
        return socket;
    }
    @Override
    public Socket createSocket(String host, int port) throws IOException,
            UnknownHostException {
        // TODO Auto-generated method stub
        return new Socket(host, port);
    }
    /**
     * All instances of DefaultProtocolSocketFactory are the same.
     * @param obj Object to compare.
     * @return True if equal
     */
    public boolean equals(Object obj) {
        return ((obj != null) &&
            obj.getClass().equals(HeritrixProtocolSocketFactory.class));
    }

    /**
     * All instances of DefaultProtocolSocketFactory have the same hash code.
     * @return Hash code for this object.
     */
    public int hashCode() {
        return HeritrixProtocolSocketFactory.class.hashCode();
    }

}

HeritrixSSLProtocolSocketFactory类通过SecureProtocolSocketFactory实现SecureProtocolSocketFactory接口(间接实现了ProtocolSocketFactory接口)用于HTTPS通信

SecureProtocolSocketFactory接口方法如下

/**
 * A ProtocolSocketFactory that is secure.
 * 
 * @see org.apache.commons.httpclient.protocol.ProtocolSocketFactory
 * 
 * @author Michael Becke
 * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a>
 * @since 2.0
 */
public interface SecureProtocolSocketFactory extends ProtocolSocketFactory {

    /**
     * Returns a socket connected to the given host that is layered over an
     * existing socket.  Used primarily for creating secure sockets through
     * proxies.
     * 
     * @param socket the existing socket 
     * @param host the host name/IP
     * @param port the port on the host
     * @param autoClose a flag for closing the underling socket when the created
     * socket is closed
     * 
     * @return Socket a new socket
     * 
     * @throws IOException if an I/O error occurs while creating the socket
     * @throws UnknownHostException if the IP address of the host cannot be
     * determined
     */
    Socket createSocket(
        Socket socket, 
        String host, 
        int port, 
        boolean autoClose
    ) throws IOException, UnknownHostException;              

}

HeritrixSSLProtocolSocketFactory类实现上面的SecureProtocolSocketFactory接口

/**
 * Implementation of the commons-httpclient SSLProtocolSocketFactory so we
 * can return SSLSockets whose trust manager is
 * {@link org.archive.httpclient.ConfigurableX509TrustManager}.
 * 
 * We also go to the heritrix cache to get IPs to use making connection.
 * To this, we have dependency on {@link HeritrixProtocolSocketFactory};
 * its assumed this class and it are used together.
 * See {@link HeritrixProtocolSocketFactory#getHostAddress(ServerCache,String)}.
 *
 * @author stack
 * @version $Id: HeritrixSSLProtocolSocketFactory.java 6637 2009-11-10 21:03:27Z gojomo $
 * @see org.archive.httpclient.ConfigurableX509TrustManager
 */
public class HeritrixSSLProtocolSocketFactory implements SecureProtocolSocketFactory {
    // static final String SERVER_CACHE_KEY = "heritrix.server.cache";
    static final String SSL_FACTORY_KEY = "heritrix.ssl.factory";
    /***
     * Socket factory with default trust manager installed.
     */
    private SSLSocketFactory sslDefaultFactory = null;
    
    /**
     * Shutdown constructor.
     * @throws KeyManagementException
     * @throws KeyStoreException
     * @throws NoSuchAlgorithmException
     */
    public HeritrixSSLProtocolSocketFactory()
    throws KeyManagementException, KeyStoreException, NoSuchAlgorithmException{
        // Get an SSL context and initialize it.
        SSLContext context = SSLContext.getInstance("SSL");

        // I tried to get the default KeyManagers but doesn't work unless you
        // point at a physical keystore. Passing null seems to do the right
        // thing so we'll go w/ that.
        context.init(null, new TrustManager[] {
            new ConfigurableX509TrustManager(
                ConfigurableX509TrustManager.DEFAULT)}, null);
        this.sslDefaultFactory = context.getSocketFactory();
    }
    @Override
    public Socket createSocket(String host, int port, InetAddress clientHost,
        int clientPort)
    throws IOException, UnknownHostException {
        return this.sslDefaultFactory.createSocket(host, port,
            clientHost, clientPort);
    }
    @Override
    public Socket createSocket(String host, int port)
    throws IOException, UnknownHostException {
        return this.sslDefaultFactory.createSocket(host, port);
    }
    @Override
    public synchronized Socket createSocket(String host, int port,
        InetAddress localAddress, int localPort, HttpConnectionParams params)
    throws IOException, UnknownHostException {
        // Below code is from the DefaultSSLProtocolSocketFactory#createSocket
        // method only it has workarounds to deal with pre-1.4 JVMs.  I've
        // cut these out.
        if (params == null) {
            throw new IllegalArgumentException("Parameters may not be null");
        }
        Socket socket = null;
        int timeout = params.getConnectionTimeout();
        if (timeout == 0) {
            socket = createSocket(host, port, localAddress, localPort);
        } else {
            SSLSocketFactory factory = (SSLSocketFactory)params.
                getParameter(SSL_FACTORY_KEY);//SSL_FACTORY_KEY
            SSLSocketFactory f = (factory != null)? factory: this.sslDefaultFactory;
            socket = f.createSocket();
            
            Thread current = Thread.currentThread();
            InetAddress hostAddress;
            if (current instanceof HostResolver) {
                HostResolver resolver = (HostResolver)current;
                hostAddress = resolver.resolve(host);
            } else {
                hostAddress = null;
            }
            InetSocketAddress address = (hostAddress != null)?
                    new InetSocketAddress(hostAddress, port):
                    new InetSocketAddress(host, port);
            socket.bind(new InetSocketAddress(localAddress, localPort));
            try {
                socket.connect(address, timeout);
            } catch (SocketTimeoutException e) {
                // Add timeout info. to the exception.
                throw new SocketTimeoutException(e.getMessage() +
                    ": timeout set at " + Integer.toString(timeout) + "ms.");
            }
            assert socket.isConnected(): "Socket not connected " + host;
        }
        return socket;
    }
    @Override
    public Socket createSocket(Socket socket, String host, int port,
        boolean autoClose)
    throws IOException, UnknownHostException {
        return this.sslDefaultFactory.createSocket(socket, host,
            port, autoClose);
    }
    
    public boolean equals(Object obj) {
        return ((obj != null) && obj.getClass().
            equals(HeritrixSSLProtocolSocketFactory.class));
    }

    public int hashCode() {
        return HeritrixSSLProtocolSocketFactory.class.hashCode();
    }
}

HTTPS通信的SOCKET对象是通过SSLSocketFactory sslDefaultFactory(SSLSocket工厂)对象创建的,为了创建SSLSocketFactory sslDefaultFactory对象

Heritrix3.1.0系统定义了X509TrustManager接口的实现类ConfigurableX509TrustManager(用于SSL通信,自动接收证书)

/**
 * A configurable trust manager built on X509TrustManager.
 *
 * If set to 'open' trust, the default, will get us into sites for whom we do
 * not have the CA or any of intermediary CAs that go to make up the cert chain
 * of trust.  Will also get us past selfsigned and expired certs.  'loose'
 * trust will get us into sites w/ valid certs even if they are just
 * selfsigned.  'normal' is any valid cert not including selfsigned.  'strict'
 * means cert must be valid and the cert DN must match server name.
 *
 * <p>Based on pointers in
 * <a href="http://jakarta.apache.org/commons/httpclient/sslguide.html">SSL
 * Guide</a>,
 * and readings done in <a
 * href="http://java.sun.com/j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction">JSSE
 * Guide</a>.
 *
 * <p>TODO: Move to an ssl subpackage when we have other classes other than
 * just this one.
 *
 * @author stack
 * @version $Id: ConfigurableX509TrustManager.java 6637 2009-11-10 21:03:27Z gojomo $
 */
public class ConfigurableX509TrustManager implements X509TrustManager
{
    /**
     * Logging instance.
     */
    protected static Logger logger = Logger.getLogger(
        "org.archive.httpclient.ConfigurableX509TrustManager");

    public static enum TrustLevel { 
        /**
         * Trust anything given us.
         *
         * Default setting.
         *
         * <p>See <a href="http://javaalmanac.com/egs/javax.net.ssl/TrustAll.html">
         *  e502. Disabling Certificate Validation in an HTTPS Connection</a> from
         * the java almanac for how to trust all.
         */
        OPEN,

        /**
         * Trust any valid cert including self-signed certificates.
         */
        LOOSE,
    
        /**
         * Normal jsse behavior.
         *
         * Seemingly any certificate that supplies valid chain of trust.
         */
        NORMAL,
    
        /**
         * Strict trust.
         *
         * Ensure server has same name as cert DN.
         */
        STRICT,
    }

    /**
     * Default setting for trust level.
     */
    public final static TrustLevel DEFAULT = TrustLevel.OPEN;

    /**
     * Trust level.
     */
    private TrustLevel trustLevel = DEFAULT;


    /**
     * An instance of the SUNX509TrustManager that we adapt variously
     * depending upon passed configuration.
     *
     * We have it do all the work we don't want to.
     */
    private X509TrustManager standardTrustManager = null;


    public ConfigurableX509TrustManager()
    throws NoSuchAlgorithmException, KeyStoreException {
        this(DEFAULT);
    }

    /**
     * Constructor.
     *
     * @param level Level of trust to effect.
     *
     * @throws NoSuchAlgorithmException
     * @throws KeyStoreException
     */
    public ConfigurableX509TrustManager(TrustLevel level)
    throws NoSuchAlgorithmException, KeyStoreException {
        super();
        TrustManagerFactory factory = TrustManagerFactory.
            getInstance(TrustManagerFactory.getDefaultAlgorithm());

        // Pass in a null (Trust) KeyStore.  Null says use the 'default'
        // 'trust' keystore (KeyStore class is used to hold keys and to hold
        // 'trusts' (certs)). See 'X509TrustManager Interface' in this doc:
        // http://java.sun.com
        // /j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction
        factory.init((KeyStore)null);
        TrustManager[] trustmanagers = factory.getTrustManagers();
        if (trustmanagers.length == 0) {
            throw new NoSuchAlgorithmException(TrustManagerFactory.
                getDefaultAlgorithm() + " trust manager not supported");
        }
        this.standardTrustManager = (X509TrustManager)trustmanagers[0];

        this.trustLevel = level;
    }
    @Override
    public void checkClientTrusted(X509Certificate[] certificates, String type)
    throws CertificateException {
        if (this.trustLevel.equals(TrustLevel.OPEN)) {
            return;
        }

        this.standardTrustManager.checkClientTrusted(certificates, type);
    }
    @Override
    public void checkServerTrusted(X509Certificate[] certificates, String type)
    throws CertificateException {
        if (this.trustLevel.equals(TrustLevel.OPEN)) {
            return;
        }

        try {
            this.standardTrustManager.checkServerTrusted(certificates, type);
            if (this.trustLevel.equals(TrustLevel.STRICT)) {
                logger.severe(TrustLevel.STRICT + " not implemented.");
            }
        } catch (CertificateException e) {
            if (this.trustLevel.equals(TrustLevel.LOOSE) &&
                certificates != null && certificates.length == 1)
            {
                    // If only one cert and its valid and it caused a
                    // CertificateException, assume its selfsigned.
                    X509Certificate certificate = certificates[0];
                    certificate.checkValidity();
            } else {
                // If we got to here, then we're probably NORMAL. Rethrow.
                throw e;
            }
        }
    }
    @Override
    public X509Certificate[] getAcceptedIssuers() {
        return this.standardTrustManager.getAcceptedIssuers();
    }
}

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/25/3042207.html

原文地址:https://www.cnblogs.com/chenying99/p/3042207.html