Selenium库的使用

selenium是一套完整的web应用程序测试系统，包含了测试的录制（selenium IDE）,编写及运行（Selenium Remote Control）和测试的并行处理（Selenium Grid）.Selenium的核心Selenium Core基于JsUnit,完全由JavaScript编写，因此可以用于任何支持JavaScript的浏览器上，并且可以模拟浏览器做出相应的JS事件，比如：下拉滚动条，点击事件等。

Selenium可以模拟真实浏览器，自动化测试工具，支持多种浏览器，爬虫中主要用来解决Javascript渲染问题。

1.selenium的基本使用

用python写爬虫的时候，主要用的是 selenium 的 webdriver ,可以通过一下命令来查看selenium.webdriver支持哪些浏览器

from selenium import webdriver

help(webdriver)

结果为

Help on package selenium.webdriver in selenium:

NAME
    selenium.webdriver

DESCRIPTION
    # Licensed to the Software Freedom Conservancy (SFC) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The SFC licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  You may obtain a copy of the License at
    #
    #   http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing,
    # software distributed under the License is distributed on an
    # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    # KIND, either express or implied.  See the License for the
    # specific language governing permissions and limitations
    # under the License.

PACKAGE CONTENTS
    android (package)
    blackberry (package)
    chrome (package)
    common (package)
    edge (package)
    firefox (package)
    ie (package)
    opera (package)
    phantomjs (package)
    remote (package)
    safari (package)
    support (package)
    webkitgtk (package)

VERSION
    3.14.0

可以看出基 PACKAGE CONTENTS 本支持了常见的所有浏览器。

这里要说一下比较重要的PhantomJS,PhantomJS是一个而基于WebKit的服务端JavaScript API,支持Web而不需要浏览器支持，其快速、原生支持各种Web标准：Dom处理，CSS选择器，JSON等等。PhantomJS可以用用于页面自动化、网络监测、网页截屏，以及无界面测试。

声明浏览器对象

from selenium import webdriver

browser = webdriver.Chrome()//谷歌浏览器
browser = webdriver.Firefox()//火狐浏览器

访问页面

from selenium imort webdriver

browser = webdriver.Chrome()

browser.get("http://www.baidu.com")
print(browser.page_sourse)
browser.close()

上述代码运行后，会自动打开Chrome浏览器，并登陆百度打印百度首页的源代码，然后关闭浏览器。

查找元素

from selenium import webdriver
browser = webdriver.Chrome()

通过id定位元素：
browser.find_element_by_id()

通过name定位元素:
browser.find_element_by_name()

通过tag_name(标签)定位元素：
browser.find_element_by_tag_name()

通过class_name定位元素：
browser.find_element_by_class_name()

通过css定位元素：
browser.find_element_by_css_selector("input[id='app']")

通过xpath定位元素：
XPath是一种在XML文档中定位元素的语言，由于HTML文档本身就是一个标准的XML页面，所以可以使用XPath语法来定位页面元素.

假设网页源代码如下：

<html>
    <body>
        <form id="login">
            <input name="username" type="text" />
            <input name="password" type="password" />
            <input type="submit" value="login" />
        </form>
    </body>
<html>

browser.find_element_by_xpath()

通过link定位：find_element_by_link_text(“text_vaule”)或者find_element_by_partial_link_text()
适用于页面中出现的文字链接

browser.find_element_by_link_text("登录").click() #点击登录链接
browser.find_element_by_partial_link_text("登").click()#只用了链接中的部分文字