selenium的简单使用

安装可见 https://www.cnblogs.com/lfri/p/10542797.html

简单使用

百度自动搜索的例子：

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.baidu.com")

input = driver.find_element_by_css_selector('#kw')
input.send_keys("武汉")

button = driver.find_element_by_css_selector('#su')
button.click()

意思也很简单，向搜索框填入“武汉”，再点击搜索按钮。

selenium 提供了挺多方法给我们获取的页面元素

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

想要在页面获取多个元素呢，就可以这样：在element后面加个s

如果你觉得，find_element_by_xxx_xxx太长了，那么你还可以这样写：

driver.find_elements(By.ID, 'xxx')

ID = "id"
XPATH = "xpath"
LINK_TEXT = "link text"
PARTIAL_LINK_TEXT = "partial link text"
NAME = "name"
TAG_NAME = "tag name"
CLASS_NAME = "class name"
CSS_SELECTOR = "css selector"

例子：

<html>
<body>
 <form id="loginForm">
  <input name="username" type="text" />
  <input name="password" type="password" />
  <input class="login" name="continue" type="submit" value="Login" />
 </form>
</body>
<html>

我把它挂在github上了https://rogn.top/selenium.html

通过 id 获取 form 表单

login_form = driver.find_element_by_id('loginForm')

通过 name 获取相应的输入框

username = driver.find_element_by_name('username')
password = driver.find_element_by_name('password')

通过 xpath 获取表单

login_form = driver.find_element_by_xpath("/html/body/form[1]")
login_form = driver.find_element_by_xpath("//form[1]")
login_form = driver.find_element_by_xpath("//form[@id='loginForm']")

通过标签获取相应的输入框

input1 = driver.find_element_by_tag_name('input')

通过 class 获取相应的元素

login = driver.find_element_by_class_name('login')

与爬虫有关的

通过 driver = webdriver.Chrome() 拿到浏览器对象后，

获取请求链接

driver.current_url

获取 cookies

driver.get_cookies()

获取源代码

driver.page_source

获取文本的值

input.text

无界面浏览器

本来向用PhantomJS，但是出现如下报错：“warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless”，意思就是selenium已经放弃Phantomjs,建议使用谷歌或者火狐的无界面浏览器。

一种方法是使用旧版本的selenium，另一种是使用headless方式。这里将使用Chrome的无界面浏览器。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('log-level=3')  # 设置chromedriver的日志级别
#driver = webdriver.PhantomJS()
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://rogn.top/selenium.html")

print(driver.page_source)
driver.close()

参考链接：

1. python爬虫09 | 上来，自己动！这就是 selenium 的牛逼之处

2. selenium使用报错:UserWarning: Selenium support for PhantomJS has been deprecated