爬虫爬网页代码

案例1: 解析传智播客官网的所有大学学历以上的所能报考的学科信息.

爬虫的具体流程:
1. 明确首页URL.
2. 发送请求, 获取数据.
3. 解析数据.
4. 释放资源.
*/
public class Demo01 {
public static void main(String[] args) throws Exception{
//1. 明确首页URL.
String indexUrl = "http://www.itcast.cn";

//2. 发送请求, 获取数据.
//HttpClient浏览器对象, get方式, post()方式.

//3. 解析数据.
Document document = Jsoup.connect(indexUrl).get();
//从Document DOM对象中获取具体的课程信息.
Elements lessNamesElements = document.select(".ulon > li > a"); //子元素选择器
//打印结果
for (Element lessNamesElement : lessNamesElements) {
//System.out.println(lessNamesElement.html()); //获取元素及内容, 即: 标签也能获取到
System.out.println(lessNamesElement.text()); //只获取元素内容, 即: 只获取文本
}

//4. 释放资源.