perl HTML::LinkExtor模块(1)

 1 use LWP::Simple;
 2 use HTML::LinkExtor;
 3 
 4 $html = get("http://www.baidu.com");
 5 $link = HTML::LinkExtor->new(&check);
 6 $link->parse($html);
 7 
 8 sub check{
 9     ($tag, %links) = @_;
10     print "$tag
";
11     foreach $key(keys %links){
12         print "$key -> $links{$key}
";
13     }
14 }
15 
16 #$tag为标签类型， 如a, link, img, script等
17 #%links为hash类型， 键为链接名，值为链接值
18 #比如对于a标签， links中的key为href, 值为href中的链接名
19 # link
20 # href -> /favicon.ico
21 # link
22 # href -> /content-search.xml
23 # link
24 # href -> //www.baidu.com/img/baidu.svg
25 # link
26 # href -> //s1.bdstatic.com
27 # link
28 # href -> //t1.baidu.com
29 # link
30 # href -> //t2.baidu.com
31 # link
32 # href -> //t3.baidu.com
33 # link
34 # href -> //t10.baidu.com
35 # link
36 # href -> //t11.baidu.com
37 # link
38 # href -> //t12.baidu.com
39 # link
40 # href -> //b1.bdstatic.com
41 # img
42 # src -> //www.baidu.com/img/bd_logo1.png

这个代码打印页面中的所有标签名与对应的link链接地址

如果我们要打印其中的所有img地址呢，那我们可能用$tag来判断是哪种标签，从而再进一步提取数据

具体可以看这里: perl HTML::LinkExtor模块(2)