KingPaper教你写采集

简单的几个函数介绍

file_get _contents();

preg_match();

首先获取你要采集的网页并采集内容

$str = file_get_contents("www.baidu.com");// 获取地址url

$regex = "/(<div class=\"content\" id=\"article\".*?><.*?>.*?<\/.*?><\/div>)/ism";//正则表达式

preg_match($regex,$str,$t);//正则匹配返回数组

print_r($t);打印数组查看内容

// 连接数据库

mysql_connect(“localhost”,"root","root");

mysql_select_db("test")

mysql_query("insert into test(id,content)values('',$t[0])")

采集多条的话将获取内容的正则封装成函数

function getcontent($url){

$str=file_get_contents($url);

$regex = "/(<div class=\"content\" id=\"article\".*?><.*?>.*?<\/.*?><\/div>)/ism";

preg_match($regex,$str, $t);

return $t[0];

}

然后将url放到数组内

$str="www.baidu.com/1.html\n www.baidu.com/2.html \n www.baidu.com/3.html\nwww.baidu.com/5.html";

$arr = explode("\n",$str);

foreach($arr as $val) {

$content=getcontent($val);

mysql_query("insert into test(id,content)values('',$content)");

}

欢迎光临我的网站夕越网 http://www.xiyue369.com

既然选择了独立，就要在人群中独立出来，成为佼佼者。