PHP用curl采集天猫详细页

代码如下

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://s.click.taobao.com/t?e=m%3D2%26s%3DItfnIoWePBscQipKwQzePOeEDrYVVa64LKpWJ%2Bin0XJRAdhuF14FMco7venWsqMa5x%2BIUlGKNpXfihkA92r7Zcnjyd38oaEmvvt5KfsX9OP1aKQAlGlgSeMqBIqCftrB');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
curl_close($ch);
preg_match('/^Location: (?P<location>.*?)$/m', $res,$match);
var_dump($match);

CURLOPT_HEADER 会返回头信息

CURLOPT_FOLLOWLOCATION 会一直根据跳转抓取新页面;

CURLOPT_RETURNRANSFER 这个用来定义是输出到页面还是赋值给变量

我curl采集天猫详情页:http://detail.tmall.com/item.htm?id=15670523848 这个网址,经常还跳转了9次,不可思议;

HTTP/1.1 302 Found
Server: Tengine
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: keep-alive
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d1
Cache-Control: 

HTTP/1.1 302 Found
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
Set-Cookie: _tb_token_=P0MPHCP5TfrL;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: cookie2=ed9f56e828c5d24c4a7d656bc23631db;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: t=5f0679cd08a7143456da77263771b8a9;domain=.taobao.com;Expires=Thu, 27-Feb-2014 04:16:49 GMT;Path=/
P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
Location: http://pass.tmall.com/add?_tb_token_=P0MPHCP5TfrL&cookie2=ed9f56e828c5d24c4a7d656bc23631db&t=5f0679cd08a7143456da77263771b8a9&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d1&pacc=emBUzSxUwN3zo8xNNtW6PQ==&opi=27.189.36.232&tmsc=1385698609459713

HTTP/1.1 302 Found
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
Set-Cookie: _tb_token_=P0MPHCP5TfrL;domain=.tmall.com;Path=/
Set-Cookie: cookie2=ed9f56e828c5d24c4a7d656bc23631db;domain=.tmall.com;Path=/
Set-Cookie: t=5f0679cd08a7143456da77263771b8a9;domain=.tmall.com;Path=/
Location: http://detail.tmall.com/item.htm?id=15670523848&tbpm=1

HTTP/1.1 302 Found
Server: Tengine
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: keep-alive
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d2
Cache-Control: 

HTTP/1.1 302 Found
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
Set-Cookie: _tb_token_=dmG3YeXiTDZl;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: cookie2=d4c7645aa3ac3e1e7e160e6f172cf82c;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: t=6d359ffa66dfa0c2cb8cd32d455718de;domain=.taobao.com;Expires=Thu, 27-Feb-2014 04:16:49 GMT;Path=/
P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
Location: http://pass.tmall.com/add?_tb_token_=dmG3YeXiTDZl&cookie2=d4c7645aa3ac3e1e7e160e6f172cf82c&t=6d359ffa66dfa0c2cb8cd32d455718de&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d2&pacc=ZBXxk-2FEQEv_91OVA1_mg==&opi=27.189.36.232&tmsc=1385698609736322

HTTP/1.1 302 Found
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
Set-Cookie: _tb_token_=dmG3YeXiTDZl;domain=.tmall.com;Path=/
Set-Cookie: cookie2=d4c7645aa3ac3e1e7e160e6f172cf82c;domain=.tmall.com;Path=/
Set-Cookie: t=6d359ffa66dfa0c2cb8cd32d455718de;domain=.tmall.com;Path=/
Location: http://detail.tmall.com/item.htm?id=15670523848&tbpm=2

HTTP/1.1 302 Found
Server: Tengine
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: keep-alive
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d3
Cache-Control: 

HTTP/1.1 302 Found
Date: Fri, 29 Nov 2013 04:16:49 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
Set-Cookie: _tb_token_=kCsYFWOg1JIp;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: cookie2=b5f8f3695f410dde27a42d6d251933ff;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: t=2290940aff64002f214e1a12480b41dd;domain=.taobao.com;Expires=Thu, 27-Feb-2014 04:16:49 GMT;Path=/
P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
Location: http://pass.tmall.com/add?_tb_token_=kCsYFWOg1JIp&cookie2=b5f8f3695f410dde27a42d6d251933ff&t=2290940aff64002f214e1a12480b41dd&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d3&pacc=eOoBCHB4T2s-kmCmd9qMcg==&opi=27.189.36.232&tmsc=1385698609985332

HTTP/1.1 302 Found
Date: Fri, 29 Nov 2013 04:16:50 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
Set-Cookie: _tb_token_=kCsYFWOg1JIp;domain=.tmall.com;Path=/
Set-Cookie: cookie2=b5f8f3695f410dde27a42d6d251933ff;domain=.tmall.com;Path=/
Set-Cookie: t=2290940aff64002f214e1a12480b41dd;domain=.tmall.com;Path=/
Location: http://detail.tmall.com/item.htm?id=15670523848&tbpm=3

HTTP/1.1 200 OK
Server: Tengine
Date: Fri, 29 Nov 2013 04:16:50 GMT
Content-Type: text/html;charset=GBK
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Cache-Control: max-age=1
At_Autype: 4_63237005
At_Cat: item_50013957
X-Category: /cat/50011397
At_Nick: %E7%A5%A5%E7%A5%AF%E7%A6%8F%E7%8F%A0%E5%AE%9D%E6%97%97%E8%88%B0%E5%BA%97
At_Itemid: 15670523848
At_Isb: 1
At_Pgty: 2
At_Cat: 50013957
At_Brid: 86306948
At_Prid: 222598572
At_Autype: 0_63237005
At_Auid: 15670523848
Content-Language: zh-CN
X-Cache: HIT TCP_MEM_HIT dirn:-2:-2
Via: wagbridge010238184026.cm4:8888
Age: 1330

curl确实挺强大的

原文地址:https://www.cnblogs.com/wangtongphp/p/3449585.html