PhantomJS

Scrape javascript pages with PhantomJS

PhantomJS

http://www.phantomjs.org/



PhantomJS is a command-line tool based on Webkit. It can execute javascript and be used for testing of web-based applications, web scraping, pages capture, PDF converter, SVG renderer, and many other use cases.



The javascript file look likes:


console.log('Hello, world!');
phantom.exit();






It's a good tool for scraping dynamic page with javascript/ajax. To extracting a site, people familiar with javascript can write javascript script file using PhantomJS's Api and scrape the pages directly; Others can use PhantomJS and a simple javascript file open the pages and output the pages contents to PIPE or files, then use other tools or program languages to parse and scrape the result.



There are some examples: http://code.google.com/p/phantomjs/wiki/QuickStart

原文地址:https://www.cnblogs.com/lexus/p/2390376.html