Anemone

An easy-to-use Ruby web spider framework

What is it?

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful.

What's new?

 

  • 02/17/2011 - Version 0.6.0 released. Added support for proxies, HTTP Basic Auth, and HTTP read timeout. Fixed a bug with double-encoding links and with erroring on a read timeout.
  • 09/01/2010 - GitHub Issue Tracker - The Anemone project issue tracker has moved from Lighthouse to GitHub Issues.
  • 09/01/2010 - Version 0.5.0 released. Added Redis and MongoDB page storage engines, and skip_query_strings option.

 

Where do I get it?

$ gem install anemone

You can also browse the code on GitHub.

How do I use it?

To get the most out of Anemone, read through the technical information and examples and the RDoc documentation.

You can use Anemone to write tasks to gather useful statistics on your websites. Just point Anemone at a URL, and it will crawl every page in that domain. You can also tell Anemone to skip pages that match certain regular expressions. Using blocks, you tell Anemone what code to run on every page, or after it's done crawling.

For example, to print the URL of every page on a site:

require 'anemone'

Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_every_page do |page|
      puts page.url
  end
end

Anemone also comes with a command-line frontend for several web-spider tasks. Just run 'anemone' on the command-line. The source for several example programs is in the lib/anemone/cli directory of the project.

Who wrote it?

Anemone is written and maintained by Chris Kite. Development is sponsored by Vertive, Inc., the creator ofOffers.com. The Anemone logo was created by Ismael Ayala.

Anemone is free to use under the terms of the MIT License.

I have a problem or a suggestion!

Check out the Anemone issue tracker, or contact the author.

原文地址:https://www.cnblogs.com/lexus/p/2429088.html