whalebot C/C++ web crawler

Summary

Whalebot is open-source web crawler. It is intended to be simple, fast and memory efficient. It was created as a targeted spider, but you may use it as common.

Current release 0.02

Current state. Bold - done, normal - TODO

If something broken or you have an idea, please visit http://groups.google.com/group/whalebot

Usages

  • It was used for collecting papers on target thematic from http://citeseerx.ist.psu.edu for my master degree work
  • Candidates for logo were collected using whalebot
  • Eating own dogs food (links for url parsing benchmark)

Features

  • Simple configuration from command line and text files
  • Start/Stop/Resume fetching sessions
原文地址:https://www.cnblogs.com/lexus/p/2559703.html