Artifact Content
Not logged in

Artifact 0cf41c29b4d083f4889857c4b7a5c9052051b3ea:

Wiki page [crawl] by evilotto 2010-10-28 21:55:07.
D 2010-10-28T21:55:07
L crawl
U evilotto
W 457
<b>crawl</b> is a web crawler.  In its current form is is mostly suitable for generating load on a web server.

  *  adustable load crawling
  *  specifying what found urls to crawl by patterns
  *  broken link report, listing links that are broken and the pages those links are found on.
  *  tries to honor robots.txt files

  *  does not honor nofollow meta tags or link attributes
  *  doesn't do anything with crawled pages
Z 881a7225e7da3b12be5dbb6e7e67a29c