Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Artifact ID: | 0cf41c29b4d083f4889857c4b7a5c9052051b3ea |
---|---|
Page Name: | crawl |
Date: | 2010-10-28 21:55:07 |
Original User: | evilotto |
Content
crawl is a web crawler. In its current form is is mostly suitable for generating load on a web server.
Features:
- adustable load crawling
- specifying what found urls to crawl by patterns
- broken link report, listing links that are broken and the pages those links are found on.
- tries to honor robots.txt files
Missing:
- does not honor nofollow meta tags or link attributes
- doesn't do anything with crawled pages