You are reading this document because you found a reference to it in your weblogs. I am taking this opportunity to explain why we believe that the robots.txt exclusion does not apply to us. The first version of the script which produced the log entry in question and led you here did indeed respect robots.txt. However upon further study and consulting http://www.robotstxt.org/orig.html I am of the opinion that our script does not qualify as a robot as defined therein. It does not "traverse many pages in the World Wide Web by recursively retrieving linked pages." There are no recursive link harvesting functions in our scripts. Should we decide to use this technique, we will start respecting robots.txt. Until then fetching a robots.txt from every site is in itself a waste of resources. A possible legitimate reason for respecting robots.txt, even for non-recursive program such as ours, is the concern that many pages will be fetched rapidly at a rate that is faster than a human with a browser and a mouse would. As of today, Tue Feb 3 2009, we have taken steps to ensure that no more that 10 pages per minute will be retreived from any single domain name*. Comments are welcome. I am happy to debate this and may change our policy should there be sufficient opposition. However I would push for a change in the definition, as well as a clear statement of a definition of a "robot" as well. Aaron Flin af(at)kilomonkey.com * There are a handfull of domains for which there are more than 50000 urls. In these cases, we may not be completely successful. If there is a problem, please let us know and we will do whatever is necessary to accommodate you.