From: Glenn Nieuwenhuyse

> wget -T 1 -t 1 -r --reject="robots.*" [...]
> 
> I would expect this not to download the robots.txt file, but still it
> does.

   Perhaps because "robots.txt" is a special case, and is not selected
by following links, and so is unaffected by the --reject option.

   A search for "robot" in the manual should reveal this:

      http://www.gnu.org/software/wget/manual/wget.html

robots = on/off
     Specify whether the norobots convention is respected by Wget,
     "on" by default. This switch controls both the /robots.txt and
     the nofollow aspect of the spec. See Robot Exclusion, for more
     details about this. Be sure you know what you are doing before
     turning this off.

So, adding "-e robots=off" to your command might help.

------------------------------------------------------------------------

   Steven M. Schweda               [EMAIL PROTECTED]
   382 South Warwick Street        (+1) 651-699-9818
   Saint Paul  MN  55105-2547

Reply via email to