Using wget 1.8.2: $ wget --page-requisites http://news.com.com
...fails to retrieve most of the files that are required to properly render the HTML document, because they are forbidden by http://news.com.com/robots.txt . I think that use of --page-requisites implies that wget is being used as a "save this entire web page as..." utility for later human viewing, rather than a text indexing spider that wants to analyze the content but not the presentation. So I believe that wget should ignore robots.txt when --page-requisites is specified. If you agree then I'll try to write a patch & send it to you this week... please let me know if you agree or disagree. Thanks! --- the gory bits: "wget -d --page-requisites http://news.com.com" says: appending "http://news.com.com/i/hdrs/ne/y_fd.gif" to urlpos. etc., but then later says: Deciding whether to enqueue "http://news.com.com/i/hdrs/ne/y_fd.gif". Rejecting path i/hdrs/ne/y_fd.gif because of rule `i/'. Not following http://news.com.com/i/hdrs/ne/y_fd.gif because robots.txt forbids it. Decided NOT to load it.