I'd suggest "wget" for spidering sites. It can be told to ignore
.robots files. It is
good for mirroring sites which you suspect may be taken down. Win/Unix
versions
available.
tp://shock-awe.info/archive/000965.php
FB`
From: "Declan McCullagh" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, December 10, 2003 9:41 AM
Subject: Re: whitehouse.gov/robots.txt
> This robots.txt issue was exaggerated by leftist crtitics of the
> administ
Can somebody with a webspider crawl these documents, and put it up
on the web?
http://www.whitehouse.gov/robots.txt
-- Eugen* Leitl http://leitl.org";>leitl
__
ICBM: 48.07078, 11.61144http://www.leitl.org
8B29F6BE: 09
This robots.txt issue was exaggerated by leftist crtitics of the
administration. (This is not a general defense of the White House,
just a statement of fact.) The Bush WH.gov server has a special Iraq
section where press releases, speeches, etc. are reposted in a
different HTML template. The WH onl
On Wed, Dec 10, 2003 at 12:56:24PM +0100, Eugen Leitl wrote:
> Can somebody with a webspider crawl these documents, and put it up
> on the web?
>
> http://www.whitehouse.gov/robots.txt
All or nearly all of them are duplicates of same documents
elsewhere in the directory tree; "X/text/" and