Re: whitehouse.gov/robots.txt

2003-12-11 Thread Major Variola (ret)
I'd suggest "wget" for spidering sites. It can be told to ignore .robots files. It is good for mirroring sites which you suspect may be taken down. Win/Unix versions available.

Re: whitehouse.gov/robots.txt

2003-12-10 Thread FB`
tp://shock-awe.info/archive/000965.php FB` From: "Declan McCullagh" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, December 10, 2003 9:41 AM Subject: Re: whitehouse.gov/robots.txt > This robots.txt issue was exaggerated by leftist crtitics of the > administ

whitehouse.gov/robots.txt

2003-12-10 Thread Eugen Leitl
Can somebody with a webspider crawl these documents, and put it up on the web? http://www.whitehouse.gov/robots.txt -- Eugen* Leitl http://leitl.org";>leitl __ ICBM: 48.07078, 11.61144http://www.leitl.org 8B29F6BE: 09

Re: whitehouse.gov/robots.txt

2003-12-10 Thread Declan McCullagh
This robots.txt issue was exaggerated by leftist crtitics of the administration. (This is not a general defense of the White House, just a statement of fact.) The Bush WH.gov server has a special Iraq section where press releases, speeches, etc. are reposted in a different HTML template. The WH onl

Re: whitehouse.gov/robots.txt

2003-12-10 Thread Anatoly Vorobey
On Wed, Dec 10, 2003 at 12:56:24PM +0100, Eugen Leitl wrote: > Can somebody with a webspider crawl these documents, and put it up > on the web? > > http://www.whitehouse.gov/robots.txt All or nearly all of them are duplicates of same documents elsewhere in the directory tree; "X/text/" and