Re: Mirroring a site on the Internet Archive

Micah Cowan Fri, 07 Dec 2007 21:24:10 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Josh Williams wrote:
> On 12/7/07, Brian <[EMAIL PROTECTED]> wrote:
>> For the life of me, I cannot convince wget to download an old copy of a
>> website from the Internet Archive. I think the url within a url is somehow
>> messing it up..
>>
>>  wget -e robots=off --base=
>> http://web.archive.org/web/19990125085924/http://gnu.org/
>> -r -Gbase
>> http://web.archive.org/web/19990125085924/http://gnu.org/
>>
>> How can I get this to work?
> 
> We've seen this issue a lot. IIRC, the --base option does no good in
> this instance because the problem is actually a parsing error.


No parsing error. Archive uses JavaScript to reset the URLs to refer to
archive pages in a browser, but without JavaScript they're pointing at
the original links.

Note that the archive terms of service prohibit the use of automated
crawlers, and in particular making personal copies.

- --base doesn't work for this because it's not intended to override
_real_ bases, but to specify the base for relative links that wget reads
from input files.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWila7M8hyUobTrERAjYsAJ43y4F+/eoqik1itAsZjm2d0BnwFgCfUkKn
4du9KE4ozn1CGOROS3xeTKg=
=Uqws
-----END PGP SIGNATURE-----

Re: Mirroring a site on the Internet Archive

Reply via email to