-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Josh Williams wrote: > On 12/7/07, Brian <[EMAIL PROTECTED]> wrote: >> For the life of me, I cannot convince wget to download an old copy of a >> website from the Internet Archive. I think the url within a url is somehow >> messing it up.. >> >> wget -e robots=off --base= >> http://web.archive.org/web/19990125085924/http://gnu.org/ >> -r -Gbase >> http://web.archive.org/web/19990125085924/http://gnu.org/ >> >> How can I get this to work? > > We've seen this issue a lot. IIRC, the --base option does no good in > this instance because the problem is actually a parsing error.
No parsing error. Archive uses JavaScript to reset the URLs to refer to archive pages in a browser, but without JavaScript they're pointing at the original links. Note that the archive terms of service prohibit the use of automated crawlers, and in particular making personal copies. - --base doesn't work for this because it's not intended to override _real_ bases, but to specify the base for relative links that wget reads from input files. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHWila7M8hyUobTrERAjYsAJ43y4F+/eoqik1itAsZjm2d0BnwFgCfUkKn 4du9KE4ozn1CGOROS3xeTKg= =Uqws -----END PGP SIGNATURE-----