Re: Mirroring a site on the Internet Archive

Brian Fri, 07 Dec 2007 22:05:44 -0800

Thanks for the comments. Here's the solution.

http://groups.google.com/group/comp.os.linux.help/msg/5b086b3500985efe


On Dec 7, 2007 10:19 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Josh Williams wrote:
> > On 12/7/07, Brian <[EMAIL PROTECTED]> wrote:
> >> For the life of me, I cannot convince wget to download an old copy of a
> >> website from the Internet Archive. I think the url within a url is
> somehow
> >> messing it up..
> >>
> >>  wget -e robots=off --base=
> >> http://web.archive.org/web/19990125085924/http://gnu.org/
> >> -r -Gbase
> >> http://web.archive.org/web/19990125085924/http://gnu.org/
> >>
> >> How can I get this to work?
> >
> > We've seen this issue a lot. IIRC, the --base option does no good in
> > this instance because the problem is actually a parsing error.
>
> No parsing error. Archive uses JavaScript to reset the URLs to refer to
> archive pages in a browser, but without JavaScript they're pointing at
> the original links.
>
> Note that the archive terms of service prohibit the use of automated
> crawlers, and in particular making personal copies.
>
> - --base doesn't work for this because it's not intended to override
> _real_ bases, but to specify the base for relative links that wget reads
> from input files.
>
> - --
> Micah J. Cowan
> Programmer, musician, typesetting enthusiast, gamer...
> http://micah.cowan.name/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHWila7M8hyUobTrERAjYsAJ43y4F+/eoqik1itAsZjm2d0BnwFgCfUkKn
> 4du9KE4ozn1CGOROS3xeTKg=
> =Uqws
> -----END PGP SIGNATURE-----
>

Re: Mirroring a site on the Internet Archive

Reply via email to