Anders Rosendal asked:
> Could you make an option to only fetch from other hosts what is directly
> referenced from the orig page?
Have you tried the "--page-requisites" (a.k.a. "-p") command line option?
The info documentation says this:
Actually, to download a single page and all its requisites (even
if they exist on separate websites), and make sure the lot
displays properly locally, this author likes to use a few options
in addition to `-p':
wget -E -H -k -K -nh -p http://SITE/DOCUMENT
In one case you'll need to add a couple more options. If DOCUMENT
is a `' page, the "one more hop" that `-p' gives you
won't be enough--you'll get the `' pages that are
referenced, but you won't get _their_ requisites. Therefore, in
this case you'll need to add `-r -l1' to the commandline. The `-r
-l1' will recurse from the `' page to to the `'
pages, and the `-p' will get their requisites. If you're already
using a recursion level of 1 or more, you'll need to up it by one.
In the future, `-p' may be made smarter so that it'll do "two
more hops" in the case of a `' page.
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an `' tag, an
`' tag, or a `' tag other than `'.