Micha <[EMAIL PROTECTED]> writes:
> Imagine to fetch and cache a huge wiki section automatically,
> to have it on your laptop when you finally, on plane or in train,
> got the time to read it up.
> I believe that by now i can only try something like
> fetching the same host, starting from page bla with recursion depth x.
> (Of course, if i don't want to script it, but anyway i'm talking about
> wwwoffle features here)
> However, there will be also *many* upload or feedback or edit links
> fetched, too, and generally much stuff i'm not interested in.
> And maybe i like to cache the desired pages (say, medical or science) for
> a long time, so with no cleanup, it's even worse to clutter the cache.
>
> Recursive fetching could be enhanced by wildcard matching.
> Usually, wiki content pages (and as a general rule, too) have
> specific subdomain/subfolder URLs. Wildcards could solve the
> problem, and i can do something like fetch anything starting from
> page 'insulin', recursive 10, staying on same host, following only
> links to other wiki knowledge contents, and only if not already cached.
This option already exists in the DontGet section of the configuration
file. There is an option that allows for pages to be set to not be
fetched recursively (but they will still be fetched for non-recursive
requests).
You can specify that only pages on the same host are to be fetched and
have a set of configuration options to ignore the pages that you don't
want.
--
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop [EMAIL PROTECTED]
http://www.gedanken.demon.co.uk/
WWWOFFLE users page:
http://www.gedanken.demon.co.uk/wwwoffle/version-2.9/user.html