Micha <[EMAIL PROTECTED]> writes:
> > check if the URLs no longer exist you would need to check the HTTP
> > status value.
>
> Am i right to expect there's no cache entry (new file or modification) for
> a 404 not even if the domain already is cached ?
No, this is incorrect, the 404 error page is cached by WWWOFFLE.
> > The problem that you would get is that lots of pages would have
> > changed and you need to get new images and things for them. You would
> > end up with lots more in the cache than you had before and no way of
> > knowing what had changed and what has stayed the same.
>
> my first thought is to feed this list of URLs (only the http pages,
> not the subcontent URLs) to a wwwoffle builidng a cache from
> srcatch, that is, if it workded, i could delete the old cache anyway.
> Assumed i exclude some stuff (which i would copy over literally)
> then we are talking about less than 2 GB which should be affordable
> in one night, over some hours.
How would you work out which are the pages that are worth keeping from
the ones that are not? If you could do that then the method that you
suggest (feeding the list of URLs to a frech WWWOFFLE) would work
quite well.
> To me the problem seems to be more that i would end up with a lot of
> main pages which related links are lost (because they don't exist anymore)
> so the cache would be way smaller. But that's the goal anyway.
--
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop [EMAIL PROTECTED]
http://www.gedanken.demon.co.uk/
WWWOFFLE users page:
http://www.gedanken.demon.co.uk/wwwoffle/version-2.9/user.html