Micha <[EMAIL PROTECTED]> writes:

> Indeed there were some strange occurences in the past, 
> but I didn't notice the problem with redirections until now.
> And of course i set keep-cache-if-not-found = yes for all pages -- i
> definitely use wwwoffle as an archive, too.
> 
> > The purpose of the option is to stop another URL replacing the one
> > that is in the cache if it has been removed or replaced with a
> > redirection.  While it is possible to make this option apply to all
> > pages in the cache I think that the orignal purpose of it was to
> > preserve specific archives of pages from selected hosts.
> 
> But what would be a straightforward approach, once the setting is
> applied to all pages ?
> 
> Often the reason for redirects is that the address has changed, and
> there's some chance that even the redirect will disappear
> later.

A lot of other times the redirect will take you to a "page not found"
page or a search page that lets you find the content that has gone
missing.  There would be many pages that get redirected to the same
single page which is totally different from the redirected page being
a new URL for the same information as the old page.

> Shouldn't a clean proxy cache represent the facts 'out there'
> as true as possible, and adapt the cache to the new situation ?

Yes, a clean proxy would remove the redirection page and then fetch
the page that it was redirected to.  It would not move the old page to
the new location since to be clean it needs to make sure that the
content was moved.

WWWOFFLE is not a clean proxy in this sense, it keeps pages that it
should delete and doesn't refresh pages that it should do.  It does
these things for your benefit, to allow you to access the pages when
offline and minimise the traffic when online.

> If a cached page suddenly gets redirected, the proxy could move the
> cached page to the redirected address (create an entry -- of course,
> only if this new address isn't cached already, too) and then cache
> the redirect. In case the new (redirection) address is already
> cached we should assume that the cached old version and the
> redirection address are and were from the beginning independent (for
> example, a site is dissolved completely and redirects for some time
> to an alternate one, before it disappears completely) and the recent
> standard behaviour could be applied (t.i., hold the cached version
> without any cache change).

The problem is in the case that I describe above where many pages
redirect to the same place.  The first page that WWWOFFLE sees
redirected to this location would get moved since the new location did
not previously exist.  The second page that WWWOFFLE sees redirected
would not get moved since the new location does now exist.  The second
page then gets redirected to the content of the first page which is
just totally wrong.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html

Reply via email to