Hi!

Some thoughts on extending WWWOffle. I have a lot of non-changing
documents in my cache. I don't really use WWWOffle as a cache in this
case, but as a document database.

One could make extern copies of such pages, using wget or similarly. This
is impractical. Just leaving them in the cache already is a much better
solution.

However, there are some problems with this approach. First, when online,
it causes a needless delay while WWWOffle asks the server for a newer
version (especially over a busy modem line), whereas I only want the
version that's already in the cache. Second, more important, the server's
page hierarchy may have changed in the meantime. The document may now
reside at a different place, or may have become unavailable. Third, I
don't want these pages to get purged. It would be cumbersome to add them
with a infinite lifetime to the config file each time. Rather, I don't do
any purging at all.

There isn't much missing to turn WWWOffle into a real document store. The
only thing that would have to be added is some kind of versioning. For
some pages, one could chose to permanently store a copy. This would be
accomplished through a web interface. WWWOffle would still act as a cache
for such pages, offering access to the frozen copy by an URL like
"http://localhost:8080/store/page=http...;version=...", which one could
then bookmark or save somewhere. One could also get a list of available
versions, or specify if this URL should be queried for newer versions at
all. Another possiblility would be the option to automatically store all
new versions, so you would get a version history. (Could even be used for
backups!)

Of course it wouldn't be feasible for the user to request this on a page
by page basis. Rather some method for efficiently chosing the pages to
version control would be required. The already existing recursive
retrieval methods could be used for this, this time operating only on the
pages which already are in the cache (optionalle fetching still missing
ones). Alternatively, one could specify some wildcard, like in the config
file. WWWOffle could then present a page listing the affected URLs for
confirmation.

Does this sound like a good idea?


bye


Reply via email to