Volker Wysk <v@volker> wrote:
> Some thoughts on extending WWWOffle. I have a lot of non-changing
> documents in my cache. I don't really use WWWOffle as a cache in this
> case, but as a document database.

> There isn't much missing to turn WWWOffle into a real document store. The
> only thing that would have to be added is some kind of versioning. For
> some pages, one could chose to permanently store a copy. This would be
> accomplished through a web interface. WWWOffle would still act as a cache
> for such pages, offering access to the frozen copy by an URL like
> "http://localhost:8080/store/page=http...;version=...", which one could
> then bookmark or save somewhere. One could also get a list of available
> versions, or specify if this URL should be queried for newer versions at
> all. Another possiblility would be the option to automatically store all
> new versions, so you would get a version history. (Could even be used for
> backups!)

Would you want this if it meant that you couldn't have images in pages
or any working links to other pages?

For example you request the page:

http://localhost:8080/store/page=http://www.foo/;version=2001-03-10-12:00:00

This is to see what that particular page (http://www.foo/) looked like
on midday last Saturday (supposing that you visted the page then).
Now this page will have links in it to other pages and it will also
have images in it.

Suppose that the images in the page are important and you need to have
the correct versions to make the page worth keeping.  There might be
two images, http://www.foo/image1.gif dated 2001-03-09-12:00:00 and
http://www.foo/image2.gif dated 2001-03-10-12:00:00.

Obviously you would want the stored page to be modified so that it
links to the two pages:

http://localhost:8080/store/page=http://www.foo/image1.gif;version=2001-03-09-12:00:00
http://localhost:8080/store/page=http://www.foo/image2.gif;version=2001-03-10-12:00:00

This works, the page displays the images.  The image1.gif is the one
that was fetched the day before (the 9th) and the image2.gif is the
one that was fetched on the 10th.


But would you want to introduce a dependancy on a page from the 9th
when you actually viewed it on the 10th as well.  This implies a
relationship between pages cached on different days.  You can't delete
the pages from the 9th without breaking the pages for the 10th.  You
would need to store information for the pages of the 9th saying that
they are used on the 10th or keep both copies.

An alternative is to modify the image pages so that the first one is

http://localhost:8080/store/page=http://www.foo/image1.gif;version=2001-03-10-12:00:00

the absence of this in the cache could cause a prompt to select an
appropriate version, defaulting to the previous one.


Any links in this page that were not followed on the 10th are useless
since they will lead to a pages that were not stored on the 10th.  Any
link that was cached on a different date may be misleading if you are
using this as an historical archive.


I think that you will end up with a collection of pages, all cached at
different times and therefore all only weakly interconnected.  The
current situation is similar, but they are connected by the fact that
they are all the latest downloaded versions.

The most useful way of archiving documents this way is to download all
of them at the same time and create a snapshot.  With the proposed
changes to WWWOFFLE to allow different versions of the same page to be
stored you will end up with a number of very small snapshots.  Each
one may only be a single page with its images.  The increase in
dynamic content means that any collection of pages gathered over an
extended period becomes distorted as the epoch changes for the
different pages in a snapshot.



Rigo Wenning <[EMAIL PROTECTED]> wrote:
> I don't think, Andrew can do this by himself, so this might be a
> larger thing to initiate in a University-Project. Is there anyone
> on the list, who can do students projects and is willing to
> promote the idea of a document management-system?

I like this idea.  There are a lot of things that users want me to add
to WWWOFFLE.  This is one of them, but the scale of it means that it
is not something that is likely to happen.


-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.6/user.html

Reply via email to