Re: refactor storing pages and versions

Eelco Hillenius Tue, 09 Jan 2007 12:02:10 -0800

On 1/9/07, Johan Compagner <[EMAIL PROTECTED]> wrote:

>
> >
> > 4?> SecondLevelCacheStore  sets a IPageVersionManager, That manager is
> > almost completely dummy
> > except that it increases an page counter when the page is changed just
> like
> > the current one does.
> > then we generate the versionid but nothing more. And when we try to
> generate
> > an older page
> > the version manager will return null. Then it will go to the SecondLevel
> and
> > its IPageStore and reads
> > the right version in from disk.
>
> I think we should try to keep the changes mechanism (though in a
> different form), and save changes for versions rather then the whole
> pages; serializing pages is pretty expensive and as it is an operation
> we'll be performing every request, we should optimize where we can.




So then when i go 4 versions back i need to read in 4 changesets from disk?
and then rollback them one by one to get the page?


Yep. You can read them all first and then rollback at once like we do
now. But the main argument for this is that we'll write much more
often then read, so writing is what we should optimize. Just the
changes will be much more efficient.

And if you where on version 10 and you roll backed to 6 you will never
be able to get 7 back.


This is how it should be (and was) in the first place. I applied a
little patch yesterday that actually deletes those higher version
(together with a patch that only writes a version when that version
wasn't written before yet).

Or all undo objects should also be able to redo..


What do you mean? If a certain undo shouldn't be done, it shouldn't be
recorded in the first place, right?

Not that this is a big problem because normally this can really happen
(except when you just refresh or something in the browser on an old version
and then go forward again to a newer page)

But this still won't solve you from not saving the page but only the
version.
For example this can happen i think:

PageA gets a event.
onClick does something and can touch the page so pageA gets a new change.
PageA was the active page in the pagemap.
But then onClick does also setResponsePage(PageB)
now pageA must be saved to disk else you loose it.

So now you have saved the "previous" version as a version changes dump
and the latest version as the page itself.

Then when you go back to PageA version 0. We need to read in the latest page
then all the version pieces then rollback all of it.


When we rollback, we should invalidate newer versions of the page
relative to the version that is rollbacked to anyway. I think this was
always the idea, and was how the page version manager was implemented.

Don't know if this much more complex situation will result in a much better
performance.
I dont think that it cost so much performance and you have to save something
anyway...
Thats a file open a file save (not that much data but still data) so what
will be really the gain
the overhead of the file save itself is in both situations.


Partially it is just fixing something that is broken now. We could
decide to write full versions rather then one full page and a series
of changes. However, as this is something we do for every request
(though we might do it async, but still it's for every request), it
makes sense to optimize as best as we can. From testing on my machine,
a typical save of a page with one or two versions costs between 5 - 20
miliseconds (which obviously increases currently as more versions get
added to the page, with my system here easily up 50 miliseconds are 10
clicks, and it is also on a warmed up system, where the introspection
caches are filled properly; the first saves are up to 500
miliseconds!). This would be 1 - 2 miliseconds for just writing the
changes. On a system with heavy load, I believe this can make quite
the difference. Note that it's not only processing time, but also
space it takes in the FS (minor) and time it needs the filesystem.

And do remember that most clustering situations do exactly the same but then
for the complete session
We only for changed pages!


The fact that other solutions are not optimized doesn't mean we don't have to :)

I think we only should do this when it is really shown as a bottleneck!


Typically I would go for the premature optimization argument. But in
this case the penalty is pretty obvious and easy to measure.

The solution i propose and like is because of 2 things, the first one is
minor
1> that we save much less, just the page at that time.. not the complete
version history.
2> That the page in mem is much smaller again not the complete history
(because that is already on disk)

especially the last point is nice to have..  the drawback is when the
backbutton is used the page will always
come from disk. Which is now not the case because most of the time the page
can construct is previous page itself in mem.


That doesn't have to be the case, as we don't have to write
immediately. We could have an overflow mechanism with a combined rule
so that it doesn't write until we have say 5 versions when it has
enough available memory. I starting to think whether it wouldn't be
better to use an actual cache manager for this - like EHCache - so
that we don't have to write this ourselves and have something that is
easy to configure and stuff.

Furthermore, you are right that most of the time we don't get it from
disk now, but the price (up till yesterday something I didn't notice
tbh) is that all those changes will stay in memory! And worse, the
page grows larger and larger, but is still saved as a whole on every
request. And again, we should optimize the writing, not the reading.

Eelco

Re: refactor storing pages and versions

Reply via email to