Eddie Epstein wrote:
>>> Copying the behavior would be appropriate, unless there is some other
>> way to
>>> easily distinguish pre-existing FS.
>> To my mind, the place to keep track of something like that
>> is the serialization code.  It has to iterate over the whole
>> CAS anyway and can do that kind of tracking.  It seems wrong
>> to put that kind of requirement on the heap implementation.s.
> 
> 
> Doing this in the serialization code will not work. There is no way for this
> to efficiently detect which existing FS have had feature values changed.
> More importantly, it eliminates the ability to track CAS changes for
> colocated annotators, something that has been repeated asked for to improve
> debugging and to track provenance.

Now wait a minute.  The current heap implementation can't
do that either.  All we were talking about was to know which
FSs were *added* since the CAS was serialized.  That is
something you can do now by remembering the top heap position,
and I am planning to support this with the new heap impl as
well.  Knowing what FSs were *modified* is an entirely different
proposition.

> 
>> Given no warning against doing this from an application, the fact that it
>>> works and that it is fairly intuitive to do so means that there are
>> likely
>>> existing UIMA applications doing it. Of course we all are willing to
>> break
>>> existing user code when it gets in the way of some neat improvement :)
>> So you agree that maintaining this behavior is not a requirement?
> 
> 
> No, not without further discussion.

Maybe we should call for a vote?

> 
>> Blob serialization, like the binary serialization used between C++ and
>> Java,
>>> leaves the Java Cas with a string heap rather than a string list. It
>> would
>>> be easy to change blob deserialization to recreate a string list
>> instead,
>>> and measure the performance difference.
>> I'll take your word for it, though I still don't see what this
>> has to do with what we were talking about.  In the new heap I'm
>> thinking about, there will be no such thing as a String heap or
>> list.  Strings will just be referenced directly from the objects
>> representing FSs.
>>
> 
> It sounds like you have no concern for binary serialization performance.

I don't know what makes you say that.  That is not the
impression I wanted to give, at least ;-)  I'll admit
it's not my primary concern.  To repeat: I simply do not
understand what you mean to show by your string heap vs.
string list test.  I'm not unwilling, just intellectually
incapable.

> Changing the heap design to enable garbage collection at the expense of
> seriously degrading performance for existing users that are strongly
> dependent on efficient CAS serialization does not sound viable.

I agree completely.  If this turns out to seriously degrade
performance for *any* important scenario, it's out.  However,
I'm not sure it will degrade performance, not even for binary
serialization.  Otherwise I wouldn't be suggesting this.

--Thilo

> 
> How about re-implementing the heap as a pluggable component so that the
> existing design would still be available?
> 
> Eddie
> 

Reply via email to