Re: New CAS heap impl?

Thilo Goetz Mon, 22 Oct 2007 02:24:53 -0700

Eddie Epstein wrote:
> On 10/19/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:
>>> As far as I know, the main requirements for delta CAS is that it is easy
>> (
>>> i.e. cheap) to know,
>>>  1. which FS were created in the current call
>>>  2. which preexisting FS were deleted from the index
>>>  3. when setting a feature value, if the containing FS was preexisting
>> None of these are particularly easy to do now, and they
>> won't be any easier or harder when I'm done ;-)  As I said,
>> there will still be unique IDs, and as long as you don't
>> refer to the heap directly, my changes should not affect
>> this design.
> 
> 
> With the current design, the top of the FS heap position on calling process
> is used to identify new versus preexisting FS during or after the call: just
> compare any FS address to that position to know if it is new or not.


I can copy this behavior in the new implementation, but
do we really want to rely on this and make it part of the
design of the CAS and its heap?  Currently, this is a property
of the implementation, but not something I ever considered
to be part of the external contract of the CAS implementation.

It only works because the heap doesn't do any garbage collection,
and consequently no heap compaction.  It's not like that because
I thought that was a particularly good idea, but simply because
it would have been difficult to implement.  So it's a restriction
of the implementation, and not something to be necessarily
preserve in the future.

> 
>>> Another thing to keep in mind for calls to remote services is the
>>> requirement that any FS references in the client are still valid after
>>> making a call.
>>>
>>> As for impact on binary serialization performance, an easy experiment
>> would
>>> be to modify binary serialization to end up with a string list instead
>> of a
>>> string heap, using a scenario that had a lot of strings in the CAS. This
>>> would give a good idea of the extra overhead of creating individual FS
>>> objects.
>> I must admit that I don't understand what you mean.
>>
>> For both paragraphs?
> 
> Consider the following code:
>         AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
>         CAS cas = ae.newCAS();
>         cas.setDocumentText("some text");
>         AnnotationFS fs = cas.createAnnotation(cas.getAnnotationType(), 0,
> 4);
>         ae.process(cas);
>         System.out.println(fs.getCoveredText());
> 
> Preexisting fs in the client must be valid after a process call, no?

No.  I've been over this with Adam on one of the OASIS calls, too.
It happens to work in the current implementation, but nowhere do
we guarantee this or suggest that this should work.  To the contrary,
we always tell people not to keep FS references across process calls.
The design I am planning on may break this code.  I will guarantee
that int IDs of FSs are constant for serialization/deserialization,
but I won't necessarily keep the objects around.  So if the CAS was
sent over the wire, the object may no longer be valid.  If the
deployment is all local, it will continue to work (unless the FS
has been deleted by one of the annotators).

> 
> For the 2nd paragraph, I was referring to binary blob serialization.
> 
> Eddie
> 

It was the second paragraph that I didn't understand.

--Thilo

Re: New CAS heap impl?

Reply via email to