Hi Thilo,

On 10/29/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:
>
> Hi Eddie,
>
> Eddie Epstein wrote:
> >> It doesn't seem intuitive to me that an object reference whose
> >> underlying object may have been serialized, sent over the network
> >> or to C++, modified, serialized again and sent back is guaranteed
> >> to still be valid afterwards.  It makes sense that this should
> >> work when all annotators are local.  I don't think it makes sense
> >> to guarantee this behavior in general.
> >>
> > The fact that this works for services and C++ annotators is not by
> > accident, it is because a lot of effort was put in to make it work.
>
> I know, I wrote the first version of that code (together with Oli
> Suhre).
>
> > At issue here is the vision for UIMA with regards how much flexibility
> > to have in deploying annotators without affecting application behavior.
> >
> > A strong point point for UIMA, particularly with the OASIS standards
> > work, is that UIMA annotators can be externalized and implemented
> > in any language. It would be nice if the Apache UIMA implementation
> > would not penalize applications for using those annotators.
> >
> > Eddie
> >
>
> I can see that this point is very important to you.  I would
> have thought that the original point we were debating was pretty
> minor, and with proper documentation, should cause no problems
> for anyone.  However, I understand you see things differently.
>
> It will be interesting to see what repercussions the OASIS
> standard has on such issues.  For example, indexing as we use
> it today in Apache UIMA is not part of the standard atm.  So indexing
> information is lost in translation.  This means that potentially,
> when a flow includes a call to a OASIS compatible annotator, indexing
> info and thus annotation iteration will change.  Now maybe we
> will want to change the way indexing works in Apache UIMA in
> response to this, but I don't see how we can do this while staying
> backward compatible.  I'd be interested to know what your take
> is on this issue, as you're one of the authors of the initial OASIS
> submission.  (Not to mention type priorities, but I'll be glad to
> see them go ;-)
>
> --Thilo
>
>
Well, it should be expected that such a change, reimplementing FS
storage, would have more ramifications than what is immediately obvious.
And yes, having spent much time now working on flexible and scalable
deployment options for UIMA annotators, I am quite keen on having
consistent behavior for co-located and remote configurations.

With regards indexing information in the Cas XMI format, what is passed
is a list of FS that are indexed [in each view]. Today, without delta Cas,
the indexes are fully rebuilt when the Cas is returned. All sorted indexes
will retain the same iteration order. Non-sorted indexes may have a
different
order, but that has always been documented: "A bag index simply stores
everything, without any guaranteed order. "

The only potential change in behavior that I am aware of has to do with
adding
an FS to the index repository multiple times: "... all FSs that are commited

are entered, even if they are duplicates of already existing FSs." So yes,
that would be a change in behavior, as there would only be a single instance
of each FS in the index upon return from a remote component. Is this the
difference you were referring to, or is there more?

Eddie

Reply via email to