Hi Thilo, On 10/29/07, Thilo Goetz <[EMAIL PROTECTED]> wrote: > > Hi Eddie, > > Eddie Epstein wrote: > >> It doesn't seem intuitive to me that an object reference whose > >> underlying object may have been serialized, sent over the network > >> or to C++, modified, serialized again and sent back is guaranteed > >> to still be valid afterwards. It makes sense that this should > >> work when all annotators are local. I don't think it makes sense > >> to guarantee this behavior in general. > >> > > The fact that this works for services and C++ annotators is not by > > accident, it is because a lot of effort was put in to make it work. > > I know, I wrote the first version of that code (together with Oli > Suhre). > > > At issue here is the vision for UIMA with regards how much flexibility > > to have in deploying annotators without affecting application behavior. > > > > A strong point point for UIMA, particularly with the OASIS standards > > work, is that UIMA annotators can be externalized and implemented > > in any language. It would be nice if the Apache UIMA implementation > > would not penalize applications for using those annotators. > > > > Eddie > > > > I can see that this point is very important to you. I would > have thought that the original point we were debating was pretty > minor, and with proper documentation, should cause no problems > for anyone. However, I understand you see things differently. > > It will be interesting to see what repercussions the OASIS > standard has on such issues. For example, indexing as we use > it today in Apache UIMA is not part of the standard atm. So indexing > information is lost in translation. This means that potentially, > when a flow includes a call to a OASIS compatible annotator, indexing > info and thus annotation iteration will change. Now maybe we > will want to change the way indexing works in Apache UIMA in > response to this, but I don't see how we can do this while staying > backward compatible. I'd be interested to know what your take > is on this issue, as you're one of the authors of the initial OASIS > submission. (Not to mention type priorities, but I'll be glad to > see them go ;-) > > --Thilo > > Well, it should be expected that such a change, reimplementing FS storage, would have more ramifications than what is immediately obvious. And yes, having spent much time now working on flexible and scalable deployment options for UIMA annotators, I am quite keen on having consistent behavior for co-located and remote configurations.
With regards indexing information in the Cas XMI format, what is passed is a list of FS that are indexed [in each view]. Today, without delta Cas, the indexes are fully rebuilt when the Cas is returned. All sorted indexes will retain the same iteration order. Non-sorted indexes may have a different order, but that has always been documented: "A bag index simply stores everything, without any guaranteed order. " The only potential change in behavior that I am aware of has to do with adding an FS to the index repository multiple times: "... all FSs that are commited are entered, even if they are duplicates of already existing FSs." So yes, that would be a change in behavior, as there would only be a single instance of each FS in the index upon return from a remote component. Is this the difference you were referring to, or is there more? Eddie
