On 10/30/07, Thilo Goetz <[EMAIL PROTECTED]> wrote: > > Eddie Epstein wrote: > [...] > > With regards indexing information in the Cas XMI format, what is passed > > is a list of FS that are indexed [in each view]. Today, without delta > Cas, > > the indexes are fully rebuilt when the Cas is returned. All sorted > indexes > > will retain the same iteration order. Non-sorted indexes may have a > > different > > order, but that has always been documented: "A bag index simply stores > > everything, without any guaranteed order. " > > > > The only potential change in behavior that I am aware of has to do with > > adding > > an FS to the index repository multiple times: "... all FSs that are > commited > > > > are entered, even if they are duplicates of already existing FSs." So > yes, > > that would be a change in behavior, as there would only be a single > instance > > of each FS in the index upon return from a remote component. Is this the > > difference you were referring to, or is there more? > > > > Eddie > > > > Our current XMI implementation and what is on the > table for OASIS are two different things. Let me > quote from the standard proposal: > > <quote> > Currently the Apache UIMA Component Metadata Descriptor includes the > following > elements that are not part of the proposed UIMA Specification. > > 1. Indexes: Defines the structure of indexes through which the analytic > will access > data. In some sense the actual indexing design is an Apache UIMA issue and > so > this may be an extension to the descriptor schema that is specific to > Apache > UIMA. However if we think of the index definitions as a component > declaring > the key features that it is going to use to query the data, we can make a > case that > this should be a UIMA standard, so that any framework could optimize based > on > this information. > > 2. Type Priorities: These are closely related to the index definitions and > should > probably be combined with them rather than represented as a separate > element > > </quote> > > Maybe I'm wrong, but I think this has consequences for Apache > UIMA flows that use OASIS compliant services, as indexing > information is lost. In Apache UIMA, you explicitly need to > add FSs to indexes (or not). This distinction is lost if > indexes are not part of the spec. > > --Thilo >
We are talking about two different things. Section 5.3.4.2 of the spec describes how views are to be encoded in the XMI representation of the CAS. The list of view members is exactly the same information as that in the earlier XCAS format, indicating which FS have been added to the index repository for each view. This data preserves the same indexing information currently in the Vinci and SOAP service interfaces as well as in the JNI interface for C++ annotators. The second issue you raise is how indexes themselves (and the related type priorities) are to be covered by the spec, specifically in the component metadata descriptors. Index definitions are not in the spec, but they are not needed to guarantee that the indexing information in Apache UIMA is preserved after calling XMI services. Eddie
