On 12/13/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:
Let's disregard the question of backward compatibility for a moment. In an ideal world, where we could design the APIs the way we wanted, what would we do?
Let's consider what the proposed UIMA spec has to say. First note that the spec is centered around the XMI representation of the CAS. Java APIs are considered a way to get at that information. When the spec first introduces the CAS it says that a CAS is just a collection of objects (i.e., FeatureStructures). Views are not introduced until much later, as a way to organize that collection of objects. Thinking about this from an XMI point of view, what you send to an annotator (e.g., web service) is an XMI document with a list of objects in it. Ignoring the Java API for a moment, it's perfectly valid for an annotator service to read and write these objects directly. No notion of a view is necessary. This is considered a critical enabler of interoperability with simple XML-based implementations such as WebFountain. Possible conclusion for the Java API: We should provide a way to get all of the FS from the CAS and add new FS to the CAS, as part of the Base CAS API. Note that we currently jump through hoops in XMI/XCAS serialization because we don't have such access. Now as for Views, a View is a subset of objects from the CAS. A special kind of view called an "Anchored View" is associated with a Sofa, and has the constraint that all annoations in the Anchored View refer to that Sofa. This feature is important for multi-Sofa applications that need to keep their annotations organized. Here is example XMI from the spec (pp.40-41): <ex:Quotation xmi:id="1" text="If we begin in certainties, we shall end with doubts; but if we begin with doubts and are patient with them, we shall end in certainties." author="Francis Bacon"/> <cas:SofaReference xmi:id="2" sofaObject="1" sofaFeature="text"/> <ex:Pronoun xmi:id="4" sofa="2" begin="3" end="5" lemma="6"/> <ex:Pronoun xmi:id="5" sofa="2" begin="29" end="31" lemma="6"/> <ex:Lemma xmi:id="6" base="I" person="1" number="p"/> <uima:AnchoredView sofa="2" members="3 4 5"/> <ex:Quotation xmi:id="7" text="The only limit to our realization of tomorrow will be our doubts of today." author="Franklin D. Roosevelt"/> <cas:SofaReference xmi:id="8" sofaObject="7" sofaFeature="text"/> <ex:Pronoun xmi:id="9" sofa="8" begin="18" end="21" lemma="11"/> <ex:Pronoun xmi:id="10" sofa="8" begin="54" end="57" lemma="11"/> <ex:Lemma xmi:id="11" base="my" person="1" number="p"/> <cas:AnchoredView sofa="7" members="9 10 11"/> Note that although this example doesn't show it, a FeatureStructure that was not an annotation (did not link to a Sofa), could be shared between the views. Views are not disjoint subsets of the set of objects in the CAS. In our implementation, we have indexes. These aren't part of the spec at all. They're considered an implementation feature, albeit an important one. We currently have indexes only over specific views. There's are no indexes over the entire CAS. This is part of the confusion. I'm trying to say that objects live in the CAS, not views; but since you can only access them through indexes and indexes are only on views, it seems useless to create an object off the base CAS. Thilo advocates addressing this issue by eliminating the capability to create objects off the base CAS. But there is an alternative solution, which is to add the ability to retrieve objects from the base CAS. The latter may be more in-sync with the proposed UIMA spec. I think the reason we don't have a way to retrieve objects from the base CAS is that our built-in annotation index, sorted by location within the Sofa, doesn't make sense if not segregated by Sofa. But other kinds of indexes may very well be useful for the entire CAS. Perhaps even bag indexes separated by type are useful. (Note we have suggested in the past that we might implement such bag indexes by default in a future version.) Sorry for the length of this post, I hope it is helpful. I think the summary from my point of view is that in light of the UIMA spec, views ought to be an optional way to interact with a CAS. If we want to make them required, we should take that up with the OASIS TC and see if we can get buy-in there, before we try to implement anything. -Adam