On 12/13/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:
Let's disregard the question of backward compatibility for a moment.  In
an ideal world, where we could design the APIs the way we wanted, what
would we do?


Let's consider what the proposed UIMA spec has to say.  First note
that the spec is centered around the XMI representation of the CAS.
Java APIs are considered a way to get at that information.

When the spec first introduces the CAS it says that a CAS is just a
collection of objects (i.e., FeatureStructures).  Views are not
introduced until much later, as a way to organize that collection of
objects.

Thinking about this from an XMI point of view, what you send to an
annotator (e.g., web service) is an XMI document with a list of
objects in it.  Ignoring the Java API for a moment, it's perfectly
valid for an annotator service to read and write these objects
directly.  No notion of a view is necessary.  This is considered a
critical enabler of interoperability with simple XML-based
implementations such as WebFountain.

Possible conclusion for the Java API:  We should provide a way to get
all of the FS from the CAS and add new FS to the CAS, as part of the
Base CAS API.  Note that we currently jump through hoops in XMI/XCAS
serialization because we don't have such access.



Now as for Views, a View is a subset of objects from the CAS.  A
special kind of view called an "Anchored View" is associated with a
Sofa, and has the constraint that all annoations in the Anchored View
refer to that Sofa.  This feature is important for multi-Sofa
applications that need to keep their annotations organized.  Here is
example XMI from the spec (pp.40-41):


 <ex:Quotation xmi:id="1"
   text="If we begin in certainties, we shall end with doubts; but if
we begin with doubts and are patient with them, we shall end in
certainties."
   author="Francis Bacon"/>

 <cas:SofaReference xmi:id="2" sofaObject="1" sofaFeature="text"/>

 <ex:Pronoun xmi:id="4" sofa="2" begin="3" end="5" lemma="6"/>
 <ex:Pronoun xmi:id="5" sofa="2" begin="29" end="31" lemma="6"/>
 <ex:Lemma xmi:id="6" base="I" person="1" number="p"/>

 <uima:AnchoredView sofa="2" members="3 4 5"/>

 <ex:Quotation xmi:id="7"
   text="The only limit to our realization of tomorrow will be our
doubts of today."
   author="Franklin D. Roosevelt"/>

 <cas:SofaReference xmi:id="8" sofaObject="7" sofaFeature="text"/>

 <ex:Pronoun xmi:id="9" sofa="8" begin="18" end="21" lemma="11"/>
 <ex:Pronoun xmi:id="10" sofa="8" begin="54" end="57" lemma="11"/>
 <ex:Lemma xmi:id="11" base="my" person="1" number="p"/>

 <cas:AnchoredView sofa="7" members="9 10 11"/>



Note that although this example doesn't show it, a FeatureStructure
that was not an annotation (did not link to a Sofa), could be shared
between the views.  Views are not disjoint subsets of the set of
objects in the CAS.


In our implementation, we have indexes.  These aren't part of the spec
at all.  They're considered an implementation feature, albeit an
important one.  We currently have indexes only over specific views.
There's are no indexes over the entire CAS.

This is part of the confusion.  I'm trying to say that objects live in
the CAS, not views; but since you can only access them through indexes
and indexes are only on views, it seems useless to create an object
off the base CAS.

Thilo advocates addressing this issue by eliminating the capability to
create objects off the base CAS.  But there is an alternative
solution, which is to add the ability to retrieve objects from the
base CAS.  The latter may be more in-sync with the proposed UIMA spec.

I think the reason we don't have a way to retrieve objects from the
base CAS is that our built-in annotation index, sorted by location
within the Sofa, doesn't make sense if not segregated by Sofa.  But
other kinds of indexes may very well be useful for the entire CAS.
Perhaps even bag indexes separated by type are useful.  (Note we have
suggested in the past that we might implement such bag indexes by
default in a future version.)

Sorry for the length of this post, I hope it is helpful.  I think the
summary from my point of view is that in light of the UIMA spec, views
ought to be an optional way to interact with a CAS.  If we want to
make them required, we should take that up with the OASIS TC and see
if we can get buy-in there, before we try to implement anything.

-Adam

Reply via email to