Adam Lally wrote:
On 1/8/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
Here's a short proposal / straw-person that might address many of the
concerns raised.

1) Drop the JCas interface - only have the CAS interface.  (one less
interface; process(...) method doesn't need variants depending on kind
of API interface passed in being CAS / JCas).


Hmmm... I'm curious how it's possible to merge CAS and JCas,
especially if we need to do it without sacrificing any performance.
For example, JCas has somewhat expensive initialization where it tries
to load classes for each type in the type system.  We don't currently
pay that cost if JCas is never used.  But if JCas and CAS are merged,
how do we know if the user needs to have these classes loaded?  That's
only one particular issue.

I haven't worked out all the details, but this is how it might work. Let's assume the framework might know, somehow, if JCas was being used by a particular component in the chain of annotators running "locally" (not remote), in Java. Then the framework could do a getJCas() call before calling that component's process method, to load up the "generators". Otherwise, it could just never do the getJCas() call, and you would never pay the overhead penalty of loading the JCas cover classes.

   Of course, I haven't figured out how to see if a component is using
   the JCas...  That might
   take a bit of thought.   :-)  The hard thing to handle would be what
   the Java cover
   objects the generator(s) would produce - before JCas is set up ,
   these produce
   generic objects mostly, for feature structures.  After JCas is set
   up , these produce
the JCas cover objects.
   My guess is we'd have to have some new metadata indicating JCas was
   being used.
   This would move the metadata indication from where it is now
   (discoverable by
   looking at the argument type of the process method) to the component
   metadata
   (and making it explicit).


The actual JCas "new Xxx(aJCas)" constructor to make new instances could easily be made to work if you passed a CasImpl object the constructor - it could call getJCas() on that. This just does a one field dereference and you have the JCas. Or, if the framework knew, it could pass in the JCasImpl to the Process method (since it would implement the new CAS Interface) - this would avoid the single dereference (my guess from previous measurement attempts is that
the performance difference is not measurable...)


Implementation challenges aside, I have some mixed feelings about
this.  In some ways it makes things simpler, but in other ways it
seems like CAS and JCas are different ways of thinking and that we
shouldn't try to hide that.  Maybe this is another area where we could
benefit from asking some users what they think.

They are different approaches to creating and accessing CAS Feature Structures,
but in the end, they do the same kinds of things with Feature Structures.




2) Put all the user-facing methods for the CAS into the CAS interface
(this includes the view stuff and the sofa stuff).  Conceptually, this
interface contains all the methods needed by a user using the CAS.

3) Add a new interface called "CasViewSelector".  This interface has
just the 2-3 methods that select a view.  This interface is passed to
process methods.


Thilo mentioned a very similar idea I think.  This does solve the
issue of the "base CAS" with all its unsupported operations.  However
it has the drawback of not really matching the names that the UIMA
specification proposal uses.  The spec says that CASes are what's
passed between analytics and that CASes contain views.  We'd still be
calling a "CAS" what the spec calls a "View".  That just makes me
nervous about rushing to implement this.

Actually, forgetting about the spec for a second, what do we say in
our documentation is the thing that carries the analysis data between
annotators?  If we still say that's called a "CAS", and that the thing
that's serialized and sent between remote annotators is still a "CAS",
then this just doesn't seem consistent with this suggested naming of
interfaces.

What this proposal is doing is using the same word, CAS, for all of these
things.   CAS is what is passed.  CASes contain views, and you can get
one (or more) views.  Each view has things you can do - represented by
the CAS Interface.

-Marshall

Reply via email to