On 5/10/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
While working on the class-loader switching code, we have revisited an
issue with the way JCas objects work with respect to views.

Currently, for each view, there is a separate set of xxx_Type objects, a
separate set of "cached" cover objects (which are identical to other
view's objects, except that their _Type ref points to the instance for
this view).

This is there only (as far as we can see) to support
aJCasObject.addToIndexes() and removeFromIndexes() which uses this
information to pick the right "view" to use (remember that indexes are
held per view).

Besides inefficiency (replication of objects per view), there is another
side-effect.  JCas Objects can, themselves, be extended by users to hold
additional information, other than what's in the CAS.  The current
design would create new versions of these objects per view, so that
iterators over different views would get different instances.  So
information set into one JCas object in one view would not be "visible"
to instances obtained by iterating using a different view's index.  This
could be a documented "feature", or it could be a "bug".

Because current users seem to often use the aJCasObject.addToIndexes()
method, I want to retain that method, while getting the efficiencies and
fixing the "bug" (if we consider it a bug) above.  To do this, we could
make this work as before *for sofa-unaware annotators, only* as
follows:  Change the impl of addToIndexes and removeFromIndexes to
reference the "current-view".


It makes me very nervous to put in changes that intentionally break
compatibility with existing annotators.  One of UIMA's main goals is
to make it easier to integrate analytics into applications.  We don't
want application developers to be concerned that if they update their
UIMA version, suddently components that used to work will no longer
work.  Sometimes that means we're stuck with a suboptimal design for
something, but that's life.  If we *really* want to stop supporting
this method, deprecate it first and then wait a few years for the 3.0
release to come out, and think about doing it then. ;)

It seems the argument here is that there aren't very many multi-sofa
annotators, so breaking them (the ones that use JCas anyway) is not
that big a deal.  I'm just not sure how to judge that.

With that general comment out of the way, let me consider this
specific issue.  I think the code that would break is this:

JCas someView = baseJCas.getView(name);
MyAnnotation annot = new MyAnnotation(someView, begin, end);
annot.addToIndexes();

It would no longer add the annotation to someView.  Instead it would
try to add it to the current view (which I think is the "initial view"
in this case).

But wait.. for types derived from annotations, we already know what
view it should be indexed in.  Just follow the annotation's Sofa
reference and you will find the right view.  It's not valid to index
it in any other view, and in fact that results in an exception.

So instead of indexing in the "current view" (which might fail), for
annotations you could always index in the "correct view". :)  This
should not break any annotators.

That leaves non-annotation types (and only those with custom indexes
defined for them matter).  For those we could:

(a) Decide we don't care about breaking this.  At this point the
number of affected annotators might be zero, but we can't be sure.  I
still don't like it on general principles.

(b) Optimize what we can get away with.  Only create extra _Type
objects for non-annotation types for which custom indexes are defined.
That has the downside of probably making the JCas code ugly and more
prone to bugs.

Next thought... what if instead of the separate _Type objects, you
maintained in the JCasImpl a map from JCas object to "home view".  As
above you only need to do this for non-annotation objects that have
indexes defined for them (AND if the home view is not the initial
view, since that could be the default).

That has different performance characteristics which may or may not
necessarily be better (it depends on how many instances get created I
think), but maybe it is a cleaner way to stay compatible.

Hope that helped,
 -Adam

Reply via email to