Re: DocumentAnnotation and type-merging

Marshall Schor Tue, 19 Dec 2006 11:08:28 -0800

Adam Lally wrote:

On 12/18/06, Marshall Schor <[EMAIL PROTECTED]> wrote:

I think what we have is annotator A might have a version of types for it
(T/A) and
annotator B might have a version of types for it (T/B).  The "assembly"
of A and B
has a process whereby T/A and T/B are "merged", and a new T/A&B is
created.  This seems different form the Xerces example.  It's not
ncesessarily a "newer"
versus "older" thing, for UIMA assemblies.


To me it's very similar, the difference being that in most cases it
would be easier to merge T/A and T/B than it would be the merge Xerces
versions X and Y.  But in both cases there are additional options
other than merging:
1) If one version is newer than another and backwards compatible then
you can just use the newer one.  I think this can be common in UIMA,
where someone has extended an existing type.

True, but I think many times the extension is not
backward compatible (e.g. multiple
components extending the DocumentAnnotation differently).

2) You can use separate ClassLoaders for the two components so that
they can each use their own version of the class.  This would also
mean that they can't share the JCAS cover objects directly, but would
each need their own instance.  But that seems doable, and perhaps
simpler than the automatic merge and more "normal" in the Java world.

Interesting... But this would reduce the effect of an optimization wehave now when

running multiple components in the same JVM  (the JCas class instances are
reused).  Each having their own instance would imply more Java objects.

I agree this is a good goal.  It seems achievable with some "automation"
introduced
into the assembly step.
> This seems particularly important for applications that host arbitrary
> UIMA analytics.  End users want to grab the latest, greatest annotator
> and drop it in.  This should work smoothly, or UIMA isn't meeting one
> of its most important goals.
I agree.  Perhaps we should figure out what (if anything) is inhibiting
this,
and see if it can be be addressed.  One concept might be to require JCas

source/class files to be packaged in a particular way, and to improvethe"merge" logic to cover more cases (and report on the cases where itfails

and a "manual" merge step might be needed).  I think in most pragmatic
cases it will work fine "automatically".


What about the following adjustment to my original proposal:

We explicitly document that JCAS and "feature extension" are not
compatible in the *current* version of UIMA, but that we may figure
out some way to make them compatible in the future.

I think this might be too strong a statement. If you're willing to runJCasGenand compile the results as part of an assembly process, it seems thereis a way to

make JCas compatible up to a point with feature extension, using the
fact that JCasGen will merge sources to handle keeping hand-done
customizations (in most cases).

How about: We explicitly document that if "feature extension" is used with
JCas, you need to

(a) package type systems and the JCas generated classes
separately from other packagings, and

(b) be willing to re-run JCasGen when
your type package is combined with others, and

(c) if hand-modifications are
done to the generated JCas classes, then that version of the source must be

available when running JCasGen and the "merge" option must be used.Limitation: if multiple versions of the JCasGen generated sources wereindependentlyhand-modified, then the assembler has to hand-merge the resulting sourcefor those

types to pick up all the hand-modifications into the result.

-Marshall

Re: DocumentAnnotation and type-merging

Reply via email to