This has been stewing for a long time, so it's time to get it out in
the open here.

Users have difficulty figuring out how to use FeatureStructures that
are not derived from annotations (and not intended to just be
subordinate objects referenced from annotations).  I have personally
had to help several who wanted to create such an FS, add it to the
CAS, and get it back out later, but couldn't figure out how to
proceed.  The answer of course is that they need to define a custom
index, even if they don't care about the sort order.  But instead, a
common workaround is to just make the type inherit from Annotation and
just ignore the begin, end features.  In practice that's a lot easier
than dealing with a custom index.

The most common use case for this is when the object is a "singleton",
for example a DocumentMetaData object, in which case there's another
not-so-nice solution: add features to DocumentAnnotation.  That
hinders interoperability, though, so it would be nice to give users
another, easy way to do this.  However, I think the issue is more
general than singletons, and could apply anytime the user wants to
just add and retrieve FS from the CAS without caring about their
ordering.

I think it's a weakness of UIMA that we make this so difficult to do,
and that we should try to improve this.

I'm open to whatever designs people can come up with to address this.

Eddie, Marshall, and I had a proposal quite some time ago that we were
never able to acheive consensus for.  Although, I'm not sure it was
100% understood what we were proposing at the time.

The basic idea is that CAS.addFsToIndexes(FS) and
IndexRepository.addFs(FS) should *always* add the FS to an index.  If
no appropriate index occurs we just create a bag index.  The FS can be
retrieved by using IndexRepository.getAllIndexedFS(Type).

The thinking is that if an annotator bothered to try to add something
to the indexes, there was a reason for it, and it's a whole lot better
to respect that than to just silently ignore it.

Note that this doesn't cause any loss in performance if an annotator
never adds an FS to the indexes.  We still support subordinate FS that
are linked off of other FS but never indexed.

I've heard of a rare case where users might have "optional" indexes.
The idea is that an annotator might call addFsToIndexes "just in case"
some downstream component might actually care about such an index.
Then when used in an application that doesn't require that index, the
index is not defined in the descriptor, making the addFsToIndexes call
a no-op.  That's the only case that would suffer a performance impact
as a result of implementing this proposal.  But I think this is easily
addressed via a configuration parameter (and, I think I heard in the
one annotator that did this, it is already replaced by a configuration
parameter anyway, since that provided even better performance than
having to check if the index exists for every FS that was created).

In summary I think this design has a lot of nice properties.  It makes
it very easy to add things to the CAS and get them out again, if you
don't care about order.  At a later point, if you start to care about
the sort order, you can go back and easily define a sorted index.
There's almost no effect of this proposal on existing code, excepting
the one rare case from the last paragraph which had its own
performance problems anyway and has an easy fix.

-Adam

Reply via email to