Re: CAS Views and Sofas simplification

Adam Lally Thu, 28 Dec 2006 07:43:07 -0800

More later (time to take stock of where we are again, I think...) but first:


On 12/27/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:

Adam Lally wrote:
> On 12/22/06, Marshall Schor <[EMAIL PROTECTED]> wrote:
<snip>
>> > Also, we have some uses of non-annotation indexes that are segregated
>> > by Sofa (say, a Lemma index that's particular to a Sofa, where there's
>> > actually no explicit link from the Lemma to the Sofa).  A filtering
>> > approach wouldn't work there,
>> It could be made to work by adding a feature to the Lemma type which was
>> a sofa reference.  But maybe that's asking too much of the user?
>
> I'm not sure what is right here... this is a reasonable idea.  But I
> think in the absence of a clear sense of what is best I lean towards
> staying closer to where were currently are, which is to have view
> where the user explicitly decides which view to index things in.

The whole point of those views, I thought, was to be able to segregate
the data.  So if you want lemmas for a certain view to be separate from
the lemmas for different views, you should be able to achieve that with
a lemma index that is specific to that view.


So you're agreeing with me, I think. (I'm the "> >" and the "> >> >" :)

If you want to share
lemmas from two views, share the index between the views.  That's my
mental model of how things should work.  I like this better than adding
sofa references for the following reasons:

a) more space efficient, as there's not extra sofa references
b) more time efficient, as you don't need to check the sofa references
at indexing time
c) no more complicated, as the user needs to reference something, the
view or the sofa.

This is how I would have done annotations as well.  Maybe there are
considerations that I'm not aware of, but I see no benefit to each
annotation knowing what sofa it references.


Well, I think the main reason we did this is so that we could
implement Annotation.getCoveredText().

Also we have the use case where we're doing translation and we have
Annotations in the translated text that point back to corresponding
Annotations in the original text.  So if you're walking the Annotation
index in the translated text and follow references that get you to
another Annotation, how are you supposed to know which Sofa the
Annotation you're looking at is supposed to be annotating?

To me, just looking at this from a data modeling perspective, the
purpose of an Annotation is to indicate some span of text, so it makes
sense to model it with a reference to that text.  But I suppose other
interpretations are possible.

Of course that would make a view-less approach from the global CAS that
much harder...


Impossible, I think.  We need to answer the question: are views a
fundamental way of interacting with the CAS (*any* CAS implementation
now or in the future, including raw XML manipulation) or not?

The UIMA spec proposal says not, and there's at least one vocal
proponent of that approach (Dan Gruhl).  We of course could decide to
become vocal proponents of the other approach, but it's not just
ourselves we need to convince.

-Adam

Re: CAS Views and Sofas simplification

Reply via email to