More later (time to take stock of where we are again, I think...) but first:
On 12/27/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:
Adam Lally wrote: > On 12/22/06, Marshall Schor <[EMAIL PROTECTED]> wrote: <snip> >> > Also, we have some uses of non-annotation indexes that are segregated >> > by Sofa (say, a Lemma index that's particular to a Sofa, where there's >> > actually no explicit link from the Lemma to the Sofa). A filtering >> > approach wouldn't work there, >> It could be made to work by adding a feature to the Lemma type which was >> a sofa reference. But maybe that's asking too much of the user? > > I'm not sure what is right here... this is a reasonable idea. But I > think in the absence of a clear sense of what is best I lean towards > staying closer to where were currently are, which is to have view > where the user explicitly decides which view to index things in. The whole point of those views, I thought, was to be able to segregate the data. So if you want lemmas for a certain view to be separate from the lemmas for different views, you should be able to achieve that with a lemma index that is specific to that view.
So you're agreeing with me, I think. (I'm the "> >" and the "> >> >" :)
If you want to share lemmas from two views, share the index between the views. That's my mental model of how things should work. I like this better than adding sofa references for the following reasons: a) more space efficient, as there's not extra sofa references b) more time efficient, as you don't need to check the sofa references at indexing time c) no more complicated, as the user needs to reference something, the view or the sofa. This is how I would have done annotations as well. Maybe there are considerations that I'm not aware of, but I see no benefit to each annotation knowing what sofa it references.
Well, I think the main reason we did this is so that we could implement Annotation.getCoveredText(). Also we have the use case where we're doing translation and we have Annotations in the translated text that point back to corresponding Annotations in the original text. So if you're walking the Annotation index in the translated text and follow references that get you to another Annotation, how are you supposed to know which Sofa the Annotation you're looking at is supposed to be annotating? To me, just looking at this from a data modeling perspective, the purpose of an Annotation is to indicate some span of text, so it makes sense to model it with a reference to that text. But I suppose other interpretations are possible.
Of course that would make a view-less approach from the global CAS that much harder...
Impossible, I think. We need to answer the question: are views a fundamental way of interacting with the CAS (*any* CAS implementation now or in the future, including raw XML manipulation) or not? The UIMA spec proposal says not, and there's at least one vocal proponent of that approach (Dan Gruhl). We of course could decide to become vocal proponents of the other approach, but it's not just ourselves we need to convince. -Adam