Thilo Goetz wrote:
> Marshall Schor wrote:
>   
>> Thilo Goetz wrote:
>>     
>>> Marshall Schor wrote:
>>>   
>>>       
>>>> Thilo Goetz wrote:
>>>>     
>>>>         
>>>>> Adam Lally wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor<m...@schor.com> wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Adam Lally wrote:
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann<kottm...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> Marshall Schor wrote:
>>>>>>>>>
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>>> The generification of FSIndex currently specifies one type, <T 
>>>>>>>>>> extends
>>>>>>>>>> FeatureStructure> that is the type of item being returned.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The contains and find methods have arguments of type 
>>>>>>>>>> FeatureStructure.
>>>>>>>>>> These could be changed to take type "T".
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>>> No I do not think that they could be changed to take type T.
>>>>>>>>> Lets take the case of the contains method.
>>>>>>>>> The javadoc says:
>>>>>>>>> "Check if the index contains an element equal to the given feature 
>>>>>>>>> structure
>>>>>>>>> according to the
>>>>>>>>> ordering of the index. Note that this is in general not the same as 
>>>>>>>>> feature
>>>>>>>>> structure identity."
>>>>>>>>> and it for the param fs it says "The FS we're looking for.". There is 
>>>>>>>>> no
>>>>>>>>> place where
>>>>>>>>> it says that contains can only be called for FSes which have the type 
>>>>>>>>> of the
>>>>>>>>> index.
>>>>>>>>>
>>>>>>>>> The change of the parameter from FeatureStructure to T would also 
>>>>>>>>> change
>>>>>>>>> the contract of the method a little, because then it would not be 
>>>>>>>>> possible
>>>>>>>>> anymore
>>>>>>>>> to pass a FeatureStructure which has not type T.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>>>>>>>> pass an FS of a Type other than the one that the index was defined
>>>>>>>> over, as part of implementing something like a subiterator.
>>>>>>>>
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>> I agree with the case where it's the "bag" index being used, because
>>>>>>> that uses a test which works on all feature structures.
>>>>>>>
>>>>>>> However, for Set and Sorted, the implication of passing a FS which is
>>>>>>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>>>>>>> This is because the code assumes the layout of the features and their
>>>>>>> values is appropriate for the type.  In other words, if the type of some
>>>>>>> key was a string, it might take a value from the main int heap and use
>>>>>>> it as an index into the string array - and if the int heap object was
>>>>>>> not the right type, it could pull an arbitrary value from that slot, and
>>>>>>> end up throwing an array index out of bounds exception.  When I looked,
>>>>>>> it didn't appear to me that the code checked for any kind of type
>>>>>>> subsumption before proceeding... (but I may have missed it...)
>>>>>>>
>>>>>>> It could turn out that the data (whatever is being pulled) would just
>>>>>>> happen to "match", even though the types are different.
>>>>>>>
>>>>>>> Because this is an undefined operation that could throw various kinds of
>>>>>>> runtime errors, or return an equal match where none really exists, I
>>>>>>> think it should not be allowed, for set and sorted indexes.
>>>>>>>
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> In the case of the AnnotationIndex, the object that you pass to
>>>>>> FSIterator.moveTo must be a subtype of Annotation (else you would get
>>>>>> all the weird effects that you describe).  But it is still valid for a
>>>>>> user to do:
>>>>>> FSIndex<AnnotType1> index = cas.getAnnotationIndex(annotType1);
>>>>>> index.moveTo(annotType2);
>>>>>>
>>>>>> where annotType1 and annotType2 are both subtypes of 
>>>>>> uima.tcas.Annotation.
>>>>>>
>>>>>> In general, the object that you pass to moveTo() must be a subtype of
>>>>>> the type that was in the index definition (in the user's descriptor,
>>>>>> or for the case of the built-in AnnotationIndex,
>>>>>> uima.tcas.Annotation).
>>>>>>
>>>>>>   -Adam
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> One concrete example of Adam's point: suppose you have a sentenceFS
>>>>> and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
>>>>> position the token iterator at the first word in the sentence (modulo
>>>>> some subtleties that are beside the point here).  Very useful.
>>>>>   
>>>>>       
>>>>>           
>>>> Yes.  This works because the "index" being iterated over is the general
>>>> annotation index (the one that's built-it) - and the presumeption is
>>>> that "token" and "sentenceFS" are both subtypes of AnnotationFS.
>>>>
>>>> Is there any reason *not* to add a check to the various methods in
>>>> indexing that take one or more Feature Structures, to see if they are
>>>> being passed a subtype of the type being indexed (except for bag indexes)?
>>>>     
>>>>         
>>> Sure.  Suppose B < A, and we have an index on B, but the ordering
>>> relation is defined in terms of features that B inherits from
>>>            A.
>>> Then I can use an A to position iterators on the B index.  This
>>> works now, and I don't see why we should prohibit it.
>>>   
>>>       
>> Right, I knew I was missing some detail.  So the test for validity is
>> not a simple subsumes test, but rather a check to see if the feature
>> structure slots used in the keys used for testing by the index are in
>> the same position in the feature structure layout as the type being
>> passed in.  Correct? 
>>     
>
> Essentially, yes.
>
>   
>> If so, is there any reason not to add this as a check?
>>     
>
> No reason other than that the relevant information is
> thrown away after the index has been generated, and it
> would be a medium size effort to implement.  I haven't
> looked at this in a while, but the distributed nature
> of the index implementation (one index object for every
> type subsumed by the index type) is not going to make
> this any easier.  Or maybe they share that part, I don't
> recall.
>   

OK - sounds like this won't be on the fix list for 2.3.0 ;-) 

And, given the value of being able to do this in selected cases, it
doesn't sound right, either, to change the arguments to find, moveTo,
etc from FeatureStructure.
So let's leave this as is. 

Thanks for the detailed, clarifying, discussions :-) .

-Marshall

Reply via email to