Marshall Schor wrote:
> 
> Thilo Goetz wrote:
>> Marshall Schor wrote:
>>   
>>> Thilo Goetz wrote:
>>>     
>>>> See the Jira issue for the cause of the problem.  More
>>>> comments below.
>>>>
>>>> Marshall Schor wrote:
>>>>   
>>>>       
>>>>> So, there may be 2 things to look at here - the actual error, described
>>>>> above, and the more philosophical question on the behavior of moveTo -
>>>>> this seems to require a sorting order if the item "moved to" is not
>>>>> present in the index.  Perhaps this needs to be documented better.  And
>>>>>     
>>>>>         
>>>> I'm not sure I understand your point about moveTo().  It requires the
>>>> index to be sorted to make any sense (and the BagIndex moveTo() is broken,
>>>> but that's a different issue
>>>>       
>>> Will you be fixing this too?
>>>     
>> We enter the realm of philosophy again.  What's the right
>> behavior for moveTo() when the underlying index isn't sorted?
>> In particular, what should happen when no proper element
>> is found?  The javadocs say:
>>
>> Note that any operation like find() or FSIterator.moveTo() will not produce
>> useful results on bag indexes, since bag indexes do not honor comparators. 
>> Only
>> use a bag index if you want very fast adding and will have to iterate over 
>> the
>> whole index anyway.
>>   
> I like systems where user errors are reported :-).  If find() and
> moveTo() don't work on bag indexes, I would prefer they throw an
> exception, perhaps like UnsupportedOperationException or our equivalent
> in UIMA.

Fine with me.

>>   
>>>> ).  moveTo(fs) will position the iterator such
>>>> that any element "to the left" is smaller than fs, and all elements at the
>>>> moved-to position and "to the right" of it are greater than or equal to
>>>> fs.  It doesn't matter if the item "moved to" is in the index or not.
>>>> Remember that equality here is defined with respect to the sort order of
>>>> the index, it is not feature structure identity.  
>>>>       
>>> Yes, this is something that is unexpected (to me), and I did forget this. 
>>>     
>>>> All this is documented,
>>>> but maybe not as clearly as it could be.
>>>>
>>>>   
>>>>       
>>>>> what if no sorting order was defined for the set index?
>>>>>     
>>>>>         
>>>> Every set index has a sort order.  
>>>>       
>>> This is the part that seems confusing, because our docs say that set
>>> indexes do not enforce ordering, and the common definition for Sets does
>>>     
>> Where did you find that? The javadocs say that set indexes are
>> not guaranteed to be sorted.  That's different from saying there's
>> no ordering relation on the members.  How else would we determine
>> equality?
>>   
> Just by testing the key values for equality, not for order.

Equality here is a notion derived from the partial order
defined on the index.  You could define equality separately,
but that would mean introducing a new notion into the index
definitions.  I don't think we want that, or at least I don't.

>> Maybe we should remove this text, because at this time, set indexes
>> are sorted, and that's not likely to change (I was thinking of hash
>> based sets when I wrote that; still, you'll need a notion of equality,
>> no matter how you implement your sets, yet they don't need to be
>> sorted).
>>
>>   
>>> not have an ordering concept.  Yet our docs say that the sort order for
>>> sets is used to determine "equality" among candidates in the set:  from
>>> section 2.4.1.7:
>>>
>>> An index may define one or more /keys/. These keys determine the sort
>>> order of the feature structures within a sorted index, and determine
>>> equality for set indexes.
>>>     
>> That is incorrect.  It should say "0 or more keys".  Though if we should
>> alert users to this fact if even UIMA developers have trouble with this
>> is doubtful.
>>
>>   
> I think some of our users could be better at remembering these details
> than I am :-)  I think this should be fixed - it's just a typo IMHO.
>>> Perhaps this should also say something about the use of the sort order
>>> in "moveTo(fs)" for sets?
>>>     
>> In our current implementation, set indexes are sorted indexes
>> without the duplicates (duplicates with respect to the ordering
>> relation of that index, of course).  If we commit to this and
>> stop waffling about how set indexes may not be sorted, then we
>> can just say that sorted and set indexes behave the same way.
>>   
> My preference is to keep the original definitions - leaving (perhaps
> unrealistically small) room for alternative implementations in the future.

Sure, but how do you propose we improve the documentation, then?

> 
> -Marshall
>>   
>>>> If that sort order is empty, it means
>>>> that all FSs are equal for that index.  That in turn means that this
>>>> index will contain at most 1 FS at any time.  It also means that moveTo()
>>>> will always position the iterator at that one element, if it exists.
>>>>
>>>> Did that help at all?
>>>>   
>>>>       
>>> Yes, thanks for the clarifications.
>>>
>>> -Marshall
>>>     
>>>> --Thilo
>>>>
>>>>
>>>>
>>>>   
>>>>       
>>
>>   

Reply via email to