Thilo Goetz wrote:
> Marshall Schor wrote:
>   
>> Thilo Goetz wrote:
>>     
>>> See the Jira issue for the cause of the problem.  More
>>> comments below.
>>>
>>> Marshall Schor wrote:
>>>   
>>>       
>>>> So, there may be 2 things to look at here - the actual error, described
>>>> above, and the more philosophical question on the behavior of moveTo -
>>>> this seems to require a sorting order if the item "moved to" is not
>>>> present in the index.  Perhaps this needs to be documented better.  And
>>>>     
>>>>         
>>> I'm not sure I understand your point about moveTo().  It requires the
>>> index to be sorted to make any sense (and the BagIndex moveTo() is broken,
>>> but that's a different issue
>>>       
>> Will you be fixing this too?
>>     
>
> We enter the realm of philosophy again.  What's the right
> behavior for moveTo() when the underlying index isn't sorted?
> In particular, what should happen when no proper element
> is found?  The javadocs say:
>
> Note that any operation like find() or FSIterator.moveTo() will not produce
> useful results on bag indexes, since bag indexes do not honor comparators. 
> Only
> use a bag index if you want very fast adding and will have to iterate over the
> whole index anyway.
>   
I like systems where user errors are reported :-).  If find() and
moveTo() don't work on bag indexes, I would prefer they throw an
exception, perhaps like UnsupportedOperationException or our equivalent
in UIMA.
>   
>>> ).  moveTo(fs) will position the iterator such
>>> that any element "to the left" is smaller than fs, and all elements at the
>>> moved-to position and "to the right" of it are greater than or equal to
>>> fs.  It doesn't matter if the item "moved to" is in the index or not.
>>> Remember that equality here is defined with respect to the sort order of
>>> the index, it is not feature structure identity.  
>>>       
>> Yes, this is something that is unexpected (to me), and I did forget this. 
>>     
>>> All this is documented,
>>> but maybe not as clearly as it could be.
>>>
>>>   
>>>       
>>>> what if no sorting order was defined for the set index?
>>>>     
>>>>         
>>> Every set index has a sort order.  
>>>       
>> This is the part that seems confusing, because our docs say that set
>> indexes do not enforce ordering, and the common definition for Sets does
>>     
>
> Where did you find that? The javadocs say that set indexes are
> not guaranteed to be sorted.  That's different from saying there's
> no ordering relation on the members.  How else would we determine
> equality?
>   
Just by testing the key values for equality, not for order.
> Maybe we should remove this text, because at this time, set indexes
> are sorted, and that's not likely to change (I was thinking of hash
> based sets when I wrote that; still, you'll need a notion of equality,
> no matter how you implement your sets, yet they don't need to be
> sorted).
>
>   
>> not have an ordering concept.  Yet our docs say that the sort order for
>> sets is used to determine "equality" among candidates in the set:  from
>> section 2.4.1.7:
>>
>> An index may define one or more /keys/. These keys determine the sort
>> order of the feature structures within a sorted index, and determine
>> equality for set indexes.
>>     
>
> That is incorrect.  It should say "0 or more keys".  Though if we should
> alert users to this fact if even UIMA developers have trouble with this
> is doubtful.
>
>   
I think some of our users could be better at remembering these details
than I am :-)  I think this should be fixed - it's just a typo IMHO.
>> Perhaps this should also say something about the use of the sort order
>> in "moveTo(fs)" for sets?
>>     
>
> In our current implementation, set indexes are sorted indexes
> without the duplicates (duplicates with respect to the ordering
> relation of that index, of course).  If we commit to this and
> stop waffling about how set indexes may not be sorted, then we
> can just say that sorted and set indexes behave the same way.
>   
My preference is to keep the original definitions - leaving (perhaps
unrealistically small) room for alternative implementations in the future.

-Marshall
>   
>>> If that sort order is empty, it means
>>> that all FSs are equal for that index.  That in turn means that this
>>> index will contain at most 1 FS at any time.  It also means that moveTo()
>>> will always position the iterator at that one element, if it exists.
>>>
>>> Did that help at all?
>>>   
>>>       
>> Yes, thanks for the clarifications.
>>
>> -Marshall
>>     
>>> --Thilo
>>>
>>>
>>>
>>>   
>>>       
>
>
>   

Reply via email to