Marshall Schor wrote:
> 
> Thilo Goetz wrote:
>> See the Jira issue for the cause of the problem.  More
>> comments below.
>>
>> Marshall Schor wrote:
>>   
>>> So, there may be 2 things to look at here - the actual error, described
>>> above, and the more philosophical question on the behavior of moveTo -
>>> this seems to require a sorting order if the item "moved to" is not
>>> present in the index.  Perhaps this needs to be documented better.  And
>>>     
>> I'm not sure I understand your point about moveTo().  It requires the
>> index to be sorted to make any sense (and the BagIndex moveTo() is broken,
>> but that's a different issue
> Will you be fixing this too?

We enter the realm of philosophy again.  What's the right
behavior for moveTo() when the underlying index isn't sorted?
In particular, what should happen when no proper element
is found?  The javadocs say:

Note that any operation like find() or FSIterator.moveTo() will not produce
useful results on bag indexes, since bag indexes do not honor comparators. Only
use a bag index if you want very fast adding and will have to iterate over the
whole index anyway.

>> ).  moveTo(fs) will position the iterator such
>> that any element "to the left" is smaller than fs, and all elements at the
>> moved-to position and "to the right" of it are greater than or equal to
>> fs.  It doesn't matter if the item "moved to" is in the index or not.
>> Remember that equality here is defined with respect to the sort order of
>> the index, it is not feature structure identity.  
> Yes, this is something that is unexpected (to me), and I did forget this. 
>> All this is documented,
>> but maybe not as clearly as it could be.
>>
>>   
>>> what if no sorting order was defined for the set index?
>>>     
>> Every set index has a sort order.  
> This is the part that seems confusing, because our docs say that set
> indexes do not enforce ordering, and the common definition for Sets does

Where did you find that?  The javadocs say that set indexes are
not guaranteed to be sorted.  That's different from saying there's
no ordering relation on the members.  How else would we determine
equality?

Maybe we should remove this text, because at this time, set indexes
are sorted, and that's not likely to change (I was thinking of hash
based sets when I wrote that; still, you'll need a notion of equality,
no matter how you implement your sets, yet they don't need to be
sorted).

> not have an ordering concept.  Yet our docs say that the sort order for
> sets is used to determine "equality" among candidates in the set:  from
> section 2.4.1.7:
> 
> An index may define one or more /keys/. These keys determine the sort
> order of the feature structures within a sorted index, and determine
> equality for set indexes.

That is incorrect.  It should say "0 or more keys".  Though if we should
alert users to this fact if even UIMA developers have trouble with this
is doubtful.

> 
> Perhaps this should also say something about the use of the sort order
> in "moveTo(fs)" for sets?

In our current implementation, set indexes are sorted indexes
without the duplicates (duplicates with respect to the ordering
relation of that index, of course).  If we commit to this and
stop waffling about how set indexes may not be sorted, then we
can just say that sorted and set indexes behave the same way.

> 
>> If that sort order is empty, it means
>> that all FSs are equal for that index.  That in turn means that this
>> index will contain at most 1 FS at any time.  It also means that moveTo()
>> will always position the iterator at that one element, if it exists.
>>
>> Did that help at all?
>>   
> Yes, thanks for the clarifications.
> 
> -Marshall
>> --Thilo
>>
>>
>>
>>   

Reply via email to