Marshall Schor wrote: > > Thilo Goetz wrote: >> See the Jira issue for the cause of the problem. More >> comments below. >> >> Marshall Schor wrote: >> >>> So, there may be 2 things to look at here - the actual error, described >>> above, and the more philosophical question on the behavior of moveTo - >>> this seems to require a sorting order if the item "moved to" is not >>> present in the index. Perhaps this needs to be documented better. And >>> >> I'm not sure I understand your point about moveTo(). It requires the >> index to be sorted to make any sense (and the BagIndex moveTo() is broken, >> but that's a different issue > Will you be fixing this too?
We enter the realm of philosophy again. What's the right behavior for moveTo() when the underlying index isn't sorted? In particular, what should happen when no proper element is found? The javadocs say: Note that any operation like find() or FSIterator.moveTo() will not produce useful results on bag indexes, since bag indexes do not honor comparators. Only use a bag index if you want very fast adding and will have to iterate over the whole index anyway. >> ). moveTo(fs) will position the iterator such >> that any element "to the left" is smaller than fs, and all elements at the >> moved-to position and "to the right" of it are greater than or equal to >> fs. It doesn't matter if the item "moved to" is in the index or not. >> Remember that equality here is defined with respect to the sort order of >> the index, it is not feature structure identity. > Yes, this is something that is unexpected (to me), and I did forget this. >> All this is documented, >> but maybe not as clearly as it could be. >> >> >>> what if no sorting order was defined for the set index? >>> >> Every set index has a sort order. > This is the part that seems confusing, because our docs say that set > indexes do not enforce ordering, and the common definition for Sets does Where did you find that? The javadocs say that set indexes are not guaranteed to be sorted. That's different from saying there's no ordering relation on the members. How else would we determine equality? Maybe we should remove this text, because at this time, set indexes are sorted, and that's not likely to change (I was thinking of hash based sets when I wrote that; still, you'll need a notion of equality, no matter how you implement your sets, yet they don't need to be sorted). > not have an ordering concept. Yet our docs say that the sort order for > sets is used to determine "equality" among candidates in the set: from > section 2.4.1.7: > > An index may define one or more /keys/. These keys determine the sort > order of the feature structures within a sorted index, and determine > equality for set indexes. That is incorrect. It should say "0 or more keys". Though if we should alert users to this fact if even UIMA developers have trouble with this is doubtful. > > Perhaps this should also say something about the use of the sort order > in "moveTo(fs)" for sets? In our current implementation, set indexes are sorted indexes without the duplicates (duplicates with respect to the ordering relation of that index, of course). If we commit to this and stop waffling about how set indexes may not be sorted, then we can just say that sorted and set indexes behave the same way. > >> If that sort order is empty, it means >> that all FSs are equal for that index. That in turn means that this >> index will contain at most 1 FS at any time. It also means that moveTo() >> will always position the iterator at that one element, if it exists. >> >> Did that help at all? >> > Yes, thanks for the clarifications. > > -Marshall >> --Thilo >> >> >> >>