Thilo Goetz wrote: > Marshall Schor wrote: > >> Thilo Goetz wrote: >> >>> See the Jira issue for the cause of the problem. More >>> comments below. >>> >>> Marshall Schor wrote: >>> >>> >>>> So, there may be 2 things to look at here - the actual error, described >>>> above, and the more philosophical question on the behavior of moveTo - >>>> this seems to require a sorting order if the item "moved to" is not >>>> present in the index. Perhaps this needs to be documented better. And >>>> >>>> >>> I'm not sure I understand your point about moveTo(). It requires the >>> index to be sorted to make any sense (and the BagIndex moveTo() is broken, >>> but that's a different issue >>> >> Will you be fixing this too? >> > > We enter the realm of philosophy again. What's the right > behavior for moveTo() when the underlying index isn't sorted? > In particular, what should happen when no proper element > is found? The javadocs say: > > Note that any operation like find() or FSIterator.moveTo() will not produce > useful results on bag indexes, since bag indexes do not honor comparators. > Only > use a bag index if you want very fast adding and will have to iterate over the > whole index anyway. > I like systems where user errors are reported :-). If find() and moveTo() don't work on bag indexes, I would prefer they throw an exception, perhaps like UnsupportedOperationException or our equivalent in UIMA. > >>> ). moveTo(fs) will position the iterator such >>> that any element "to the left" is smaller than fs, and all elements at the >>> moved-to position and "to the right" of it are greater than or equal to >>> fs. It doesn't matter if the item "moved to" is in the index or not. >>> Remember that equality here is defined with respect to the sort order of >>> the index, it is not feature structure identity. >>> >> Yes, this is something that is unexpected (to me), and I did forget this. >> >>> All this is documented, >>> but maybe not as clearly as it could be. >>> >>> >>> >>>> what if no sorting order was defined for the set index? >>>> >>>> >>> Every set index has a sort order. >>> >> This is the part that seems confusing, because our docs say that set >> indexes do not enforce ordering, and the common definition for Sets does >> > > Where did you find that? The javadocs say that set indexes are > not guaranteed to be sorted. That's different from saying there's > no ordering relation on the members. How else would we determine > equality? > Just by testing the key values for equality, not for order. > Maybe we should remove this text, because at this time, set indexes > are sorted, and that's not likely to change (I was thinking of hash > based sets when I wrote that; still, you'll need a notion of equality, > no matter how you implement your sets, yet they don't need to be > sorted). > > >> not have an ordering concept. Yet our docs say that the sort order for >> sets is used to determine "equality" among candidates in the set: from >> section 2.4.1.7: >> >> An index may define one or more /keys/. These keys determine the sort >> order of the feature structures within a sorted index, and determine >> equality for set indexes. >> > > That is incorrect. It should say "0 or more keys". Though if we should > alert users to this fact if even UIMA developers have trouble with this > is doubtful. > > I think some of our users could be better at remembering these details than I am :-) I think this should be fixed - it's just a typo IMHO. >> Perhaps this should also say something about the use of the sort order >> in "moveTo(fs)" for sets? >> > > In our current implementation, set indexes are sorted indexes > without the duplicates (duplicates with respect to the ordering > relation of that index, of course). If we commit to this and > stop waffling about how set indexes may not be sorted, then we > can just say that sorted and set indexes behave the same way. > My preference is to keep the original definitions - leaving (perhaps unrealistically small) room for alternative implementations in the future.
-Marshall > >>> If that sort order is empty, it means >>> that all FSs are equal for that index. That in turn means that this >>> index will contain at most 1 FS at any time. It also means that moveTo() >>> will always position the iterator at that one element, if it exists. >>> >>> Did that help at all? >>> >>> >> Yes, thanks for the clarifications. >> >> -Marshall >> >>> --Thilo >>> >>> >>> >>> >>> > > >