Marshall Schor wrote: > > Thilo Goetz wrote: >> Marshall Schor wrote: >> >>> Thilo Goetz wrote: >>> >>>> See the Jira issue for the cause of the problem. More >>>> comments below. >>>> >>>> Marshall Schor wrote: >>>> >>>> >>>>> So, there may be 2 things to look at here - the actual error, described >>>>> above, and the more philosophical question on the behavior of moveTo - >>>>> this seems to require a sorting order if the item "moved to" is not >>>>> present in the index. Perhaps this needs to be documented better. And >>>>> >>>>> >>>> I'm not sure I understand your point about moveTo(). It requires the >>>> index to be sorted to make any sense (and the BagIndex moveTo() is broken, >>>> but that's a different issue >>>> >>> Will you be fixing this too? >>> >> We enter the realm of philosophy again. What's the right >> behavior for moveTo() when the underlying index isn't sorted? >> In particular, what should happen when no proper element >> is found? The javadocs say: >> >> Note that any operation like find() or FSIterator.moveTo() will not produce >> useful results on bag indexes, since bag indexes do not honor comparators. >> Only >> use a bag index if you want very fast adding and will have to iterate over >> the >> whole index anyway. >> > I like systems where user errors are reported :-). If find() and > moveTo() don't work on bag indexes, I would prefer they throw an > exception, perhaps like UnsupportedOperationException or our equivalent > in UIMA.
Fine with me. >> >>>> ). moveTo(fs) will position the iterator such >>>> that any element "to the left" is smaller than fs, and all elements at the >>>> moved-to position and "to the right" of it are greater than or equal to >>>> fs. It doesn't matter if the item "moved to" is in the index or not. >>>> Remember that equality here is defined with respect to the sort order of >>>> the index, it is not feature structure identity. >>>> >>> Yes, this is something that is unexpected (to me), and I did forget this. >>> >>>> All this is documented, >>>> but maybe not as clearly as it could be. >>>> >>>> >>>> >>>>> what if no sorting order was defined for the set index? >>>>> >>>>> >>>> Every set index has a sort order. >>>> >>> This is the part that seems confusing, because our docs say that set >>> indexes do not enforce ordering, and the common definition for Sets does >>> >> Where did you find that? The javadocs say that set indexes are >> not guaranteed to be sorted. That's different from saying there's >> no ordering relation on the members. How else would we determine >> equality? >> > Just by testing the key values for equality, not for order. Equality here is a notion derived from the partial order defined on the index. You could define equality separately, but that would mean introducing a new notion into the index definitions. I don't think we want that, or at least I don't. >> Maybe we should remove this text, because at this time, set indexes >> are sorted, and that's not likely to change (I was thinking of hash >> based sets when I wrote that; still, you'll need a notion of equality, >> no matter how you implement your sets, yet they don't need to be >> sorted). >> >> >>> not have an ordering concept. Yet our docs say that the sort order for >>> sets is used to determine "equality" among candidates in the set: from >>> section 2.4.1.7: >>> >>> An index may define one or more /keys/. These keys determine the sort >>> order of the feature structures within a sorted index, and determine >>> equality for set indexes. >>> >> That is incorrect. It should say "0 or more keys". Though if we should >> alert users to this fact if even UIMA developers have trouble with this >> is doubtful. >> >> > I think some of our users could be better at remembering these details > than I am :-) I think this should be fixed - it's just a typo IMHO. >>> Perhaps this should also say something about the use of the sort order >>> in "moveTo(fs)" for sets? >>> >> In our current implementation, set indexes are sorted indexes >> without the duplicates (duplicates with respect to the ordering >> relation of that index, of course). If we commit to this and >> stop waffling about how set indexes may not be sorted, then we >> can just say that sorted and set indexes behave the same way. >> > My preference is to keep the original definitions - leaving (perhaps > unrealistically small) room for alternative implementations in the future. Sure, but how do you propose we improve the documentation, then? > > -Marshall >> >>>> If that sort order is empty, it means >>>> that all FSs are equal for that index. That in turn means that this >>>> index will contain at most 1 FS at any time. It also means that moveTo() >>>> will always position the iterator at that one element, if it exists. >>>> >>>> Did that help at all? >>>> >>>> >>> Yes, thanks for the clarifications. >>> >>> -Marshall >>> >>>> --Thilo >>>> >>>> >>>> >>>> >>>> >> >>