On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote: > Also... if lucene is already capable of sorting on multi-valued field by > choosing the largest value.... largest vs. smallest is presumably just > arbitrary there, there is presumably no performance implication to choosing > the smallest instead of the largest. It just chooses the largest, according > to Yonik.
It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com