On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote:
> Also... if lucene is already capable of sorting on multi-valued field by
> choosing the largest value.... largest vs. smallest is presumably just
> arbitrary there, there is presumably no performance implication to choosing
> the smallest instead of the largest. It just chooses the largest, according
> to Yonik.

It's a little more complicated than that.
It's not so much an explicit feature in lucene, but just what
naturally happens when building the field cache via uninverting an
indexed field.

It's pretty much this:

for every term in the field:
  for every document that matches that term:
    value[document] = term

And since terms are iterated from smallest to largest (and no, you
can't reverse this)
larger values end up overwriting smaller values.
There's no simple patch to pick the smallest rather than the largest.

In the past, lucene used to try and detect this multi-valued case by
checking the number of values set in the whole array.  This was
unreliable though and the check was discarded.

-Yonik
http://lucidimagination.com

Reply via email to