Re: Repeatability of results

2012-04-04 Thread Chris Hostetter
: OK this could make sense (floating point math is frustrating!). : : But, Lucene generally scores one document at a time, so in theory just : changing its docid shouldn't alter the order of float operations. i haven't thought this through, but couldn't scorer re-ordering in BooleanScorer2 poss

Re: Repeatability of results

2012-04-04 Thread Marvin Humphrey
On Wed, Apr 4, 2012 at 4:18 PM, Michael McCandless wrote: > On Wed, Apr 4, 2012 at 6:15 PM, Alan Bawden wrote: >> The key observation is that the differences in scores we see are always >> down around the sixth decimal place -- down where 32-bit floating point >> loses precision. >8 snip 8<-

Re: Repeatability of results

2012-04-04 Thread Michael McCandless
On Wed, Apr 4, 2012 at 6:15 PM, Alan Bawden wrote: > So I sat down to try to make a small test case that exhibited this > behavior, and while I was working on that I thought of a possible > explanation for what we are seeing.  If you agree that my explanation is > what's going on here, then Benson

Re: Repeatability of results

2012-04-04 Thread Alan Bawden
So I sat down to try to make a small test case that exhibited this behavior, and while I was working on that I thought of a possible explanation for what we are seeing. If you agree that my explanation is what's going on here, then Benson and I can stop working on making a test case, and move on t

Re: Repeatability of results

2012-04-02 Thread Benson Margulies
On Mon, Apr 2, 2012 at 5:33 PM, Michael McCandless wrote: > Hmm that's odd. > > If the scores were identical I'd expect different sort order, since we > tie-break by internal docID. > > But if the scores are different... the insertion order shouldn't > matter.  And, the score should not change as

Re: Repeatability of results

2012-04-02 Thread Michael McCandless
Hmm that's odd. If the scores were identical I'd expect different sort order, since we tie-break by internal docID. But if the scores are different... the insertion order shouldn't matter. And, the score should not change as a function of insertion order... Do you have a small test case? Mike

Repeatability of results

2012-04-02 Thread Benson Margulies
We've observed something that, in some ways, is not surprising. If you take a set of documents that are close in 'score' to some query, and shuffle them in different orders and then see what results you get in what order from the reference query, the scores will vary according to the insertio