Re: Same index is ranking differently on 2 machines

Allistair Crossley Wed, 09 Mar 2011 13:48:33 -0800

That's what I think, glad I am not going mad.

I've spent 1/2 a day comparing the config files, checking out from SVN again 
and ensuring the databases are identical. I cannot see what else I can do to 
make them equivalent. Both servers checkout directly from SVN, I am convinced 
the files are the same. The database is definately the same.


Not sure what you mean about having identical indices - that's my problem - I 
don't - or do you mean something else I've missed? But yes everything else you 
mention is identical, I am as certain as I can be. 

I too think there must be a difference I have missed but I have run out of 
ideas for what to check!

Frustrating :)

On Mar 9, 2011, at 4:38 PM, Jonathan Rochkind wrote:

> Yes, but the identical index with the identical solrconfig.xml and the 
> identical query and the identical version of Solr on two different machines 
> should preduce identical results.
> 
> So it's a legitimate question why it's not.  But perhaps queryNorm isn't 
> enough to answer that. Sorry, it's out of my league to try and figure out it 
> out.
> 
> But are you absolutely sure you have identical indexes, identical 
> solrconfig.xml, identical queries, and identical versions of Solr and any 
> other installed Java libraries... on both machines?  One of these being 
> different seems more likely than a bug in Solr, although that's possible.
> 
> On 3/9/2011 4:34 PM, Jayendra Patil wrote:
>> queryNorm is just a normalizing factor and is the same value across
>> all the results for a query, to just make the scores comparable.
>> So even if it varies in different environment, you should not worried about.
>> 
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
>> -
>> Defination - queryNorm(q) is just a normalizing factor used to make
>> scores between queries comparable. This factor does not affect
>> document ranking (since all ranked documents are multiplied by the
>> same factor), but rather just attempts to make scores from different
>> queries (or even different indexes) comparable
>> 
>> Regards,
>> Jayendra
>> 
>> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley<a...@roxxor.co.uk>  wrote:
>>> Hi,
>>> 
>>> I am seeing an issue I do not understand and hope that someone can shed 
>>> some light on this. The issue is that for a particular search we are seeing 
>>> a particular result rank in position 3 on one machine and position 8 on the 
>>> production machine. The position 3 is our desired and roughly expected 
>>> ranking.
>>> 
>>> I have a local machine with solr and a version deployed on a production 
>>> server. My local machine's solr and the production version are both checked 
>>> out from our project's SVN trunk. They are identical files except for the 
>>> data files (not in SVN) and database connection settings.
>>> 
>>> The index is populated exclusively via data import handler queries to a 
>>> database.
>>> 
>>> I have exported the production database as-is to my local development 
>>> machine so that my local machine and production have access to the self 
>>> same data.
>>> 
>>> I execute a total full-import on both.
>>> 
>>> Still, I see a different position for this document that should surely rank 
>>> in the same location, all else being equal.
>>> 
>>> I ran debugQuery diff to see how the scores were being computed. See 
>>> appendix at foot of this email.
>>> 
>>> As far as I can tell every single query normalisation block of the debug is 
>>> marginally different, e.g.
>>> 
>>> -        0.021368012 = queryNorm (local)
>>> +        0.009944122 = queryNorm (production)
>>> 
>>> Which leads to a final score of -2 versus +1 which is enough to skew the 
>>> results from correct to incorrect (in terms of what we expect to see).
>>> 
>>> - -2.286596 (local)
>>> +1.0651637 = (production)
>>> 
>>> I cannot explain this difference. The database is the same. The 
>>> configuration is the same. I have fully imported from scratch on both 
>>> servers. What am I missing?
>>> 
>>> Thank you for your time
>>> 
>>> Allistair
>>> 
>>> ----- snip
>>> 
>>> APPENDIX - debugQuery=on DIFF
>>> 
>>> --- untitled
>>> +++ (clipboard)
>>> @@ -1,51 +1,49 @@
>>> -<str name="L12411p">
>>> +<str name="L12411">
>>> 
>>> -2.286596 = (MATCH) sum of:
>>> -  1.6891675 = (MATCH) sum of:
>>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>>> -          0.1 = boost
>>> +1.0651637 = (MATCH) sum of:
>>> +  0.7871359 = (MATCH) sum of:
>>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>>> +        0.05489459 = queryWeight(text:dubai), product of:
>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>           0.25 = fieldNorm(field=text, doc=1551)
>>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>>           2.0 = boost
>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>>> -          0.1 = boost
>>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>>> +        0.018402064 = queryWeight(text:product), product of:
>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>>           1.0 = tf(termFreq(text:product)=1)
>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>           0.25 = fieldNorm(field=text, doc=1551)
>>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>>           2.0 = boost
>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>>           1.4142135 = tf(termFreq(profile:product)=2)
>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product 
>>> of:
>>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product 
>>> of:
>>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>         0.5 = boost
>>>         11.667155 = idf(profile: dubai=7 product=290)
>>> -        0.021368012 = queryNorm
>>> +        0.009944122 = queryNorm
>>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>>         1.0954452 = tf(phraseFreq=1.2)
>>>         11.667155 = idf(profile: dubai=7 product=290)
>>> 
>>> 
>>> 
>>>

Re: Same index is ranking differently on 2 machines

Reply via email to