Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

Aaron Daubman Thu, 19 Jul 2012 10:58:33 -0700

Robert,

So this is lossy: basically you can think of there being only 256
> possible values. So when you increased the number of terms only
> slightly by changing your analysis, this happened to bump you over the
> edge rounding you up to the next value.
>
> more information:
> http://lucene.apache.org/core/3_6_0/scoring.html
>
> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html




Thanks - this was extremely helpful! I had read both sources before but
didn't grasp the magnitude of lossy-ness until your pointer and mention of
edge-case.
Just to help out anybody else who might run in to this, I hacked together a
little harness to demonstrate:
---
fieldLength: 160, computeNorm: 0.07905694, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 161, computeNorm: 0.07881104, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 162, computeNorm: 0.07856742, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 163, computeNorm: 0.07832605, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 164, computeNorm: 0.07808688, floatToByte315: 108,
byte315ToFloat: 0.0625
fieldLength: 165, computeNorm: 0.077849895, floatToByte315: 108,
byte315ToFloat: 0.0625
fieldLength: 166, computeNorm: 0.07761505, floatToByte315: 108,
byte315ToFloat: 0.0625
---

So my takeaway is that these scores that vary significantly are caused by:
1) a field with lengths right on this boundary between the two analyzer
chains
2) the fact that we might be searching for matches from 50+ values to a
field with 150+ values, and so the overall score is repeatedly impacted by
the otherwise typically insignificant change in fieldNorm value

Thanks again,
     Aaron

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

Reply via email to