Robert, So this is lossy: basically you can think of there being only 256 > possible values. So when you increased the number of terms only > slightly by changing your analysis, this happened to bump you over the > edge rounding you up to the next value. > > more information: > http://lucene.apache.org/core/3_6_0/scoring.html > > http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html
Thanks - this was extremely helpful! I had read both sources before but didn't grasp the magnitude of lossy-ness until your pointer and mention of edge-case. Just to help out anybody else who might run in to this, I hacked together a little harness to demonstrate: --- fieldLength: 160, computeNorm: 0.07905694, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 161, computeNorm: 0.07881104, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 162, computeNorm: 0.07856742, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 163, computeNorm: 0.07832605, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 164, computeNorm: 0.07808688, floatToByte315: 108, byte315ToFloat: 0.0625 fieldLength: 165, computeNorm: 0.077849895, floatToByte315: 108, byte315ToFloat: 0.0625 fieldLength: 166, computeNorm: 0.07761505, floatToByte315: 108, byte315ToFloat: 0.0625 --- So my takeaway is that these scores that vary significantly are caused by: 1) a field with lengths right on this boundary between the two analyzer chains 2) the fact that we might be searching for matches from 50+ values to a field with 150+ values, and so the overall score is repeatedly impacted by the otherwise typically insignificant change in fieldNorm value Thanks again, Aaron