[ https://issues.apache.org/jira/browse/LUCENE-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Ernst resolved LUCENE-5005. -------------------------------- Resolution: Not A Problem Lucene encodes norms using 8 bits. This means precision can be lost when encoding. You can see it explained here: https://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/DefaultSimilarity > Length norm value of DefaultSimilarity for a few terms > ------------------------------------------------------ > > Key: LUCENE-5005 > URL: https://issues.apache.org/jira/browse/LUCENE-5005 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: 4.0 > Reporter: Shingo Sasaki > Priority: Minor > > lengthNorm method of DefaultSimilarity is following: > {noformat} > public float lengthNorm(FieldInvertState state) { > final int numTerms; > if (discountOverlaps) > numTerms = state.getLength() - state.getNumOverlap(); > else > numTerms = state.getLength(); > return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))); > } > {noformat} > The retrun value is decided by (1.0 / Math.sqrt(numTerms)). > The type is float, but this value is encoded to byte length by > SmallFloat.floatToByte315. > ||term count||1/sqrt(numTerms)||1/sqrt(numTerms) to byte|| > |1| 1.000000| 1.0000| > |2| 0.707107| 0.6250| > |3| 0.577350| 0.5000| > |4| 0.500000| 0.5000| > |5| 0.447214| 0.4375| > The length norm of 3 terms is the same as that of 4 terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org