Shingo Sasaki created LUCENE-5005: ------------------------------------- Summary: Length norm value of DefaultSimilarity for a few terms Key: LUCENE-5005 URL: https://issues.apache.org/jira/browse/LUCENE-5005 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: 4.0 Reporter: Shingo Sasaki Priority: Minor
lengthNorm method of DefaultSimilarity is following: {noformat} public float lengthNorm(FieldInvertState state) { final int numTerms; if (discountOverlaps) numTerms = state.getLength() - state.getNumOverlap(); else numTerms = state.getLength(); return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))); } {noformat} The retrun value is decided by (1.0 / Math.sqrt(numTerms)). The type is float, but this value is encoded to byte length by SmallFloat.floatToByte315. ||term count||1/sqrt(numTerms)||1/sqrt(numTerms) to byte|| |1| 1.000000| 1.0000| |2| 0.707107| 0.6250| |3| 0.577350| 0.5000| |4| 0.500000| 0.5000| |5| 0.447214| 0.4375| The length norm of 3 terms is the same as that of 4 terms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org