Shingo Sasaki created LUCENE-5005:
-------------------------------------

             Summary: Length norm value of DefaultSimilarity for a few terms
                 Key: LUCENE-5005
                 URL: https://issues.apache.org/jira/browse/LUCENE-5005
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/search
    Affects Versions: 4.0
            Reporter: Shingo Sasaki
            Priority: Minor


lengthNorm method of DefaultSimilarity is following:

{noformat}
  public float lengthNorm(FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
   return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
  }
{noformat}

The retrun value is decided by (1.0 / Math.sqrt(numTerms)).
The type is float, but this value is encoded to byte length by 
SmallFloat.floatToByte315.

||term count||1/sqrt(numTerms)||1/sqrt(numTerms) to byte||
|1|     1.000000|       1.0000|
|2|     0.707107|       0.6250|
|3|     0.577350|       0.5000|
|4|     0.500000|       0.5000|
|5|     0.447214|       0.4375|

The length norm of 3 terms is the same as that of 4 terms.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to