The BM25 similarity computes the normalized length as the number of tokens,
ignoring synonyms (tokens at the same position).
Then it encodes this length as an 8-bit integer in the index using this
logic:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/SmallFl
During indexing, an inverted index is made with the term of the documents
and the term frequency, document frequency etc. are stored. If I know
correctly, the exact document length is not stored in the index to reduce
the size. Instead, a normalized length is stored for each document.
However, for