I gather that "term" is the proper technical term within the Vector Space
Model (TDIFS) and BM25 similarity, so it may simply be a question of where
the boundary is in Lucene between VSM processing and other stuff, like the
source for documents and queries.
-- Jack Krupansky
On Wed, Apr 20, 2016
My understanding is a Term is comprised of a "token" and a field. So then
the documentation makes sense to me - return the count of tokens in a field
for example. But there were a couple of references you had there that
don't match with that definition, like the number of tokens in a
collection.
Looking at the Lucene Similarity Javadoc, I see some references to tokens,
but I am wondering if that is intentional or whether those should really be
references to terms.
For example:
*lengthNorm - computed
*when the document is added to the index in accordance with the
number