In order to perform any further study of the resultset, like clustering, the TermVectorComponent gives the list of words with the correspoing tf, idf, but this list can be huge for each document, and most of the terms may have a low tf or a too high df, maybe, it is usefull to compare the relative increment of DF to the collection in order to improve the facets (show only these terms that the relative DF in the query is higher than in the full collection)
To perform this it could be interesting that the TermVectorComponent could sort the results by some of these options: *tf *DF * tf/df (to simplify) or tf*idf where idf is computed as log(total_docs/df) and truncate the list to a number of words or a given value or maybe there is another way to perform this? Joan -- View this message in context: http://www.nabble.com/Top-tf_idf-in-TermVectorComponent-tp24201076p24201076.html Sent from the Solr - User mailing list archive at Nabble.com.