It's probably also reasonable to have a way of retrieving the
TermVector (in the Lucene sense) as part of this component.
Thus, the component could retrieve/append:
1. The TV (terms + TF)
2. TV + offset + position (depending on what was set in the schema)
3. #2 + IDF
-Grant
On Jul 24, 2008, at 1:05 PM, Shalin Shekhar Mangar wrote:
+1
On Thu, Jul 24, 2008 at 4:32 PM, Asharaf S <[EMAIL PROTECTED]> wrote:
I was wondering whether the following feature if added, will have
some
added
value to the SOLR framework
What is needed?A component that can return TF-IDF vector for any
given
document in the SOLR index
Query : A Document Number / a query identifying a Document
Response : A Map of term vs.TF-IDF value of every term in the
Selected
Document
Why it is needed
Most of the Machine Learning Algorithms work on TFIDF
representation of
documents, hence adding a Request Handler proving the TFIDF
representation
will pave the way for incorporating Learning Paradigms to SOLR
framework.
thanks,
Asharaf
--
Regards,
Shalin Shekhar Mangar.
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ