It's probably also reasonable to have a way of retrieving the TermVector (in the Lucene sense) as part of this component.

Thus, the component could retrieve/append:
1. The TV (terms + TF)
2. TV + offset + position (depending on what was set in the schema)
3. #2 + IDF

-Grant

On Jul 24, 2008, at 1:05 PM, Shalin Shekhar Mangar wrote:

+1

On Thu, Jul 24, 2008 at 4:32 PM, Asharaf S <[EMAIL PROTECTED]> wrote:

I was wondering whether the following feature if added, will have some
added
value to the SOLR framework
What is needed?A component that can return TF-IDF vector for any given
document in the SOLR index
Query : A Document Number / a query identifying a Document
Response : A Map of term vs.TF-IDF value of every term in the Selected
Document


Why it is needed

Most of the Machine Learning Algorithms work on TFIDF representation of documents, hence adding a Request Handler proving the TFIDF representation will pave the way for incorporating Learning Paradigms to SOLR framework.


thanks,


Asharaf




--
Regards,
Shalin Shekhar Mangar.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Reply via email to