Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ <http://en.wikipedia.org/wiki/Okapi_BM25> term score on Mahout. It was straightforward using the current TFIDF implementation as an example.
Basically what I did was implement the interface org.apache.mahout.vectorizer.Weight, create a BM25Converter and BM25PartialVectorReducer similar to TFIDFConverter <https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html> and TFIDFPartialVectorReducer <https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html> respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali <ar...@arianpasquali.com>: > Yes, > I'm studying his work <http://nlp.uned.es/~jperezi/Lucene-BM25/> and the > current mahout's tfidf code. > Trying to understand how I would port that to mr. > I ll try to share something if I succeed. > > Arian Pasquali > http://about.me/arianpasquali > > 2014-09-24 5:12 GMT+01:00 Suneel Marthi <suneel.mar...@gmail.com>: > >> Lucene 4.x supports okapi-bm25. So it should be easy to implement. >> >> On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> >> > Should be pretty easy. I haven't heard of anyone doing it. >> > >> > Sent from my iPhone >> > >> > > On Sep 23, 2014, at 18:53, Arian Pasquali <ar...@arianpasquali.com> >> > wrote: >> > > >> > > Hi, >> > > I was wondering if would be possible to support bm25 term weighting >> > > extending Mahout's tf-idf implementation. >> > > >> > > I was curious to know if anyone here has already tried to do so. >> > > If not, what would be your suggestion for such implementation on >> Mahout? >> > > >> > > >> > > Arian Pasquali >> > > http://about.me/arianpasquali >> > >> > >