Hey guys,
I think it is fair to give you some feedback.
I managed to implement BM25+ <http://en.wikipedia.org/wiki/Okapi_BM25> term
score on Mahout.
It was straightforward using the current TFIDF implementation as an example.

Basically what I did was implement the interface
org.apache.mahout.vectorizer.Weight, create a BM25Converter and
BM25PartialVectorReducer similar to TFIDFConverter
<https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html>
and
TFIDFPartialVectorReducer
<https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html>
 respectively .

cheers
Arian

Arian Pasquali
http://about.me/arianpasquali

2014-09-24 14:14 GMT+01:00 Arian Pasquali <ar...@arianpasquali.com>:

> Yes,
> I'm studying his work <http://nlp.uned.es/~jperezi/Lucene-BM25/> and the
> current mahout's tfidf code.
> Trying to understand how I would port that to mr.
> I ll try to share something if I succeed.
>
> Arian Pasquali
> http://about.me/arianpasquali
>
> 2014-09-24 5:12 GMT+01:00 Suneel Marthi <suneel.mar...@gmail.com>:
>
>> Lucene 4.x supports okapi-bm25. So it should be easy to implement.
>>
>> On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>
>> > Should be pretty easy. I haven't heard of anyone doing it.
>> >
>> > Sent from my iPhone
>> >
>> > > On Sep 23, 2014, at 18:53, Arian Pasquali <ar...@arianpasquali.com>
>> > wrote:
>> > >
>> > > Hi,
>> > > I was wondering if would be possible to support bm25 term weighting
>> > > extending Mahout's tf-idf implementation.
>> > >
>> > > I was curious to know if anyone here has already tried to do so.
>> > > If not, what would be your suggestion for such implementation on
>> Mahout?
>> > >
>> > >
>> > > Arian Pasquali
>> > > http://about.me/arianpasquali
>> >
>>
>
>

Reply via email to