Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
suggestion? best Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:10 GMT+01:00 Suneel Marthi : > How did u implement BM25PartialVectorReducer and BM25Converter?? The > present implementations for TFIDFConverter and Reducer are MR. > Mahout is not accepting any new MapRedu

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Hi Ted, My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:09 GMT+01:00 Ted Dunning : > Thanks so much

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
adoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html> respectively . cheers Arian Arian Pasquali http://about.me/arianpasquali 2014-09-24 14:14 GMT+01:00 Arian Pasquali : > Yes, > I'm studying his work <http://nlp.uned.es/~jperezi/Lucene-BM25/> and the > cu

Re: word weights using BM25

2014-09-24 Thread Arian Pasquali
Yes, I'm studying his work <http://nlp.uned.es/~jperezi/Lucene-BM25/> and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:

word weights using BM25

2014-09-23 Thread Arian Pasquali
Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali http://abo

Re: Clusterdump output format

2014-07-30 Thread Arian Pasquali
Actually I m having the same situation with mahout 0.9.TEXT dump works fine printing clusters and its representative points. I change the output format to   CSV and it dumps an empty file :/ I m stuck here too Arian Pasquali On Wed, Jul 30, 2014 at 3:38 AM -0700, "Oisin Boydell&quo