Re: Regarding ItemBased Recommendation Results

2013-04-01 Thread Sebastian Schelter
It could also be due to the way in which the Pearson correlation is calculated in both implementations. The distributed implementation centers all item vectors, scales them to unit length and computes dot products afterwards. The single machine implementation centers only based on the common inte

Re: Regarding ItemBased Recommendation Results

2013-04-01 Thread Phoenix Bai
Raju, like Sebastian said, it probably due to the default sampling restriction of hadoop-based implementation. maxPrefsPerUserInItemSimilarity", "max number of preferences to consider per user in the " + "item similarity computation phase, users with more preferences will be sampled d

* Using lucene as pre processing for document clustering *

2013-04-01 Thread Rajesh Nikam
Hello, I want to cluster document. I see lucene to be great help to do pre processing 'StandardAnalyzer' to remove stop words, stemming etc. As Mahout requires input in its vector format. They have provided following props for the same, which I have used. org.apache.mahout.utils.vectors.lucene.

Re: Parallel GenericRecommenderIRStatsEvaluator?

2013-04-01 Thread Sean Owen
No, just was never written I suppose back in the day. The way it is structured now it creates a test split for each user, which is also slow, and may be challenging to memory limitations as that's N data models in memory. You could take a crack at a patch. When I rewrote this aspect in a separate

Parallel GenericRecommenderIRStatsEvaluator?

2013-04-01 Thread Gabor Bernat
Hello, Is there any good reason why the *GenericRecommenderIRStatsEvaluator* does not support parallel (multi-CPU) evaluation. Today is quite common to have CPUs with more than one core, and IR evaluation on any reasonably sized data set takes forever to finish. I'm thinking if we could paralleliz