Question about Pearson Correlation in non-Taste mode

2013-11-26 Thread Amit Nithian
Hi all, Apologies if this is a repeat question as I just joined the list but I have a question about the way that metrics like Cosine and Pearson are calculated in Hadoop "mode" (i.e. non Taste). As far as I understand, the vectors used for computing pairwise item similarity in Taste are based on

Re: Install Mahout on eclipse kelpler on Windows

2013-11-26 Thread Amit Nithian
I haven't tried Mahout on Windows in a while but the eclipse part should just be able to be done through using mvn eclipse:eclipse and import the project(s) into your workspace. Then you should be able to use the LocalJobRunner to do some basic testing in Eclipse. Cheers Amit On Tue, Nov 26, 20

Re: Question about Pearson Correlation in non-Taste mode

2013-11-27 Thread Amit Nithian
Thanks Sebastian! Is there a particular reason for that? On Nov 27, 2013 7:47 AM, "Sebastian Schelter" wrote: > Hi Amit, > > You are right, the non-corated items are not filtered out in the > distributed implementation. > > --sebastian > > > On 26.11.2013

Re: Question about Pearson Correlation in non-Taste mode

2013-11-27 Thread Amit Nithian
arity between two vectors of different length which essentially is going on here I think? Thanks again Amit On Nov 27, 2013 9:06 AM, "Sebastian Schelter" wrote: > Yes, it is due to the parallel algorithm which only looks at co-ratings > from a given user. > > > On 27.11.

Re: Question about Pearson Correlation in non-Taste mode

2013-11-27 Thread Amit Nithian
test. > > The distributed code doesn't look at vectors of different lengths, but > simply assumes non-existent ratings as zero. > > --sebastian > > On 27.11.2013 16:09, Amit Nithian wrote: > > Comparing this against the non distributed (taste) gives different > answe

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Amit Nithian
r it's very possible I am completely misunderstanding something :-). Thanks again! Amit On Wed, Nov 27, 2013 at 8:23 AM, Amit Nithian wrote: > Hey Sebastian, > > Thanks again. Actually I'm glad that I am talking to you as it's your > paper and presentation I have quest

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Amit Nithian
e and the expression m_x e . y can be computed > (at lease in Mahout) in map-reduce idiom as > > y.aggregate(Functions.PLUS, Functions.mult(m_x)) > > > > > On Fri, Nov 29, 2013 at 9:31 PM, Amit Nithian wrote: > > > Okay so I rethought my question and realized

Re: Question about Pearson Correlation in non-Taste mode

2013-11-30 Thread Amit Nithian
orm(1) / vector.getNumNonZeroElements(); Which looks like it's taking the sum and dividing by the number of defined elements. Which would make my [5 - 4] average be 4.5. Thanks again Amit On Fri, Nov 29, 2013 at 10:34 PM, Ted Dunning wrote: > On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian wrote: > > >

Re: Question about Pearson Correlation in non-Taste mode

2013-12-01 Thread Amit Nithian
Thanks guys! So the real question is not so much what's the average of the vector with the missing rating (although yes that was a question) but what's the average of the vector with all the ratings specified but the second rating that is not shared with the first user: [5 - 4] vs [4 5 2]. If we a

Re: Clustering without Hadoop

2013-12-01 Thread Amit Nithian
When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, "Shan Lu" wrote: > Hi, > > I am working on a very simple k-means clusterin

Re: Question about Pearson Correlation in non-Taste mode

2013-12-06 Thread Amit Nithian
glikelihood ratio test > give much better results. The current implementation of Mahout's > distributed itembased recommender is clearly designed and tuned for the > latter usecase. > > I hope that answers your question. > > --sebastian > > On 01.12.2013 18:10, Amit Nithian w