The sequential and mapreduce implementations do not produce the same results, as the sequential implementation runs canopy once and the mapreduce implementation twice: in each mapper and in the reducer. This is documented in https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering (see #10).
-----Original Message----- From: Paritosh Ranjan [mailto:[email protected]] Sent: Sunday, October 02, 2011 8:59 PM To: [email protected] Subject: Re: Difference in results : Clustering : sequential and MapReduce The sequential algorithm finds more/better clusters than the mapreduce one. There's not a huge difference, but the standalone one is better for sure. Thanks and Regards, Paritosh On 03-10-2011 01:47, Konstantin Shmakov wrote: > I'd assume that distributed and sequential algorithms shouldn't produce > identical results. To start with, they differ in initial setup: > -- In distributed algorithm each mapper deals with subset of data and starts > by picking up a random point, so N random points are picked up by N mappers > to start with. > -- In sequential algorithm 1 mapper deals with all data and starts by > picking up 1 random point. > But for the data with real clusters both algorithms should produce similar > results. How different are the results in your case? > > Thanks > --Konstantin > > > > > > > > > On Sun, Oct 2, 2011 at 1:36 AM, Paritosh Ranjan<[email protected]> wrote: > >> Even run() of CanopyDriver, which takes only T1 and T2 is giving different >> results for sequential and mapreduce. >> This is preventing me from scaling up, as I need to run mapreduce on hadoop >> to scale. >> >> Is anyone having any idea of this problem? >> >> On 02-10-2011 00:27, Paritosh Ranjan wrote: >> >>> Hi, >>> >>> I am able to cluster correctly sequentially, using CanopyDriver. >>> >>> However, the same dataset, when processed as a MapReduce job, where ( t1 = >>> t3 and t2 = t4 and t1>t2) is not working. I am getting errors like Canopies >>> are empty. >>> >>> I also tried to reduce the values of t3 and t4. But reducing it either has >>> no effect or gives meaningless results. >>> >>> Am I doing something wrong? or is there a bug somewhere? >>> >>> I feel that both, sequential and MapReduce should give similar results. >>> But, It is not happening. >>> >>> Thanks and Regards, >>> Paritosh >>> >>> >>> ----- >>> No virus found in this message. >>> Checked by AVG - www.avg.com >>> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11 >>> >> >
