Re: Help regarding ClusterOutputPostProcessor

2012-04-29 Thread praneet mhatre
Great, that helps! I'll just go ahead with the output file then and see what kind of results I get. Thank you! On Fri, Apr 27, 2012 at 12:36 PM, Jeff Eastman wrote: > I think the answer to this question lies in how Dirichlet works: During > each iteration, all points are assigned to clusters bas

Re: Help regarding ClusterOutputPostProcessor

2012-04-27 Thread Jeff Eastman
I think the answer to this question lies in how Dirichlet works: During each iteration, all points are assigned to clusters based upon a probabilistic assignment using a multinomial sampling of the cluster pdfs times a Dirichlet distribution mixture (see DirichletClusteringPolicy.select() for e

Re: Help regarding ClusterOutputPostProcessor

2012-04-27 Thread Paritosh Ranjan
To answer : "I was wondering if the clusteredPoints directory contains the correct point assignment and if I could just use that for the purpose of my project. " I would say "Yes". If you will read the comments in the issue, you will find that "The number of members printed by the clusterdum

Re: Help regarding ClusterOutputPostProcessor

2012-04-26 Thread praneet mhatre
Hi, I had a look at the JIRA and looks like the issue is still unresolved. I wanted to know if the suggestion that the postprocessor may be at fault has been verified. I am using Dirichlet clustering for a project of mine and I also noticed the mismatch between the number of points actually prese

Re: Help regarding ClusterOutputPostProcessor

2012-01-30 Thread gaurav redkar
Hello. As Jeff mentioned, i created a JIRA issue. Kindly check out MAHOUT-966 and share your inputs. Thanks, Gaurav On Wed, Jan 25, 2012 at 8:51 PM, Jeff Eastman wrote: > Mean Shift accumulates the pointIds of every point assigned to a cluster,

Re: Help regarding ClusterOutputPostProcessor

2012-01-25 Thread Jeff Eastman
Mean Shift accumulates the pointIds of every point assigned to a cluster, so I would expect n= to be correct in the cluster dumper output. It is most likely the postprocessor is misbehaving. Please create a JIRA and attach your dataset and we will take a look at it. It would also be useful for

Re: Help regarding ClusterOutputPostProcessor

2012-01-25 Thread gaurav redkar
Hello, I was able to rectify the afore-mentioned problem after i implemented a custom partitioner instead of using the default hash partitioner. I have another issue though. After running the post processor the number of points that each cluster contains is not matching the number of points each

Re: Help regarding ClusterOutputPostProcessor

2012-01-06 Thread Paritosh Ranjan
ClusterOutputProcessorDriver has options to run either sequentially or in a mapreduce way. If the clustering was done sequetially, then ClusterOutputProcessor should be run sequentially, and if the clustering was done in a mapreduce way, then run the ClusterOutputPostProcessor with option map

Re: Help regarding ClusterOutputPostProcessor

2012-01-06 Thread Lance Norskog
Apache mail throws away all attachments. If you think that this is a bug, please file a JIRA. If you can change ClusterOutputPostProcessorTest to test for this scenario, please contribute it. With this it is possible to single-step map-reduce jobs inside your IDE. Sometimes these directory manipul