subject:"Naïve k\-means using hadoop"

Re: Naïve k-means using hadoop

2013-03-27 Thread Mark Miller

On Mar 27, 2013, at 12:47 PM, Ted Dunning wrote: > And, of course, due credit should be given here. The advanced clustering > algorithms in Crunch were lifted from the new stuff in Mahout pretty much > step for step. > > The Mahout group would have loved to have contributions from the Cloude

Re: Naïve k-means using hadoop

2013-03-27 Thread Ted Dunning

Spark would be an excellent choice for the iterative sort of k-means. It could be good for sketch-based algorithms as well, but the difference would be much less pronounced. On Wed, Mar 27, 2013 at 3:39 PM, Charles Earl wrote: > I would think also that starting with centers in some in-memory

Re: Naïve k-means using hadoop

2013-03-27 Thread Ted Dunning

And, of course, due credit should be given here. The advanced clustering algorithms in Crunch were lifted from the new stuff in Mahout pretty much step for step. The Mahout group would have loved to have contributions from the Cloudera guys instead of re-implementation, but you can't legislate ta

Re: Naïve k-means using hadoop

2013-03-27 Thread Charles Earl

I would think also that starting with centers in some in-memory Hadoop platform like spark would also be a valid approach. I think the spark demo assumes that the data set is cached vs just centers. C On Mar 27, 2013, at 9:24 AM, Bertrand Dechoux wrote: > And there is also Cascading ;) : http:

Re: Naïve k-means using hadoop

2013-03-27 Thread Bertrand Dechoux

And there is also Cascading ;) : http://www.cascading.org/ But like Crunch, this is Hadoop. Both are 'only' higher APIs for MapReduce. As for the number of reducers, you will have to do the math yourself but I highly doubt that more than one reducer is needed (imho). But you can indeed distribute

Re: Naïve k-means using hadoop

2013-03-27 Thread Yaron Gonen

Thanks! *Bertrand*: I don't like the idea of using a single reducer. A better way for me is to write all the output of all the reducers to the same directory, and then distribute all the files. I know about Mahout of course, but I want to implement it myself. I will look at the documentation though

Re: Naïve k-means using hadoop

2013-03-27 Thread Harsh J

If you're also a fan of doing things the better way, you can also checkout some Apache Crunch (http://crunch.apache.org) ways of doing this via https://github.com/cloudera/ml (blog post: http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/). On Wed, Mar 27, 2013 at 3:29 PM, Yaron

Re: Naïve k-means using hadoop

2013-03-27 Thread Bertrand Dechoux

Of course, you should check out Mahout, at least the documentation, even if you really want to implement it by yourself. https://cwiki.apache.org/MAHOUT/k-means-clustering.html Regards Bertrand On Wed, Mar 27, 2013 at 1:34 PM, Bertrand Dechoux wrote: > Actually for the first step, the client co

Re: Naïve k-means using hadoop

2013-03-27 Thread Bertrand Dechoux

Actually for the first step, the client could create a file with the centers and then put it on hdfs and use it with distributed cache. A single reducer might be enough and that case, its only responsibility is to create the file with the updated centers. You can then use this new file again in the

Naïve k-means using hadoop

2013-03-27 Thread Yaron Gonen

Hi, I'd like to implement k-means by myself, in the following naive way: Given a large set of vectors: 1. Generate k random centers from set. 2. Mapper reads all center and a split of the vectors set and emits for each vector the closest center as a key. 3. Reducer calculated new cente

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Re: Naïve k-means using hadoop

Naïve k-means using hadoop

10 matches

Site Navigation

Mail list logo

Footer information