Re: Dealing with kmean and meanshift output

2010-04-12 Thread Jeff Eastman
adam35413 wrote: I did some testing and determined that a local copy of the results files AND an HDFS copy has to exist. From looking at the code, it appears that ClusterDump checks if the files exist locally at Line 129, but then tries to read the files from HDFS aroudn Line 135. If the files

Re: DistributedRowMatrix mult problem

2010-04-12 Thread Jake Mannix
Hi Young, The problem is one of documentation, and poor naming of the method: DistributedRowMatrix.times(DistributedRowMatrix m) should be called DistributedRowMatrix.transposeTimes(DistributedRowMatrix m), as it computes a.transpose().times(b), not a.times(b). See the javadocs for the inte

Re: Dealing with kmean and meanshift output

2010-04-12 Thread adam35413
I did some testing and determined that a local copy of the results files AND an HDFS copy has to exist. From looking at the code, it appears that ClusterDump checks if the files exist locally at Line 129, but then tries to read the files from HDFS aroudn Line 135. If the files don't exist at bot

DistributedRowMatrix mult problem.

2010-04-12 Thread Young Y. Kim
I'm trying to test DistributedRowMatrix in eclipse for matrix calcuration in hadoop. A = [[85,68,30,15,50,34], [53,38,19,70,90,29], [20,83,19,38,82,34], [67,50,68,86,64,53], [84,71,30,85,82,73], [2,43,54,50,66,31]] DistributedRowMatrix m = DistributedRowMatrix(path,...); and check the values of m

Re: Current state of (dense) matrix multiplication?

2010-04-12 Thread Vimal Mathew
In my opinion the most important take-away from MapReduce is that you dont move the data, but move the "computation towards the data". So I implemented a persistent storage, and moved most of the critical computations directly into the storage ( I did play around with Hama before this). Right now

Re: Current state of (dense) matrix multiplication?

2010-04-12 Thread Vimal Mathew
The naive matrix-multiplication algorithm is highly parallelizable if you have the data available locally at all the nodes. The persistent storage issue was one of the first problems that I tried solving (HDFS is just wrong for the access patterns in matrix algorithms). I cant compete with Matlab

DistributedRowMatrix mult problem

2010-04-12 Thread Young Y. Kim
I'm trying to test DistributedRowMatrix in eclipse for matrix calcuration in hadoop. A = [[85,68,30,15,50,34], [53,38,19,70,90,29], [20,83,19,38,82,34], [67,50,68,86,64,53], [84,71,30,85,82,73], [2,43,54,50,66,31]] DistributedRowMatrix m = DistributedRowMatrix(path,...) ; and check the values of m