I have a 6kx10k matrix T and I need the result of T'*T which should be 10kx10k. I want to do this using Mahout DistributedRowMatrix but I found Hadoop caculates with only one mapper which is very slow.

I digged into the source code of DistributedRowMatrix and found that the input format of DistributedRowMatrix is CompositeInputFormat.class which has a method named getSplits that set mapred.min.split.size to Long.MAX_VALUE.

So my question is that is DistributedRowMatrix only a demo to show that matrix multiplication could be done using MapReduce but has no practical value? Is there any way to do matrix multiplication quickly using Hadoop?

Thanks for your time and sorry for my broken English.

Reply via email to