No problem. It was a big headache for my team as well. One of us already reimplemented it from scratch, as seen in this pending PR for our project. https://github.com/hail-is/hail/pull/1895
Hopefully you find that useful. We'll hopefully try to PR that into Spark at some point. Best, John Sent from my iPhone > On Jun 14, 2017, at 8:28 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> wrote: > > Interesting, thanks! That probably also explains why there seems to be a ton > of shuffle for this operation. So what's the best option for truly scalable > matrix multiplication on Spark then - implementing from scratch using the > coordinate matrix ((i,j), k) format? > >> On Wed, Jun 14, 2017 at 4:29 PM, John Compitello <jo...@broadinstitute.org> >> wrote: >> Hey Anthony, >> >> You're the first person besides myself I've seen mention this. BlockMatrix >> multiply is not the best method. As far as me and my team can tell, the >> memory problem stems from the fact that when Spark tries to compute block >> (i, j) of the matrix, it tries to manifest all of row i from matrix 1 and >> all of column j from matrix 2 in memory at once on one executor. Then after >> doing that, it proceeds to combine them with a functional reduce, creating >> one additional block for each pair. So you end up manifesting 3n + logn >> matrix blocks in memory at once, which is why it sucks so much. >> >> Sent from my iPhone >> >>> On Jun 14, 2017, at 7:07 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> wrote: >>> >>> I've been experimenting with MlLib's BlockMatrix for distributed matrix >>> multiplication but consistently run into problems with executors being >>> killed due to memory constrains. The linked gist (here) has a short example >>> of multiplying a 25,000 x 25,000 square matrix taking approximately 5G of >>> disk with a vector (also stored as a BlockMatrix). I am running this on a 3 >>> node (1 master + 2 workers) cluster on Amazon EMR using the m4.xlarge >>> instance type. Each instance has 16GB of RAM and 4 CPU. The gist has >>> detailed information about the Spark environment. >>> >>> I have tried reducing the block size of the matrix, increasing the number >>> of partitions in the underlying RDD, increasing defaultParallelism and >>> increasing spark.yarn.executor.memoryOverhead (up to 3GB) - all without >>> success. The input matrix should fit comfortably in distributed memory and >>> the resulting matrix should be quite small (25,000 x 1) so I'm confused as >>> to why Spark seems to want so much memory for this operation, and why Spark >>> isn't spilling to disk here if it wants more memory. The job does >>> eventually complete successfully, but for larger matrices stages have to be >>> repeated several times which leads to long run times. I don't encounter any >>> issues if I reduce the matrix size down to about 3GB. Can anyone with >>> experience using MLLib's matrix operators provide any suggestions about >>> what settings to look at, or what the hard constraints on memory for >>> BlockMatrix multiplication are? >>> >>> Thanks, >>> >>> Anthony >