No problem. It was a big headache for my team as well. One of us already 
reimplemented it from scratch, as seen in this pending PR for our project. 
https://github.com/hail-is/hail/pull/1895

Hopefully you find that useful. We'll hopefully try to PR that into Spark at 
some point. 

Best,

John

Sent from my iPhone

> On Jun 14, 2017, at 8:28 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> wrote:
> 
> Interesting, thanks! That probably also explains why there seems to be a ton 
> of shuffle for this operation. So what's the best option for truly scalable 
> matrix multiplication on Spark then - implementing from scratch using the 
> coordinate matrix ((i,j), k) format?
> 
>> On Wed, Jun 14, 2017 at 4:29 PM, John Compitello <jo...@broadinstitute.org> 
>> wrote:
>> Hey Anthony,
>> 
>> You're the first person besides myself I've seen mention this. BlockMatrix 
>> multiply is not the best method. As far as me and my team can tell, the 
>> memory problem stems from the fact that when Spark tries to compute block 
>> (i, j) of the matrix, it tries to manifest all of row i from matrix 1 and 
>> all of column j from matrix 2 in memory at once on one executor. Then after 
>> doing that, it proceeds to combine them with a functional reduce, creating 
>> one additional block for each pair. So you end up manifesting 3n + logn 
>> matrix blocks in memory at once, which is why it sucks so much. 
>> 
>> Sent from my iPhone
>> 
>>> On Jun 14, 2017, at 7:07 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> wrote:
>>> 
>>> I've been experimenting with MlLib's BlockMatrix for distributed matrix 
>>> multiplication but consistently run into problems with executors being 
>>> killed due to memory constrains. The linked gist (here) has a short example 
>>> of multiplying a 25,000 x 25,000 square matrix taking approximately 5G of 
>>> disk with a vector (also stored as a BlockMatrix). I am running this on a 3 
>>> node (1 master + 2 workers) cluster on Amazon EMR using the m4.xlarge 
>>> instance type. Each instance has 16GB of RAM and 4 CPU. The gist has 
>>> detailed information about the Spark environment.
>>> 
>>> I have tried reducing the block size of the matrix, increasing the number 
>>> of partitions in the underlying RDD, increasing defaultParallelism and 
>>> increasing spark.yarn.executor.memoryOverhead (up to 3GB) - all without 
>>> success. The input matrix should fit comfortably in distributed memory and 
>>> the resulting matrix should be quite small (25,000 x 1) so I'm confused as 
>>> to why Spark seems to want so much memory for this operation, and why Spark 
>>> isn't spilling to disk here if it wants more memory. The job does 
>>> eventually complete successfully, but for larger matrices stages have to be 
>>> repeated several times which leads to long run times. I don't encounter any 
>>> issues if I reduce the matrix size down to about 3GB. Can anyone with 
>>> experience using MLLib's matrix operators provide any suggestions about 
>>> what settings to look at, or what the hard constraints on memory for 
>>> BlockMatrix multiplication are?
>>> 
>>> Thanks,
>>> 
>>> Anthony
> 

Reply via email to