Hey all, 

I’ve found myself in a position where I need to do a relatively large matrix 
multiply (at least, compared to what I normally have to do). I’m looking to 
multiply a 100k by 500k dense matrix by its transpose to yield 100k by 100k 
matrix. I’m trying to do this on Google Cloud, so I don’t have any real limits 
on cluster size or memory. However, I have no idea where to begin as far as 
number of cores / number of partitions / how big to make the block size for 
best performance. Is there anywhere where Spark users collect optimal 
configurations for methods relative to data input size? Does anyone have any 
suggestions? I’ve tried throwing 900 cores at a 100k by 100k matrix multiply 
with 1000 by 1000 sized blocks, and that seemed to hang forever and eventually 
fail. 

Thanks ,

John
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to