Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-29 Thread Imran Rashid
Hi Mike, I dug into this a little more, and it turns out in this case there is a pretty trivial fix -- the problem you are seeing is just from integer overflow before casting to a long in SizeEstimator. I've opened https://issues.apache.org/jira/browse/SPARK-9437 for this. For now, I think your

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-29 Thread Mike Hynes
Hi Imran, Thanks to you and Shivaram for looking into this, and opening the JIRA/PR. I will update you once the PR is merged if there are any other problems that arise from the broadcast. Mike On 7/29/15, Imran Rashid iras...@cloudera.com wrote: Hi Mike, I dug into this a little more, and it

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-28 Thread Mike Hynes
Hi Imran, Thanks for your reply. I have double-checked the code I ran to generate an nxn matrix and nx1 vector for n = 2^27. There was unfortunately a bug in it, where instead of having typed 134,217,728 for n = 2^27, I included a third '7' by mistake, making the size 10x larger. However, even

Broadcast variable of size 1 GB fails with negative memory exception

2015-07-28 Thread Mike Hynes
Hello Devs, I am investigating how matrix vector multiplication can scale for an IndexedRowMatrix in mllib.linalg.distributed. Currently, I am broadcasting the vector to be multiplied on the right. The IndexedRowMatrix is stored across a cluster with up to 16 nodes, each with 200 GB of memory.

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-28 Thread Imran Rashid
Hi Mike, are you sure there the size isn't off 2x somehow? I just tried to reproduce with a simple test in BlockManagerSuite: test(large block) { store = makeBlockManager(4e9.toLong) val arr = new Array[Double](1 28) println(arr.size) val blockId = BlockId(rdd_3_10) val result =