Hi Mike,
I dug into this a little more, and it turns out in this case there is a
pretty trivial fix -- the problem you are seeing is just from integer
overflow before casting to a long in SizeEstimator. I've opened
https://issues.apache.org/jira/browse/SPARK-9437 for this.
For now, I think your
Hi Imran,
Thanks to you and Shivaram for looking into this, and opening the
JIRA/PR. I will update you once the PR is merged if there are any
other problems that arise from the broadcast.
Mike
On 7/29/15, Imran Rashid iras...@cloudera.com wrote:
Hi Mike,
I dug into this a little more, and it
Hi Imran,
Thanks for your reply. I have double-checked the code I ran to
generate an nxn matrix and nx1 vector for n = 2^27. There was
unfortunately a bug in it, where instead of having typed 134,217,728
for n = 2^27, I included a third '7' by mistake, making the size 10x
larger.
However, even
Hello Devs,
I am investigating how matrix vector multiplication can scale for an
IndexedRowMatrix in mllib.linalg.distributed.
Currently, I am broadcasting the vector to be multiplied on the right.
The IndexedRowMatrix is stored across a cluster with up to 16 nodes,
each with 200 GB of memory.
Hi Mike,
are you sure there the size isn't off 2x somehow? I just tried to
reproduce with a simple test in BlockManagerSuite:
test(large block) {
store = makeBlockManager(4e9.toLong)
val arr = new Array[Double](1 28)
println(arr.size)
val blockId = BlockId(rdd_3_10)
val result =