Re: BlockMatrix multiplication

2015-07-14 Thread Rakesh Chalasani
Block matrix stores the data as key->Matrix pairs and multiply does a reduceByKey operations, aggregating matrices per key. Since you said each block is residing in a separate partition, reduceByKey might be effectively shuffling all of the data. A better way to go about this is to allow multiple b

Re: BlockMatrix multiplication

2015-07-14 Thread Ulanov, Alexander
Hi Rakesh, Thanks for suggestion. Each block of original matrix is in separate partition. Each block of transposed matrix is also in a separate partition. The partition numbers are the same for the blocks that undergo multiplication. Each partition is on a separate worker. Basically, I want to

Re: BlockMatrix multiplication

2015-07-14 Thread Rakesh Chalasani
Hi Alexander: Aw, I missed the 'cogroup' on BlockMatrix multiply! I stand corrected. Check https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala#L361 BlockMatrix multiply uses a custom partiti

RE: BlockMatrix multiplication

2015-07-14 Thread Ulanov, Alexander
am missing something or using it wrong. Best regards, Alexander From: Rakesh Chalasani [mailto:vnit.rak...@gmail.com] Sent: Tuesday, July 14, 2015 9:05 AM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: BlockMatrix multiplication Hi Alexander: Aw, I missed the 'cogrou

Re: BlockMatrix multiplication

2015-07-14 Thread Burak Yavuz
sing something or using it wrong. > > > > Best regards, Alexander > > > > *From:* Rakesh Chalasani [mailto:vnit.rak...@gmail.com] > *Sent:* Tuesday, July 14, 2015 9:05 AM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: BlockMatrix multiplicatio

RE: BlockMatrix multiplication

2015-07-14 Thread Ulanov, Alexander
From: Burak Yavuz [mailto:brk...@gmail.com] Sent: Tuesday, July 14, 2015 10:14 AM To: Ulanov, Alexander Cc: Rakesh Chalasani; dev@spark.apache.org Subject: Re: BlockMatrix multiplication Hi Alexander, From your example code, using the GridPartitioner, you will have 1 column, and 5 rows. When you

RE: BlockMatrix multiplication

2015-07-15 Thread Ulanov, Alexander
/ 1e9) Best regards, Alexander From: Ulanov, Alexander Sent: Tuesday, July 14, 2015 6:24 PM To: 'Burak Yavuz' Cc: Rakesh Chalasani; dev@spark.apache.org Subject: RE: BlockMatrix multiplication Hi Burak, Thank you for explanation! I will try to make a diagonal block matrix and report y

Re: BlockMatrix multiplication

2015-07-15 Thread Burak Yavuz
> > bm.validate() > > val t = System.nanoTime() > > // multiply matrix with itself > > val aa = bm.multiply(bm) > > aa.validate() > > println(rows + "x" + columns + ", block:" + blockSize + "\t" + > (System.nanoTime() - t) / 1e9) > >

RE: BlockMatrix multiplication

2015-07-16 Thread Ulanov, Alexander
, Alexander Cc: Rakesh Chalasani; dev@spark.apache.org Subject: Re: BlockMatrix multiplication Hi Alexander, I just noticed the error in my logic. There will always be a shuffle due to the `cogroup`. `join` also uses cogroup, therefore a shuffle is inevitable. However, the reduceByKey will not cause a

Re: BlockMatrix multiplication

2015-07-17 Thread Burak Yavuz
mit a JIRA Issue related to the problem of block matrix > shuffling given the blocks co-location? > > > > Best regards, Alexander > > > > *From:* Burak Yavuz [mailto:brk...@gmail.com] > *Sent:* Wednesday, July 15, 2015 3:29 PM > > *To:* Ulanov, Alexander > *Cc:* Rakesh Ch