As we all know, a partition in Spark is actually an Iterator[T]. For some
purpose, I want to treat each partition not an Iterator but one whole
object. For example, treat Iterator[Int] to a
breeze.linalg.DenseVector[Int]. Thus I use 'mapPartitions' API to achieve
this, however, during the implement
Thank you for your reply,
Actually, we have already used this parameter. Our cluster is a
standalone cluster with 16 nodes, every node has 16 cores. We have 256 pairs
matrices along with 256 tasks , when we set --total-executor-cores as 64,
each node can launch 4 tasks simultaneously, each task
Hi, everyone
I come across with a problem about increasing the concurency. In a
program, after shuffle write, each node should fetch 16 pair matrices to do
matrix multiplication. such as:
*
import breeze.linalg.{DenseMatrix => BDM}
pairs.map(t => {
val b1 = t._2._1.asInstanceOf[BDM[Do
gt; You get better utilization when you're higher than 1.
>
> Aaron Davidson goes into this more somewhere in this talk --
> https://www.youtube.com/watch?v=dmL0N3qfSc8
>
> On Mon, Sep 22, 2014 at 11:52 PM, Nicholas Chammas <
> nicholas.chammas@
>> wrote:
>
>
We are now implementing a matrix multiplication algorithm on Spark, which was
designed in the traditional MPI working way before. It assumes every core in
the grid computes in parallel.
Now in our develop environment, each executor node has 16 cores, and I
assign 16 tasks to each executor node to