subject:"Spark job concurrency problem"

Re: Spark job concurrency problem

2015-05-05 Thread Imran Rashid

can you give your entire spark submit command? Are you missing "--executor-cores "? Also, if you intend to use all 6 nodes, you also need "--num-executors 6" On Mon, May 4, 2015 at 2:07 AM, Xi Shen wrote: > Hi, > > I have two small RDD, each has about 600 records. In my code, I did > > val rdd

Spark job concurrency problem

2015-05-04 Thread Xi Shen

Hi, I have two small RDD, each has about 600 records. In my code, I did val rdd1 = sc...cache() val rdd2 = sc...cache() val result = rdd1.cartesian(rdd2).*repartition*(num_cpu).map {case (a,b) => some_expensive_job(a,b) } I ran my job in YARN cluster with "--master yarn-cluster", I have 6 exe