Spark Standalone cluster. My program is running very slow, I suspect it is not doing parallel processing of rdd. How can I force it to run parallel? Is there anyway to check whether it is processed in parallel?
Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 -----Original Message----- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, January 15, 2015 4:29 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to force parallel processing of RDD using multiple thread What is your cluster manager? For example on YARN you would specify --executor-cores. Read: http://spark.apache.org/docs/latest/running-on-yarn.html On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) <ningjun.w...@lexisnexis.com> wrote: > I have a standalone spark cluster with only one node with 4 CPU cores. > How can I force spark to do parallel processing of my RDD using > multiple threads? For example I can do the following > > > > Spark-submit --master local[4] > > > > However I really want to use the cluster as follow > > > > Spark-submit --master spark://10.125.21.15:7070 > > > > In that case, how can I make sure the RDD is processed with multiple > threads/cores? > > > > Thanks > > Ningjun > >