RE: How to force parallel processing of RDD using multiple thread

Wang, Ningjun (LNG-NPV) Thu, 15 Jan 2015 14:55:26 -0800

Spark Standalone cluster.

My program is running very slow, I suspect it is not doing parallel processing 
of rdd. How can I force it to run parallel? Is there anyway to check whether it 
is processed in parallel?


Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541


-----Original Message-----
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Thursday, January 15, 2015 4:29 PM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: How to force parallel processing of RDD using multiple thread

What is your cluster manager? For example on YARN you would specify 
--executor-cores. Read:
http://spark.apache.org/docs/latest/running-on-yarn.html

On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) 
<ningjun.w...@lexisnexis.com> wrote:
> I have a standalone spark cluster with only one node with 4 CPU cores. 
> How can I force spark to do parallel processing of my RDD using 
> multiple threads? For example I can do the following
>
>
>
> Spark-submit  --master local[4]
>
>
>
> However I really want to use the cluster as follow
>
>
>
> Spark-submit  --master spark://10.125.21.15:7070
>
>
>
> In that case, how can I make sure the RDD is processed with multiple 
> threads/cores?
>
>
>
> Thanks
>
> Ningjun
>
>

RE: How to force parallel processing of RDD using multiple thread

Reply via email to