Re: Running Spark in local mode

Ashok Kumar Sun, 19 Jun 2016 12:44:33 -0700

Thank you all sirs
Appreciated Mich your clarification.

    On Sunday, 19 June 2016, 19:31, Mich Talebzadeh <mich.talebza...@gmail.com> 
wrote:

 Thanks Jonathan for your points
I am aware of the fact yarn-client and yarn-cluster are both depreciated (still 
work in 1.6.1), hence the new nomenclature.
Bear in mind this is what I stated in my notes:
"YARN Cluster Mode, the Spark driver runs inside an application master process 
which is managed by YARN on the cluster, and the client can go away after 
initiating the application. This is invoked with –master yarn and --deploy-mode 
cluster   
   - YARN Client Mode, the driver runs in the client process, and the 
application master is only used for requesting resources from YARN. 
   -    

   - Unlike Spark standalone mode, in which the master’s address is specified 
in the --master parameter, in YARN mode the ResourceManager’s address is picked 
up from the Hadoop configuration. Thus, the --master parameter is yarn. This is 
invoked with --deploy-mode client"

These are exactly from Spark document and I quote
"There are two deploy modes that can be used to launch Spark applications on 
YARN. In cluster mode, the Spark driver runs inside an application master 
process which is managed by YARN on the cluster, and the client can go away 
after initiating the application. 
In client mode, the driver runs in the client process, and the application 
master is only used for requesting resources from YARN.
Unlike Spark standalone and Mesos modes, in which the master’s address is 
specified in the --master parameter, in YARN mode the ResourceManager’s address 
is picked up from the Hadoop configuration. Thus, the --master parameter is 
yarn."
Cheers
Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 19 June 2016 at 19:09, Jonathan Kelly <jonathaka...@gmail.com> wrote:

Mich, what Jacek is saying is not that you implied that YARN relies on two 
masters. He's just clarifying that yarn-client and yarn-cluster modes are 
really both using the same (type of) master (simply "yarn"). In fact, if you 
specify "--master yarn-client" or "--master yarn-cluster", spark-submit will 
translate that into using a master URL of "yarn" and a deploy-mode of "client" 
or "cluster".

And thanks, Jacek, for the tips on the "less-common master URLs". I had no idea 
that was an option!

~ Jonathan
On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mich.talebza...@gmail.com> 
wrote:

Good points but I am an experimentalist 
In Local mode I have this
In local mode with:--master local This will start with one thread or equivalent 
to –master local[1]. Youcan also start by more than one thread by specifying 
the number of threads k in –master local[k]. You can also start using all 
available threads with –master local[*]which in mine would be local[12].
The important thing about Local mode is that number of JVM thrown is controlled 
by you and you can start as many spark-submit as you wish within constraint of 
what you get
${SPARK_HOME}/bin/spark-submit\                
--packagescom.databricks:spark-csv_2.11:1.3.0 \                --driver-memory 
2G \                --num-executors 1 \                --executor-memory 2G \   
             --master local \                --executor-cores 2 \               
 --conf"spark.scheduler.mode=FIFO" \                
--conf"spark.executor.extraJavaOptions=-XX:+PrintGCDetails-XX:+PrintGCTimeStamps"
 \                
--jars/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \         
       --class"${FILE_NAME}" \                --conf "spark.ui.port=4040” \     
           ${JAR_FILE} \                >> ${LOG_FILE}
Now that does work fine although some of those parameters are implicit (for 
example cheduler.mode = FIFOR or FAIR and I can start different spark jobs in 
Local mode. Great for testing.
With regard to your comments on Standalone 
Spark Standalone – a simple cluster manager included with Spark that makes it 
easy to set up a cluster.

s/simple/built-inWhat is stated as "included" implies that, i.e. it comes as 
part of running Spark in standalone mode. 
Your other points on YARN cluster mode and YARN client mode
I'd say there's only one YARN master, i.e. --master yarn. You could
 however say where the driver runs, be it on your local machine where
 you executed spark-submit or on one node in a YARN cluster.
Yes that is I believe what the text implied. I would be very surprised if YARN 
as a resource manager relies on two masters :)

HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:

On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:

> Spark Local - Spark runs on the local host. This is the simplest set up and
> best suited for learners who want to understand different concepts of Spark
> and those performing unit testing.

There are also the less-common master URLs:

* local[n, maxRetries] or local[*, maxRetries] — local mode with n
threads and maxRetries number of failures.
* local-cluster[n, cores, memory] for simulating a Spark local cluster
with n workers, # cores per worker, and # memory per worker.

As of Spark 2.0.0, you could also have your own scheduling system -
see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
known implementation of the ExternalClusterManager contract in Spark
being YarnClusterManager, i.e. whenever you call Spark with --master
yarn.

> Spark Standalone – a simple cluster manager included with Spark that makes
> it easy to set up a cluster.

s/simple/built-in

> YARN Cluster Mode, the Spark driver runs inside an application master
> process which is managed by YARN on the cluster, and the client can go away
> after initiating the application. This is invoked with –master yarn and
> --deploy-mode cluster
>
> YARN Client Mode, the driver runs in the client process, and the application
> master is only used for requesting resources from YARN. Unlike Spark
> standalone mode, in which the master’s address is specified in the --master
> parameter, in YARN mode the ResourceManager’s address is picked up from the
> Hadoop configuration. Thus, the --master parameter is yarn. This is invoked
> with --deploy-mode client

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.

The same applies to Spark Standalone and Mesos and is controlled by
--deploy-mode, i.e. client (default) or cluster.

Please update your notes accordingly ;-)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

Re: Running Spark in local mode

Reply via email to