Hi, I am testing some parallel processing of Spark applications.
I have a two node spark cluster and currently running two worker processes on each in Yarn-client mode. The master has 12 cores and 24GB of RAM. The worker node has 4GB of RAM and 2 cores (well an old 32 bit host). The OS on both is RHES5 with master 64-bit and the worker 32-bit. I submit identical jobs to master via spark-shell, one from master and the other from the worker node However, whichever one comes first is executed but the other job is waiting. I know this has nothing to do with Hadoop as I can run a Spark job and identical Hive job in parallel. I have allocated the following resources in spark-env.sh export SPARK_EXECUTOR_CORES=3 ##, Number of cores for the workers (Default: 1). export SPARK_EXECUTOR_MEMORY=3G ## , Memory per Worker (e.g. 1000M, 2G) (Default: 1G) export SPARK_DRIVER_MEMORY=1G ## , Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) The problem is that the first job goes and grabs 12 cores despite the above settings. Is this behaviour expected in Yarn -client mode or there is something wrong in here? The Spark GUI shows the jobs. The master worker runs on address ending in 216 whereas the worker runs on 217 as shown below Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org