Re: Spark on YARN not utilizing all the YARN containers available

2018-10-10 Thread Gourav Sengupta
Hi Dillon, yes we can understand the number of executors that are running but the question is more around understanding the relation between YARN containers, their persistence and SPARK excutors. Regards, Gourav On Wed, Oct 10, 2018 at 6:38 AM Dillon Dukek wrote: > There is documentation here

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
There is documentation here http://spark.apache.org/docs/latest/running-on-yarn.html about running spark on YARN. Like I said before you can use either the logs from the application or the Spark UI to understand how many executors are running at any given time. I don't think I can help much

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
Hi Dillon, I do think that there is a setting available where in once YARN sets up the containers then you do not deallocate them, I had used it previously in HIVE, and it just saves processing time in terms of allocating containers. That said I am still trying to understand how do we determine

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
I'm still not sure exactly what you are meaning by saying that you have 6 yarn containers. Yarn should just be aware of the total available resources in your cluster and then be able to launch containers based on the executor requirements you set when you submit your job. If you can, I think it

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Gourav Sengupta
hi, may be I am not quite clear in my head on this one. But how do we know that 1 yarn container = 1 executor? Regards, Gourav Sengupta On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek wrote: > Can you send how you are launching your streaming process? Also what > environment is this cluster

Re: Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread Dillon Dukek
Can you send how you are launching your streaming process? Also what environment is this cluster running in (EMR, GCP, self managed, etc)? On Tue, Oct 9, 2018 at 10:21 AM kant kodali wrote: > Hi All, > > I am using Spark 2.3.1 and using YARN as a cluster manager. > > I currently got > > 1) 6

Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread kant kodali
Hi All, I am using Spark 2.3.1 and using YARN as a cluster manager. I currently got 1) 6 YARN containers(executors=6) with 4 executor cores for each container. 2) 6 Kafka partitions from one topic. 3) You can assume every other configuration is set to whatever the default values are. Spawned a