There is documentation here http://spark.apache.org/docs/latest/running-on-yarn.html about running spark on YARN. Like I said before you can use either the logs from the application or the Spark UI to understand how many executors are running at any given time. I don't think I can help much further without more information about the specific use case.
On Tue, Oct 9, 2018 at 2:54 PM Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi Dillon, > > I do think that there is a setting available where in once YARN sets up > the containers then you do not deallocate them, I had used it previously in > HIVE, and it just saves processing time in terms of allocating containers. > That said I am still trying to understand how do we determine one YARN > container = one executor in SPARK. > > Regards, > Gourav > > On Tue, Oct 9, 2018 at 9:04 PM Dillon Dukek > <dillon.du...@placed.com.invalid> wrote: > >> I'm still not sure exactly what you are meaning by saying that you have 6 >> yarn containers. Yarn should just be aware of the total available resources >> in your cluster and then be able to launch containers based on the >> executor requirements you set when you submit your job. If you can, I think >> it would be helpful to send me the command you're using to launch your >> spark process. You should also be able to use the logs and/or the spark UI >> to determine how many executors are running. >> >> On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta < >> gourav.sengu...@gmail.com> wrote: >> >>> hi, >>> >>> may be I am not quite clear in my head on this one. But how do we know >>> that 1 yarn container = 1 executor? >>> >>> Regards, >>> Gourav Sengupta >>> >>> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek >>> <dillon.du...@placed.com.invalid> wrote: >>> >>>> Can you send how you are launching your streaming process? Also what >>>> environment is this cluster running in (EMR, GCP, self managed, etc)? >>>> >>>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <kanth...@gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I am using Spark 2.3.1 and using YARN as a cluster manager. >>>>> >>>>> I currently got >>>>> >>>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each >>>>> container. >>>>> 2) 6 Kafka partitions from one topic. >>>>> 3) You can assume every other configuration is set to whatever the >>>>> default values are. >>>>> >>>>> Spawned a Simple Streaming Query and I see all the tasks get scheduled >>>>> on one YARN container. am I missing any config? >>>>> >>>>> Thanks! >>>>> >>>>