So if the process your communicating with from Spark isn't launched inside of its YARN container then it shouldn't be an issue - although it sounds like you maybe have multiple resource managers on the same machine which can sometimes lead to interesting/difficult states.
On Thu, Nov 24, 2016 at 1:27 PM, Sameer Choudhary <sameer2...@gmail.com> wrote: > Ok, that makes sense for processes directly launched via fork or exec from > the task. > > However, in my case the nd that starts docker daemon starts the new > process. This process runs in a docker container. Will the container use > memory from YARN executor memory overhead, as well? How will YARN know that > the container launched by the docker daemon is linked to an executor? > > Best, > Sameer > > On Thu, Nov 24, 2016 at 1:59 AM Holden Karau <hol...@pigscanfly.ca> wrote: > >> YARN will kill your processes if the child processes you start via PIPE >> consume too much memory, you can configured the amount of memory Spark >> leaves aside for other processes besides the JVM in the YARN containers >> with spark.yarn.executor.memoryOverhead. >> >> On Wed, Nov 23, 2016 at 10:38 PM, Sameer Choudhary <sameer2...@gmail.com> >> wrote: >> >> Hi, >> >> I am working on an Spark 1.6.2 application on YARN managed EMR cluster >> that uses RDD's pipe method to process my data. I start a light weight >> daemon process that starts processes for each task via pipes. This is >> to ensure that I don't run into >> https://issues.apache.org/jira/browse/SPARK-671. >> >> I'm running into Spark job failure due to task failures across the >> cluster. Following are the questions that I think would help in >> understanding the issue: >> >> - How does resource allocation in PySpark work? How does YARN and >> SPARK track the memory consumed by python processes launched on the >> worker nodes? >> >> - As an example, let's say SPARK started n tasks on a worker node. >> These n tasks start n processes via pipe. Memory for executors is >> already reserved during application launch. As the processes run their >> memory footprint grows and eventually there is not enough memory on >> the box. In this case how will YARN and SPARK behave? Will the >> executors be killed or my processes will kill, eventually killing the >> task? I think this could lead to cascading failures of tasks across >> cluster as retry attempts also fail, eventually leading to termination >> of SPARK job. Is there a way to avoid this? >> >> - When we define number of executors in my SparkConf, are they >> distributed evenly across my nodes? One approach to get around this >> problem would be to limit the number of executors on each host that >> YARN can launch. So we will manage the memory for piped processes >> outside of YARN. Is there way to avoid this? >> >> Thanks, >> Sameer >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau