I the above setup my executors start one docker container per task. Some of
these containers grow in memory as data is piped. Eventually there is not
enough memory on the machine for docker containers to run (since YARN
already started its containers), and everything starts failing.
The way I'm
So if the process your communicating with from Spark isn't launched inside
of its YARN container then it shouldn't be an issue - although it sounds
like you maybe have multiple resource managers on the same machine which
can sometimes lead to interesting/difficult states.
On Thu, Nov 24, 2016 at
Ok, that makes sense for processes directly launched via fork or exec from
the task.
However, in my case the nd that starts docker daemon starts the new
process. This process runs in a docker container. Will the container use
memory from YARN executor memory overhead, as well? How will YARN know
YARN will kill your processes if the child processes you start via PIPE
consume too much memory, you can configured the amount of memory Spark
leaves aside for other processes besides the JVM in the YARN containers
with spark.yarn.executor.memoryOverhead.
On Wed, Nov 23, 2016 at 10:38 PM, Sameer
Hi,
I am working on an Spark 1.6.2 application on YARN managed EMR cluster
that uses RDD's pipe method to process my data. I start a light weight
daemon process that starts processes for each task via pipes. This is
to ensure that I don't run into
https://issues.apache.org/jira/browse/SPARK-671.
Hi,
I am working on an Spark 1.6.2 application on YARN managed EMR cluster that
uses RDD's pipe method to process my data. I start a light weight daemon
process that starts processes for each task via pipes. This is to ensure
that I don't run into https://issues.apache.org/jira/browse/SPARK-671.