Hey Adam,

I'm not sure I understand just yet what you have in mind. My takeaway from
the logs is that the container actually was above its allotment of about
14G. Since 6G of that are for overhead, I assumed there to be plenty of
space for Python workers, but there seem to be more of those than I'd
expect.

Does anyone know if that is actually the intended behavior, i.e. in this
case over 90 Python processes on a 2 core executor?

Best,
-Sven


On Fri, Jan 23, 2015 at 10:04 PM, Adam Diaz <adam.h.d...@gmail.com> wrote:

> Yarn only has the ability to kill not checkpoint or sig suspend.  If you
> use too much memory it will simply kill tasks based upon the yarn config.
> https://issues.apache.org/jira/browse/YARN-2172
>
>
> On Friday, January 23, 2015, Sandy Ryza <sandy.r...@cloudera.com> wrote:
>
>> Hi Sven,
>>
>> What version of Spark are you running?  Recent versions have a change
>> that allows PySpark to share a pool of processes instead of starting a new
>> one for each task.
>>
>> -Sandy
>>
>> On Fri, Jan 23, 2015 at 9:36 AM, Sven Krasser <kras...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> I am running into a problem where YARN kills containers for being over
>>> their memory allocation (which is about 8G for executors plus 6G for
>>> overhead), and I noticed that in those containers there are tons of
>>> pyspark.daemon processes hogging memory. Here's a snippet from a container
>>> with 97 pyspark.daemon processes. The total sum of RSS usage across all of
>>> these is 1,764,956 pages (i.e. 6.7GB on the system).
>>>
>>> Any ideas what's happening here and how I can get the number of
>>> pyspark.daemon processes back to a more reasonable count?
>>>
>>> 2015-01-23 15:36:53,654 INFO  [Reporter] yarn.YarnAllocationHandler 
>>> (Logging.scala:logInfo(59)) - Container marked as failed: 
>>> container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics: 
>>> Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is 
>>> running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB 
>>> physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing 
>>> container.
>>> Dump of the process-tree for container_1421692415636_0052_01_000030 :
>>>     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
>>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>>     |- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m 
>>> pyspark.daemon
>>>     |- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m 
>>> pyspark.daemon
>>>     |- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m 
>>> pyspark.daemon
>>>
>>>     [...]
>>>
>>>
>>> Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c
>>>
>>> Thank you!
>>> -Sven
>>>
>>> --
>>> krasser <http://sites.google.com/site/krasser/?utm_source=sig>
>>>
>>
>>


-- 
http://sites.google.com/site/krasser/?utm_source=sig

Reply via email to