Re: Mesos killing Spark Driver

2015-01-08 Thread Gerard Maas
Hi, This issue is on prod, running Marathon 0.6 - we are currently testing 0.7.5 on Dev, but I've no results of this behavior yet. I saw your post by searching on the Marathon group but didn't consider that it would apply to my case as I don't see the NPE. The warning on version mismatch between M

Re: Mesos killing Spark Driver

2015-01-08 Thread Shijun Kong
Hi Gerard, What version of Marathon are you running? I ran into similar behavior some time back. My problem seems to be compatibility issue between Marathon and Meosos: https://github.com/mesosphere/marathon/issues/595 Regards, Shijun On Jan 8, 2015, at 9:28 AM, Gerard Maas mailto:gerard.m.

Re: Mesos killing Spark Driver

2015-01-08 Thread Gerard Maas
Hi again, I finally found a clue in this issue. It looks like Marathon is the one behind the job killing spree. I still don't know *why* but it looks like the task consolidation of Marathon finds a discrepancy with Mesos and decides to kill the instance. INFO|2015-01-08 10:05:35,491|pool-1-thread

Re: Mesos killing Spark Driver

2014-12-01 Thread Gerard Maas
Thanks!. I'll try that and report back once I've some interesting evidence. -kr, Gerard. On Tue, Dec 2, 2014 at 12:54 AM, Tim Chen wrote: > Hi Gerard, > > I see. What will be helpful to help diagnoise your problem is that if you > can enable verbose logging (GLOG_v=1) before running the slave,

Re: Mesos killing Spark Driver

2014-12-01 Thread Tim Chen
Hi Gerard, I see. What will be helpful to help diagnoise your problem is that if you can enable verbose logging (GLOG_v=1) before running the slave, and share the slave logs when it happens. Tim On Mon, Dec 1, 2014 at 3:23 PM, Gerard Maas wrote: > Hi Tim, > > It's quite hard to reproduce. It j

Re: Mesos killing Spark Driver

2014-12-01 Thread Gerard Maas
Hi Tim, It's quite hard to reproduce. It just "happens"... some time worst than others, mostly when the system is under load. We notice b/c the framework starts 'jumping' from one slave to other, but so far we have no clue why this is happening. What I'm currently looking for is some potential co

Re: Mesos killing Spark Driver

2014-12-01 Thread Jing Dong
When executor dies, did you see any exceptions from Mesos or Spark? On 1 December 2014 at 22:43, Gerard Maas wrote: > Hi, > > Sorry if this has been discussed before. I'm new to the list. > > We are currently running our Spark + Spark Streaming jobs on Mesos, > submitting our jobs through Marat

Re: Mesos killing Spark Driver

2014-12-01 Thread Tim Chen
There are different reasons, but most commonly is when the framework ask to kill the task. Can you provide some easy repro steps/artifacts? I've been working on Spark on Mesos these days and can help try this out. Tim On Mon, Dec 1, 2014 at 2:43 PM, Gerard Maas wrote: > Hi, > > Sorry if this h

Fwd: Mesos killing Spark Driver

2014-12-01 Thread Gerard Maas
Hi, Sorry if this has been discussed before. I'm new to the list. We are currently running our Spark + Spark Streaming jobs on Mesos, submitting our jobs through Marathon. We see with some regularity that the Spark Streaming driver gets killed by Mesos and then restarted on some other node by Ma