Hi,
This issue is on prod, running Marathon 0.6 - we are currently testing
0.7.5 on Dev, but I've no results of this behavior yet.
I saw your post by searching on the Marathon group but didn't consider that
it would apply to my case as I don't see the NPE.
The warning on version mismatch between M
Hi Gerard,
What version of Marathon are you running? I ran into similar behavior some time
back. My problem seems to be compatibility issue between Marathon and Meosos:
https://github.com/mesosphere/marathon/issues/595
Regards,
Shijun
On Jan 8, 2015, at 9:28 AM, Gerard Maas
mailto:gerard.m.
Hi again,
I finally found a clue in this issue. It looks like Marathon is the one
behind the job killing spree. I still don't know *why* but it looks like
the task consolidation of Marathon finds a discrepancy with Mesos and
decides to kill the instance.
INFO|2015-01-08
10:05:35,491|pool-1-thread
Thanks!. I'll try that and report back once I've some interesting evidence.
-kr, Gerard.
On Tue, Dec 2, 2014 at 12:54 AM, Tim Chen wrote:
> Hi Gerard,
>
> I see. What will be helpful to help diagnoise your problem is that if you
> can enable verbose logging (GLOG_v=1) before running the slave,
Hi Gerard,
I see. What will be helpful to help diagnoise your problem is that if you
can enable verbose logging (GLOG_v=1) before running the slave, and share
the slave logs when it happens.
Tim
On Mon, Dec 1, 2014 at 3:23 PM, Gerard Maas wrote:
> Hi Tim,
>
> It's quite hard to reproduce. It j
Hi Tim,
It's quite hard to reproduce. It just "happens"... some time worst than
others, mostly when the system is under load. We notice b/c the framework
starts 'jumping' from one slave to other, but so far we have no clue why
this is happening.
What I'm currently looking for is some potential co
When executor dies, did you see any exceptions from Mesos or Spark?
On 1 December 2014 at 22:43, Gerard Maas wrote:
> Hi,
>
> Sorry if this has been discussed before. I'm new to the list.
>
> We are currently running our Spark + Spark Streaming jobs on Mesos,
> submitting our jobs through Marat
There are different reasons, but most commonly is when the framework ask to
kill the task.
Can you provide some easy repro steps/artifacts? I've been working on Spark
on Mesos these days and can help try this out.
Tim
On Mon, Dec 1, 2014 at 2:43 PM, Gerard Maas wrote:
> Hi,
>
> Sorry if this h
Hi,
Sorry if this has been discussed before. I'm new to the list.
We are currently running our Spark + Spark Streaming jobs on Mesos,
submitting our jobs through Marathon.
We see with some regularity that the Spark Streaming driver gets killed by
Mesos and then restarted on some other node by Ma
9 matches
Mail list logo