Re: java driver/shutdown call

2018-01-17 Thread Mohit Jaggi
ore robust? > > On Tue, Jan 16, 2018 at 8:41 PM Mohit Jaggi wrote: > >> I am trying to change Apache Aurora's code to call SHUTDOWN instead of >> KILL. SHUTDOWN seems to offer more robust termination than KILL. >> >> On Tue, Jan 16, 2018 at 6:40 PM, Benjamin Ma

Re: java driver/shutdown call

2018-01-16 Thread Mohit Jaggi
< >>> mazumdar.an...@gmail.com> wrote: >>> >>>> Yes; It's a newer interface that still allows you to switch between the >>>> v1 (new) and the old API. >>>> >>>> -anand >>>> >>>> On Fri, Jan 12, 20

Re: java driver/shutdown call

2018-01-16 Thread Mohit Jaggi
> On Fri, Jan 12, 2018 at 3:28 PM, Mohit Jaggi wrote: > >> Are you suggesting >> >> *send(new Call(METHOD, Param1, ...)) * >> >> instead of >> >> *driver.method(Param1, )* >> >> *?* >> >> On Fri, Jan 12, 2018 at 10:59 AM, A

Re: java driver/shutdown call

2018-01-12 Thread Mohit Jaggi
27; call. We also have a V0Mesos class that uses the old > scheduler driver internally. > > -anand > > On Wed, Jan 10, 2018 at 2:53 PM, Mohit Jaggi wrote: > >> Thanks Vinod. Is there a V1SchedulerDriver.java file? I see >> https://github.com/apache/mesos/tree/72752fc6d

Re: java driver/shutdown call

2018-01-10 Thread Mohit Jaggi
is only available for v1 schedulers. > > On Fri, Jan 5, 2018 at 3:38 PM, Mohit Jaggi wrote: > >> Folks, >> I am trying to change Apache Aurora's code to call SHUTDOWN instead of >> KILL. However, it seems that the SchedulerDriver class in Mesos does not >&g

java driver/shutdown call

2018-01-05 Thread Mohit Jaggi
Folks, I am trying to change Apache Aurora's code to call SHUTDOWN instead of KILL. However, it seems that the SchedulerDriver class in Mesos does not have a shutdownExecutor() call. https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/src/java/src/org/apache/mesos/Schedul

Re: explain these replication logs?

2017-12-13 Thread Mohit Jaggi
that time? > > On Wed, Dec 13, 2017 at 9:27 AM, Mohit Jaggi wrote: > >> Folks, >> Can you help please? >> >> Mohit. >> -- Forwarded message -- >> From: Bill Farner >> Date: Wed, Dec 13, 2017 at 9:06 AM >> Subject: Re: expla

Fwd: explain these replication logs?

2017-12-13 Thread Mohit Jaggi
erested in your findings! On Tue, Dec 12, 2017 at 4:32 PM, Mohit Jaggi wrote: > For the same position I see two bursts of writes, one around 00:12:36 and > another 12 min earlier. Any idea what this means? > > ~/a/a/aurora-outage ❯❯❯ grep 67516183 cpp-repl-logs > Nov 8 00:12:36

Re: Multi-machine jobs

2017-12-03 Thread Mohit Jaggi
map-reduce or spark can work. On Sun, Dec 3, 2017 at 9:13 AM, Adam Sylvester wrote: > I have a use case where my Scheduler gets an externally-generated request > to produce an image. This is a CPU-intensive task that I can divide up > into, say, 20 largely independent jobs, and I have an applic

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
there > isn't a complete lifecycle API for the executor. (This includes > healthiness, state updates, reconciliation, ability for scheduler to shut > it down, etc). > > On Tue, Oct 31, 2017 at 4:27 PM, Mohit Jaggi wrote: > >> Good question. >> - I don't know

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
) knows this expected behavior of Thermos and can clean up > ones that get stuck after the task terminates. However, we currently don't > provide a great executor lifecycle API to enable schedulers to do this > (it's long overdue). > > On Tue, Oct 31, 2017 at 2:47 PM, M

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
I command for 'exec'ing into the > container? > > On Tue, Oct 31, 2017 at 12:47 PM, Mohit Jaggi > wrote: > >> Yes. There is a fix available now in Aurora/Thermos to try and exit in >> such scenarios. But I am curious to know if Mesos agent has the >> functi

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
t looks like an > agent termination is involved here as well? > > On Fri, Oct 27, 2017 at 3:09 PM, Mohit Jaggi wrote: > >> Here are some relevant logs. Aurora scheduler logs shows the task going >> from: >> INIT >> ->PENDING >> ->ASSIGNED >>

Re: orphan executor

2017-10-27 Thread Mohit Jaggi
n start 69 _start_new_thread(self.__bootstrap, ()) 70 thread.error: can't start new thread On Fri, Oct 27, 2017 at 2:25 PM, Vinod Kone wrote: > Can you share the agent and executor logs of an example orphaned executor? > That would help us diagnose the issue. > > On Fri

orphan executor

2017-10-27 Thread Mohit Jaggi
Folks, Often I see some orphaned executors in my cluster. These are cases where the framework was informed of task loss, so has forgotten about them as expected, but the container(docker) is still around. AFAIK, Mesos agent is the only entity that has knowledge of these containers. How do I ensure