[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2017-05-25 Thread Benjamin Mahler (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025625#comment-16025625 ] Benjamin Mahler commented on MESOS-5332: In order to enable users who hit this situation to safely

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-11 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280897#comment-15280897 ] Anand Mazumdar commented on MESOS-5332: --- [~StephanErb] That took some catching! Since, we have

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-11 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279791#comment-15279791 ] Stephan Erb commented on MESOS-5332: I was able to assemble a reproducing example (using Aurora master

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-07 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275412#comment-15275412 ] Anand Mazumdar commented on MESOS-5332: --- It doesn't. {{send}} just provides at most once semantics.

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-07 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275314#comment-15275314 ] Stephan Erb commented on MESOS-5332: The observation that it takes 5 seconds for a faulty executor to

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-06 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274881#comment-15274881 ] Anand Mazumdar commented on MESOS-5332: --- [~bmahler] and me again investigated this today. We think

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-06 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273776#comment-15273776 ] Stephan Erb commented on MESOS-5332: All 7 killed executors have the same offending log messages. I

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273336#comment-15273336 ] Anand Mazumdar commented on MESOS-5332: --- [~bmahler] and me went through the agent logs in more

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273276#comment-15273276 ] Anand Mazumdar commented on MESOS-5332: --- Correction: Some of the ~9400 might have already been

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273268#comment-15273268 ] Anand Mazumdar commented on MESOS-5332: --- [~StephanErb] Thanks for uploading the logs. Can you

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273171#comment-15273171 ] Stephan Erb commented on MESOS-5332: {code} $ ls

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Anand Mazumdar (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272952#comment-15272952 ] Anand Mazumdar commented on MESOS-5332: --- [~StephanErb] Thanks for reporting this issue after the

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272937#comment-15272937 ] Stephan Erb commented on MESOS-5332: [~vinodkone] we have talked about this issue before