Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-60327322
@tsliwowicz your fix seems good -- thanks for getting to the bottom of this!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user tsliwowicz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-59724452
@mateiz - @KashiErez and I went on a different route. The killer issue was
that there is a System.exit(1) in BlockManagerMasterActor which was a huge
robustness issue
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-55978678
I see, so maybe the problem is that an executor dies, and another is
launched on the same Mesos machine with the same executor ID, which then breaks
assumptions elsewhere
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-55978771
BTW the delta from the original pull request would be that we only
increment our counter when the old executor fails. If you want to implement
that, please create a JIRA
Github user brndnmtthws commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-55659794
It seems that this is a symptom of the following issue:
https://issues.apache.org/jira/browse/SPARK-3535
---
If your project is set up for it, you can reply
Github user brndnmtthws commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-55310679
Yep, also hitting this same problem. We're running Spark 1.0.2 and Mesos
0.20.0.
From a quick analysis, it looks like a bug in Spark.
---
If your project
Github user KashiErez commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-55237569
I have encountered this issue:
We have a 24/7 Spark running job on Mesos.
It happens every 1-3 days.
Here are 2 lines from my Driver log file:
Github user gmalouf commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-52836949
We've run into this issue a handful of times including once today - is it
possible the bug is in Mesos?
---
If your project is set up for it, you can reply to this
Github user drexin commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-50318921
@mateiz: You are right. I don't see how an executor could be started more
than once per slave, but it seems to happen sometimes (see the mailing list
entry). I will close
Github user drexin closed the pull request at:
https://github.com/apache/spark/pull/1358
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-50413581
Sure, if you find it, let me know.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-50250085
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-50250198
QA tests have started for PR 1358. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17229/consoleFull
---
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1358#discussion_r15436238
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
---
@@ -250,7 +252,7 @@ private[spark] class
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-50250554
So I don't quite understand, how can multiple executors be launched for the
same Spark application on the same node right now? I thought we always reuse
our executor
Github user drexin commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-48706534
Hi Patrick,
the problem is described in [this mailing list
Github user drexin commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-48707209
Created a JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2445
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user drexin opened a pull request:
https://github.com/apache/spark/pull/1358
mesos executor ids now consist of the slave id and a counter to fix dupl...
...icate id problems
You can merge this pull request into a Git repository by running:
$ git pull
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1358#issuecomment-48630513
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
19 matches
Mail list logo