We may no longer need to track disassociation and IMHO use the *improved*
feature in akka 2.2.x called remote death watch. Which lets us acknowledge
a remote death both in case of a natural demise and accidental deaths. This
was not the case with remote death watch in previous akka releases.
Hey Prashant, do messages still get lost while we’re dissociated? Or can you
set the timeouts high enough to proven that?
Matei
On Nov 13, 2013, at 12:39 AM, Prashant Sharma scrapco...@gmail.com wrote:
We may no longer need to track disassociation and IMHO use the *improved*
feature in akka
We can set timeouts high enough ! same as connection timeout that we
already set.
On Wed, Nov 13, 2013 at 11:37 PM, Matei Zaharia matei.zaha...@gmail.comwrote:
Hey Prashant, do messages still get lost while we’re dissociated? Or can
you set the timeouts high enough to proven that?
Matei
Yes, so far they’ve been built on that assumption — not that Akka would
*guarantee* delivery in that as soon as the send() call returns you know it’s
delivered, but that Akka would act the same way as a TCP socket, allowing you
to send a stream of messages in order and hear when the connection
unfortunately that change wasn't the silver bullet I was hoping for. Even
with
1) ignoring DisassociatedEvent
2) executor uses ReliableProxy to send messages back to driver
3) turn up akka.remote.watch-failure-detector.threshold=12
there is a lot of weird behavior. First, there are a few
It’s true that Akka’s delivery guarantees are in general at-most-once, but if
you look at the text there it says that they differ by transport. In the
previous version, I’m quite sure that except maybe in very rare circumstances
or cases where we had a bug, Akka’s remote layer always kept
Sorry if I my understanding is wrong. May be, for this particular case it
might be something to do with the load/network, but, in general, are you
saying that, we build these communication channels(block manager
communication, task events communication, etc) assuming akka would take
care of it? I
I have things running (from scala 2.10 branch) for over 3-4 hours now
without a problem and my jobs write data about the same as you suggested.
My cluster size is 7 nodes and not *congested* for memory. I going to leave
jobs running all night long. Meanwhile I had encourage you to try to spot
the
I'm gonna try turning on more akka debugging msgs as described at
http://akka.io/faq/
and
http://doc.akka.io/docs/akka/current/scala/testing.html#Tracing_Actor_Invocations
unfortunately that will require a patch to spark, but hopefully that will
give us more info to go on ...
On Wed, Oct 30,
Can you apply this patch too and check the logs of Driver and worker.
diff --git
a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
index b6f0ec9..ad0ebf7 100644
---
I am guessing something wrong with using Dissociation event then.
Try applying something on the lines of this patch. This might cause the
executors to hang so be prepared for that.
diff --git
a/core/src/main/scala/org/apache/spark/executor/StandaloneExecutorBackend.scala
yeah, just causes them to hang.
the first deadLetters message shows up about the same time. Oddly, after
it first happens, I keep getting some results trickling in from those
executors. (maybe they were just queued up on the driver already, I
dunno.) but then it just hangs. the stage has a
We've been testing out the 2.10 branch of spark, and we're running into
some issues were akka disconnects from the executors after a while. We ran
some simple tests first, and all was well, so we started upgrading our
whole codebase to 2.10. Everything seemed to be working, but then we
noticed
13 matches
Mail list logo