Hi, Lukas,
Do you know how to modify the timeout settings for zookeeper in giraph? I
see the session is established on server with negotiated timeout = 60,
which is 600s, I think this is enough for the job as the job get aborted in
only few minutes. Really confused here, why the server closed
Hi, Lukas,
Got the patch applied to
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java and
recompiled giraph by "mvn compile", but still the same error:
14/04/07 16:51:26 INFO job.JobProgressTracker: Data from 8 workers -
Compute superstep 2: 0 out of 4847571 vertices comput
Hi, Lukas,
Thank you, but when I tried to apply the patch, I got:
2014.04.07|09:25:47~/giraph/giraph-core/src> git apply --check
NettyClient_Timeout.patch
error: patch failed:
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:153
error:
giraph-core/src/main/java/org/apache/g
BTW: This patch solves connection problems between workers, not with
zookeeper but as you problem disappears when you dont sent messages the
zookeeper problems may be secondary.
On 5.4.2014 00:12, Lukas Nalezenec wrote:
Hi,
I had similar issue, it was caused by long GC pauses. I patched
Nett
Hi,
I had similar issue, it was caused by long GC pauses. I patched
NettyClient so when reconnect fails it sleeps for some time before next
try. Patch is enclosed. Let me know if it works for you.
I would try tuning GC. You can also try to use
giraph.waitForRequestsConfirmation and giraph.maxN
Hi,
I had similar issue, it was caused by long GC pauses. I patched
NettyClient so when reconnect fails it sleeps for some time before next
try. Patch is enclosed. Let me know if it works for you.
I would try tuning GC. You can also try to use
giraph.waitForRequestsConfirmation and giraph.maxN
Hi,
I have a zookeeper problem when running a giraph program, the program
will be aborted in superstep 2 as:
14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to
server compute-0-18.local/10.1.255.236:22181. Will not attempt to
authenticate using SASL (unknown error)
14/04/04