Re: zookeeper problem in giraph..

2014-04-08 Thread Suijian Zhou
Hi, Lukas, Do you know how to modify the timeout settings for zookeeper in giraph? I see the session is established on server with negotiated timeout = 60, which is 600s, I think this is enough for the job as the job get aborted in only few minutes. Really confused here, why the server closed

Re: zookeeper problem in giraph..

2014-04-07 Thread Suijian Zhou
Hi, Lukas, Got the patch applied to giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java and recompiled giraph by "mvn compile", but still the same error: 14/04/07 16:51:26 INFO job.JobProgressTracker: Data from 8 workers - Compute superstep 2: 0 out of 4847571 vertices comput

Re: zookeeper problem in giraph..

2014-04-07 Thread Suijian Zhou
Hi, Lukas, Thank you, but when I tried to apply the patch, I got: 2014.04.07|09:25:47~/giraph/giraph-core/src> git apply --check NettyClient_Timeout.patch error: patch failed: giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:153 error: giraph-core/src/main/java/org/apache/g

Re: zookeeper problem in giraph..

2014-04-04 Thread Lukas Nalezenec
BTW: This patch solves connection problems between workers, not with zookeeper but as you problem disappears when you dont sent messages the zookeeper problems may be secondary. On 5.4.2014 00:12, Lukas Nalezenec wrote: Hi, I had similar issue, it was caused by long GC pauses. I patched Nett

Re: zookeeper problem in giraph..

2014-04-04 Thread Lukas Nalezenec
Hi, I had similar issue, it was caused by long GC pauses. I patched NettyClient so when reconnect fails it sleeps for some time before next try. Patch is enclosed. Let me know if it works for you. I would try tuning GC. You can also try to use giraph.waitForRequestsConfirmation and giraph.maxN

Re: zookeeper problem in giraph..

2014-04-04 Thread Lukas Nalezenec
Hi, I had similar issue, it was caused by long GC pauses. I patched NettyClient so when reconnect fails it sleeps for some time before next try. Patch is enclosed. Let me know if it works for you. I would try tuning GC. You can also try to use giraph.waitForRequestsConfirmation and giraph.maxN

zookeeper problem in giraph..

2014-04-04 Thread Suijian Zhou
Hi, I have a zookeeper problem when running a giraph program, the program will be aborted in superstep 2 as: 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to server compute-0-18.local/10.1.255.236:22181. Will not attempt to authenticate using SASL (unknown error) 14/04/04