Hi, Lukas, Got the patch applied to giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java and recompiled giraph by "mvn compile", but still the same error:
14/04/07 16:51:26 INFO job.JobProgressTracker: Data from 8 workers - Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 partitions computed; min free memory on worker 5 - 270.76MB, average 394.74MB 14/04/07 16:51:27 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x1453e2b3cca0009, likely server has closed socket, closing socket connection and attempting reconnect 14/04/07 16:51:29 INFO zookeeper.ClientCnxn: Opening socket connection to server compute-0-19.local/10.1.255.235:22181. Will not attempt to authenticate using SASL (unknown error) 14/04/07 16:51:29 WARN zookeeper.ClientCnxn: Session 0x1453e2b3cca0009 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/04/07 16:51:31 INFO zookeeper.ClientCnxn: Opening socket connection to server compute-0-19.local/10.1.255.235:22181. Will not attempt to authenticate using SASL (unknown error) I tried to modify some parameters in: ./giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java like DEFAULT_ZOOKEEPER_MAX_CLIENT_CNXNS but seems have no effect. Any hints? Best Regards, Suijian 2014-04-07 9:34 GMT-05:00 Suijian Zhou <suijian.z...@gmail.com>: > Hi, Lukas, > Thank you, but when I tried to apply the patch, I got: > 2014.04.07|09:25:47~/giraph/giraph-core/src> git apply --check > NettyClient_Timeout.patch > error: patch failed: > giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:153 > error: > giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java: > patch does not apply > > Could you send me directly the new patched NettyClient.java file? Thanks! > > Best Regards, > Suijian > > > > 2014-04-04 17:12 GMT-05:00 Lukas Nalezenec < > lukas.naleze...@firma.seznam.cz>: > > Hi, >> >> I had similar issue, it was caused by long GC pauses. I patched >> NettyClient so when reconnect fails it sleeps for some time before next >> try. Patch is enclosed. Let me know if it works for you. >> I would try tuning GC. You can also try to use >> giraph.waitForRequestsConfirmation and giraph.maxNumberOfOpenRequests . >> I hope I am right. >> >> Regards >> Lukas >> >> >> On 4.4.2014 22:49, Suijian Zhou wrote: >> >> Hi, >> I have a zookeeper problem when running a giraph program, the program >> will be aborted in superstep 2 as: >> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to >> server compute-0-18.local/10.1.255.236:22181. Will not attempt to >> authenticate using SASL (unknown error) >> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Socket connection >> established to compute-0-18.local/10.1.255.236:22181, initiating session >> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Session establishment >> complete on server compute-0-18.local/10.1.255.236:22181, sessionid = >> 0x1452e7c79910009, negotiated timeout = 600000 >> ...... >> 14/04/04 15:46:08 INFO job.JobProgressTracker: Data from 8 workers - >> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 >> partitions computed; min free memory on worker 3 - 270.37MB, average >> 451.21MB >> 14/04/04 15:46:13 INFO job.JobProgressTracker: Data from 8 workers - >> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 >> partitions computed; min free memory on worker 6 - 249.25MB, average >> 404.02MB >> 14/04/04 15:46:16 INFO zookeeper.ClientCnxn: Unable to read additional >> data from server sessionid 0x1452e7c79910009, likely server has closed >> socket, closing socket connection and attempting reconnect >> 14/04/04 15:46:17 INFO zookeeper.ClientCnxn: Opening socket connection to >> server compute-0-18.local/10.1.255.236:22181. Will not attempt to >> authenticate using SASL (unknown error) >> 14/04/04 15:46:17 WARN zookeeper.ClientCnxn: Session 0x1452e7c79910009 >> for server null, unexpected error, closing socket connection and attempting >> reconnect >> java.net.ConnectException: Connection refused >> >> >> Each rerun of the program will lead to another computing node reporting >> the same error("Unable to read additional data from server sessionid..."). >> >> What in superstep 2 are: >> if (getSuperstep() == 2) { >> for (IntWritable message: messages) { >> for (Edge<IntWritable, IntWritable> edge: vertex.getEdges()) { >> sendMessage(edge.getTargetVertexId(), message); >> //int abc=0; >> } >> } >> } >> >> Checked that if I replace the line >> "sendMessage(edge.getTargetVertexId(), message);" to another meaningless >> line like "int abc=0;", the program could be finished successfully. Seems a >> ZooKeeper problem but this seems comes with giraph as I did not install >> ZooKeeper seperately. I tried to modify parameters in GiraphConstants.java >> and re-compile giraph, but it seems do not take any effects as I see in the >> screen output the parameters were not changed at all. Any hints? >> >> Best Regards, >> Suijian >> >> >> >