Hi, I have a zookeeper problem when running a giraph program, the program will be aborted in superstep 2 as: 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to server compute-0-18.local/10.1.255.236:22181. Will not attempt to authenticate using SASL (unknown error) 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Socket connection established to compute-0-18.local/10.1.255.236:22181, initiating session 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Session establishment complete on server compute-0-18.local/10.1.255.236:22181, sessionid = 0x1452e7c79910009, negotiated timeout = 600000 ...... 14/04/04 15:46:08 INFO job.JobProgressTracker: Data from 8 workers - Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 partitions computed; min free memory on worker 3 - 270.37MB, average 451.21MB 14/04/04 15:46:13 INFO job.JobProgressTracker: Data from 8 workers - Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 partitions computed; min free memory on worker 6 - 249.25MB, average 404.02MB 14/04/04 15:46:16 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x1452e7c79910009, likely server has closed socket, closing socket connection and attempting reconnect 14/04/04 15:46:17 INFO zookeeper.ClientCnxn: Opening socket connection to server compute-0-18.local/10.1.255.236:22181. Will not attempt to authenticate using SASL (unknown error) 14/04/04 15:46:17 WARN zookeeper.ClientCnxn: Session 0x1452e7c79910009 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused
Each rerun of the program will lead to another computing node reporting the same error("Unable to read additional data from server sessionid..."). What in superstep 2 are: if (getSuperstep() == 2) { for (IntWritable message: messages) { for (Edge<IntWritable, IntWritable> edge: vertex.getEdges()) { sendMessage(edge.getTargetVertexId(), message); //int abc=0; } } } Checked that if I replace the line "sendMessage(edge.getTargetVertexId(), message);" to another meaningless line like "int abc=0;", the program could be finished successfully. Seems a ZooKeeper problem but this seems comes with giraph as I did not install ZooKeeper seperately. I tried to modify parameters in GiraphConstants.java and re-compile giraph, but it seems do not take any effects as I see in the screen output the parameters were not changed at all. Any hints? Best Regards, Suijian