Hi when i run giraph job against a data of 1 gb i am getting the below
exception after some times can somebody tell me what is the issue?
14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers -
Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196
partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB
14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional
data from server sessionid 0x145f9cff031000f, likely server has closed
socket, closing socket connection and attempting reconnect
14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on
attempt 0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_workerProgresses
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:09 INFO mapred.JobClient: map 93% reduce 0%
14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:12 WARN zk.ZooKeeperExt: exists: Connection loss on
attempt 1, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_workerProgresses
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:13 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:13 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:15 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:15 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:16 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:16 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:18 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:18 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:18 WARN zk.ZooKeeperExt: exists: Connection loss on
attempt 2, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_workerProgresses
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:20 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:20 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:21 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:21 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:22 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:22 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:23 INFO job.JobProgressTracker: run: Exception occurred
java.lang.IllegalStateException: exists: Failed to check
/_hadoopBsp/job_201405140108_0003/_workerProgresses after 3 tries!
at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:24 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:24 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:24 WARN zk.ZooKeeperExt: createExt: Connection loss on
attempt 0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:25 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:25 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:27 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:27 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:29 INFO mapred.JobClient: map 86% reduce 0%
14/05/14 01:54:30 INFO zookeeper.ClientCnxn: Opening socket connection
to server mercado-12.hpl.hp.com/15.25.119.147:22181
<http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt
to authenticate using SASL (unknown error)
14/05/14 01:54:30 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f
for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:30 WARN zk.ZooKeeperExt: createExt: Connection loss on
attempt 1, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:30 INFO mapred.JobClient: Job complete:
job_201405140108_0003
14/05/14 01:54:30 INFO mapred.JobClient: Counters: 6
14/05/14 01:54:30 INFO mapred.JobClient: Job Counters
14/05/14 01:54:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=30036780
14/05/14 01:54:30 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
14/05/14 01:54:30 INFO mapred.JobClient: Total time spent by all
maps waiting after reserving slots (ms)=0
14/05/14 01:54:30 INFO mapred.JobClient: Launched map tasks=15
14/05/14 01:54:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/05/14 01:54:30 INFO mapred.JobClient: Failed map tasks=1
Regards
Arun