I think this is the key message.

0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB

Having less than 1 MB free won't work. Your workers are likely OOM, killing the job. Can you get more memory for your job?

On 5/14/14, 3:13 AM, Arun Kumar wrote:
Hi when i run giraph job against a data of 1 gb i am getting the below exception after some times can somebody tell me what is the issue? 14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers - Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB 14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x145f9cff031000f, likely server has closed socket, closing socket connection and attempting reconnect 14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on attempt 0, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201405140108_0003/_workerProgresses at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:09 INFO mapred.JobClient:  map 93% reduce 0%
14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:12 WARN zk.ZooKeeperExt: exists: Connection loss on attempt 1, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201405140108_0003/_workerProgresses at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:13 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:13 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:15 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:15 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:16 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:16 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:18 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:18 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:18 WARN zk.ZooKeeperExt: exists: Connection loss on attempt 2, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201405140108_0003/_workerProgresses at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:20 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:20 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:21 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:21 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:22 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:22 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:23 INFO job.JobProgressTracker: run: Exception occurred
java.lang.IllegalStateException: exists: Failed to check /_hadoopBsp/job_201405140108_0003/_workerProgresses after 3 tries!
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:24 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:24 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:24 WARN zk.ZooKeeperExt: createExt: Connection loss on attempt 0, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
    at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:25 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:25 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:27 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:27 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:29 INFO mapred.JobClient:  map 86% reduce 0%
14/05/14 01:54:30 INFO zookeeper.ClientCnxn: Opening socket connection to server mercado-12.hpl.hp.com/15.25.119.147:22181 <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt to authenticate using SASL (unknown error) 14/05/14 01:54:30 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/05/14 01:54:30 WARN zk.ZooKeeperExt: createExt: Connection loss on attempt 1, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
    at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:30 INFO mapred.JobClient: Job complete: job_201405140108_0003
14/05/14 01:54:30 INFO mapred.JobClient: Counters: 6
14/05/14 01:54:30 INFO mapred.JobClient:   Job Counters
14/05/14 01:54:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=30036780
14/05/14 01:54:30 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/05/14 01:54:30 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/05/14 01:54:30 INFO mapred.JobClient:     Launched map tasks=15
14/05/14 01:54:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/05/14 01:54:30 INFO mapred.JobClient:     Failed map tasks=1

Regards
Arun


Reply via email to