We did have this error a few times. This can happen due to GC pauses, so
I would check the worker for long GC issues. Also, you can increase the
ZooKeeper timeouts, see
/** ZooKeeper session millisecond timeout */
IntConfOption ZOOKEEPER_SESSION_TIMEOUT =
new IntConfOption("giraph.zkSessionMsecTimeout", MINUTES.toMillis(1),
"ZooKeeper session millisecond timeout");
Currently, the default is one minute, but in production we set that
number much, much higher (even greater than a day sometimes) to avoid
the disconnection.
Hope that helps,
Avery
On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote:
Hi ,
I am able to run apache giraph successfully with around 500M pairs to
find Connected components. It works great but not always, the issue
seems to be with the time out zookeeper time out. Some of the
client(around 5-10 ) out of 100, produces this error and the master
fails due to this.Do you have any suggestions for this error. Any
suggestions will be appreaciated.
2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process:
Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent
state:Disconnected type:None path:null
2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to serverhad22.rsk.admobius.com/10.240.51.32:2181
<http://had22.rsk.admobius.com/10.240.51.32:2181>. Will not attempt to
authenticate using SASL (Unable to locate a login configuration)
2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established tohad22.rsk.admobius.com/10.240.51.32:2181
<http://had22.rsk.admobius.com/10.240.51.32:2181>, initiating session
2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x441604c97412331 has expired, closing
socket connection
2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got
unknown null path event WatchedEvent state:Expired type:None path:null
2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2013-10-02 01:21:20,046 INFO
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit:
Loaded 250000 vertices at 1827.2925619484213 vertices/sec 1728790 edges at
12636.730317550928 edges/sec Memory (free/total/max) = 1745.60M / 2262.19M /
2730.69M
2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable:
loadFromInputSplit: Finished loading
/_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601 (v=261131, e=1808572)
2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable:
Execution of callable failed
java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException on
/_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
at
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
... 9 more
--
Best Regards,
Jyotirmoy Sundi
Admobius
San Francisco, CA 94158
On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi <sundi...@gmail.com
<mailto:sundi...@gmail.com>> wrote:
Hi ,
I got the connected component working for 1B nodes, but when I run the
job again, it fails with the below error. Aprt form this in zookeeper the data
is not cleared in the data directory. For successful jobs the data in zookeper
from giraph is cleared.
The following errors seems to be coming because the node tries to connect
to the zookeeper with a session id which is cleared as seens in
"Client session timed out, have not heard from server in 68845ms for sessionid
0x3415cc6ce930059, closing socket connection and attempting reconnect" , Any idea if
increasing the session time out will be good ?
2013-09-27 00:57:11,748 WARN org.apache.giraph.bsp.BspService: process: Got
unknown null path event WatchedEvent state:Expired type:None path:null
2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x3415cc6ce930059 has expired, closing
socket connection
2013-09-27 00:57:11,748 WARN org.apache.giraph.worker.InputSplitsHandler:
process: Problem with zookeeper, got event with path null, state Expired, event
type None
2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2013-09-27 00:57:11,925 INFO org.apache.giraph.worker.InputSplitsCallable:
loadFromInputSplit: Finished loading
/_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89 (v=258127, e=1792906)
2013-09-27 00:57:11,926 ERROR
org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed
java.lang.IllegalStateException: markInputSplitPathFinished:
KeeperException on
/_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
at
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
at
org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
... 9 more
--
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius
San Francisco, CA 94158
--
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius
San Francisco, CA 94158