Thanks a lot Avery for your response, I increased the timeout to 10 minutes *changed:* -Dgiraph.zkSessionMsecTimeout=600000 and -Dgiraph.useInputSplitLocality=false , It is working for consecutive runs now without any errors.
Thanks Sundi On Tue, Oct 1, 2013 at 10:18 PM, Avery Ching <ach...@apache.org> wrote: > We did have this error a few times. This can happen due to GC pauses, > so I would check the worker for long GC issues. Also, you can increase the > ZooKeeper timeouts, see > > /** ZooKeeper session millisecond timeout */ > IntConfOption ZOOKEEPER_SESSION_TIMEOUT = > new IntConfOption("giraph.zkSessionMsecTimeout", MINUTES.toMillis(1), > "ZooKeeper session millisecond timeout"); > > Currently, the default is one minute, but in production we set that number > much, much higher (even greater than a day sometimes) to avoid the > disconnection. > > Hope that helps, > Avery > > > On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote: > > Hi , > I am able to run apache giraph successfully with around 500M pairs to > find Connected components. It works great but not always, the issue seems > to be with the time out zookeeper time out. Some of the client(around 5-10 > ) out of 100, produces this error and the master fails due to this.Do you > have any suggestions for this error. Any suggestions will be appreaciated. > > 2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process: > Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent > state:Disconnected type:None path:null > 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server had22.rsk.admobius.com/10.240.51.32:2181. Will not > attempt to authenticate using SASL (Unable to locate a login configuration) > 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to had22.rsk.admobius.com/10.240.51.32:2181, > initiating session > 2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to > reconnect to ZooKeeper service, session 0x441604c97412331 has expired, > closing socket connection > 2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got > unknown null path event WatchedEvent state:Expired type:None path:null > 2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2013-10-02 01:21:20,046 INFO > org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: > Loaded 250000 vertices at 1827.2925619484213 vertices/sec 1728790 edges at > 12636.730317550928 edges/sec Memory (free/total/max) = 1745.60M / 2262.19M / > 2730.69M > 2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable: > loadFromInputSplit: Finished loading > /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601 (v=261131, > e=1808572) > 2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable: > Execution of callable failed > java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException > on > /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished > at > org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168) > at > org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226) > at > org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) > at > org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58) > at > org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) > at > org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159) > ... 9 more > > > -- > Best Regards, > Jyotirmoy Sundi > Admobius > > San Francisco, CA 94158 > > > On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi <sundi...@gmail.com>wrote: > >> Hi , >> >> I got the connected component working for 1B nodes, but when I run the >> job again, it fails with the below error. Aprt form this in zookeeper the >> data is not cleared in the data directory. For successful jobs the data in >> zookeper from giraph is cleared. >> >> The following errors seems to be coming because the node tries to connect to >> the zookeeper with a session id which is cleared as seens in >> >> "Client session timed out, have not heard from server in 68845ms for >> sessionid 0x3415cc6ce930059, closing socket connection and attempting >> reconnect" , Any idea if increasing the session time out will be good ? >> >> 2013-09-27 00:57:11,748 WARN org.apache.giraph.bsp.BspService: process: Got >> unknown null path event WatchedEvent state:Expired type:None path:null >> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: Unable to >> reconnect to ZooKeeper service, session 0x3415cc6ce930059 has expired, >> closing socket connection >> 2013-09-27 00:57:11,748 WARN org.apache.giraph.worker.InputSplitsHandler: >> process: Problem with zookeeper, got event with path null, state Expired, >> event type None >> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: EventThread >> shut down >> 2013-09-27 00:57:11,925 INFO org.apache.giraph.worker.InputSplitsCallable: >> loadFromInputSplit: Finished loading >> /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89 (v=258127, >> e=1792906) >> 2013-09-27 00:57:11,926 ERROR org.apache.giraph.utils.LogStacktraceCallable: >> Execution of callable failed >> java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException >> on >> /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished >> at >> org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168) >> at >> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226) >> at >> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) >> at >> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58) >> at >> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: >> KeeperErrorCode = Session expired for >> /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) >> at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) >> at >> org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159) >> ... 9 more >> >> >> -- >> Best Regards, >> Jyotirmoy Sundi >> Data Engineer, >> Admobius >> >> San Francisco, CA 94158 >> > > > > -- > Best Regards, > Jyotirmoy Sundi > Data Engineer, > Admobius > > San Francisco, CA 94158 > > > -- Best Regards, Jyotirmoy Sundi Data Engineer, Admobius San Francisco, CA 94158