Thanks a lot Avery for your response, I increased the timeout to 10 minutes
*changed:*
-Dgiraph.zkSessionMsecTimeout=600000 and
-Dgiraph.useInputSplitLocality=false ,
 It is working for consecutive runs now without any errors.

Thanks
Sundi


On Tue, Oct 1, 2013 at 10:18 PM, Avery Ching <ach...@apache.org> wrote:

>  We did have this error a few times.  This can happen due to GC pauses,
> so I would check the worker for long GC issues.  Also, you can increase the
> ZooKeeper timeouts, see
>
>   /** ZooKeeper session millisecond timeout */
>   IntConfOption ZOOKEEPER_SESSION_TIMEOUT =
>       new IntConfOption("giraph.zkSessionMsecTimeout", MINUTES.toMillis(1),
>           "ZooKeeper session millisecond timeout");
>
> Currently, the default is one minute, but in production we set that number
> much, much higher (even greater than a day sometimes) to avoid the
> disconnection.
>
> Hope that helps,
> Avery
>
>
> On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote:
>
> Hi ,
> I am able to run apache giraph successfully with around 500M pairs to
> find Connected components. It works great but not always, the issue seems
> to be with the time out zookeeper time out. Some of the client(around 5-10
> ) out of 100, produces this error and the master fails due to this.Do you
> have any suggestions for this error. Any suggestions will be appreaciated.
>
> 2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process: 
> Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent 
> state:Disconnected type:None path:null
> 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server had22.rsk.admobius.com/10.240.51.32:2181. Will not 
> attempt to authenticate using SASL (Unable to locate a login configuration)
> 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to had22.rsk.admobius.com/10.240.51.32:2181, 
> initiating session
> 2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to 
> reconnect to ZooKeeper service, session 0x441604c97412331 has expired, 
> closing socket connection
> 2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got 
> unknown null path event WatchedEvent state:Expired type:None path:null
> 2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> 2013-10-02 01:21:20,046 INFO 
> org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
> Loaded 250000 vertices at 1827.2925619484213 vertices/sec 1728790 edges at 
> 12636.730317550928 edges/sec Memory (free/total/max) = 1745.60M / 2262.19M / 
> 2730.69M
> 2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable: 
> loadFromInputSplit: Finished loading 
> /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601 (v=261131, 
> e=1808572)
> 2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable: 
> Execution of callable failed
> java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException 
> on 
> /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
>       at 
> org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
>       at 
> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
>       at 
> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>       at 
> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>       at 
> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for 
> /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>       at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>       at 
> org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
>       ... 9 more
>
>
>  --
>  Best Regards,
> Jyotirmoy Sundi
> Admobius
>
> San Francisco, CA 94158
>
>
> On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi <sundi...@gmail.com>wrote:
>
>>  Hi ,
>>
>>    I got the connected component working for 1B nodes, but when I run the 
>> job again, it fails with the below error. Aprt form this in zookeeper the 
>> data is not cleared in the data directory. For successful jobs the data in 
>> zookeper from giraph is cleared.
>>
>> The following errors seems to be coming because the node tries to connect to 
>> the zookeeper with a session id which is cleared as seens in
>>
>> "Client session timed out, have not heard from server in 68845ms for 
>> sessionid 0x3415cc6ce930059, closing socket connection and attempting 
>> reconnect" , Any idea if increasing the session time out will be good ?
>>
>> 2013-09-27 00:57:11,748 WARN org.apache.giraph.bsp.BspService: process: Got 
>> unknown null path event WatchedEvent state:Expired type:None path:null
>> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: Unable to 
>> reconnect to ZooKeeper service, session 0x3415cc6ce930059 has expired, 
>> closing socket connection
>> 2013-09-27 00:57:11,748 WARN org.apache.giraph.worker.InputSplitsHandler: 
>> process: Problem with zookeeper, got event with path null, state Expired, 
>> event type None
>> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: EventThread 
>> shut down
>> 2013-09-27 00:57:11,925 INFO org.apache.giraph.worker.InputSplitsCallable: 
>> loadFromInputSplit: Finished loading 
>> /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89 (v=258127, 
>> e=1792906)
>> 2013-09-27 00:57:11,926 ERROR org.apache.giraph.utils.LogStacktraceCallable: 
>> Execution of callable failed
>> java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException 
>> on 
>> /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished
>>      at 
>> org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
>>      at 
>> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
>>      at 
>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>      at 
>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>      at 
>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>      at java.lang.Thread.run(Thread.java:662)
>> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
>> KeeperErrorCode = Session expired for 
>> /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished
>>      at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>>      at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>      at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>>      at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>>      at 
>> org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
>>      ... 9 more
>>
>>
>>  --
>>  Best Regards,
>> Jyotirmoy Sundi
>> Data Engineer,
>> Admobius
>>
>> San Francisco, CA 94158
>>
>
>
>
>  --
>  Best Regards,
> Jyotirmoy Sundi
> Data Engineer,
> Admobius
>
> San Francisco, CA 94158
>
>
>


-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158

Reply via email to