The servers are reserved for Apache Hama, so there is no other network
traffic. I tested it on three other PCs at another location but with the
same configuration and got the same errors :(

Am So, 16.06.2013, 16:44 schrieb Chia-Hung Lin:
> Have you checked if underlying network traffic is busy when error happens?
>
> Can't be very sure but the symptom seems to be the heavy network
> traffic leads to the zk connection lost.
>
>
>
> On 16 June 2013 20:22, Sascha Jonas <[email protected]>
> wrote:
>> Hey,
>>
>> iam using Apache Hama on a small cluster with two computers. Its working
>> fine with a small number of supersteps but every time i am trying with
>> lots of iterations e.g. 10000 it crashes.
>>
>> Right now it stopped working after 4600 supersteps. 8 from 16 Tasks are
>> still running while the log shows some errors.
>>
>> Iam using Apache Hama 0.6 and the builtin Zookeeper. Should i go with a
>> newer Hama or Zookeeper version?
>>
>> 13/06/16 00:14:14 ERROR sync.ZKSyncClient: Error creating zk path
>> /bsp/job_201306091733_0009/sync/4276
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for
>> /bsp/job_201306091733_0009/sync/4276
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>>         at
>> org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:138)
>>         at
>> org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:290)
>>         at
>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncClientImpl.java:99)
>>         at
>> org.apache.hama.bsp.BSPPeerImpl.enterBarrier(BSPPeerImpl.java:474)
>>         at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:428)
>>         at
>> de.distMLP.Base_MLP_Trainer.calculateAndWriteCost(Base_MLP_Trainer.java:90)
>>         at
>> de.distMLP.Train_MultilayerPerceptron$MultilayerPerceptron_Trainer.bsp(Train_MultilayerPerceptron.java:57)
>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:168)
>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>         at
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1262)
>> 13/06/16 00:14:15 ERROR
>> distMLP.Train_MultilayerPerceptron$MultilayerPerceptron_Trainer:
>> org.apache.hama.bsp.sync.SyncException
>> org.apache.hama.bsp.sync.SyncException
>>         at
>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncClientImpl.java:137)
>>         at
>> org.apache.hama.bsp.BSPPeerImpl.enterBarrier(BSPPeerImpl.java:474)
>>         at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:428)
>>         at
>> de.distMLP.Base_MLP_Trainer.calculateAndWriteCost(Base_MLP_Trainer.java:90)
>>         at
>> de.distMLP.Train_MultilayerPerceptron$MultilayerPerceptron_Trainer.bsp(Train_MultilayerPerceptron.java:57)
>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:168)
>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>         at
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1262)
>>
>


Reply via email to