Hey Ted,

Could you check your zk connection string and ensure that all the hostnames
resolve correctly? When I've hit that error in the past it was due to
zookeeper failing to resolve a hostname (in my case, for a EC2 instance
that was deleted).

Thanks,
Tom


On Wed, Apr 9, 2014 at 7:09 PM, Ted Young <tyo...@guidewire.com> wrote:

>  (I'm running mesos 0.16.0 and marathon 0.4.0)
>
>
>
> Every day or two, I'm seeing the mesos slaves lose touch with the master
> and disconnect (causing all of the services running on all of the slaves to
> be redeployed and restarted). The only thing I'm seeing in the logs at
> these times (on the slaves) is something like:
>
>
>
> W0409 12:32:27.347270 22523 group.cpp:435] Timed out waiting to reconnect
> to ZooKeeper (sessionId=1446fc9b27d00b7)
>
> F0409 12:32:42.366143 22523 zookeeper.cpp:195] Failed to create ZooKeeper,
> zookeeper_init: No such file or directory [2]
>
>
>
> I'm not sure where to begin troubleshooting this. I will be upgrading to
> mesos 0.17.0 and marathon 0.4.1 in case that matters.
>
>
>
> Any pointers would be appreciated!
>
>
>
> ;ted
>
>
>
> __________________________________________________________
>
> *Ted M. Young*
> Guidewire Software - DevOps
>
> Tel: +1 650 357 5291
> tyo...@guidewire.com <yourem...@guidewire.com> | www.guidewire.com
>
> 1001 E. Hillsdale Blvd, Suite 800, Foster City, CA 94404
>
> Deliver insurance your way with flexible software products from Guidewire.
>
>
>
>
>

Reply via email to