Hello guys,
we are running a mesos stack on CoreOS, with three zookeeper nodes.
We can start a docker containers with Marathon and all, that's fine, but
some of the docker containers generates high network load, while
communicating between nodes/containers and I think that' the reason why the
zook
Hi Martin,
how many ZooKeepers do you have? Is your transaction log on a dedicated
disk? How many clients are approximately connecting?
have a look at
http://zookeeper.apache.org/doc/r3.2.2/zookeeperAdmin.html#sc_bestPractices
Tomas
On 27 April 2015 at 10:58, Martin Stiborský
wrote:
> Hello g
Hi,
there are 3 zookeepers nodes.
We've started our containers and this time I was watching the zookeepers
and their condition with the "stat" command.
It seems that zookeeper latency is not the issue, there was only about 8
connections, max latency time 134ms.
I'm still not sure what is the real
Hi Martin. Are these VMs or bare-metal? Is ZK running on the same 3 nodes
as the mesos cluster? Does your application also use ZooKeeper to manage
it's own state? Are there any other services running on the machines and
does Mesos and ZK have enough resources? And as Tomas asked; is your ZK log
on
Hi guys,
these machines are relatively beefy - Dell PowerEdge r710 with 2x QC Xeon,
144GB RAM, CoreOS is deployed on baremetal.
- ZK is running on the same 3 nodes as the mesos cluster
- our application is not using ZK
- nothing else running on the stack, only 1 mesos master, 3 mesos slaves
and ma
Hi Martin,
do all 3 zookeepers go down with same error logs/cause - there should be
some info as one node failure should not cause ZK to fail (as quorum is
maintained) and remaining nodes should at least show some info from failure
detector.
The original log you posted are after stopping zookeeper
Now I finally tracked down the real problem, and it's nothing related to
mesos at all.
It was fleet on CoreOS stopping all containers on a node, because the node
was considered as unresponsive, from the CoreOS/etcd/fleet cluster point of
view.
The high cpu/network load caused the problem and fleet
7 matches
Mail list logo