Hi,
I believe I've identified a couple of bugs, but would like to ask for
other opinions before raising them officially. These might also be more
related to Kubernetes than Openshift.
First one:
We have a cluster running Openshift Origin v1.2.1 with kubernetes
v1.2.0-36-g4a3f9c5. This cluster has 3 masters and 4+ nodes.
We've noticed that if we take master01 offline for maintenance, DNS
lookups inside the cluster are affected. It only affects lookups to the
172.30.0.1 cluster IP which translates to 53 tcp/udp on the three
masters. Due to the iptables rules using the probability module, 1 in 3
DNS lookups fails as it is directed to the offline master. I guess
what's really needed here is for the master to be removed from the
service endpoint so that the iptables rules are amended to prevent
traffic hitting a master that isn't working. I would like to see
healthchecks here too.
Second one:
This cluster is running Openshift Origin v1.3.0-alpha.2+3da4944 with
kubernetes v1.3.0+57fb9ac. Again 3 masters and a bunch (~20) nodes.
The first bug also applies to this cluster but the behaviour is somewhat
different due to the introduction of iptables recent module rules. I'm
not 100% clear on the behaviour of this, but what seems to happen is
that the first "recent" rule is always matched and hence all traffic
from internal pods to cluster IP's always hits the first endpoint. This
means that ~100% of all DNS lookups against service names fail from
other pods while master01 is down.
If anyone has any information to share on this, I'd be grateful. I can
also provide further details if required.
Thanks
J
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users