Hi,

I believe I've identified a couple of bugs, but would like to ask for other opinions before raising them officially. These might also be more related to Kubernetes than Openshift.


First one:

We have a cluster running Openshift Origin v1.2.1 with kubernetes v1.2.0-36-g4a3f9c5. This cluster has 3 masters and 4+ nodes.

We've noticed that if we take master01 offline for maintenance, DNS lookups inside the cluster are affected. It only affects lookups to the 172.30.0.1 cluster IP which translates to 53 tcp/udp on the three masters. Due to the iptables rules using the probability module, 1 in 3 DNS lookups fails as it is directed to the offline master. I guess what's really needed here is for the master to be removed from the service endpoint so that the iptables rules are amended to prevent traffic hitting a master that isn't working. I would like to see healthchecks here too.


Second one:

This cluster is running Openshift Origin v1.3.0-alpha.2+3da4944 with kubernetes v1.3.0+57fb9ac. Again 3 masters and a bunch (~20) nodes.

The first bug also applies to this cluster but the behaviour is somewhat different due to the introduction of iptables recent module rules. I'm not 100% clear on the behaviour of this, but what seems to happen is that the first "recent" rule is always matched and hence all traffic from internal pods to cluster IP's always hits the first endpoint. This means that ~100% of all DNS lookups against service names fail from other pods while master01 is down.


If anyone has any information to share on this, I'd be grateful. I can also provide further details if required.


Thanks


J

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to