Re: [kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread Rodrigo Campos
Ohh, sorry. My bad, just ignore my past email :-) On Thursday, October 5, 2017, Evan Jones wrote: > My script *is* always looking up the same domain, and I believe it is > cached by dnsmasq. I *think* the limit is the kernel NAT connection > tracking, because each DNS

Re: [kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread 'Tim Hockin' via Kubernetes user discussion and Q
On Thu, Oct 5, 2017 at 1:29 PM, Evan Jones wrote: > The sustained 1000 qps comes from an application making that many outbound > connections. I agree that the application is very inefficient and shouldn't > be doing a DNS lookup for every request it sends, but it's a

Re: [kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread duffie . cooley
This is a good read on the problem as well: https://rsmitty.github.io/KubeDNS-Tweaks/ Basically greatly reduce the number of calls by tweaking some kube-dns settings. On Thursday, October 5, 2017 at 2:46:55 PM UTC-7, Evan Jones wrote: > My script is always looking up the same domain, and I

Re: [kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread Evan Jones
My script *is* always looking up the same domain, and I believe it is cached by dnsmasq. I *think* the limit is the kernel NAT connection tracking, because each DNS query comes from a new ephemeral port, so it ends up using up all NAT mappings on the node running kube-dns. This is why dnsPolicy:

Re: [kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread Rodrigo Campos
On Thu, Oct 05, 2017 at 04:29:21PM -0400, Evan Jones wrote: > The sustained 1000 qps comes from an application making that many outbound > connections. I agree that the application is very inefficient and shouldn't > be doing a DNS lookup for every request it sends, but it's a python program >

Re: [kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread 'Tim Hockin' via Kubernetes user discussion and Q
We had a proposal to avoid conntrack for DNS, but no real movement on it. We have flags to adjust the conntrack table size. Kernel has params to tweak timeouts, which users can tweak. Sustained 1000 QPS DNS seems artificial. On Thu, Oct 5, 2017 at 10:47 AM, Evan Jones

[kubernetes-users] Cluster DNS: bottleneck with ~1000 outbound connections per second

2017-10-05 Thread Evan Jones
*TL;DR*: Kubernetes dnsPolicy: ClusterFirst can become a bottleneck with a high rate of outbound connections. It seems like the problem is filling the nf_conntrack table, causing client applications to fail to do DNS lookups. I resolved this problem by switching my application to dnsPolicy: