The logging from dnsmasq was insightful.  It looks like the lookups favor the 
second server in the list.
In my case, the second server was for quick offsite lookups so it was failing 
the local lookups.



From: Brigman, Larry

Sent: Thursday, February 22, 2018 3:57 PM

To: Clayton Coleman

Cc: users@lists.openshift.redhat.com

Subject: RE: DNS lookup failures









I hadn't tried that.  
I did turn on dnsmasq logging of queries to help pinpoint the problem.

One of the issues was outside of Openshift where the DNS wouldn't forward 
requests and just time out.

Getting that out of the system was a multi-step process plus restarting dnsmasq 
after changing the config files to get it to pick up the correct DNS servers.





When it occurs again, I'll use dig against the local resolver where I'm getting 
the failure.

On a multi-node cluster, changing all the files is painful.





From: Clayton Coleman [ccole...@redhat.com]

Sent: Thursday, February 22, 2018 2:58 PM

To: Brigman, Larry

Cc: users@lists.openshift.redhat.com

Subject: Re: DNS lookup failures









Do you see errors when you try to dig the master DNS address?  Or if you dig 
the local dnsmasq?




I wonder if we're caching a negative lookup or soemithng similar.





On Wed, Feb 21, 2018 at 6:01 PM, Brigman, Larry 
<larry.brig...@arris.com> wrote:






I have been experiencing DNS lookup failures.  This is preventing production 
deployment of Openshift.
I see it in two cases, lookup of a remote docker registry and lookup of a ldap 
service.  Both of these are not local to the server(s) in question but local to 
internal DNS servers.
 
The ldap case is easier for me to replicate as I just need to attempt to login.
 
Message:
Feb 20 11:21:16 lab-stack1 atomic-openshift-master-api: E0220 11:21:16.924930   
 2005 login.go:176] Error authenticating "XXXX" with provider "ldap": LDAP 
Result Code 200 "": dial tcp: lookup ldap.xxx.xxx on xxx.xxx.xxx.xxx:53: no such
 host
Officiated the user, provider name and host for security.
 
On xxx.xxx.xxx.xxx:53 is the master node which is running dnsmasq with the 
default configuration provided via openshift-ansible installation.
 
These get resolved for a while if I go on a host and do ‘host ldap.xxx.xxx’.  
It then works for a while and then reverts. 

 
oc version
oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO
 
Server 
https://lab-stack1.lab.c-cor.com:8443
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
 
What are the next steps to try.  Using dig or host on the node in question 
always returns a valid lookup result.




_______________________________________________

users mailing list

users@lists.openshift.redhat.com

http://lists.openshift.redhat.com/openshiftmm/listinfo/users
















_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to