Benjamin Reed wrote:

johnm wrote:

So TriLUGers, weigh in please. :) Have you had problems resolving Google as of late?


I've had intermittent SERVFAIL issues with resolving www.google.com (you'll note that just "google.com" still resolves, it's weird).


Lasts for maybe 5-10 minutes and then pops back.

First off, let me say thanks to everyone who responded, I'm glad to know it's not just me. Also, after going away from the problem for a few hours and taking another look, I think I have a better understanding of what's going on. Here's the relevant portion of the output from `rndc dumpdb`:

; authauthority
l.google.com.           55996   NS      a.l.google.com.
                        55996   NS      b.l.google.com.
                        55996   NS      c.l.google.com.
                        55996   NS      d.l.google.com.
; authanswer
a.l.google.com.         55768   A       216.239.53.9
; authanswer
b.l.google.com.         55947   A       64.233.179.9
; authanswer
c.l.google.com.         55951   A       64.233.161.9
; authanswer
d.l.google.com.         55996   A       64.233.183.9

What this basically points out is what I suspected before. The authauthority (glue NS) records for l.google.com are getting refreshed every time it updates the www.l.google.com record (who's TTL is 5 mins), but does not provide glue records for the IPs of these hosts, only their names. This is the start of the problem, as shown by this dig query:


[EMAIL PROTECTED] asjoyner]$ dig -t any www.l.google.com @a.l.google.com +norec

; <<>> DiG 9.2.1 <<>> -t any www.l.google.com @a.l.google.com +norec
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35141
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 0

;; QUESTION SECTION:
;www.l.google.com.              IN      ANY

;; ANSWER SECTION:
www.l.google.com.       300     IN      A       66.249.85.104
www.l.google.com.       300     IN      A       66.249.85.99

;; AUTHORITY SECTION:
l.google.com.           86400   IN      NS      a.l.google.com.
l.google.com.           86400   IN      NS      b.l.google.com.
l.google.com.           86400   IN      NS      c.l.google.com.
l.google.com.           86400   IN      NS      d.l.google.com.

;; Query time: 93 msec
;; SERVER: 216.239.53.9#53(a.l.google.com)
;; WHEN: Sat Apr  2 17:30:24 2005
;; MSG SIZE  rcvd: 130


The lookups will cycle through the various authoritative records, but they don't get updated regularly. And once those glue records expire, which admittedly takes a day, the next query against a sub host of l.google.com will try to ask one of the authoritative servers, which are unfortunately hosts with in that very domain. This is the manifestation of the problem. What server can it ask now? It knows that a.l.google.com is authoritative for l.google.com, and it needs to ask that server how to look up itself, but it has no address to begin the query with. This causes the SERVFAIL error that Ben is describing above. I think that this will invalidate the NS records, perhaps negatively caching them for some short time? I can't seem to find any documentation on precisely what BIND9 will do with the associated records when it gets a SERVFAIL for NS records.

So the problem is that a.l.google.com (and it's companions) aren't returning the IPs as glue records for the NS records that are returned on queries against www.l.google.com. We can prove that a.l.google.com is aware of the A record for it's own IP address, it can return it when queried for it directly, but I don't know why it's not returning that glue. I attempted to duplicate the behavior by setting "fetch-glue no; recursion no;" on a similarly configured server, but BIND 9.2.1 seems to always be handing back those glue records (as it really should).

Thanks again to everyone who responded confirming I wasn't the only one to have seen this. I'll see what I can do about bringing it to the attention of someone at Google to get it fixed. :)

Aaron S. Joyner

--
TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ  : http://trilug.org/faq/
TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc

Reply via email to