Public bug reported:

Description of problem:
When ironic (undercloud) is not able to get reverse DNS entry for IP assigned 
to br-ctlplane (doesn't even receive NXDomain error message in time, e.g. DNS 
server is misconfigured, connectivity issues, ...), all ironic commands take 
very long to execute (they will time out, but they still succeed).

[undercloud]: $ time ironic-node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID | Power State | 
Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
...
real    0m55.383s
user    0m0.248s
sys     0m0.043s

Version-Release number of selected component (if applicable):
Tested on OSP director 8

How reproducible (example with IP 10.100.100.1):
[undercloud]: $ ip a
...
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UNKNOWN 
    link/ether <macaddr> brd ff:ff:ff:ff:ff:ff
    inet 10.100.100.1/24 brd 10.100.100.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
...

Configure your DNS server to not respond (even with NXDOMAIN) for
10.100.100.1:

[undercloud]: $ time host 10.100.100.1
;; connection timed out; no servers could be reached
real    0m14.005s
user    0m0.003s
sys     0m0.003s

[undercloud]: $ time dig -x 10.100.100.1
...
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 20304
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
...
;; connection timed out; no servers could be reached
real    0m21.007s
user    0m0.003s
sys     0m0.004s

[undercloud]: $ time nslookup 10.100.100.1                                
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; Got SERVFAIL reply from XYZ, trying next server
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
real    0m50.008s
user    0m0.002s
sys     0m0.009

Actual results:
Ironic commands can take 20-60 seconds per one in this case

Expected results:
Ironic should have mechanism to deal with this, commands shouldn't take tens of 
seconds rather than milliseconds:
[undercloud]: $ time ironic-node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID | Power State | 
Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
...
real    0m0.393s
user    0m0.244s
sys     0m0.041s

Originaly created: https://bugzilla.redhat.com/show_bug.cgi?id=1328143

** Affects: ironic
     Importance: Undecided
         Status: New

** Affects: network-manager (Ubuntu)
     Importance: Undecided
         Status: Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1572201

Title:
  Long ironic timeouts because of ServFail DNS error

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1572201/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to