Hi, we recently had an incident involving one of our central recursors which at the time ran unbound 1.7.3. What appeared to happen was that it suddenly started being Really Slow in replying to queries for non-existent names, confirming that they really did not exist.
"dig" (with default retry & timeout) would time out 2 or 3 times before I got an NXDOMAIN. It did not matter whether the zone in question was DNSSEC-signed or not. We're graphing various performance data from "unbound-control stats" and while normal daily peak query load is around 1.000 to 1.200 qps, the weekdays this incident lasted saw a load of only around 700 qps, so I can't entirely dismiss that users were impacted. I'll have to admit that I didn't turn on more extensive logging to get some more information about this incident. I'll also admit to that I took the opportunity to upgrade to unbound 1.8.0 which it's running at the moment. This leads to a separate message about logging... Prior to this incident, unbound had run continually for 30-40 days, and the "cache_message" value had (according to the graph we plot) reached its ceiling about 14 days earlier, so this is unlikely to be a trigger. I know there isn't much to go on here, but does this match any other incidents? Best regards, - HÃ¥vard
