Hey team, We run 8 node unbound clusters as recursive resolvers. The setup forwards (using forward-zone) internal queries to a separate PowerDNS authoritative cluster.
Recently, we've had some connectivity issues to Cloudflare (who provides a lot of external DNS services in our environment). When this has happened, we've seen the requestlist balloon to around 1.5-2k entries as queries repeatedly time out. However, the problem is that this affects forward-zones as well. We lose resolution for internal queries when these backup events happen. We're looking for suggestions on how to safeguard these internal forwards. We notice stub-zone may be the more appropriate stanza for our use case, but are unsure if that'd bypass this requestlist queuing (?) Any thoughts greatly welcome, thank you! Our config is fairly simple: server: num-threads: 4 # Best performance is a "power of 2 close to the num-threads value" msg-cache-slabs: 4 rrset-cache-slabs: 4 infra-cache- slabs: 4 key-cache-slabs: 4 # Use 1.125GB of a 4GB node to start, but real usage may be 2.5x this so # closer to 2.8G/4GB (~70%) # msg-cache- size: 384m # Should be 2x the msg cache rrset-cache-size: 768m # We have libevent! Use lots of ports. outgoing-range: 8192 num-queries-per- thread: 4096 # Use larger socket buffers for busy servers. so-rcvbuf: 8m so-sndbuf: 8m # Turn on port reuse so-reuseport: yes # This is needed to forward queries for private PTR records to upstream DNS servers unblock- lan-zones: yes forward-zone: name: "int.domain.tld" forward-addr: "10.10.5.5" # No caching in unbound forward-no-cache: "yes"
