Hello squid users,

I'm trying to understand a strange problem with requests to edge.apple.com,
which I think may be related to IPv6 DNS resolution.

To set the scene - we operate a large (1,000+) fleet of Squid 3.5.25 caches.
Each runs on a separate LAN, connected to the internet via another upstream
proxy, accessed over a wide-area network.  Each local cache runs on a CentOS
6 box, incuding BIND for name resolution.  For DNS resolution, each local
CentOS server runs BIND, which is configured to resolve against a local
Microsoft DNS server, which then resolves internet queries using a
whole-of-WAN BIND service operated by the carrier.  The WAN does not support
IPv6, and CentOS does not have any v6 network interfaces configured.

Recently we became aware of a fault on a single cache serving requests for
edge.icloud.com.  Requests would time out with a TAG_NONE/503 written to the
log.  The error could be replicated with cURL at the CLI using this URL:
https://edge.icloud.com/perf.css.  This was a strange error, because at the
time it happened, it was possible to connect to edge.icloud.com on port 443.
The error was happening in just one site.

To isolate the fault we stripped the Squid config at the affected site right
back to the following:

        # Skeleton Squid 3.5.25 config
        shutdown_lifetime 2 seconds
        max_filedesc 16384
        coredump_dir /var/spool/squid
        dns_timeout 5 seconds
        error_directory /var/www/squid-errors
        logfile_rotate 0
        http_port 3128
        cache_dir ufs /var/spool/squid 8192 16 256
        maximum_object_size 536870912 bytes
        cache_replacement_policy heap LFUDA
        http_access allow localhost
        debug_options ALL,5

Here's the messages written to the log when fetching
https://edge.icloud.com/perf.css with curl:

2018/05/08 16:25:46.321 kid1| 14,3| ipcache.cc(362) ipcacheParse: 18 answers
for 'edge.icloud.com'
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #0 [2403:300:a50:105::f]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #1 [2403:300:a50:105::9]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #2 [2403:300:a50:100::e]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #3 [2403:300:a50:101::5]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #4 [2403:300:a50:104::e]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #5 [2403:300:a50:104::9]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #6 [2403:300:a50:104::5]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(431) ipcacheParse:
edge.icloud.com #7 [2403:300:a50:101::6]
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #8 17.248.155.107
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #9 17.248.155.142
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #10 17.248.155.110
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #11 17.248.155.80
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #12 17.248.155.114
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #13 17.248.155.77
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #14 17.248.155.145
2018/05/08 16:25:46.322 kid1| 14,3| ipcache.cc(420) ipcacheParse:
edge.icloud.com #15 17.248.155.89
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(280) peerSelectDnsPaths:
Found sources for 'edge.icloud.com:443'
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(281) peerSelectDnsPaths:
always_direct = DENIED
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(282) peerSelectDnsPaths:
never_direct = DENIED
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:105::f]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:105::9]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:100::e]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:101::5]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:104::e]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:104::9]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:104::5]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=[::] remote=[2403:300:a50:101::6]:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.107:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.142:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.110:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.80:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.114:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.77:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.145:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
DIRECT = local=0.0.0.0 remote=17.248.155.89:443 flags=1
2018/05/08 16:25:46.322 kid1| 44,2| peer_select.cc(295) peerSelectDnsPaths:
timedout = 0

After adding "dns_v4_first on" to the config, this allowed the request to
proceed.  Great!

We're interested to know why this copy of Squid acted differently to all the
others.  One potential clue we noticed is that this particular LAN has a lot
of v6 trafflic flying around on it, compared to the rest of our networks.  I
understand Squid runs it's own DNS resolver.  Could something on the local
LAN be announcing itself and the resolver in Squid picking up on that,
switching around the order of DNS resolution so that it tries that v6
address first?

Anything else we should be looking for?  Guidance appreciated!

Cheers
Luke


_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

Reply via email to