Hi Alex,

On 1/31/19 2:14 AM, Alex Zorin via Unbound-users wrote:
> Hi Wouter,
> 
> Some background: The .pl ccTLD currently has some issues with 0x20[1] that 
> causes Unbound to go into capsforid fallback when it encounters one or two 
> nameservers used by pl.
> 
> That's not a problem in itself, but I believe Unbound may be occasionally 
> triggering a false positive failure during capsforid fallback. 

Thanks for the detailed report.  Yes, you are right, it should get
sorted.  I have made a fix, also in the code repository that sorts the
rrsets before comparison.

I hope this can solve your problem.

Best regards, Wouter


Index: iterator/iter_utils.c
===================================================================
--- iterator/iter_utils.c       (revision 5099)
+++ iterator/iter_utils.c       (working copy)
@@ -882,10 +882,34 @@
        return 1;
 }

+/** compare rrsets and sort canonically.  Compares rrset name, type, class.
+ */
+static int
+rrset_canonical_sort_cmp(const void* x, const void* y)
+{
+       struct ub_packed_rrset_key* rrx = (struct ub_packed_rrset_key*)x;
+       struct ub_packed_rrset_key* rry = (struct ub_packed_rrset_key*)y;
+       int r = dname_canonical_compare(rrx->rk.dname, rry->rk.dname);
+       if(r != 0)
+               return r;
+       if(rrx->rk.type != rry->rk.type) {
+               if(ntohs(rrx->rk.type) > ntohs(rry->rk.type))
+                       return 1;
+               else    return -1;
+       }
+       if(rrx->rk.rrset_class != rry->rk.rrset_class) {
+               if(ntohs(rrx->rk.rrset_class) > ntohs(rry->rk.rrset_class))
+                       return 1;
+               else    return -1;
+       }
+       return 0;
+}
+
 int
 reply_equal(struct reply_info* p, struct reply_info* q, struct
regional* region)
 {
        size_t i;
+       struct ub_packed_rrset_key** sorted_p, **sorted_q;
        if(p->flags != q->flags ||
                p->qdcount != q->qdcount ||
                /* do not check TTL, this may differ */
@@ -899,16 +923,40 @@
                p->ar_numrrsets != q->ar_numrrsets ||
                p->rrset_count != q->rrset_count)
                return 0;
+       /* sort the rrsets in the authority and additional sections before
+        * compare, the query and answer sections are ordered in the sequence
+        * they should have (eg. one after the other for aliases). */
+       sorted_p = (struct ub_packed_rrset_key**)regional_alloc_init(
+               region, p->rrsets, sizeof(*sorted_p)*p->rrset_count);
+       if(!sorted_p) return 0;
+       log_assert(p->an_numrrsets + p->ns_numrrsets + p->ar_numrrsets <=
+               p->rrset_count);
+       qsort(sorted_p + p->an_numrrsets, p->ns_numrrsets,
+               sizeof(*sorted_p), rrset_canonical_sort_cmp);
+       qsort(sorted_p + p->an_numrrsets + p->ns_numrrsets, p->ar_numrrsets,
+               sizeof(*sorted_p), rrset_canonical_sort_cmp);
+
+       sorted_q = (struct ub_packed_rrset_key**)regional_alloc_init(
+               region, q->rrsets, sizeof(*sorted_q)*q->rrset_count);
+       if(!sorted_q) return 0;
+       log_assert(q->an_numrrsets + q->ns_numrrsets + q->ar_numrrsets <=
+               q->rrset_count);
+       qsort(sorted_q + q->an_numrrsets, q->ns_numrrsets,
+               sizeof(*sorted_q), rrset_canonical_sort_cmp);
+       qsort(sorted_q + q->an_numrrsets + q->ns_numrrsets, q->ar_numrrsets,
+               sizeof(*sorted_q), rrset_canonical_sort_cmp);
+
+       /* compare the rrsets */
        for(i=0; i<p->rrset_count; i++) {
-               if(!rrset_equal(p->rrsets[i], q->rrsets[i])) {
-                       if(!rrset_canonical_equal(region, p->rrsets[i],
-                               q->rrsets[i])) {
+               if(!rrset_equal(sorted_p[i], sorted_q[i])) {
+                       if(!rrset_canonical_equal(region, sorted_p[i],
+                               sorted_q[i])) {
                                regional_free_all(region);
                                return 0;
                        }
-                       regional_free_all(region);
                }
        }
+       regional_free_all(region);
        return 1;
 }



> 
> I have attached:
> - 0001-capsforid-verbose-logging.patch: a verbose printf-debugging patch 
> affecting comparison routines used during capsforid fallback. (Against 
> Unbound 1.9.0rc1).
> - unbound-host.log.gz: the output of the unbound-host invocation below that 
> triggers this failure, with the above patch.
> - pl.pcap: the packet capture that correlates directly to the unbound-host 
> invocation.
> - unbound.conf: the conf file used for the unbound-host invocation.
> 
> This is the command that triggers the issue. Please keep in mind you may need 
> to try as many as 10+ times because due to the unlikely selection of the pl 
> nameserver(s) with the 0x20 issue that triggers the fallback in the first 
> place:
> 
>     $ unbound-host pie3aq.pl -t caa -d -vvv -C unbound.conf
> 
> My hypothesis for what is happening is that Unbound might not performing a 
> canonical sort of the additional RRs, which then triggers the capsforid fail:
> 
>   [1548894424] libunbound[5931:0] info: rrset 3 not equal
>   [1548894424] libunbound[5931:0] info: canonical basic compare, dname_len: 
> 16 vs 16
>   [1548894424] libunbound[5931:0] info: canonical basic compare, flags: 0 vs 0
>   [1548894424] libunbound[5931:0] info: canonical basic compare, type: 256 vs 
> 256
>   [1548894424] libunbound[5931:0] info: canonical basic compare, rrset_class: 
> 256 vs 256
>   [1548894424] libunbound[5931:0] info: canonical basic compare, ttl: 0 vs 0
>   [1548894424] libunbound[5931:0] info: canonical basic compare, count: 1 vs 1
>   [1548894424] libunbound[5931:0] info: canonical basic compare, rrsig_count: 
> 0 vs 0
>   [1548894424] libunbound[5931:0] info: canonical basic compare, trust: 1 vs 1
>   [1548894424] libunbound[5931:0] info: canonical basic compare, security: 0 
> vs 0
>   [1548894424] libunbound[5931:0] info: d vs d
>   [1548894424] libunbound[5931:0] info: n vs n
>   [1548894424] libunbound[5931:0] info: s vs s
>   [1548894424] libunbound[5931:0] info: 2 vs 1
>   [1548894424] libunbound[5931:0] info: rrset 3 not canonical equal
>   [1548894424] libunbound[5931:0] info: Capsforid fallback: getting different 
> replies, failed
> 
> In the above, we see that the label "dns1" being compared to "dns2" is what 
> ultimately triggers the capsforid failure.
> 
> In the pcap, I believe this refers to a comparison of DNS response 0x00006547 
> with DNS response 0x000006bd, where in the latter, the ordering of the 
> additional records is:
> 
>   dns2.pie3aq.pl: type A, class IN, addr 5.135.61.93
>   dns1.pie3aq.pl: type A, class IN, addr 51.75.34.178
> 
> and in the former, dns1 comes first.
> 
> My very shoddy reading of Unbound's code is that these additional records 
> should first undergo a canonical sort, in which case these messages should be 
> matching under capsforid.
> 
> I might be barking entirely up the wrong tree, but your guidance is very much 
> appreciated.
> 
> Thanks,
> Alex
> 
> --
> 
> 1. 
> https://lists.dns-oarc.net/pipermail/dns-operations/2019-January/018359.html
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to