Hi Jon,

On 12/9/17, Jon Loeliger <j...@netgate.com> wrote:
> On Sat, Dec 9, 2017 at 8:16 AM, Andrew 👽 Yourtchenko <ayour...@gmail.com>
> wrote:
>
>> Jon,
>>
>
> Hi Andrew,
>
> Thanks for taking a look at this issue!
>
>
>> on api trace: does the below work ? (even though the current scenario
>> is trivially reproducible, the api traces are very useful for tougher
>> cases, and save a lot of typing while storytelling).
>>
>> DBGvpp# api trace on
>>
>> ..... do the things ....
>>
>> DBGvpp# api trace save macip-trace
>> API trace saved to /tmp/macip-trace
>> DBGvpp# api trace custom-dump /tmp/macip-trace
>> SCRIPT: macip_acl_add_replace -1 count 1 count 1 \
>>   ipv4 permit \
>>     src mac 00:00:00:00:00:00 mask 00:00:00:00:00:00 \
>>     src ip 0.0.0.0/0, \
>>
>> SCRIPT: macip_acl_interface_add_del sw_if_index 0 acl_index 0 add
>> SCRIPT: macip_acl_add_replace 0 count 1 count 1 \
>>   ipv4 permit \
>>     src mac 00:00:00:00:00:00 mask 00:00:00:00:00:00 \
>>     src ip 0.0.0.0/0, \
>>
>
> I think that is the right sequence.
>
>
> Now, to the issue itself: it's exactly as I described, but with a twist:
>> vnet_set_input_acl_intfc(), which is used under the hood to assign the
>> inacl on the interfaces, is quite picky - if there is an existing
>> inacl applied, it just quietly does nothing. (@DaveB - this kinda
>> feels strange, I am not sure what the logic is behind doing that.)
>>
>> Anyway, rather than debating on why it behaves this way, and,
>> especially since we actually are deleting the tables in question, it's
>> better to unapply the inacls first, and then reapply them after the
>> tables have been recreated.
>>
>
> This solves half the problem for me.  It looks like I can properly
> turn around and remove this ACL from the interface now!
>
> But I still have doubts; or at least I don't understand why the
> three table indices are 3 after initial creation, and 0 after they
> are replaced.
>
> The result is in https://gerrit.fd.io/r/#/c/9772/ - you can verify
>> that it addresses the issue.
>>
>
> I've left a comment on the code there.

if by "the issue" in the comment you mean the index being first zero
and then three,
this is not really a problem - like I wrote in the log... Anyway,
since I was also curious
to know a bit more about "why", let me expand on this here:
if I add this diff to the code:

diff --git a/src/plugins/acl/acl.c b/src/plugins/acl/acl.c
index 286b2e96..5b4b1828 100644
--- a/src/plugins/acl/acl.c
+++ b/src/plugins/acl/acl.c
@@ -476,6 +476,12 @@ acl_classify_add_del_table_small
(vnet_classify_main_t * cm, u8 * mask,
                                         table_index, current_data_flag,
                                         current_data_offset, is_add,
                                         1 /* delete_chain */ );
+  if (table_index) {
+    if (is_add)
+      clib_warning ("CLASSIFY-SMALL-TABLE: New table index: %d for
mask %U\n", *table_index, format_hex_bytes, mask, mask_len);
+    else
+      clib_warning ("CLASSIFY-SMALL-TABLE: deleting index %d", *table_index);
+  }
   clib_mem_set_heap (oldheap);
   return ret;
 }

I notice that upon the first creation the indices that we get are 0,
1, 2, 3, with the "3" being the last table created and being the head
of the table list, then when we do replace, the tables get deleted
(called three times with the same argument since it is in the
ipv4/ipv6/l2 table index - I should probably fix that), and the next
round of tables being created returns the new indices in the order of
3,2,1,0.

So, the table #0 is created last, is the new head of the chain - and
consequently this is the new index we see.

This difference this order comes from the logic that the
vnet_classify_new_table() uses to allocate new tables - it uses the
pool_get_aligned(), which first tries to reallocate the newly freed
elements from the freelist before expanding the pool. The freelist is
a vector, with the latest freed element at the tail of it. And the new
allocations use that element. Because table #3 is the head of the
chain, when deleting the chain, it is freed last. So, in the new
allocation it becomes the first table to be allocated. So, we get the
indices in reverse order.
If we do the same operation again, we will see that this phenomenon is
(as expected) symmetric - this time index 0 is the head of the list,
so it is freed up last, and becomes the first one available for
allocation by the pool alloc. So, if we do two updates, the indices
get again in the order of 0,1,2,3, and the head of the table chain
becomes again 3.

>
> Despite what Gerrit thinks, this code does compile and run for me!
> So maybe just a "rebuild" request there will allow it to verify?

Actually, the faulty test case (which was supposed to detect this) was botched
in a more interesting way than I thought. It seems to be a copy paste
or a merge error of some sorts, and it succeeded when the bug that you
caught was present, and failed when it was fixed... After staring at
the logic of it for a couple of hours I changed it to something that
is hopefully more obvious, and patch#2 has the fix for both the code
and the test case(s) - and they now correctly catch the bug you found
- you can doublecheck that if you undo the patch to acl.c the test
case now fails as it should have been from the start, so that this bug
would not surface... Alas, everything is easy in retrospect :)

>
>
>
>> Now, going on to "how exactly did this slip through" - seems the macip
>> tests are quite a bit too lenient than they should be. I'll need to
>> address that as well, though probably I will split the dot1q/dot1ad
>> test cases out, and in the process refactor things a bit... so in the
>> interests of your time, maybe 9772 can go with just an actual code
>> fix.
>>
>
> I've not read through the test as they stand today.
> I'd like to understand the "3 vs. 0" issue before I am happy to
> Code Review +1 this patch.

Sure! hopefully my above explanation clarifies. Actually thanks for
pushing me to dig into this, since this also clarifies another
phenomenon which I had observed earlier and was not sure of its
origins - for the policy ACL you might also observe similar backflips
of the classifier tables. So, now I know the reason :-)

>
>
>> I've dumped the entire debugging process into a log, which turned out
>> be fairly long, so to avoid sending the walls of text to the list,
>> I've dumped it elsewhere:
>> http://stdio.be/blog/2017-12-09-Debugging-VPP-MACIP-ACLs/
>
>
> And, excellent!  I read through that in quite some detail.  And I
> understand
> the "3 vs 0" issue I was seeing now!
>
> The two pieces I missed were:  The "show inacl type l2" to see where
> the chain was starting, and then noticing that the chain had indeed been
> reversed once the starting point was known.
>
> So thanks for that excellent debug layout and explanation.
>
> And thanks for including the intermediate step of showing why simply
> updating the interface at the end wasn't sufficient.  I had done that,
> but hadn't yet gotten into the next function to see it was being ignored.
>
> Thanks again for catching this.
>
>
> Thanks for fixing this!
>
>
>>
>> --a
>
>
> So, with that, I've left  a comment and a Review -1 on the patch.
> And the patch didn't Verify.  So where do we go from here?
>
> I'm good with the patch, and we need to rebuild it.  So do we just
> re-build the same patch or re-submit it as a new patch?  I will
> either update or Review a-new to +1 it!

Cool! Feel free to play with the latest rev, I think we should be good now..

--a


>
> Thanks,
> jdl
>
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to