Hi Finally I could reproduce the situation. commit: d594711a5d79859a7d0bde83a516f7ab52051d9b branch:stable/1710
git diff <https://paste.ubuntu.com/26213344/> startup.conf <https://paste.ubuntu.com/26213820/> vpp commands: vppctl set interface l2 bridge TenGigabitEthernet6/0/0 1 vppctl set interface l2 bridge TenGigabitEthernet6/0/1 1 vppctl set int state TenGigabitEthernet6/0/0 up vppctl set int state TenGigabitEthernet6/0/1 up vppctl set acl-plugin session timeout udp idle 5 vppctl set acl-plugin session timeout tcp idle 10 vppctl set acl-plugin session timeout tcp transient 5 vat> acl_add_replace permit+reflect acl_interface_add_del sw_if_index 1 add input acl 0 acl_interface_add_del sw_if_index 1 add output acl 0 acl_interface_add_del sw_if_index 2 add input acl 0 acl_interface_add_del sw_if_index 2 add output acl 0 bizarre output of 'sh acl-plugin sessions' : output <https://paste.ubuntu.com/26214222/> lookat thread #1 sw_if_index_2!!! gdb backtrace <https://paste.ubuntu.com/26214237/> T-rex run as I said earlier. Hardware Info <https://paste.ubuntu.com/26214263/> Regards, Khers On Wed, Nov 8, 2017 at 8:48 PM, Andrew Yourtchenko <ayour...@gmail.com> wrote: > Dear Khers, > > That is without applying the one liner change that I have proposed, right ? > > I would suggest to retry the reproduction on the same commit where you > were previously able to reproduce it, and if it is reliably reproducible > there - to apply that change and see if it addresses the issue. Then we can > track if the latest commit fixed or merely masked it... > > --a > > On 8 Nov 2017, at 08:40, khers <s3m2e1.6s...@gmail.com> wrote: > > Dear Andrew > > Sorry for my delay, I get last revision of master (commit : > e695cb4dbdb6f9424ac5a567799e67f791fad328 ), and > segfault did not occur with the same environment and test scenario. I will > try to reproduce the potential bug > with running test with longer duration and more aggressive scenario. > > Regards, > Khers > > On Wed, Oct 25, 2017 at 1:45 PM, Andrew 👽 Yourtchenko <ayour...@gmail.com > > wrote: > >> Dear Khers, >> >> okay, cool! When testing the debug image, you could save the full dump >> and the .debs for all the artefacts so just in case I could grab the >> entire set of info and was able to look at it in my environment. >> >> Meantime, I had an idea for another potential failure mode, whereby >> the session would get checked while there is a session being freed, >> potentially resulting in a reallocation of the free bitmap in the >> pool. >> >> So before the reproduction in the debug build, give a shot to this >> one-line change >> in the release build and see if you still can reproduce the crash with >> it: >> >> --- a/src/plugins/acl/fa_node.c >> +++ b/src/plugins/acl/fa_node.c >> @@ -609,6 +609,8 @@ acl_fa_verify_init_sessions (acl_main_t * am) >> for (wk = 0; wk < vec_len (am->per_worker_data); wk++) { >> acl_fa_per_worker_data_t *pw = &am->per_worker_data[wk]; >> pool_alloc_aligned(pw->fa_sessions_pool, >> am->fa_conn_table_max_entries, CLIB_CACHE_LINE_BYTES); >> + /* preallocate the free bitmap */ >> + clib_bitmap_validate(pool_header(pw->fa_sessions_pool)->free >> _bitmap, >> am->fa_conn_table_max_entries); >> } >> >> --a >> >> On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote: >> > Dear Andrew >> > >> > I used latest version of master branch, I will replay the test with >> debug >> > build to make more debug info ASAP. >> > Vpp is running on Xeon E5-2600 series. >> > I did the tanother tests with two rx-queue and two worker, also with 4 >> > rx-queue and 4 worker, I got segmentation fault on the same function. >> > >> > I will send more info in few days. >> > >> > Regards, >> > Khers >> > >> > On Oct 24, 2017 6:43 PM, "Andrew 👽 Yourtchenko" <ayour...@gmail.com> >> > wrote: >> > >> >> Dear Khers, >> >> >> >> Thanks for the info! >> >> >> >> I tried with these configs in my local setup (I tried even to increase >> >> the multi-cpu contention by specifying 4 rx-queues instead of 2), but >> >> it works ok for me on the master. What is the version you are testing >> >> with ? I presume it is also the master, but just wanted to verify. >> >> >> >> To try to get more info about this happening: could you give a shot at >> >> reproducing this on the debug build ? There are a few asserts that >> >> would be handy to verify that they do hold true during your tests - >> >> the location of the crash points to either the pool header being >> >> corrupted by something (the asserts should catch that) or the pool >> >> itself reallocated and memory used by something else (which should not >> >> happen because the memory is preallocated during the initialisation >> >> time - unless you change the max number of sessions after >> >> initialisation). >> >> >> >> Also, could you tell a bit more about the hardware you are testing >> >> with ? (cat /proc/cpuinfo) >> >> >> >> --a >> >> >> >> On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote: >> >> > Dear Andrew >> >> > >> >> > Thanks for your attention. >> >> > Trex config file <https://paste.ubuntu.com/25807801/> >> >> > Trex scenario is default sfr.yaml. >> >> > vpp: startup.conf <https://paste.ubuntu.com/25807840/> >> >> > I changed size of acl_mheap to '(uword)2<<32' in acl.c >> >> > vpp config: >> >> > vppctl set interface l2 bridge TenGigabitEthernet86/0/0 1 >> >> > vppctl set interface l2 bridge TenGigabitEthernet86/0/1 1 >> >> > >> >> > vppctl set int state TenGigabitEthernet86/0/0 up >> >> > vppctl set int state TenGigabitEthernet86/0/1 up >> >> > >> >> > vppctl set acl-plugin session table hash-table-buckets 1000000 >> >> > vppctl set acl-plugin session table hash-table-memory 2147483648 >> >> > >> >> > vppctl set acl-plugin session timeout udp idle 5 >> >> > vppctl set acl-plugin session timeout tcp idle 10 >> >> > vppctl set acl-plugin session timeout tcp transient 5 >> >> > >> >> > Regards, >> >> > Khers >> >> > >> >> > >> >> > On Mon, Oct 23, 2017 at 7:52 PM, Andrew 👽 Yourtchenko < >> >> ayour...@gmail.com> >> >> > wrote: >> >> > >> >> >> Hi, >> >> >> >> >> >> could you share the exact TRex and VPP config files, so I could >> >> >> recreate it locally to investigate further ? >> >> >> >> >> >> Thanks a lot! >> >> >> >> >> >> --a >> >> >> >> >> >> On 10/23/17, khers <s3m2e1.6s...@gmail.com> wrote: >> >> >> > Dear folks >> >> >> > >> >> >> > I have bridged two interfaces and set permit+reflect acl on the >> >> >> > input >> >> >> > of >> >> >> > interface one and deny rule on output of same interface as follow: >> >> >> > >> >> >> > acl_add_replace permit+reflect >> >> >> > acl_add_replace deny >> >> >> > >> >> >> > acl_interface_add_del sw_if_index 1 add input acl 0 >> >> >> > acl_interface_add_del sw_if_index 1 add output acl 1 >> >> >> > >> >> >> > >> >> >> > after about 100 seconds of running Trex with sfr scenario I got >> >> >> > sigsegv. >> >> >> > this is gdb's backtrace <https://pastebin.com/VvZ9Z3Nf>. >> >> >> > >> >> >> > Trex : >> >> >> > ./t-rex-64 -f cap2/sfr.yaml -m 5 -c 4 >> >> >> > >> >> >> > >> >> >> > Regards, >> >> >> > Khers >> >> >> > >> >> >> >> >> > >> >> >> > >> > >
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev