Hi Sarath, 

> On Mar 29, 2019, at 8:47 PM, Sharath Kumar 
> <sharathkumarboyanapa...@gmail.com> wrote:
> 
> Hi Florin, 
> 
> The patch doesn't fix any of the issues with epoll/select.
> 
> The usecase is like below
> 1.Main thread calls epoll_create.
> 2.One of the non-main threads calls epoll_ctl.
> 3.And another non-main thread calls epoll_wait.
> 
> All the 3 threads above operate on a single epoll fd.
> 
> Is this usecase supported? 

No, that usecase is not supported. 

> 
> How to register these non-main threads [2 and 3] as workers with vcl?
> I am a newbie to VPP, I have no idea about this. Can you give me some input 
> on this?
> 
> Would registering these non-main threads [2,3] as workers with vcl resolve my 
> problem? 
> Did you mean LDP doesn't support this kind of registration? 

There are several layers that help integrating applications with vpp’s host 
stack:
- VCL(VPPCOM library): it facilitates the interaction with the session layer in 
vpp by exposing a set of apis that are similar to POSIX socket apis. That means 
applications don’t have to interact with vpp’s binary api, don’t have to 
directly work with shared memory fifos and more importantly they get implicit 
support for async communication mechanisms like epoll/select. For performance 
reasons, VCL avoids as much as possible locking. As as result, it doesn’t allow 
sharing of sessions (or session handles/fds from app perspective) between app 
threads or processes (in case the apps fork). However, if they need more 
workers for performance reasons, applications can register their worker threads 
with vcl (see vppcom_worker_register). Sessions cannot be shared between 
workers but each worker can have its own epoll/select loop. 
- VLS (VCL locked sessions): as the name suggests, it employs a set of locks 
that allow: 1) multi threaded apps that have one dispatcher thread and N 
‘worker’ threads to transparently work with vcl. In this scenario, vcl “sees” 
only one worker. Expectation is that only the dispatcher thread (main thread) 
interacts with epoll/select. 2) multi-process apps to work with vcl, but for 
that it employs additional logic when applications fork. Every child/forked 
process is registered with vcl by vls, so vcl sees more workers. 
- LDP (LD_PRELOAD): this is a shim that intercepts network related syscalls and 
redirects them into vls. Its goal is to have applications work unchanged with 
vpp's host stack. Since there are no POSIX apis for registering workers with 
the kernel, ldp cannot register app workers with vls/vcl. 

As far as I can tell, you’re running ldp through dmm. Thus, to support your 
usecase, you’d probably have to change your app to directly work with vls or 
vcl. 

Hope this helps, 
Florin

> 
> Thanks, 
> Sharath. 
> 
> 
> On Sat 30 Mar, 2019, 1:27 AM Florin Coras, <fcoras.li...@gmail.com 
> <mailto:fcoras.li...@gmail.com>> wrote:
> Just so I understand, does the patch not fix the epoll issues or does it fix 
> the issues but it doesn’t fix select, which apparently crashes in a different 
> way. 
> 
> Second, what is your usecase/app? Are you actually trying to share 
> epoll/select between multiple threads? That is, multiple threads might want 
> to call epoll_wait/select at the same time? That is not supported. The 
> implicit assumption is that only the dispatcher thread is to call the two 
> functions the rest of the threads do only io work. 
> 
> If all the threads must handle async communication via epoll/select, then 
> they should register themselves as workers with vcl and get their own epoll 
> fd. LDP does not support that. 
> 
> Florin
> 
>> On Mar 29, 2019, at 12:13 PM, Sharath Kumar 
>> <sharathkumarboyanapa...@gmail.com 
>> <mailto:sharathkumarboyanapa...@gmail.com>> wrote:
>> 
>> No, it doesn't work. 
>> 
>> Attaching the applications being used.
>> 
>> "Select" also has similar kind of issue when called from non-main thread
>> 
>> Thread 9 "nstack_select" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fffd77fe700 (LWP 63170)]
>> 0x00007ffff4e1d032 in ldp_select_init_maps (original=0x7fffbc0008c0, 
>> resultb=0x7fffe002e514, libcb=0x7fffe002e544, vclb=0x7fffe002e52c, nfds=34, 
>> minbits=64, n_bytes=5, si_bits=0x7fffd77fdc20, 
>>     libc_bits=0x7fffd77fdc28) at 
>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:601
>> 601    clib_bitmap_validate (*vclb, minbits);
>> (gdb) bt
>> #0  0x00007ffff4e1d032 in ldp_select_init_maps (original=0x7fffbc0008c0, 
>> resultb=0x7fffe002e514, libcb=0x7fffe002e544, vclb=0x7fffe002e52c, nfds=34, 
>> minbits=64, n_bytes=5, si_bits=0x7fffd77fdc20, 
>>     libc_bits=0x7fffd77fdc28) at 
>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:601
>> #1  0x00007ffff4e1db47 in ldp_pselect (nfds=34, readfds=0x7fffbc0008c0, 
>> writefds=0x7fffbc000cd0, exceptfds=0x7fffbc0010e0, timeout=0x7fffd77fdcb0, 
>> sigmask=0x0)
>>     at 
>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:723
>> #2  0x00007ffff4e1e5d5 in select (nfds=34, readfds=0x7fffbc0008c0, 
>> writefds=0x7fffbc000cd0, exceptfds=0x7fffbc0010e0, timeout=0x7fffd77fdd20)
>>     at 
>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:857
>> #3  0x00007ffff7b4c42a in nstack_select_thread (arg=0x0) at 
>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/select/nstack_select.c:651
>> #4  0x00007ffff78ed6ba in start_thread (arg=0x7fffd77fe700) at 
>> pthread_create.c:333
>> #5  0x00007ffff741b41d in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>> 
>> 
>> Before https://gerrit.fd.io/r/#/c/18597/ <https://gerrit.fd.io/r/#/c/18597/> 
>> I have tried to fix the issue.
>> 
>> The below changes fixed epoll_wait and epoll_ctl issues for me.[doesn't 
>> include the changes of https://gerrit.fd.io/r/#/c/18597/ 
>> <https://gerrit.fd.io/r/#/c/18597/>]
>> 
>> diff --git a/src/vcl/vcl_locked.c b/src/vcl/vcl_locked.c
>> index fb19b5d..e6c891b 100644
>> --- a/src/vcl/vcl_locked.c
>> +++ b/src/vcl/vcl_locked.c
>> @@ -564,7 +564,10 @@ vls_attr (vls_handle_t vlsh, uint32_t op, void *buffer, 
>> uint32_t * buflen)
>>  
>>    if (!(vls = vls_get_w_dlock (vlsh)))
>>      return VPPCOM_EBADFD;
>> +
>> +  vls_mt_guard (0, VLS_MT_OP_XPOLL);
>>    rv = vppcom_session_attr (vls_to_sh_tu (vls), op, buffer, buflen);
>> +  vls_mt_unguard ();
>>    vls_get_and_unlock (vlsh);
>>    return rv;
>>  }
>> @@ -773,8 +776,10 @@ vls_epoll_ctl (vls_handle_t ep_vlsh, int op, 
>> vls_handle_t vlsh,
>>    vls_table_rlock ();
>>    ep_vls = vls_get_and_lock (ep_vlsh);
>>    vls = vls_get_and_lock (vlsh);
>> +  vls_mt_guard (0, VLS_MT_OP_XPOLL);
>>    ep_sh = vls_to_sh (ep_vls);
>>    sh = vls_to_sh (vls);
>> +  vls_mt_unguard ();
>>  
>>    if (PREDICT_FALSE (!vlsl->epoll_mp_check))
>>      vls_epoll_ctl_mp_checks (vls, op);
>> 
>> Thanks,
>> Sharath.
>> 
>> On Fri, Mar 29, 2019 at 9:15 PM Florin Coras <fcoras.li...@gmail.com 
>> <mailto:fcoras.li...@gmail.com>> wrote:
>> Interesting. What application are you running and does this [1] fix the 
>> issue for you?
>> 
>> In short, many of vls’ apis check if the call is coming in on a new pthread 
>> and program vcl accordingly if yes. The patch makes sure vls_attr does that 
>> as well.
>> 
>> Thanks, 
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/#/c/18597/ <https://gerrit.fd.io/r/#/c/18597/>
>> 
>>> On Mar 29, 2019, at 4:29 AM, Dave Barach via Lists.Fd.Io 
>>> <http://lists.fd.io/> <dbarach=cisco....@lists.fd.io 
>>> <mailto:dbarach=cisco....@lists.fd.io>> wrote:
>>> 
>>> For whatever reason, the vls layer received an event notification which 
>>> didn’t end well. vcl_worker_get (wrk_index=4294967295) [aka 0xFFFFFFFF] 
>>> will never work.
>>>  
>>> I’ll let Florin comment further. He’s in the PDT time zone, so don’t expect 
>>> to hear from him for a few hours.
>>>  
>>> D.
>>>  
>>> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io 
>>> <mailto:vpp-dev@lists.fd.io>> On Behalf Of sharath kumar
>>> Sent: Friday, March 29, 2019 12:18 AM
>>> To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>; csit-...@lists.fd.io 
>>> <mailto:csit-...@lists.fd.io>
>>> Subject: [vpp-dev] multi-threaded application, "epoll_wait" and "epoll_ctl" 
>>> have "received signal SIGABRT, Aborted".
>>>  
>>> Hello all,
>>>  
>>> I am a newbie to VPP.
>>>  
>>> I am trying to run VPP with a multi-threaded application.
>>> "recv" works fine from non-main threads,
>>> whereas "epoll_wait" and "epoll_ctl" have "received signal SIGABRT, 
>>> Aborted".
>>>  
>>> Is this a known issue?
>>> Or am I doing something wrong?
>>>  
>>> Attaching  backtrace for  "epoll_wait" and "epoll_ctl"
>>>  
>>> Thread 9 "dmm_vcl_epoll" received signal SIGABRT, Aborted.
>>> [Switching to Thread 0x7fffd67fe700 (LWP 56234)]
>>> 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at 
>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>> 54          ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>> (gdb) bt
>>> #0  0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at 
>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>> #1  0x00007ffff734b02a in __GI_abort () at abort.c:89
>>> #2  0x00007ffff496d873 in os_panic () at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/unix-misc.c:176
>>> #3  0x00007ffff48ce42c in debugger () at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:84
>>> #4  0x00007ffff48ce864 in _clib_error (how_to_die=2, function_name=0x0, 
>>> line_number=0, fmt=0x7ffff4bfe0e0 "%s:%d (%s) assertion `%s' fails")
>>>     at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:143
>>> #5  0x00007ffff4bcca7d in vcl_worker_get (wrk_index=4294967295) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:540
>>> #6  0x00007ffff4bccabe in vcl_worker_get_current () at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:554
>>> #7  0x00007ffff4bd7c49 in vppcom_session_attr (session_handle=4278190080, 
>>> op=6, buffer=0x0, buflen=0x0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vppcom.c:2606
>>> #8  0x00007ffff4bfc7fd in vls_attr (vlsh=0, op=6, buffer=0x0, buflen=0x0) 
>>> at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_locked.c:569
>>> #9  0x00007ffff4e21736 in ldp_epoll_pwait (epfd=32, events=0x7fffd67fad20, 
>>> maxevents=1024, timeout=100, sigmask=0x0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2203
>>> #10 0x00007ffff4e21948 in epoll_wait (epfd=32, events=0x7fffd67fad20, 
>>> maxevents=1024, timeout=100) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2257
>>> #11 0x00007ffff4e13041 in dmm_vcl_epoll_thread (arg=0x0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/dmm_vcl_adpt.c:75
>>> #12 0x00007ffff78ed6ba in start_thread (arg=0x7fffd67fe700) at 
>>> pthread_create.c:333
>>> #13 0x00007ffff741b41d in clone () at 
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>  
>>>  
>>>  
>>>  
>>> Thread 11 "vs_epoll" received signal SIGABRT, Aborted.
>>> 0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at 
>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>> 54          ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>> (gdb) bt
>>> #0  0x00007ffff7349428 in __GI_raise (sig=sig@entry=6) at 
>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>> #1  0x00007ffff734b02a in __GI_abort () at abort.c:89
>>> #2  0x00007ffff496d873 in os_panic () at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/unix-misc.c:176
>>> #3  0x00007ffff48ce42c in debugger () at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:84
>>> #4  0x00007ffff48ce864 in _clib_error (how_to_die=2, function_name=0x0, 
>>> line_number=0, fmt=0x7ffff4bfe1a0 "%s:%d (%s) assertion `%s' fails")
>>>     at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vppinfra/error.c:143
>>> #5  0x00007ffff4bcca7d in vcl_worker_get (wrk_index=4294967295) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:540
>>> #6  0x00007ffff4bccabe in vcl_worker_get_current () at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_private.h:554
>>> #7  0x00007ffff4bd597a in vppcom_epoll_ctl (vep_handle=4278190080, op=1, 
>>> session_handle=4278190082, event=0x7fffd4dfb3b0)
>>>     at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vppcom.c:2152
>>> #8  0x00007ffff4bfd061 in vls_epoll_ctl (ep_vlsh=0, op=1, vlsh=2, 
>>> event=0x7fffd4dfb3b0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/vcl_locked.c:787
>>> #9  0x00007ffff4e213b6 in epoll_ctl (epfd=32, op=1, fd=34, 
>>> event=0x7fffd4dfb3b0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/ldp.c:2118
>>> #10 0x00007ffff4e12f88 in vpphs_ep_ctl_ops (epFD=-1, proFD=34, ctl_ops=0, 
>>> events=0x7fffd5190078, pdata=0x7fffd53f01d0)
>>>     at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/stacks/vpp/vpp/src/vcl/dmm_vcl_adpt.c:48
>>> #11 0x00007ffff7b4d502 in nsep_epctl_triggle (epi=0x7fffd5190018, 
>>> info=0x7fffd53f01d0, triggle_ops=0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:134
>>> #12 0x00007ffff7b4de31 in nsep_insert_node (ep=0x7fffd50bffa8, 
>>> event=0x7fffd4dfb5a0, fdInfo=0x7fffd53f01d0)
>>>     at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:250
>>> #13 0x00007ffff7b4e480 in nsep_epctl_add (ep=0x7fffd50bffa8, fd=22, 
>>> events=0x7fffd4dfb5a0) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/event/epoll/nstack_eventpoll.c:294
>>> #14 0x00007ffff7b44db0 in nstack_epoll_ctl (epfd=21, op=1, fd=22, 
>>> event=0x7fffd4dfb630) at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/src/nSocket/nstack/nstack_socket.c:2499
>>> #15 0x0000000000401e65 in process_server_msg_thread (pArgv=<optimized out>) 
>>> at 
>>> /home/root1/sharath/2019/vpp_ver/19.04/dmm/app_example/perf-test/multi_tcp_epoll_app_Ser.c:369
>>> #16 0x00007ffff78ed6ba in start_thread (arg=0x7fffd4dff700) at 
>>> pthread_create.c:333
>>> #17 0x00007ffff741b41d in clone () at 
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>  
>>> Thanks and Regards,
>>> Sharath.
>>> -=-=-=-=-=-=-=-=-=-=-=-
>>> Links: You receive all messages sent to this group.
>>> 
>>> View/Reply Online (#12665): https://lists.fd.io/g/vpp-dev/message/12665 
>>> <https://lists.fd.io/g/vpp-dev/message/12665>
>>> Mute This Topic: https://lists.fd.io/mt/30819724/675152 
>>> <https://lists.fd.io/mt/30819724/675152>
>>> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io>
>>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub 
>>> <https://lists.fd.io/g/vpp-dev/unsub>  [fcoras.li...@gmail.com 
>>> <mailto:fcoras.li...@gmail.com>]
>>> -=-=-=-=-=-=-=-=-=-=-=-
>> 
>> <multi_tcp_epoll_app_Ser.c><multi_tcp_select_app_Ser.c>
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12678): https://lists.fd.io/g/vpp-dev/message/12678
Mute This Topic: https://lists.fd.io/mt/30819724/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to