Hi Stanislav, IMHO the ASSERT is a false negative, even though the load-balance has been put in the pool, its memory is still valid and can still be safely retrieved from a worker. If the load-balance was re-used before all workers were done with it, then we’d have problems, but that doesn’t happen given the speed mismatch of the workers and the main thread – to guarantee that case a call to vlib_worker_wait_one_loop() in the load-balance code would be sufficient.
If you want to stop the asserts firing in a debug image you can #ifdef CLIB_DEBUG conditionally hold the lock on pool_put. /neale From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> on behalf of Florin Coras via lists.fd.io <fcoras.lists=gmail....@lists.fd.io> Date: Thursday, 4 November 2021 at 22:19 To: Stanislav Zaikin <zsta...@gmail.com> Cc: Damjan Marion <dmar...@me.com>, vpp-dev <vpp-dev@lists.fd.io> Subject: Re: [vpp-dev] a free_bitmap may expand while pool_put Hi Stanislav, Yes, binary api messages are handled by main thread. If the issue is purely binary api related, then yes, checking for pool (vectors or bitmap) realloc with single writer should work. Note that the pool’s free_indices vector can also expand (not only the bitmap), so that should be checked as well. Regards, Florin On Nov 4, 2021, at 1:58 PM, Stanislav Zaikin <zsta...@gmail.com<mailto:zsta...@gmail.com>> wrote: Hi Damjan, Hi Florin, I may have given the wrong example, even though the IP API's marked as mt-safe (VL_API_IP_ROUTE_ADD_DEL/VL_API_IP_ROUTE_ADD_DEL_V2), they'd be called only from the main thread, right? So, only one pool_put will be called and it looks like the patch actually fixes the problem (I still don't like it). I liked the idea with a vec of freed elements, but we still need to put a worker barrier to perform such an operation. I thought about something more lightweight, like CAS for a newly allocated bitmap/vector. But I find myself extremely incompetent to write any lock-free solutions. On Thu, 4 Nov 2021 at 20:01, Florin Coras <fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>> wrote: Agreed! Since pools are not thread safe, the whole load balance destroy scheme should probably be refactored. Maybe add, with locks, the indices to be freed to a vector and once enough accumulate, or if a new element is needed, grab the worker barrier and actually free them all. Regards, Florin On Nov 4, 2021, at 11:50 AM, Damjan Marion via lists.fd.io<http://lists.fd.io/> <dmarion=me....@lists.fd.io<mailto:dmarion=me....@lists.fd.io>> wrote: Dear Stanislav, It doesn’t look to me as a thread safe solution. i.e imagine 2 threads call pool_put_will expand on the same time, and there is just one free slot. Both will get negative answer, but 2nd put operation will actually expand. — Damjan On 04.11.2021., at 18:24, Stanislav Zaikin <zsta...@gmail.com<mailto:zsta...@gmail.com>> wrote: Hello folks, In a multi-threaded environment (in my case I have 2 workers) I observed a crash, and thanks to Neale, it turned out that free_bitmap may expand while doing pool_put. Let's say one thread is doing pool_put, while another thread is calling "pool_elt_at_index". I observed different addresses before and after checking "ASSERT (! pool_is_free (p, _e))" in that macro. I prepared a patch [0], but it's kind of ugly. We don't have asserts in release mode, so why should we care about it? On the other hand, 2 different threads can do 2 pool_puts simultaneously and we can lose one free element in the pool (and also additionally allocated bitmap). For me, it's a place where it would be nice to have an mt-safe vec. What do you think? [0] - https://gerrit.fd.io/r/c/vpp/+/34332 -- Best regards Stanislav Zaikin -- Best regards Stanislav Zaikin
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20433): https://lists.fd.io/g/vpp-dev/message/20433 Mute This Topic: https://lists.fd.io/mt/86821639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-