Hi Stanislav,

IMHO the ASSERT is a false negative, even though the load-balance has been put 
in the pool, its memory is still valid and can still be safely retrieved from a 
worker. If the load-balance was re-used before all workers were done with it, 
then we’d have problems, but that doesn’t happen given the speed mismatch of 
the workers and the main thread – to guarantee that case a call to 
vlib_worker_wait_one_loop() in the load-balance code would be sufficient.

If you want to stop the asserts firing in a debug image you can #ifdef 
CLIB_DEBUG conditionally hold the lock on pool_put.

/neale

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> on behalf of Florin Coras via 
lists.fd.io <fcoras.lists=gmail....@lists.fd.io>
Date: Thursday, 4 November 2021 at 22:19
To: Stanislav Zaikin <zsta...@gmail.com>
Cc: Damjan Marion <dmar...@me.com>, vpp-dev <vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] a free_bitmap may expand while pool_put
Hi Stanislav,

Yes, binary api messages are handled by main thread. If the issue is purely 
binary api related, then yes, checking for pool (vectors or bitmap) realloc 
with single writer should work.

Note that the pool’s free_indices vector can also expand (not only the bitmap), 
so that should be checked as well.

Regards,
Florin


On Nov 4, 2021, at 1:58 PM, Stanislav Zaikin 
<zsta...@gmail.com<mailto:zsta...@gmail.com>> wrote:

Hi Damjan,
Hi Florin,

I may have given the wrong example, even though the IP API's marked as mt-safe 
(VL_API_IP_ROUTE_ADD_DEL/VL_API_IP_ROUTE_ADD_DEL_V2), they'd be called only 
from the main thread, right? So, only one pool_put will be called and it looks 
like the patch actually fixes the problem (I still don't like it).

I liked the idea with a vec of freed elements, but we still need to put a 
worker barrier to perform such an operation. I thought about something more 
lightweight, like CAS for a newly allocated bitmap/vector. But I find myself 
extremely incompetent to write any lock-free solutions.


On Thu, 4 Nov 2021 at 20:01, Florin Coras 
<fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>> wrote:
Agreed! Since pools are not thread safe, the whole load balance destroy scheme 
should probably be refactored.

Maybe add, with locks, the indices to be freed to a vector and once enough 
accumulate, or if a new element is needed, grab the worker barrier and actually 
free them all.

Regards,
Florin


On Nov 4, 2021, at 11:50 AM, Damjan Marion via lists.fd.io<http://lists.fd.io/> 
<dmarion=me....@lists.fd.io<mailto:dmarion=me....@lists.fd.io>> wrote:


Dear Stanislav,

It doesn’t look to me as a thread safe solution.

i.e imagine 2 threads call pool_put_will expand  on the same time, and there is 
just one free slot. Both will get negative answer, but 2nd put operation will 
actually expand.

—
Damjan


On 04.11.2021., at 18:24, Stanislav Zaikin 
<zsta...@gmail.com<mailto:zsta...@gmail.com>> wrote:

Hello folks,

In a multi-threaded environment (in my case I have 2 workers) I observed a 
crash, and thanks to Neale, it turned out that free_bitmap may expand while 
doing pool_put.
Let's say one thread is doing pool_put, while another thread is calling 
"pool_elt_at_index". I observed different addresses before and after checking 
"ASSERT (! pool_is_free (p, _e))" in that macro.

I prepared a patch [0], but it's kind of ugly. We don't have asserts in release 
mode, so why should we care about it?

On the other hand, 2 different threads can do 2 pool_puts simultaneously and we 
can lose one free element in the pool (and also additionally allocated bitmap).

For me, it's a place where it would be nice to have an mt-safe vec. What do you 
think?

[0] - https://gerrit.fd.io/r/c/vpp/+/34332

--
Best regards
Stanislav Zaikin







--
Best regards
Stanislav Zaikin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20433): https://lists.fd.io/g/vpp-dev/message/20433
Mute This Topic: https://lists.fd.io/mt/86821639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to