Re: [vpp-dev] VPP and RCU?

2018-10-31 Thread Florin Coras


> On Oct 31, 2018, at 4:03 PM, Stephen Hemminger  
> wrote:
> 
> On Wed, 31 Oct 2018 00:24:36 -0700
> Florin Coras  wrote:
> 
>> No reader-writer locks are 100's of times slower.  In fact reader
>> write locks are slower than normal spin lock.
>> 
> 
> I guess you meant that in general, and I can see how for scenarios with  
 multiple writers and readers performance can be bad. But from your original
 message I assumed you’re mostly interested in concurrent read performance
 with few writes. For such scenarios I would expect our current, simple, 
 spin
 and rw lock implementations to not be that bad. If that’s provably not the 
 case,
 we should definitely consider doing RCU.  
> 
> Also worth noting that a common pattern in vpp is to have per thread data 
>  
 structures and then entirely avoid locking. For lookups we typically use 
 the
 bihash and that is thread safe.  
>>> When you say 'per thread data structures', does it mean the data structures 
>>> will be duplicated for each data plane thread?  
>> 
>> No, we don’t duplicate the data. Instead, we rely on RSS hashing to pin 
>> flows to workers and then build per worker state. 
>> 
>> For scenarios when that doesn’t work, we handoff flows between workers. 
> 
> Ok, the tradeoff is that having a single worker is a bottleneck, and if 
> packet arrives on one
> core and then is processed on another core there is a cache miss.

Handoffs are not the common case, that is, for things like l2, ip, tunneling 
protocols, connection oriented transports like tcp, as long as the interface 
supports rss hashing, flows are uniformly distributed (statistically) between 
the workers. So, from a pure forwarding perspective, we do get horizontal 
scaling because we don’t need any inter-worker locking. 

On the other hand, if the forwarding state needs to be updated frequently by 
either a “control plane” or an “in-band” protocol, then you’re right to point 
out that in some cases RCU is probably superior to what we currently use. The 
bihash and apis that are mp safe (e.g., ip route addition) would be some 
counter examples. 

> Per the original discussion, a reader/writer lock even uncontended requires 
> an atomic
> increment, and that increment is a locked instruction and a cache miss with 
> multiple readers.
> 
> RCU has zero overhead for readers.  The problem is pushed to the writer to 
> deal with.
> It works fine for data structures like lists or trees that can be updated 
> with a specific
> access pattern such that reader always sees valid data.

Agreed with both points. 

> 
> The case I was thinking of is things like flow and routing tables.

For the former, you may be able to use the bihash since it uses a more 
sophisticated rw-lock scheme than the simple rw-locks from vppinfra. As for the 
latter, I see your point if the goal is to program the table from any worker. 

Florin


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11061): https://lists.fd.io/g/vpp-dev/message/11061
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-31 Thread Stephen Hemminger
On Wed, 31 Oct 2018 00:24:36 -0700
Florin Coras  wrote:

>  No reader-writer locks are 100's of times slower.  In fact reader
>  write locks are slower than normal spin lock.
>    
> >>> 
> >>> I guess you meant that in general, and I can see how for scenarios with  
> >> multiple writers and readers performance can be bad. But from your original
> >> message I assumed you’re mostly interested in concurrent read performance
> >> with few writes. For such scenarios I would expect our current, simple, 
> >> spin
> >> and rw lock implementations to not be that bad. If that’s provably not the 
> >> case,
> >> we should definitely consider doing RCU.  
> >>> 
> >>> Also worth noting that a common pattern in vpp is to have per thread data 
> >>>  
> >> structures and then entirely avoid locking. For lookups we typically use 
> >> the
> >> bihash and that is thread safe.  
> > When you say 'per thread data structures', does it mean the data structures 
> > will be duplicated for each data plane thread?  
> 
> No, we don’t duplicate the data. Instead, we rely on RSS hashing to pin flows 
> to workers and then build per worker state. 
> 
> For scenarios when that doesn’t work, we handoff flows between workers. 

Ok, the tradeoff is that having a single worker is a bottleneck, and if packet 
arrives on one
core and then is processed on another core there is a cache miss.

Per the original discussion, a reader/writer lock even uncontended requires an 
atomic
increment, and that increment is a locked instruction and a cache miss with 
multiple readers.

RCU has zero overhead for readers.  The problem is pushed to the writer to deal 
with.
It works fine for data structures like lists or trees that can be updated with 
a specific
access pattern such that reader always sees valid data.

The case I was thinking of is things like flow and routing tables.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11059): https://lists.fd.io/g/vpp-dev/message/11059
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-31 Thread Florin Coras


> On Oct 30, 2018, at 9:02 PM, Honnappa Nagarahalli 
>  wrote:
> 
> 
> 
 
> Hi Stephen,
> 
> No, we don’t support RCU. Wouldn’t rw-locks be enough to support your
>> usecases?
> 
> Florin
> 
>> On Oct 29, 2018, at 12:40 PM, Stephen Hemminger
>>  wrote:
>> 
>> Is it possible to do Read Copy Update with VPP? Either using
>> Userspace RCU (https://librcu.org) or manually. RCU is very
>> efficient way to handle read mostly tables and other dynamic cases such
>> as plugins.
>> 
>> The several things that are needed are non-preempt, atomic update
>> of a pointer and a mechanism to be sure all active threads have
>> gone through a quiescent period. I don't think VPP will preempt
>> one node for another so that is done. The atomic update is
>> relatively easy with basic barriers, either from FD.IO, DPDK, or native
>> compiler operations. But is there an API to have a quiescent period marker in
>> the main VPP vector scheduler?
>> 
>> Something like the QSBR model of Userspace RCU library.
>> (each thread calls rcu_queiscent_state() periodically) would be
>> ideal.
>> 
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> 
>> View/Reply Online (#11023):
>> https://lists.fd.io/g/vpp-dev/message/11023
>> Mute This Topic: https://lists.fd.io/mt/27785182/675152
>> Group Owner: vpp-dev+ow...@lists.fd.io
>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
>> [fcoras.li...@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
> 
 
 No reader-writer locks are 100's of times slower.  In fact reader
 write locks are slower than normal spin lock.
 
>>> 
>>> I guess you meant that in general, and I can see how for scenarios with
>> multiple writers and readers performance can be bad. But from your original
>> message I assumed you’re mostly interested in concurrent read performance
>> with few writes. For such scenarios I would expect our current, simple, spin
>> and rw lock implementations to not be that bad. If that’s provably not the 
>> case,
>> we should definitely consider doing RCU.
>>> 
>>> Also worth noting that a common pattern in vpp is to have per thread data
>> structures and then entirely avoid locking. For lookups we typically use the
>> bihash and that is thread safe.
> When you say 'per thread data structures', does it mean the data structures 
> will be duplicated for each data plane thread?

No, we don’t duplicate the data. Instead, we rely on RSS hashing to pin flows 
to workers and then build per worker state. 

For scenarios when that doesn’t work, we handoff flows between workers. 

Florin   

> 
>>> 
>>> Florin
>>> 
>>> 
>> 
>> https://www.researchgate.net/publication/247337469_RCU_vs_Locking_Perf 
>> 
>> ormance_on_Different_CPUs
>> 
>> https://lwn.net/Articles/263130/ 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11047): https://lists.fd.io/g/vpp-dev/message/11047
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-30 Thread Honnappa Nagarahalli


> > >
> > >> Hi Stephen,
> > >>
> > >> No, we don’t support RCU. Wouldn’t rw-locks be enough to support your
> usecases?
> > >>
> > >> Florin
> > >>
> > >>> On Oct 29, 2018, at 12:40 PM, Stephen Hemminger
>  wrote:
> > >>>
> > >>> Is it possible to do Read Copy Update with VPP? Either using
> > >>> Userspace RCU (https://librcu.org) or manually. RCU is very
> > >>> efficient way to handle read mostly tables and other dynamic cases such
> as plugins.
> > >>>
> > >>> The several things that are needed are non-preempt, atomic update
> > >>> of a pointer and a mechanism to be sure all active threads have
> > >>> gone through a quiescent period. I don't think VPP will preempt
> > >>> one node for another so that is done. The atomic update is
> > >>> relatively easy with basic barriers, either from FD.IO, DPDK, or native
> compiler operations. But is there an API to have a quiescent period marker in
> the main VPP vector scheduler?
> > >>>
> > >>> Something like the QSBR model of Userspace RCU library.
> > >>> (each thread calls rcu_queiscent_state() periodically) would be
> > >>> ideal.
> > >>>
> > >>> -=-=-=-=-=-=-=-=-=-=-=-
> > >>> Links: You receive all messages sent to this group.
> > >>>
> > >>> View/Reply Online (#11023):
> > >>> https://lists.fd.io/g/vpp-dev/message/11023
> > >>> Mute This Topic: https://lists.fd.io/mt/27785182/675152
> > >>> Group Owner: vpp-dev+ow...@lists.fd.io
> > >>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
> [fcoras.li...@gmail.com]
> > >>> -=-=-=-=-=-=-=-=-=-=-=-
> > >>
> > >
> > > No reader-writer locks are 100's of times slower.  In fact reader
> > > write locks are slower than normal spin lock.
> > >
> >
> > I guess you meant that in general, and I can see how for scenarios with
> multiple writers and readers performance can be bad. But from your original
> message I assumed you’re mostly interested in concurrent read performance
> with few writes. For such scenarios I would expect our current, simple, spin
> and rw lock implementations to not be that bad. If that’s provably not the 
> case,
> we should definitely consider doing RCU.
> >
> > Also worth noting that a common pattern in vpp is to have per thread data
> structures and then entirely avoid locking. For lookups we typically use the
> bihash and that is thread safe.
When you say 'per thread data structures', does it mean the data structures 
will be duplicated for each data plane thread?

> >
> > Florin
> >
> >
> 
> https://www.researchgate.net/publication/247337469_RCU_vs_Locking_Perf
> ormance_on_Different_CPUs
> 
> https://lwn.net/Articles/263130/
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11045): https://lists.fd.io/g/vpp-dev/message/11045
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-30 Thread Stephen Hemminger
On Mon, 29 Oct 2018 14:58:07 -0700
Florin Coras  wrote:

> > On Oct 29, 2018, at 1:42 PM, Stephen Hemminger  
> > wrote:
> > 
> > On Mon, 29 Oct 2018 13:20:27 -0700
> > Florin Coras  wrote:
> >   
> >> Hi Stephen, 
> >> 
> >> No, we don’t support RCU. Wouldn’t rw-locks be enough to support your 
> >> usecases?
> >> 
> >> Florin
> >>   
> >>> On Oct 29, 2018, at 12:40 PM, Stephen Hemminger 
> >>>  wrote:
> >>> 
> >>> Is it possible to do Read Copy Update with VPP? Either using Userspace 
> >>> RCU (https://librcu.org)
> >>> or manually. RCU is very efficient way to handle read mostly tables and 
> >>> other dynamic cases
> >>> such as plugins.
> >>> 
> >>> The several things that are needed are non-preempt, atomic update of a 
> >>> pointer and a mechanism to be sure
> >>> all active threads have gone through a quiescent period. I don't think 
> >>> VPP will preempt one node
> >>> for another so that is done. The atomic update is relatively
> >>> easy with basic barriers, either from FD.IO, DPDK, or native compiler 
> >>> operations. But
> >>> is there an API to have a quiescent period marker in the main VPP vector 
> >>> scheduler?
> >>> 
> >>> Something like the QSBR model of Userspace RCU library.
> >>> (each thread calls rcu_queiscent_state() periodically)
> >>> would be ideal.
> >>> 
> >>> -=-=-=-=-=-=-=-=-=-=-=-
> >>> Links: You receive all messages sent to this group.
> >>> 
> >>> View/Reply Online (#11023): https://lists.fd.io/g/vpp-dev/message/11023
> >>> Mute This Topic: https://lists.fd.io/mt/27785182/675152
> >>> Group Owner: vpp-dev+ow...@lists.fd.io
> >>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> >>> -=-=-=-=-=-=-=-=-=-=-=-
> >>   
> > 
> > No reader-writer locks are 100's of times slower.  In fact reader write 
> > locks
> > are slower than normal spin lock.
> >   
> 
> I guess you meant that in general, and I can see how for scenarios with 
> multiple writers and readers performance can be bad. But from your original 
> message I assumed you’re mostly interested in concurrent read performance 
> with few writes. For such scenarios I would expect our current, simple, spin 
> and rw lock implementations to not be that bad. If that’s provably not the 
> case, we should definitely consider doing RCU. 
> 
> Also worth noting that a common pattern in vpp is to have per thread data 
> structures and then entirely avoid locking. For lookups we typically use the 
> bihash and that is thread safe.  
> 
> Florin
> 
>

https://www.researchgate.net/publication/247337469_RCU_vs_Locking_Performance_on_Different_CPUs

https://lwn.net/Articles/263130/
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11038): https://lists.fd.io/g/vpp-dev/message/11038
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-29 Thread Florin Coras


> On Oct 29, 2018, at 1:42 PM, Stephen Hemminger  
> wrote:
> 
> On Mon, 29 Oct 2018 13:20:27 -0700
> Florin Coras  wrote:
> 
>> Hi Stephen, 
>> 
>> No, we don’t support RCU. Wouldn’t rw-locks be enough to support your 
>> usecases?
>> 
>> Florin
>> 
>>> On Oct 29, 2018, at 12:40 PM, Stephen Hemminger 
>>>  wrote:
>>> 
>>> Is it possible to do Read Copy Update with VPP? Either using Userspace RCU 
>>> (https://librcu.org)
>>> or manually. RCU is very efficient way to handle read mostly tables and 
>>> other dynamic cases
>>> such as plugins.
>>> 
>>> The several things that are needed are non-preempt, atomic update of a 
>>> pointer and a mechanism to be sure
>>> all active threads have gone through a quiescent period. I don't think VPP 
>>> will preempt one node
>>> for another so that is done. The atomic update is relatively
>>> easy with basic barriers, either from FD.IO, DPDK, or native compiler 
>>> operations. But
>>> is there an API to have a quiescent period marker in the main VPP vector 
>>> scheduler?
>>> 
>>> Something like the QSBR model of Userspace RCU library.
>>> (each thread calls rcu_queiscent_state() periodically)
>>> would be ideal.
>>> 
>>> -=-=-=-=-=-=-=-=-=-=-=-
>>> Links: You receive all messages sent to this group.
>>> 
>>> View/Reply Online (#11023): https://lists.fd.io/g/vpp-dev/message/11023
>>> Mute This Topic: https://lists.fd.io/mt/27785182/675152
>>> Group Owner: vpp-dev+ow...@lists.fd.io
>>> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
>>> -=-=-=-=-=-=-=-=-=-=-=-  
>> 
> 
> No reader-writer locks are 100's of times slower.  In fact reader write locks
> are slower than normal spin lock.
> 

I guess you meant that in general, and I can see how for scenarios with 
multiple writers and readers performance can be bad. But from your original 
message I assumed you’re mostly interested in concurrent read performance with 
few writes. For such scenarios I would expect our current, simple, spin and rw 
lock implementations to not be that bad. If that’s provably not the case, we 
should definitely consider doing RCU. 

Also worth noting that a common pattern in vpp is to have per thread data 
structures and then entirely avoid locking. For lookups we typically use the 
bihash and that is thread safe.  

Florin


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11027): https://lists.fd.io/g/vpp-dev/message/11027
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-29 Thread Stephen Hemminger
On Mon, 29 Oct 2018 13:20:27 -0700
Florin Coras  wrote:

> Hi Stephen, 
> 
> No, we don’t support RCU. Wouldn’t rw-locks be enough to support your 
> usecases?
> 
> Florin
> 
> > On Oct 29, 2018, at 12:40 PM, Stephen Hemminger 
> >  wrote:
> > 
> > Is it possible to do Read Copy Update with VPP? Either using Userspace RCU 
> > (https://librcu.org)
> > or manually. RCU is very efficient way to handle read mostly tables and 
> > other dynamic cases
> > such as plugins.
> > 
> > The several things that are needed are non-preempt, atomic update of a 
> > pointer and a mechanism to be sure
> > all active threads have gone through a quiescent period. I don't think VPP 
> > will preempt one node
> > for another so that is done. The atomic update is relatively
> > easy with basic barriers, either from FD.IO, DPDK, or native compiler 
> > operations. But
> > is there an API to have a quiescent period marker in the main VPP vector 
> > scheduler?
> > 
> > Something like the QSBR model of Userspace RCU library.
> > (each thread calls rcu_queiscent_state() periodically)
> > would be ideal.
> > 
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Links: You receive all messages sent to this group.
> > 
> > View/Reply Online (#11023): https://lists.fd.io/g/vpp-dev/message/11023
> > Mute This Topic: https://lists.fd.io/mt/27785182/675152
> > Group Owner: vpp-dev+ow...@lists.fd.io
> > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> > -=-=-=-=-=-=-=-=-=-=-=-  
> 

No reader-writer locks are 100's of times slower.  In fact reader write locks
are slower than normal spin lock.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11026): https://lists.fd.io/g/vpp-dev/message/11026
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP and RCU?

2018-10-29 Thread Florin Coras
Hi Stephen, 

No, we don’t support RCU. Wouldn’t rw-locks be enough to support your usecases?

Florin

> On Oct 29, 2018, at 12:40 PM, Stephen Hemminger  
> wrote:
> 
> Is it possible to do Read Copy Update with VPP? Either using Userspace RCU 
> (https://librcu.org)
> or manually. RCU is very efficient way to handle read mostly tables and other 
> dynamic cases
> such as plugins.
> 
> The several things that are needed are non-preempt, atomic update of a 
> pointer and a mechanism to be sure
> all active threads have gone through a quiescent period. I don't think VPP 
> will preempt one node
> for another so that is done. The atomic update is relatively
> easy with basic barriers, either from FD.IO, DPDK, or native compiler 
> operations. But
> is there an API to have a quiescent period marker in the main VPP vector 
> scheduler?
> 
> Something like the QSBR model of Userspace RCU library.
> (each thread calls rcu_queiscent_state() periodically)
> would be ideal.
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#11023): https://lists.fd.io/g/vpp-dev/message/11023
> Mute This Topic: https://lists.fd.io/mt/27785182/675152
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11024): https://lists.fd.io/g/vpp-dev/message/11024
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] VPP and RCU?

2018-10-29 Thread Stephen Hemminger
Is it possible to do Read Copy Update with VPP? Either using Userspace RCU 
(https://librcu.org)
or manually. RCU is very efficient way to handle read mostly tables and other 
dynamic cases
such as plugins.

The several things that are needed are non-preempt, atomic update of a pointer 
and a mechanism to be sure
all active threads have gone through a quiescent period. I don't think VPP will 
preempt one node
for another so that is done. The atomic update is relatively
easy with basic barriers, either from FD.IO, DPDK, or native compiler 
operations. But
is there an API to have a quiescent period marker in the main VPP vector 
scheduler?

Something like the QSBR model of Userspace RCU library.
(each thread calls rcu_queiscent_state() periodically)
would be ideal.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11023): https://lists.fd.io/g/vpp-dev/message/11023
Mute This Topic: https://lists.fd.io/mt/27785182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-