On Wed, Apr 21, 2021 at 09:36:11PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> For a while we are running network without kernel lock, but with a
> network lock.  The latter is an exclusive sleeping rwlock.
> 
> It is possible to run the forwarding path in parallel on multiple
> cores.  I use ix(4) interfaces which provide one input queue for
> each CPU.  For that we have to start multiple softnet tasks and
> replace the exclusive lock with a shared lock.  This works for IP
> and IPv6 input and forwarding, but not for higher protocols.
> 
> So I implement a queue between IP and higher layers.  We had that
> before when we were using netlock for IP and kernel lock for TCP.
> Now we have shared lock for IP and exclusive lock for TCP.  By using
> a queue, we can upgrade the lock once for multiple packets.
> 
> As you can see here, forwardings performance doubles from 4.5x10^9
> to 9x10^9 .  Left column is current, right column is with my diff.
> The other dots at 2x10^9 are with socket splicing which is not
> affected.
> http://bluhm.genua.de/perform/results/2021-04-21T10%3A50%3A37Z/gnuplot/forward.png
> 
> Here are all numbers with various network tests.
> http://bluhm.genua.de/perform/results/2021-04-21T10%3A50%3A37Z/perform.html
> TCP performance gets less deterministic due to the addition queue.
> 
> Kernel stack flame graph looks like this.  Machine uses 4 CPU.
> http://bluhm.genua.de/files/kstack-multiqueue-forward.svg
> 
> Note the kernel lock around nd6_resolve().  I hat to put it there
> as I have seen an MP related crash there.  This can be fixed
> independently of this diff.
> 
> We need more MP preassure to find such bugs and races.  I think now
> is a good time to give this diff broader testing and commit it.
> You need interfaces with multiple queues to see a difference.
> 
> ok?
> 
> bluhm
>

Hi.

Did you tested your diff with ipsec(4) enabled? I'm asking because we
have this in net/pfkeyv2.c:

1108 pfkeyv2_send(struct socket *so, void *message, int len)
1109 {
    ....
2013                     ipsec_in_use++;
2014                     /*
2015                      * XXXSMP IPsec data structures are not ready to be
2016                      * accessed by multiple Network threads in parallel,
2017                      * so force all packets to be processed by the first
2018                      * one.
2019                      */
2020                     extern int nettaskqs;
2021                     nettaskqs = 1;

Reply via email to