On 14/04/20(Tue) 10:47, Claudio Jeker wrote:
> On Tue, Apr 14, 2020 at 10:08:54AM +0200, Martin Pieuchot wrote:
> > Thanks for all the inputs, updated diff below.
> > 
> > On 13/04/20(Mon) 03:20, Alexandr Nedvedicky wrote:
> > > On Sun, Apr 12, 2020 at 07:02:43PM +0200, Mark Kettenis wrote:
> > > > > From: "Theo de Raadt" <[email protected]>
> > > > > Date: Sun, 12 Apr 2020 10:28:59 -0600
> > > > > 
> > > > > > +     if ((p->p_flag & P_SYSTEM) &&
> > > > > > +         (strncmp(p->p_p->ps_comm, "softnet", 7) == 0))
> > > > > 
> > > > > Wow that is ugly.  
> > > > 
> > > > A better approach might be to store a pointer to the softnet task's
> > > > struct proc in a global variable and check that.  That is what we do
> > > > for the pagedaemon for example.
> > 
> > Diff below implements that by introducing the in_taskq() function, any
> > comment on that approach?
> > 
> > >     I'm not sure same thing would work for network task. Currently
> > >     there is a single instance of pagedaemon, however we hope to
> > >     have more network tasks running in parallel.
> > 
> > Let's concentrate on the present.  More work is required to have
> > multiple network tasks running in parallel anyway ;)
> > 
> > Diff below uses NET_RLOCK_IN_SOFTNET() with a corresponding KASSERT()
> > and converts all the places where tasks are enqueued on the "softnet"
> > taskq.
> > 
> > Any other comment or concern?
> 
> I'm very concerned about this. I think this is a complicated workaround
> that will haunt us later on. I don't really understand what goes on there
> and the names don't help me that much. It just makes this even more
> confusing.

Thanks for raising your concern.

In November 2017 we introduced a reader version of the NET_LOCK().
The first goal was to allow ifconfig(8) to not block on each and every
ioctl(2) it does.  This, combined with the conversion of various ioctls
to take a read lock improved the responsiveness of ifconfig(8) under
load.

This designed has been working since then (6.2-current to 6.6-current)
and allowed us to make progress by auditing some parts of the ioctl(2)
and later sysctl(2) paths and mark them as "read only". 

During this release cycle, and after 4 releases working with the above
mentioned designed, a programming mistake has been made.  This mistake
has, in my opinion, nothing to do with design.  However it made it clear
that we do not understand what the current architecture is.  So the goal
of this renaming is to explicitly state what the current locks are for.

> Shouldn't we fix the network stack so that it actually can use read locks
> and with that multiple concurrent threads instead? Here the main problem
> seems to be that solock needs to finally grow up to a real individual lock.

Sure we can.  This isn't something orthogonal to this diff.

However throwing this diff away and going back to a single NET_LOCK()
everywhere would, in a way, disregard the efforts we've put in auditing,
refactoring and marking part of it as "read only". 

> Are there other issues with multiple softnet task that we know off?

The one we know are easy:

  - pf(4)
  - pfsync(4)

These one could be differed to a different context or limited to a
single thread if pf(4) is ready:
  - ipsec(4)
  - socket delivery

But honestly as long as the signal handling and select/poll/kqueue
aren't KERNEL_LOCK()-free trying to use multiple threads in the network
doesn't seem the best way forward to me.  The fact that socket delivery
is still somehow serialized might hide bugs.  So going multi-threads
before that seems like a way to make bugs harder to fix.

On top of that, I'd also argue that the priority should be pushing the
limits of the KERNEL_LOCK().  I don't see the point for OpenBSD, a 
general purpose OS, to have a fully parallel network stack if the rest
of the kernel is still big locked.  So, IMHO network hackers should
better concentrate on:

 - unix sockets
 - routing sockets
 - interrupt handlers of !IPL_MPSAFE drivers
 - ARP
 - pfsync, pipex, etc.
 - physical ioctl path


Is it clearer?

Reply via email to