On Fri, Apr 28, 2023 at 02:51:23PM +0300, Vitaliy Makkoveev wrote:
> > On 28 Apr 2023, at 14:03, Alexander Bluhm <[email protected]> wrote:
> >
> > That basically means we must never call one of the pool get or put
> > functions with kernel lock. This may be the case currently. But
> > at this stage while we are pushing locks around the network code,
> > I would like to keep it as it is.
> >
> > Allowing net interrupts during route pool get mutex gains nearly
> > nothing. It makes placing a kernel lock in our transition process
> > harder.
> >
> > bluhm
>
> This is not true. The IPL is not kerne lock or MP related. Each pool(9)
> uses its own mutex(9) or rwlock(9) for protection.
mpi@ told me the following background of IPL_MPFLOOR:
The reasoning is lock ordering with mutex and kernel lock. If you
take mutex sometimes with and sometimes without kernel lock the
following can happen.
- cpu 1 takes kernel lock
- cpu 1 takes mutex
- cpu 2 takes mutex without kernel lock
- cpu 2 is interrupted
- cpu 2 interrupt handler takes kernel lock
And here you have a deadlock. There are three solutions
1. always take mutex with kernel lock
2. always take mutex without kernel lock
3. set spl in mutex to IPL_MPFLOOR
3 works as all interrupt handlers with higher priority than IPL_MPFLOOR
do not take kernel lock. All other interrupts are blocked during
the critical section of the mutex.
When we totally unlock all IPL_SOFTTTY and IPL_NET interrupt handlers
IPL_MPFLOOR can be moved to lower priority.
Fo pool_init the same rules as for mtx_enter() apply as pool_get()
and pool_put() take the pool mutex internally.
As we were in a mixed kernel lock environment I have chosen solution
3 and added the IPL_MPFLOOR in rev 1.404 a year ago.
bluhm
> This depends on
> PR_RWLOCK flag passed to pool_init(9). The spl*(9) only disables
> interrupts with level less or equal priority on this CPU. Non system
> wide. On the uniprocessor machine you can???t spin in mutex(9) because it
> is the deadlock, so you only disable interrupts within mutex(9) protected
> critical section. But SMP machine is the different case. In the ancient
> MP times interrupts were enabled on boot cpu only, so spl*(9) really
> provided protection, but since interrupts were enabled on application
> CPUs too this stopped to work, because the interrupts you disabled on
> this cpu could happen on any other. IPL_MPFLOOR has very high level and
> I guess it was chosen mostly for insurance.
>
> We do rt allocation and release in the sockets, ioctl and softnet
> context, so I see no reason to disable interrupts with priority greater
> than IPL_SOFTNET.
>
>
> >
> > On Thu, Apr 27, 2023 at 02:18:11AM +0300, Vitaliy Makkoveev wrote:
> >> Index: sys/net/route.c
> >> ===================================================================
> >> RCS file: /cvs/src/sys/net/route.c,v
> >> retrieving revision 1.418
> >> diff -u -p -r1.418 route.c
> >> --- sys/net/route.c 26 Apr 2023 16:09:44 -0000 1.418
> >> +++ sys/net/route.c 26 Apr 2023 23:00:02 -0000
> >> @@ -176,7 +176,7 @@ route_init(void)
> >> {
> >> rtcounters = counters_alloc(rts_ncounters);
> >>
> >> - pool_init(&rtentry_pool, sizeof(struct rtentry), 0, IPL_MPFLOOR, 0,
> >> + pool_init(&rtentry_pool, sizeof(struct rtentry), 0, IPL_SOFTNET, 0,
> >> "rtentry", NULL);
> >>
> >> while (rt_hashjitter == 0)
> >
>