On Mon, Nov 28, 2016 at 4:29 PM, Nick Hudson <sk...@netbsd.org> wrote:
> On 08/13/16 14:27, Ryota Ozaki wrote:
>>
>> On Fri, Aug 12, 2016 at 11:04 PM, Nick Hudson <sk...@netbsd.org> wrote:
>>>
>>> On 07/28/16 10:03, Ryota Ozaki wrote:
>>>>
>>>> Module Name:    src
>>>> Committed By:   ozaki-r
>>>> Date:           Thu Jul 28 09:03:51 UTC 2016
>>>>
>>>> Modified Files:
>>>>          src/sys/netinet: if_arp.c in.c
>>>>          src/sys/netinet6: in6.c nd6_nbr.c
>>>>
>>>> Log Message:
>>>> Fix panic on adding/deleting IP addresses under network load
>>>>
>>>> Adding and deleting IP addresses aren't serialized with other network
>>>> opeartions, e.g., forwarding packets. So if we add or delete an IP
>>>> address under network load, a kernel panic may happen on manipulating
>>>> network-related shared objects such as rtentry and rtcache.
>>>>
>>>> To avoid such panicks, we still need to hold softnet_lock in in_control
>>>> and in6_control that are called via ioctl and do network-related
>>>> operations
>>>> including IP address additions/deletions.
>>>>
>>>> Fix PR kern/51356
>>>
>>>
>>> Hi,
>>>
>>> This is contributory to the problems in
>>>
>>>      http://gnats.netbsd.org/49065
>>>
>>>      http://gnats.netbsd.org/50491
>>>
>>>      http://gnats.netbsd.org/51395
>>>
>>> Where softnet_lock is held by something that sleeps, e.g. a usb transfer.
>>>
>>>      http://mail-index.netbsd.org/tech-net/2015/12/06/msg005443.html
>>>
>>> This patch
>>>
>>>      http://www.netbsd.org/~skrll/usb.softint.diff
>>>
>>> helps, but I'm not sure it deals with all the problems in the network
>>> stack.
>>> Is this something you intend to address?
>>
>> No. The commit prevents parallel accesses on shared data (rtentry, ifaddr,
>> etc.). The issue of USB transfers seem to be a deadlock between softints
>> of the network stack and USB interrupt processing.
>>
>> I think we can commit your patch if it fixes the PRs and doesn't break
>> anything.
>>
>> Of course we should get rid of softnet_lock at some point.
>>
>> Thanks,
>>    ozaki-r
>>
> This is hurting me again.
>
> Can you, or someone else at iij, explain the plan to allow NET_MPSAFE to
> be enabled by default.  Perhaps others can help if there are clear steps.

AFAIK, we need to do the following tasks:
(1) MP-ification of the routing table
(2) MP-ification of IPv6-specific stuffs
    (nd_defrouter, nd_prefix, etc.)
(3) MP-ification of some components (needed by IIJ)
    (pppoe, bpf, vlan, ipsec, opencrypto and pfil.)
(4) Restructuring for MP-ification of bpf
    - Move bpf_mtap in drivers to softint (percpu if_input)
    - Prevent bpf_mtap from being called in if_start
      that runs probably in interrupt context
    - Make the ieee80211 stack and wireless drivers
      run in softint
      - They call bpf_mtap variants
(5) Adding KERNEL_LOCK (and/or softnet_lock) to pr_input
    (input routine from Layer 3 to Layer 4) if needed
(6) Make statistical counters per-CPU
(7) Make some print functions MP-safe
    - Stop using static local buffers
(8) Adding KERNEL_LOCK (and/or softnet_lock) to
    non-MP-safe components (ALTQ, pf/ipf, tun/tap, and many others)
    (or MP-ifying them)

I'm working on (1) and will propose a patch soon. We'll work on (2)-(7)
in (half of) a year. We don't have time to do (8).

(3) and (4) are not must for enabling NET_MPSAFE and instead adding
KERNEL_LOCK is probably enough. And also (6) and (7) are not critical
and we can do it later.

So it's helpful for us (IIJ) that someone works on (8). (Of course,
working on other tasks is also welcome.)

Regards,
  ozaki-r

Reply via email to