> On 30 May 2020, at 9:43 pm, Vitaliy Makkoveev <[email protected]>
> wrote:
>
>
>> On 30 May 2020, at 09:40, David Gwynne <[email protected]> wrote:
>>
>> On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote:
>>> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote:
>>>>> On 23 May 2020, at 12:54, Martin Pieuchot <[email protected]> wrote:
>>>>> On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote:
>>>>>> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote:
>>>>>>> [...]
>>>>>>> can you try the following diff?
>>>>>>>
>>>>>>
>>>>>> I tested this diff and it works for me. But the problem I pointed is
>>>>>> about pipex(4) locking.
>>>>>>
>>>>>> pipex(4) requires NET_LOCK() be grabbed not only for underlaying
>>>>>> ip{,6}_output() but for itself too. But since pppac_start() has
>>>>>> unpredictable behavior I suggested to make it predictable [1].
>>>>>
>>>>> What needs the NET_LOCK() in their? We're talking about
>>>>> pipex_ppp_output(), right? Does it really need the NET_LOCK() or
>>>>> the KERNEL_LOCK() is what protects those data structures?
>>>>
>>>> Yes, about pipex_ppp_output() and pipex_output(). Except
>>>> ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed,
>>>> they can be replaced by ip{,6}_send().
>>>
>>> Locks protect data structures, you're talking about functions, which
>>> data structures are serialized by this lock? I'm questioning whether
>>> there is one.
>>>
>>>> [...]
>>>>> In case of pipex(4) is isn't clear that the NET_LOCK() is necessary.
>>>>
>>>> I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s
>>>> accessed through `pr_input???. Is NET_LOCK() required for this case?
>>>
>>> pipex(4) like all the network stack has been wrapped in the NET_LOCK()
>>> because it was easy to do. That means it isn't a concious decision or
>>> design. The fact that pipex(4) code runs under the NET_LOCK() is a side
>>> effect of how the rest of the stack evolved. I'm questioning whether
>>> this lock is required there. In theory it shouldn't. What is the
>>> reality?
>>
>> pipex and pppx pre-date the NET_LOCK, which means you can assume
>> that any implicit locking was and is done by the KERNEL_LOCK. mpi is
>> asking the right questions here.
>>
>> As for the ifq maxlen difference between pppx and pppac, that's more
>> about when and how quickly they were written more than anything else.
>> The IFQ_SET_MAXLEN(&ifp->if_snd, 1) in pppx is because that's a way to
>> bypass transmit mitigation for pseudo/virtual interfaces. That was the
>> only way to do it historically. It is not an elegant hack to keep
>> hold of the NET_LOCK over a call to a start routine.
>>
>> As a rule of thumb, network interface drivers should not (maybe
>> cannot) rely on the NET_LOCK in their if_start handlers. To be
>> clear, they should not rely on it being held by the network stack
>> when if_start is called because sometimes the stack calls it without
>> holding NET_LOCK, and they should not take it because they might
>> be called by the stack when it is being held.
>>
>> Also, be aware that the ifq machinery makes sure that the start
>> routine is not called concurrently or recursively. You can queue
>> packets for transmission on an ifq from anywhere in the kernel at
>> any time, but only one cpu will run the start routine. Other cpus
>> can queue packets while another one is running if_start, but the
>> first one ends up responsible for trying to transmit it.
>>
>> ifqs also take the KERNEL_LOCK before calling if_start if the interface
>> is not marked as IFXF_MPSAFE.
>>
>> The summary is that pppx and pppac are not marked as mpsafe so their
>> start routines are called with KERNEL_LOCK held. Currently pppx
>> accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't
>> rely on it.
>>
>> Cheers,
>> dlg
>>
>
> Thanks for explanation.
> Will you commit diff you posted in this thread?
Yes, I'm doing that now.
Thanks for testing it btw.
dlg