On 17.03.22 16:24, Scott Reed wrote:
> 
> 
> On 3/16/22 11:35 AM, Jan Kiszka wrote:
>> On 16.03.22 10:58, Scott Reed wrote:
>>>
>>>
>>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>>
>>>>
>>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>>
>>>>>>
>>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>>
>>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>>
>>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>>> /var/log/messages.
>>>>>>>>>
>>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
>>>>>>>>> 5.4.151
>>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>>
>>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
>>>>>>>>> MSI
>>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
>>>>>>>>> MAC.
>>>>>>>>>
>>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
>>>>>>>>> Also
>>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>>> interrupt
>>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
>>>>>>>>> 5.4.151,
>>>>>>>>> but
>>>>>>>>> see the same hang.
>>>>>>>>
>>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, that commit is also missing from the last tagged 5.4
>>>>>>>> ipipe
>>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>>>>>>>> instead.
>>>>>>>
>>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>>> did not
>>>>>>> help (hang still occurs with first interrupt).
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>>
>>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>>> and I-pipe?
>>>>>>>>>
>>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>>> the problem. Would this be recommended?
>>>>>>>>
>>>>>>>> If you can migrate your test with reasonable effort, yes,
>>>>>>>> definitely.
>>>>>>>
>>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
>>>>>>> that
>>>>>>> it will not be too much effort and report back.
>>>>>>
>>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
>>>>>> first
>>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
>>>>>> kernel
>>>>>> on my platform.
>>>>>>
>>>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>>>> i.MX 6 is not working (cannot ping in or out).
>>>>>
>>>>> Do you have or did you have any custom patches on top?
>>>>
>>>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>>>      μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>>>
>>>>>
>>>>>>
>>>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>>>> example, the source MAC address should be
>>>>>>     00:09:cc:02:c1:b6
>>>>>> but is
>>>>>>     00:01:cc:02:01:36 or
>>>>>>     00:09:cc:02:c1:36
>>>>>> Wireshark also complains about the Frame check sequence
>>>>>> ([FCS Status: Unverified]
>>>>>>
>>>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>>>> ipipe before running into the original problem posted (with ipipe
>>>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>>>> ipipe, I do not see any issues).
>>>>>>
>>>>>> As mentioned, I first saw this problem a while ago when trying
>>>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>>>> then backed back down to 4.14.62+ipipe which works.
>>>>>>
>>>>>> I guess my next strategy is to try to figure out what changed
>>>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>>>> the hang as I hope the delta between them is not too large.
>>>>>>
>>>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>>>
>>>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>>>
>>>> OK. Will do.
>>>
>>> I was able to run my test where the system hangs on the first
>>> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>>> unfortunately see the same behavior (system hangs).
>>>
>>> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>>> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
>>> the system hangs on the first PCIe MSI interrupt.
>>>
>>> As mentioned before, I first observed this behavior when moving from
>>> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>>> into what changed in this time frame. My goal is still to move to
>>
>> Yes, that might be a way now to try to find the root cause. Problem: you
>> can't do bisection easily because of the merges with the I-pipe patch.
>> Therefore, it can be easier to actually debug where the system hangs, on
>> what. With some traces from there, it can then be simpler again to
>> analyse the differences between to working and non-working 4.14 kernels.
>>
> 
> I have been able to get my test running on 4.14.110+ipipe without the
> system hanging on the first PCIe MSI interrupt. I have attached my
> patch (hopefully the attachment shows up correctly, but if not
> please let me know).
> 
> The fix is to replace in the PCIe MSI interrupt handler the call
> to generic_handle_irq() with ipipe_handle_demuxed_irq.

Great to hear! Looks a lot like
https://source.denx.de/Xenomai/ipipe-noarch/-/commit/578e2cbf69ce8e22546423d403cda4a438d0751f

> 
> Actually, I had already made this patch on my 4.14.62 system
> in combination with a patch to make the PCIe driver an RTDM
> driver (see [1]) to address latency issues. As this patch was for
> a latency issue on 4.14.62 and not a hang, I did not immediately
> think about the ipipe part of the patch being the fix for the
> hang I was seeing when moving to 4.14.110+ipipe.
> 
> I will now check if the same/similar patch fixes my original
> hang on 5.4.151+ipipe.
> 
> Would it make sense to integrate this patch into next ipipe release?
> 

Yep. Please prepare an official patch once done with testing. I will add
it to ipipe-noarch, and then the architecture trees (relevant for arm &
arm64) can pick it up.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux

Reply via email to