On 3/17/22 5:22 PM, Scott Reed via Xenomai wrote:
>
>
> On 3/17/22 4:44 PM, Jan Kiszka wrote:
>> On 17.03.22 16:24, Scott Reed wrote:
>>>
>>>
>>> On 3/16/22 11:35 AM, Jan Kiszka wrote:
>>>> On 16.03.22 10:58, Scott Reed wrote:
>>>>>
>>>>>
>>>>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>>>>
>>>>>>
>>>>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>>>>
>>>>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>>>>
>>>>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>>>>> /var/log/messages.
>>>>>>>>>>>
>>>>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
>>>>>>>>>>> 5.4.151
>>>>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>>>>
>>>>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
>>>>>>>>>>> MSI
>>>>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
>>>>>>>>>>> MAC.
>>>>>>>>>>>
>>>>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
>>>>>>>>>>> Also
>>>>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>>>>> interrupt
>>>>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
>>>>>>>>>>> 5.4.151,
>>>>>>>>>>> but
>>>>>>>>>>> see the same hang.
>>>>>>>>>>
>>>>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Actually, that commit is also missing from the last tagged 5.4
>>>>>>>>>> ipipe
>>>>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>>>>>>>>>> instead.
>>>>>>>>>
>>>>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>>>>> did not
>>>>>>>>> help (hang still occurs with first interrupt).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>>>>
>>>>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>>>>> and I-pipe?
>>>>>>>>>>>
>>>>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>>>>> the problem. Would this be recommended?
>>>>>>>>>>
>>>>>>>>>> If you can migrate your test with reasonable effort, yes,
>>>>>>>>>> definitely.
>>>>>>>>>
>>>>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
>>>>>>>>> that
>>>>>>>>> it will not be too much effort and report back.
>>>>>>>>
>>>>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
>>>>>>>> first
>>>>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
>>>>>>>> kernel
>>>>>>>> on my platform.
>>>>>>>>
>>>>>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>>>>>> i.MX 6 is not working (cannot ping in or out).
>>>>>>>
>>>>>>> Do you have or did you have any custom patches on top?
>>>>>>
>>>>>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>>>>> μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>>>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>>>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>>>>>> example, the source MAC address should be
>>>>>>>> 00:09:cc:02:c1:b6
>>>>>>>> but is
>>>>>>>> 00:01:cc:02:01:36 or
>>>>>>>> 00:09:cc:02:c1:36
>>>>>>>> Wireshark also complains about the Frame check sequence
>>>>>>>> ([FCS Status: Unverified]
>>>>>>>>
>>>>>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>>>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>>>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>>>>>> ipipe before running into the original problem posted (with ipipe
>>>>>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>>>>>> ipipe, I do not see any issues).
>>>>>>>>
>>>>>>>> As mentioned, I first saw this problem a while ago when trying
>>>>>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>>>>>> then backed back down to 4.14.62+ipipe which works.
>>>>>>>>
>>>>>>>> I guess my next strategy is to try to figure out what changed
>>>>>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>>>>>> the hang as I hope the delta between them is not too large.
>>>>>>>>
>>>>>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>>>>>
>>>>>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>>>>>
>>>>>> OK. Will do.
>>>>>
>>>>> I was able to run my test where the system hangs on the first
>>>>> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>>>>> unfortunately see the same behavior (system hangs).
>>>>>
>>>>> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>>>>> but when I add ipipe and Xenomai 3.2.1 to the kernel, then
>>>>> the system hangs on the first PCIe MSI interrupt.
>>>>>
>>>>> As mentioned before, I first observed this behavior when moving from
>>>>> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>>>>> into what changed in this time frame. My goal is still to move to
>>>>
>>>> Yes, that might be a way now to try to find the root cause. Problem: you
>>>> can't do bisection easily because of the merges with the I-pipe patch.
>>>> Therefore, it can be easier to actually debug where the system hangs, on
>>>> what. With some traces from there, it can then be simpler again to
>>>> analyse the differences between to working and non-working 4.14 kernels.
>>>>
>>>
>>> I have been able to get my test running on 4.14.110+ipipe without the
>>> system hanging on the first PCIe MSI interrupt. I have attached my
>>> patch (hopefully the attachment shows up correctly, but if not
>>> please let me know).
>>>
>>> The fix is to replace in the PCIe MSI interrupt handler the call
>>> to generic_handle_irq() with ipipe_handle_demuxed_irq.
>>
>> Great to hear! Looks a lot like
>> https://source.denx.de/Xenomai/ipipe-noarch/-/commit/578e2cbf69ce8e22546423d403cda4a438d0751f
>>
>>>
>>> Actually, I had already made this patch on my 4.14.62 system
>>> in combination with a patch to make the PCIe driver an RTDM
>>> driver (see [1]) to address latency issues. As this patch was for
>>> a latency issue on 4.14.62 and not a hang, I did not immediately
>>> think about the ipipe part of the patch being the fix for the
>>> hang I was seeing when moving to 4.14.110+ipipe.
>>>
>>> I will now check if the same/similar patch fixes my original
>>> hang on 5.4.151+ipipe.
>>>
>>> Would it make sense to integrate this patch into next ipipe release?
>>>
>>
>> Yep. Please prepare an official patch once done with testing. I will add
>> it to ipipe-noarch, and then the architecture trees (relevant for arm &
>> arm64) can pick it up.
>
> Will do. May take me a day or two to be able to submit an official patch
> as it will be my first official patch.
>
> I will submit (i.e. send to mailing list) the patch on ipipe-noarch:
> ipipe/master which looks like it is currently based on 5.4.179 once
> my testing is complete.
>
> If my understanding is not correct, please let me know.
In addition to testing the patch on 4.14.110 on my system, I have also
successfully tested the patch on 4.19.229 and 5.4.151.
I will be submitting then the patch to ipipe-noarch then shortly. As
this is the first time I am submitting an official patch, if did
something incorrectly, please let me know.
Scott
>
> Scott
>>
>> Jan
>>
>