On 05.03.21 12:29, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kis...@siemens.com> writes:
> 
>> On 05.03.21 10:34, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kis...@siemens.com> writes:
>>>
>>>> On 01.03.21 17:53, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kis...@siemens.com> writes:
>>>>>
>>>>>> On 25.02.21 15:18, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kis...@siemens.com> writes:
>>>>>>>
>>>>>>>> On 25.02.21 14:54, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kis...@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 24.02.21 12:35, Henning Schild via Xenomai wrote:
>>>>>>>>>>> Am Wed, 24 Feb 2021 11:24:55 +0100
>>>>>>>>>>> schrieb Henning Schild via Xenomai <xenomai@xenomai.org>:
>>>>>>>>>>>
>>>>>>>>>>>> Am Wed, 10 Feb 2021 12:08:43 +0100
>>>>>>>>>>>> schrieb Jan Kiszka via Xenomai <xenomai@xenomai.org>:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 10.02.21 11:07, Bezdeka, Florian (T RDA IOT SES-DE) wrote:  
>>>>>>>>>>>>>> On Wed, 2021-02-10 at 09:15 +0100, Jan Kiszka via Xenomai wrote:
>>>>>>>>>>>>>>   
>>>>>>>>>>>>>>> On 10.02.21 07:22, xenomai--- via Xenomai wrote:    
>>>>>>>>>>>>>>>> Download URL:
>>>>>>>>>>>>>>>> https://xenomai.org/downloads/ipipe/v4.x/arm64/ipipe-core-4.19.165-cip41-arm64-09.patch
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Repository: https://git.xenomai.org/ipipe-arm64
>>>>>>>>>>>>>>>> Release tag: ipipe-core-4.19.165-cip41-arm64-09
>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hmm, now we have the 5.4-arm64 issue also on 4.19:
>>>>>>>>>>>>>>> https://gitlab.denx.de/Xenomai/xenomai-images/-/jobs/219984
>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know much about the things going on here, but found this
>>>>>>>>>>>>>> line in the log. Maybe a starting point...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2021-02-10T07:51:47 setsched.c:120, assertion failed: stats.msw 
>>>>>>>>>>>>>> ==
>>>>>>>>>>>>>> msw   
>>>>>>>>>>>>>
>>>>>>>>>>>>> Exactly, that is causing the overall failure. And it was first 
>>>>>>>>>>>>> seen
>>>>>>>>>>>>> with the newly added 5.4 kernel.  
>>>>>>>>>>>>
>>>>>>>>>>>> Seing the same on amd64 when testing on qemu, real HW is fine.
>>>>>>>>>>>>
>>>>>>>>>>>> Managed to bisect it down to 4.19.147-cip (good) 4.19.150-cip (bad)
>>>>>>>>>>>>
>>>>>>>>>>>> Which also means that ipipe-core-4.19.152-cip37-x86-15 is affected.
>>>>>>>>>>>>
>>>>>>>>>>>> https://gitlab.denx.de/Xenomai/xenomai-images/-/jobs/200646
>>>>>>>>>>>> did not find it, so maybe our config differs
>>>>>>>>>>
>>>>>>>>>> Already compared yours against the one in xenomai-images? That would 
>>>>>>>>>> be
>>>>>>>>>> useful.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Digging further i found 0f0b6099c45ff3e06d2487816cf1ff30d21835f6 
>>>>>>>>>>> likely
>>>>>>>>>>> causing the problem.
>>>>>>>>>>>
>>>>>>>>>>> ipipe-core-4.19.152-cip37-x86-15 <- bad
>>>>>>>>>>> revert 2b294ac325c7ce3f36854b74d0d1d89dc1d1d8b8
>>>>>>>>>>> revert 8579a0440381353e0a71dd6a4d4371be8457eac4 <- bad
>>>>>>>>>>> revert 0f0b6099c45ff3e06d2487816cf1ff30d <- good
>>>>>>>>>>>
>>>>>>>>>>> I think here Jan or Phillipe should take over.
>>>>>>>>>>
>>>>>>>>>> Thanks for bisecting, this is helpful!
>>>>>>>>>>
>>>>>>>>>> Philippe, any immediate idea why all that is failing now?
>>>>>>>>>
>>>>>>>>> Something may be going wrong with MAP_SHARED mappings wrt commit_vma()
>>>>>>>>> in Dovetail. I'm adding this to my debug queue.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is still I-pipe, not a dovetail-related issues.
>>>>>>>
>>>>>>> This I-pipe release mimics what Dovetail does wrt mm pinning.
>>>>>>>
>>>>>>
>>>>>> Any news on this from your side?
>>>>>>
>>>>>
>>>>> No time slot for working on this yet. High multiplexing rate ATM.
>>>>>
>>>>
>>>> I reproduced the issue on qemu-arm64 (xenomai-images exposes it
>>>> directly), and I'm testing a fix.
>>>>
>>>> Brief summary:
>>>> Removal of un-COW support was a mistake. We will continue to require it
>>>> because it not only affects the child (where the removal argumentation
>>>> was targeting), but it also prevents that shared pages - even if locked
>>>> - on a RT parent suddenly become read-only.
>>>>
>>>> Expect some patches later today.
>>>
>>> The best fix is not add that ugly code back, but rather make VMA commit
>>> code work with shared mappings.
>>>
>>
>> What exactly do you mean?
>>
>> We must avoid that shared pages (with the child) become read-only on the
>> parent. How to do that other than un-COWing?
>>
> 
> The issue is not with un-COW is obviously the only thing to do, but
> rather with how and where this is done. The way it used to be done when
> copying the PTEs led to several conflicts and subtle breakages due to
> upstream changes over time. Hopefully a better implementation is
> possible.
> 

Do you have one at hand or can guide how to write that?

Otherwise, I would suggest to restore the code to fix the regression and
clean up later. FWIW, I'll throw my current 4.19 fix on the list in a
minute.

Jan

Reply via email to