On 2011-06-18 14:56, Gilles Chanteperdrix wrote:
> On 06/18/2011 02:10 PM, Jan Kiszka wrote:
>> On 2011-06-18 14:09, Gilles Chanteperdrix wrote:
>>> On 06/18/2011 12:21 PM, Jan Kiszka wrote:
>>>> On 2011-06-17 20:55, Gilles Chanteperdrix wrote:
>>>>> On 06/17/2011 07:03 PM, Jan Kiszka wrote:
>>>>>> On 2011-06-17 18:53, Gilles Chanteperdrix wrote:
>>>>>>> On 06/17/2011 04:38 PM, GIT version control wrote:
>>>>>>>> Module: xenomai-jki
>>>>>>>> Branch: for-upstream
>>>>>>>> Commit: 7203b1a66ca0825d5bcda1c3abab9ca048177914
>>>>>>>> URL:    
>>>>>>>> http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=7203b1a66ca0825d5bcda1c3abab9ca048177914
>>>>>>>>
>>>>>>>> Author: Jan Kiszka <jan.kis...@siemens.com>
>>>>>>>> Date:   Fri Jun 17 09:46:19 2011 +0200
>>>>>>>>
>>>>>>>> nucleus: Fix interrupt handler tails
>>>>>>>>
>>>>>>>> Our current interrupt handlers assume that they leave over the same 
>>>>>>>> task
>>>>>>>> and CPU they entered. But commit f6af9b831c broke this assumption:
>>>>>>>> xnpod_schedule invoked from the handler tail can now actually trigger a
>>>>>>>> domain migration, and that can also include a CPU migration. This 
>>>>>>>> causes
>>>>>>>> subtle corruptions as invalid xnstat_exectime_t objects may be restored
>>>>>>>> and - even worse - we may improperly flush XNHTICK of the old CPU,
>>>>>>>> leaving Linux timer-wise dead there (as happened to us).
>>>>>>>>
>>>>>>>> Fix this by moving XNHTICK replay and exectime accounting before the
>>>>>>>> scheduling point. Note that this introduces a tiny imprecision in the
>>>>>>>> accounting.
>>>>>>>
>>>>>>> I am not sure I understand why moving the XNHTICK replay is needed: if
>>>>>>> we switch to secondary mode, the HTICK is handled by xnpod_schedule
>>>>>>> anyway, or am I missing something?
>>>>>>
>>>>>> The replay can work on an invalid sched (after CPU migration in
>>>>>> secondary mode). We could reload the sched, but just moving the replay
>>>>>> is simpler.
>>>>>
>>>>> But does it not remove the purpose of this delayed replay?
>>>>
>>>> Hmm, yes, in the corner case of coalesced timed RT task wakeup and host
>>>> tick over a root thread. Well, then we actually have to reload sched and
>>>> keep the ordering to catch that as well.
>>>>
>>>>>
>>>>> Note that if you want to reload the sched, you also have to shut
>>>>> interrupts off, because upon return from xnpod_schedule after migration,
>>>>> interrupts are on.
>>>>
>>>> That would be another severe bug if we left an interrupt handler with
>>>> hard IRQs enabled - the interrupt tail code of ipipe would break.
>>>>
>>>> Fortunately, only xnpod_suspend_thread re-enables IRQs and returns.
>>>> xnpod_schedule also re-enables but then terminates the context (in
>>>> xnshadow_exit). So we are safe.
>>>
>>> I do not think we are, at least on platforms where context switches
>>> happen with irqs on.
>>
>> Can you sketch a problematic path?
> 
> On platforms with IPIPE_WANT_PREEMPTIBLE_SWITCH on, all context switches
> happens with irqs on. So, in particular, the context switch to a relaxed
> task happens with irqs on. In __xnpod_schedule, we then return from
> xnpod_switch_to with irqs on, and so return from __xnpod_schedule with
> irqs on.

"/* We are returning to xnshadow_relax via
    xnpod_suspend_thread, do nothing,
    xnpod_suspend_thread will re-enable interrupts. */"

Looks like this is outdated. I think we best fix this in
__xnpod_schedule by disabling irqs there instead of forcing otherwise
redundant disabling into all handler return paths.

> 
> Maybe in the irq handlers, we should skip the XNHTICK replay, when
> current_domain is root_domain.
> 

That would be against the purpose of the XNTICK replay (it only targets
that particular case). And it would still leave us with broken ipipe due
to enabled IRQs on return from the Xenomai handlers.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to