Am 05.11.2010 00:24, Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Am 04.11.2010 23:06, Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>>>> At first sight, here you are more breaking things than cleaning them.
>>>>> Still, it has the SMP record for my test program, still runs with ftrace 
>>>>> on (after 2 hours, where it previously failed after maximum 23 minutes).
>>>> My version was indeed still buggy, I'm reworking it ATM.
>>>>
>>>>> If I get the gist of Jan's changes, they are (using the IPI to transfer 
>>>>> one bit of information: your cpu needs to reschedule):
>>>>>
>>>>> xnsched_set_resched:
>>>>> -      setbits((__sched__)->status, XNRESCHED);
>>>>>
>>>>> xnpod_schedule_handler:
>>>>> + xnsched_set_resched(sched);
>>>>>   
>>>>> If you (we?) decide to keep the debug checks, under what circumstances 
>>>>> would the current check trigger (in laymans language, that I'll be able 
>>>>> to understand)?
>>>> That's actually what /me is wondering as well. I do not see yet how you
>>>> can reliably detect a missed reschedule reliably (that was the purpose
>>>> of the debug check) given the racy nature between signaling resched and
>>>> processing the resched hints.
>>> The purpose of the debugging change is to detect a change of the
>>> scheduler state which was not followed by setting the XNRESCHED bit.
>>
>> But that is nucleus business, nothing skins can screw up (as long as
>> they do not misuse APIs).
> 
> Yes, but it happens that we modify the nucleus from time to time.
> 
>>
>>> Getting it to work is relatively simple: we add a "scheduler change set
>>> remotely" bit to the sched structure which is NOT in the status bit, set
>>> this bit when changing a remote sched (under nklock). In the debug check
>>> code, if the scheduler state changed, and the XNRESCHED bit is not set,
>>> only consider this a but if this new bit is not set. All this is
>>> compiled out if the debug is not enabled.
>>
>> I still see no benefit in this check. Where to you want to place the bit
>> set? Aren't that just the same locations where
>> xnsched_set_[self_]resched already is today?
> 
> Well no, that would be another bit in the sched structure which would
> allow us to manipulate the status bits from the local cpu. That
> supplementary bit would only be changed from a distant CPU, and serve to
> detect the race which causes the false positive. The resched bits are
> set on the local cpu to get xnpod_schedule to trigger a rescheduling on
> the distance cpu. That bit would be set on the remote cpu's sched. Only
> when debugging is enabled.
> 
>>
>> But maybe you can provide some motivating bug scenarios, real ones of
>> the past or realistic ones of the future.
> 
> Of course. The bug is anything which changes the scheduler state but
> does not set the XNRESCHED bit. This happened when we started the SMP
> port. New scheduling policies would be good candidates for a revival of
> this bug.
> 

You don't gain any worthwhile check if you cannot make the
instrumentation required for a stable detection simpler than the proper
problem solution itself. And this is what I'm still skeptical of.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to