Am 05.11.2010 00:24, Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Am 04.11.2010 23:06, Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >>>>>> At first sight, here you are more breaking things than cleaning them. >>>>> Still, it has the SMP record for my test program, still runs with ftrace >>>>> on (after 2 hours, where it previously failed after maximum 23 minutes). >>>> My version was indeed still buggy, I'm reworking it ATM. >>>> >>>>> If I get the gist of Jan's changes, they are (using the IPI to transfer >>>>> one bit of information: your cpu needs to reschedule): >>>>> >>>>> xnsched_set_resched: >>>>> - setbits((__sched__)->status, XNRESCHED); >>>>> >>>>> xnpod_schedule_handler: >>>>> + xnsched_set_resched(sched); >>>>> >>>>> If you (we?) decide to keep the debug checks, under what circumstances >>>>> would the current check trigger (in laymans language, that I'll be able >>>>> to understand)? >>>> That's actually what /me is wondering as well. I do not see yet how you >>>> can reliably detect a missed reschedule reliably (that was the purpose >>>> of the debug check) given the racy nature between signaling resched and >>>> processing the resched hints. >>> The purpose of the debugging change is to detect a change of the >>> scheduler state which was not followed by setting the XNRESCHED bit. >> >> But that is nucleus business, nothing skins can screw up (as long as >> they do not misuse APIs). > > Yes, but it happens that we modify the nucleus from time to time. > >> >>> Getting it to work is relatively simple: we add a "scheduler change set >>> remotely" bit to the sched structure which is NOT in the status bit, set >>> this bit when changing a remote sched (under nklock). In the debug check >>> code, if the scheduler state changed, and the XNRESCHED bit is not set, >>> only consider this a but if this new bit is not set. All this is >>> compiled out if the debug is not enabled. >> >> I still see no benefit in this check. Where to you want to place the bit >> set? Aren't that just the same locations where >> xnsched_set_[self_]resched already is today? > > Well no, that would be another bit in the sched structure which would > allow us to manipulate the status bits from the local cpu. That > supplementary bit would only be changed from a distant CPU, and serve to > detect the race which causes the false positive. The resched bits are > set on the local cpu to get xnpod_schedule to trigger a rescheduling on > the distance cpu. That bit would be set on the remote cpu's sched. Only > when debugging is enabled. > >> >> But maybe you can provide some motivating bug scenarios, real ones of >> the past or realistic ones of the future. > > Of course. The bug is anything which changes the scheduler state but > does not set the XNRESCHED bit. This happened when we started the SMP > port. New scheduling policies would be good candidates for a revival of > this bug. >
You don't gain any worthwhile check if you cannot make the instrumentation required for a stable detection simpler than the proper problem solution itself. And this is what I'm still skeptical of. Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core