Jan,

I'm very much in favor of providing a way to prevent Xenomai modules from using 
features which can result in deadlock, if there is a clean way to detect such a 
situation.

We used gettimeofday in one of our modules and it mostly worked great. But once 
in a great while the system would deadlock. Most calls to gettimeofday are 
benign and appear to work normally, which is why it is especially problematic. 
It would have saved some debug cycles if there was a kernel log message to warn 
us of our danger.

Or perhaps we could collect a blacklist of references which will produce 
warnings when linking a Xenomai module. All of these things are 'nice to have' 
but certainly not urgent matters.

Regards,
Kendall

> -----Original Message-----
> From: Xenomai <xenomai-boun...@xenomai.org> On Behalf Of Jan Kiszka
> via Xenomai
> Sent: Wednesday, December 19, 2018 4:45 AM
> To: Philippe Gerum <r...@xenomai.org>; Lange Norbert
> <norbert.la...@andritz.com>; Xenomai (xenomai@xenomai.org)
> <xenomai@xenomai.org>
> Subject: Re: Cobalt Preemption of kernel update_fast_timekeeper can cause
> deadlocks
> 
> On 19.12.18 13:09, Philippe Gerum via Xenomai wrote:
> > On 12/19/18 11:20 AM, Lange Norbert via Xenomai wrote:
> >> There is a deadlock issue that haunted me for several weeks, it is
> >> caused by the kernels update of the user-visible timekeeping
> >> structures used by the VDSO clock_gettime functions.
> >>
> >> The kernel regularly updates a Timestamp structure, which is
> >> accessible in user-mode, it does so by something akin to a rw-lock in
> update_fast_timekeeper.
> >>
> >> If cobalt preempts the core during holding the lock, any thread
> >> trying to read the time will continue to spin. (This alone is an issue 
> >> IMHO).
> >> If the cobalt thread itself now call the vDSO function as reader, it
> >> will spin on the lock and block the lock from getting released.
> >>
> >>
> >> Either the update_fast_timekeeper funtion should not be preemptible
> >> by cobalt, or the spin-lock on reading could fallback to the syscall after 
> >> a
> certain amount of retries.
> >>
> >> The later is probably easier to implement, but then could randomly
> demote cobalt threads.
> >> (on the other hand, this would be always a demotion on platforms
> >> without the vdso function)
> >>
> >
> > update_vsyscall() is locking the write-side. If the analysis is correct, 
> > this
> patch may help at the expense of a some cycles more spent uninterruptible:
> >
> > diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > b/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > index 9fb89b6e88c3..e9baa57e8385 100644
> > --- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > +++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > @@ -32,11 +32,14 @@ void update_vsyscall(struct timekeeper *tk)
> >   {
> >     int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
> >     struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
> > +   unsigned long flags;
> >
> >     /* Mark the new vclock used. */
> >     BUILD_BUG_ON(VCLOCK_MAX >= 32);
> >     WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 <<
> > vclock_mode));
> >
> > +   flags = hard_cond_local_irq_save();
> > +
> >     gtod_write_begin(vdata);
> >
> >     /* copy vsyscall data */
> > @@ -77,6 +80,8 @@ void update_vsyscall(struct timekeeper *tk)
> >
> >     gtod_write_end(vdata);
> >
> > +   hard_cond_local_irq_restore(flags);
> > +
> >     if (tk->tkr_mono.clock == &clocksource_tsc)
> >             ipipe_update_hostrt(tk);
> >   }
> >
> 
> This should rather be an application bug: An RT (Xenomai) thread is
> apparently using Linux gettimeofday & Co. (glibc) from RT context. That was
> never supported, we rather have RT services for that
> (CLOCK_HOST_REALTIME).
> 
> We may think about detecting such cases better, though. Norbert, are you
> using native/alchemy APIs?
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
> Competence Center Embedded Linux

Reply via email to