> -----Original Message-----
> From: Jan Kiszka <jan.kis...@siemens.com>
> Sent: Freitag, 20. August 2021 08:37
> To: Lange Norbert <norbert.la...@andritz.com>; Xenomai
> (xenomai@xenomai.org) <xenomai@xenomai.org>
> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless you
> know the sender and that the content is safe.
>
> On 19.08.21 18:54, Lange Norbert wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Kiszka <jan.kis...@siemens.com>
> >> Sent: Donnerstag, 19. August 2021 17:42
> >> To: Lange Norbert <norbert.la...@andritz.com>; Xenomai
> >> (xenomai@xenomai.org) <xenomai@xenomai.org>
> >> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>
> >>
> >>
> >> CAUTION: External email. Do not click on links or open attachments
> >> unless you know the sender and that the content is safe.
> >>
> >> On 19.08.21 17:24, Lange Norbert wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Jan Kiszka <jan.kis...@siemens.com>
> >>>> Sent: Donnerstag, 19. August 2021 12:54
> >>>> To: Lange Norbert <norbert.la...@andritz.com>; Xenomai
> >>>> (xenomai@xenomai.org) <xenomai@xenomai.org>
> >>>> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>>>
> >>>>
> >>>>
> >>>> CAUTION: External email. Do not click on links or open attachments
> >>>> unless you know the sender and that the content is safe.
> >>>>
> >>>> On 19.08.21 11:56, Lange Norbert via Xenomai wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I have some small slight issue with the cobalt_assert_nrt
> >>>>> function, incase a violation is detected the thread should get a
> >>>>> signal, but the implementation will implicitly get a signal during
> >>>>> the execution of
> >>>> pthread_kill, see:
> >>>>>
> >>>>>
> >>>>> #0  getpid () at ../sysdeps/unix/syscall-template.S:60
> >>>>> #1  0x00007fc1dc4fa0d6 in __pthread_kill (threadid=<optimized
> >>>>> out>,
> >>>>> signo=24) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
> >>>>> #2  0x00007fc1dc8b2470 in callAssertFunction () at
> >>>>> /home/lano/git/preload_checkers/src/pchecker.h:199
> >>>>> #3  malloc () at
> >>>>> /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
> >>>>> #4 <actual instrumented function>
> >>>>>
> >>>>> You see, the signal should happen with the pc of #2, not from the
> >>>> implementation of glibc (or whatever c library).
> >>>>> So the function should be changed to:
> >>>>>
> >>>>> void cobalt_assert_nrt(void)
> >>>>> {
> >>>>>             if (cobalt_should_warn())
> >>>>>                         __cobalt_pthread_kill(pthread_self(),
> >>>>> SIGDEBUG); }
> >>>>>
> >>>>> (or even replaced with the raw syscall ?)
> >>>>>
> >>>>
> >>>> Hmm, that's similar to an assert causing a lengthy trace, not
> >>>> failing directly at the place where the assert was raised:
> >>>>
> >>>> #0  0x00007ffff7a3918b in raise () from /lib64/libc.so.6
> >>>> #1  0x00007ffff7a3a585 in abort () from /lib64/libc.so.6
> >>>> #2  0x00007ffff7a3185a in __assert_fail_base () from
> >>>> /lib64/libc.so.6
> >>>> #3  0x00007ffff7a318d2 in __assert_fail () from /lib64/libc.so.6
> >>>> #4  0x0000000000400524 in main () at assert.c:5
> >>>>
> >>>> What is your practical problem with the current implementation? Do
> >>>> you expect a specific SIGDEBUG reason?
> >>>
> >>> A better stacktrace. (I actually cut the trace in the signal handler
> >>> in case of hitting an __assert_fail)
> >>
> >> The backtrace should still point to the right function that caused the
> migration.
> >> I miss cobalt_assert_nrt() in your backtrace though, but that should
> >> have nothing to do with how it is implemented. Are you actually using
> >> cobalt_assert_nrt() from libcobalt?
> >
> > Yes, but I dlsym it.
> > I would prefer if the cobalt_assert_nrt would be the start of the trace.
> >
>
> That it always does under normal constraints - please check your local setup,
> this is not a generic problem. It's your pchecker.h:199 which issues the 
> syscall
> directly, rather than calling cobalt_assert_nrt().
> Maybe that's because of lazy symbol resolution?

No, the start of the stacktrace is the first linux syscall resulting from the 
pthread_kill
Call (getpid() in my case)

Gdb has problems displaying the frame function, but it is cobalt_assert_nrt.
I believe this happens if you use '-fno-plt'

>
> >>
> >>> BTW, __cobalt_pthread_kill(pthread_self(), SIGDEBUG) doesn’t seem to
> >>> do a
> >> thing, doesn’t handle SIGDEBUG?
> >>>
> >>
> >> It only triggers the signal (in one way or another...). Handling is
> >> up to the application. If you don't handle that, the application is
> terminated, obviously.
> >
> > The application continues running. But I did not try with
> > __cobalt_pthread_kill(pthread_self(), SIGDEBUG) but
> XENOMAI_SYSCALL2(sc_cobalt_thread_kill, thread, sig).
> > Means the cobalt syscall is not handling the signal.
>
> A syscall does not handle signals.
>
> By calling the cobalt version of pthread_kill, you queue the signal for
> synchronous RT processing (sigwait).

I believe this does only happen for RT signals, the rest get delegated
To __STD(thread_kill) in __cobalt_pthread_kill ?

>
> >
> > So for to satisfy my OCD toggling off/on the modeswitch signals would
> > be correct I guess
> >
> > pthread_setmode_np(PTHREAD_WARNSW, 0, NULL);
> > pthread_kill(pthread_self(), SIGDEBUG); pthread_setmode_np(0,
> > PTHREAD_WARNSW, NULL);
> >
> > or even just using a linux syscall:
> >
> > getpid();
>
> A syscall will remain the source of the signal, no change on the origin of the
> backtrace.

The "origin" changes, see the part of the stacktrace you did not quote.
I believe this is just some semantic/language issue ith origin/source,
After using the raw linux getpid syscall I get the frame-ip from 
cobalt_assert_nrt
For the first line, instead of the getpid function.

>
> >
> > Point being that right now you trap alteast twice
>
> That is a different point. So far, you were complaining about getting a wrong
> backtrace which is not caused by triggering a SIGDEBUG twice. If you want to
> prevent a duplicate event, triggering a syscall only or disabling the warning 
> for
> the syscall itself can be options.

Well, the pthread_kill(pthread_self(), SIGDEBUG) will not cause my signal 
handler
to be called *by sending a signal*. At this point its ensured that 
PTHREAD_WARNSW
is enabled and the c library will trigger an mode switch at least once (atleast 
2 times with glibc-2.28)
before sending the signal.

it would be more direct to call abort() in case this should really be a 
"assert" (no way of returning).

Or send just one signal like with (void)DO_SYSCALL(SYS_getpid, 0)
That has just one guaranteed modeswitch + signal handler invocation,
And the frame ip points to cobalt_assert_nrt

> But I consider this really a minor issue.

Yeah I said as much, still like better diagnostics if I can get them.

Norbert
________________________________

This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You
________________________________

Reply via email to