> -----Original Message----- > From: Jan Kiszka <jan.kis...@siemens.com> > Sent: Freitag, 20. August 2021 08:37 > To: Lange Norbert <norbert.la...@andritz.com>; Xenomai > (xenomai@xenomai.org) <xenomai@xenomai.org> > Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill > > > > CAUTION: External email. Do not click on links or open attachments unless you > know the sender and that the content is safe. > > On 19.08.21 18:54, Lange Norbert wrote: > > > > > >> -----Original Message----- > >> From: Jan Kiszka <jan.kis...@siemens.com> > >> Sent: Donnerstag, 19. August 2021 17:42 > >> To: Lange Norbert <norbert.la...@andritz.com>; Xenomai > >> (xenomai@xenomai.org) <xenomai@xenomai.org> > >> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill > >> > >> > >> > >> CAUTION: External email. Do not click on links or open attachments > >> unless you know the sender and that the content is safe. > >> > >> On 19.08.21 17:24, Lange Norbert wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Jan Kiszka <jan.kis...@siemens.com> > >>>> Sent: Donnerstag, 19. August 2021 12:54 > >>>> To: Lange Norbert <norbert.la...@andritz.com>; Xenomai > >>>> (xenomai@xenomai.org) <xenomai@xenomai.org> > >>>> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill > >>>> > >>>> > >>>> > >>>> CAUTION: External email. Do not click on links or open attachments > >>>> unless you know the sender and that the content is safe. > >>>> > >>>> On 19.08.21 11:56, Lange Norbert via Xenomai wrote: > >>>>> Hello, > >>>>> > >>>>> I have some small slight issue with the cobalt_assert_nrt > >>>>> function, incase a violation is detected the thread should get a > >>>>> signal, but the implementation will implicitly get a signal during > >>>>> the execution of > >>>> pthread_kill, see: > >>>>> > >>>>> > >>>>> #0 getpid () at ../sysdeps/unix/syscall-template.S:60 > >>>>> #1 0x00007fc1dc4fa0d6 in __pthread_kill (threadid=<optimized > >>>>> out>, > >>>>> signo=24) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53 > >>>>> #2 0x00007fc1dc8b2470 in callAssertFunction () at > >>>>> /home/lano/git/preload_checkers/src/pchecker.h:199 > >>>>> #3 malloc () at > >>>>> /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220 > >>>>> #4 <actual instrumented function> > >>>>> > >>>>> You see, the signal should happen with the pc of #2, not from the > >>>> implementation of glibc (or whatever c library). > >>>>> So the function should be changed to: > >>>>> > >>>>> void cobalt_assert_nrt(void) > >>>>> { > >>>>> if (cobalt_should_warn()) > >>>>> __cobalt_pthread_kill(pthread_self(), > >>>>> SIGDEBUG); } > >>>>> > >>>>> (or even replaced with the raw syscall ?) > >>>>> > >>>> > >>>> Hmm, that's similar to an assert causing a lengthy trace, not > >>>> failing directly at the place where the assert was raised: > >>>> > >>>> #0 0x00007ffff7a3918b in raise () from /lib64/libc.so.6 > >>>> #1 0x00007ffff7a3a585 in abort () from /lib64/libc.so.6 > >>>> #2 0x00007ffff7a3185a in __assert_fail_base () from > >>>> /lib64/libc.so.6 > >>>> #3 0x00007ffff7a318d2 in __assert_fail () from /lib64/libc.so.6 > >>>> #4 0x0000000000400524 in main () at assert.c:5 > >>>> > >>>> What is your practical problem with the current implementation? Do > >>>> you expect a specific SIGDEBUG reason? > >>> > >>> A better stacktrace. (I actually cut the trace in the signal handler > >>> in case of hitting an __assert_fail) > >> > >> The backtrace should still point to the right function that caused the > migration. > >> I miss cobalt_assert_nrt() in your backtrace though, but that should > >> have nothing to do with how it is implemented. Are you actually using > >> cobalt_assert_nrt() from libcobalt? > > > > Yes, but I dlsym it. > > I would prefer if the cobalt_assert_nrt would be the start of the trace. > > > > That it always does under normal constraints - please check your local setup, > this is not a generic problem. It's your pchecker.h:199 which issues the > syscall > directly, rather than calling cobalt_assert_nrt(). > Maybe that's because of lazy symbol resolution?
No, the start of the stacktrace is the first linux syscall resulting from the pthread_kill Call (getpid() in my case) Gdb has problems displaying the frame function, but it is cobalt_assert_nrt. I believe this happens if you use '-fno-plt' > > >> > >>> BTW, __cobalt_pthread_kill(pthread_self(), SIGDEBUG) doesn’t seem to > >>> do a > >> thing, doesn’t handle SIGDEBUG? > >>> > >> > >> It only triggers the signal (in one way or another...). Handling is > >> up to the application. If you don't handle that, the application is > terminated, obviously. > > > > The application continues running. But I did not try with > > __cobalt_pthread_kill(pthread_self(), SIGDEBUG) but > XENOMAI_SYSCALL2(sc_cobalt_thread_kill, thread, sig). > > Means the cobalt syscall is not handling the signal. > > A syscall does not handle signals. > > By calling the cobalt version of pthread_kill, you queue the signal for > synchronous RT processing (sigwait). I believe this does only happen for RT signals, the rest get delegated To __STD(thread_kill) in __cobalt_pthread_kill ? > > > > > So for to satisfy my OCD toggling off/on the modeswitch signals would > > be correct I guess > > > > pthread_setmode_np(PTHREAD_WARNSW, 0, NULL); > > pthread_kill(pthread_self(), SIGDEBUG); pthread_setmode_np(0, > > PTHREAD_WARNSW, NULL); > > > > or even just using a linux syscall: > > > > getpid(); > > A syscall will remain the source of the signal, no change on the origin of the > backtrace. The "origin" changes, see the part of the stacktrace you did not quote. I believe this is just some semantic/language issue ith origin/source, After using the raw linux getpid syscall I get the frame-ip from cobalt_assert_nrt For the first line, instead of the getpid function. > > > > > Point being that right now you trap alteast twice > > That is a different point. So far, you were complaining about getting a wrong > backtrace which is not caused by triggering a SIGDEBUG twice. If you want to > prevent a duplicate event, triggering a syscall only or disabling the warning > for > the syscall itself can be options. Well, the pthread_kill(pthread_self(), SIGDEBUG) will not cause my signal handler to be called *by sending a signal*. At this point its ensured that PTHREAD_WARNSW is enabled and the c library will trigger an mode switch at least once (atleast 2 times with glibc-2.28) before sending the signal. it would be more direct to call abort() in case this should really be a "assert" (no way of returning). Or send just one signal like with (void)DO_SYSCALL(SYS_getpid, 0) That has just one guaranteed modeswitch + signal handler invocation, And the frame ip points to cobalt_assert_nrt > But I consider this really a minor issue. Yeah I said as much, still like better diagnostics if I can get them. Norbert ________________________________ This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system. ANDRITZ HYDRO GmbH Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation Firmensitz/ Registered seat: Wien Firmenbuchgericht/ Court of registry: Handelsgericht Wien Firmenbuchnummer/ Company registration: FN 61833 g DVR: 0605077 UID-Nr.: ATU14756806 Thank You ________________________________