On 14.11.19 02:58, Jeff Webb via Xenomai wrote:
Lange Norbert via Xenomai wrote:
From: Jan Kiszka <jan.kiszka at siemens.com>
On 13.11.19 16:18, Lange Norbert via Xenomai wrote:
I am running into some bad issues with debugging, can't really narrow
down when they happen, but usually when I run through GDB and want to
"break" (pause execution), it seems to be related to *other* Xenomai
programs running at the same time (as said its hard to narrow down).
We have a gdb test case. Does it trigger for you as well when you run some
other program in parallel?
Also, could you provide the kernel full log? Possibly, enabling the I-pipe
tracer with panic dump could be useful as well. But the most important step
would be to create reproducibility for a third party like me.
Currently the issue is gone, and I don't have time for researching the cause.
is panic dump a kernel compilation config?
I think one of my colleagues has experienced something similar.
He said that a when one application was stopped in a breakpoint,
it caused sem_timedwait calls in another application to not time
out until execution of the other program was resumed. I will ask
and see if he can put together a reproducible test case. I know
the problem was repeatable at one point with the two applications
he was working with.
This particular behavior is solved in 3.1 by
https://gitlab.denx.de/Xenomai/xenomai/commit/9ebc2b6ea49406026e9e69d8fa490b3f8d8f0a24.
I have personally experienced what seems (to me) to be a similar
issue involving signal handling where a signal handling thread
received a SIGINT via sigwait (other threads had SIGINT blocked),
and tried to set a global variable that should have caused the
other threads to terminate. The other threads had an issue where
they would not wake up from sem_timedwait calls (or even sleep
calls) after the SIGINT was received by the other thread, so they
would not terminate properly. The same code worked fine under
Xenomai 2.6. I tried to create a standalone example to reproduce
this today, but I could recreate the problem. I know it was very
reproducible when I was constructing a work-around for it.
Could it be that some fault occurs that causes subsequent bad
behavior with respect to signal handling (SIGINT/debugging) that
is fixed by a reboot?
Just trying to shed some light on the problem. I think there is
a bug here somewhere...
Stand-alone test cases or test sequences are always welcome! Just please
also make sure 3.1-rc as debugging code changed there quite a bit.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux