urcu/lttng (Userspace) and Xenomai
Hello, I am trying to figure out if Xenomai would work correctly with Lttng. Currently I haven’t figured out how the system manages buffers, but I am checking if this would be generally applicable to Xenomai. I’d like to know if anyone has already used Lttng UST with xenomai threads, and if there is any need to compile lttng/liburcu for xenomai or using some patches. (I haven’t seen anything that indicates it would not work). ## urcu flavours This has a few variants, lttng uses the bulletproof one. Most others should be faster on average – but all of them might unlock a futex with a raw syscall. Other flavours like qsbr could likely be faster if the futex sycall would be replaced with a cobalt mutex (it’s very unlikely this path is executed). Would need some work to get this done (and lttng to use it). ## sys_membarrier recent kernels and liburcu versions support this syscall, which supposedly allows removal of reader memory barriers. The syscall will somehow interrupt the threads (all *running threads* of the process), which implicitly causes a barrier for readers. Q: I guess this will *not* interrupt xenomai threads, as their shadow linux thread is not *running*? Q: x86_64 accesses are strictly ordered, do you actually need membarriers at all? Kind regards, Norbert This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system. ANDRITZ HYDRO GmbH Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation Firmensitz/ Registered seat: Wien Firmenbuchgericht/ Court of registry: Handelsgericht Wien Firmenbuchnummer/ Company registration: FN 61833 g DVR: 0605077 UID-Nr.: ATU14756806 Thank You
Re: urcu/lttng (Userspace) and Xenomai
On 21.11.19 11:26, Lange Norbert via Xenomai wrote: Hello, I am trying to figure out if Xenomai would work correctly with Lttng. Currently I haven’t figured out how the system manages buffers, but I am checking if this would be generally applicable to Xenomai. I’d like to know if anyone has already used Lttng UST with xenomai threads, and if there is any need to compile lttng/liburcu for xenomai or using some patches. (I haven’t seen anything that indicates it would not work). ## urcu flavours This has a few variants, lttng uses the bulletproof one. Most others should be faster on average – but all of them might unlock a futex with a raw syscall. Other flavours like qsbr could likely be faster if the futex sycall would be replaced with a cobalt mutex (it’s very unlikely this path is executed). Would need some work to get this done (and lttng to use it). ## sys_membarrier recent kernels and liburcu versions support this syscall, which supposedly allows removal of reader memory barriers. The syscall will somehow interrupt the threads (all *running threads* of the process), which implicitly causes a barrier for readers. Q: I guess this will *not* interrupt xenomai threads, as their shadow linux thread is not *running*? Q: x86_64 accesses are strictly ordered, do you actually need membarriers at all? I didn't look into details of enabling userspace lttng yet, but I had a chat with Mathieu about this, maybe a year ago. He said back then that there is also a polling mode where a data collection thread is simply trying to obtain the trace output time-driven. Then the producer (including cobalt threads) would not need any syscall at all. As I said, that was just a conceptual discussion. None of us actually looked into the implementation. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
RE: urcu/lttng (Userspace) and Xenomai
> -Original Message- > From: Jan Kiszka > Sent: Donnerstag, 21. November 2019 14:46 > To: Lange Norbert ; Xenomai > (xenomai@xenomai.org) > Subject: Re: urcu/lttng (Userspace) and Xenomai > > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR > ATTACHMENTS. > > > On 21.11.19 11:26, Lange Norbert via Xenomai wrote: > > Hello, > > > > I am trying to figure out if Xenomai would work correctly with Lttng. > > Currently I haven’t figured out how the system manages buffers, but I am > checking if this would be generally applicable to Xenomai. > > > > I’d like to know if anyone has already used Lttng UST with xenomai > > threads, and if there is any need to compile lttng/liburcu for xenomai or > using some patches. > > (I haven’t seen anything that indicates it would not work). > > > > ## urcu flavours > > This has a few variants, lttng uses the bulletproof one. Most others > > should be faster on average – but all of them might unlock a futex with a > raw syscall. > > > > Other flavours like qsbr could likely be faster if the futex sycall > > would be replaced with a cobalt mutex (it’s very unlikely this path is > executed). Would need some work to get this done (and lttng to use it). > > > > ## sys_membarrier > > recent kernels and liburcu versions support this syscall, which > > supposedly allows removal of reader memory barriers. > > The syscall will somehow interrupt the threads (all *running threads* of > the process), which implicitly causes a barrier for readers. > > > > Q: I guess this will *not* interrupt xenomai threads, as their shadow linux > thread is not *running*? > > Q: x86_64 accesses are strictly ordered, do you actually need membarriers > at all? > > > > I didn't look into details of enabling userspace lttng yet, but I had a chat > with > Mathieu about this, maybe a year ago. He said back then that there is also a > polling mode where a data collection thread is simply trying to obtain the > trace output time-driven. I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any OS-level synchronization in the readers, only some atomic variables. Mathieu is a lttng dev? > Then the producer (including cobalt threads) would > not need any syscall at all. In the context of lttng those are readers (of the shared rcu structures), writes would only happen if tracepoint providers are added/removed. But then I don’t know how the buffers are managed, this appears to be system-wide in another process. The sys_membarrier syscall would be called by writers (not xenomai threads) to additionally allow instructions like dmb (for arm) around atomic accesses to be removed for the readers. I think it's useless for x86_64 and the syscall itself would not do anything for running xenomai threads. (you can only force the syscall but not disable it, without changing sources that is). > As I said, that was just a conceptual discussion. > None of us actually looked into the implementation. Hmm, would like to test this soon. Still need a way to totally disable it in-case something goes wrong.. ie ugly macro magic. Can you tell me that I am right about membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until the running xenomai thread had some sort of syscall synchronization? Norbert This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system. ANDRITZ HYDRO GmbH Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation Firmensitz/ Registered seat: Wien Firmenbuchgericht/ Court of registry: Handelsgericht Wien Firmenbuchnummer/ Company registration: FN 61833 g DVR: 0605077 UID-Nr.: ATU14756806 Thank You
Re: Deadlock during debugging
On 11/19/19 6:39 PM, Philippe Gerum wrote: > On 11/19/19 5:46 PM, Philippe Gerum via Xenomai wrote: >> On 11/18/19 4:13 PM, Lange Norbert via Xenomai wrote: >>> Hello, >>> >>> Here's one of my deadlocks, the output seems interleaved from 2 concurrent >>> dumps, >>> I ran the crashlog through decode_stacktrace.sh. >>> >> >> Ok, I can reproduce this one, including in a vm. The symptom can be either a >> lockup, or recursive faults. I'm on it. >> > > #0b9e81807 seems to have introduced a regression. Something there may be > subtly at odds with the core scheduler logic. More later. Eh, no. #0b9e81807 only exposed a very serious and longstanding issue into the scheduler core, which may cause a CPU to pick threads from a remote runqueue in rare circumstances. And that bug is definitely mine. I need to review more code to make sure no more horror shows alike are waiting for prime time. This bug affects all Xenomai series, 3.1, 3.0, 2.x. -- Philippe.
Re: Deadlock during debugging
On 21.11.19 16:31, Philippe Gerum wrote: On 11/19/19 6:39 PM, Philippe Gerum wrote: On 11/19/19 5:46 PM, Philippe Gerum via Xenomai wrote: On 11/18/19 4:13 PM, Lange Norbert via Xenomai wrote: Hello, Here's one of my deadlocks, the output seems interleaved from 2 concurrent dumps, I ran the crashlog through decode_stacktrace.sh. Ok, I can reproduce this one, including in a vm. The symptom can be either a lockup, or recursive faults. I'm on it. #0b9e81807 seems to have introduced a regression. Something there may be subtly at odds with the core scheduler logic. More later. Eh, no. #0b9e81807 only exposed a very serious and longstanding issue into the scheduler core, which may cause a CPU to pick threads from a remote runqueue in rare circumstances. And that bug is definitely mine. I need to review more code to make sure no more horror shows alike are waiting for prime time. This bug affects all Xenomai series, 3.1, 3.0, 2.x. Uh... good that we surfaced this now. Curious to see the result! Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: urcu/lttng (Userspace) and Xenomai
On 21.11.19 15:15, Lange Norbert wrote: -Original Message- From: Jan Kiszka Sent: Donnerstag, 21. November 2019 14:46 To: Lange Norbert ; Xenomai (xenomai@xenomai.org) Subject: Re: urcu/lttng (Userspace) and Xenomai NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR ATTACHMENTS. On 21.11.19 11:26, Lange Norbert via Xenomai wrote: Hello, I am trying to figure out if Xenomai would work correctly with Lttng. Currently I haven’t figured out how the system manages buffers, but I am checking if this would be generally applicable to Xenomai. I’d like to know if anyone has already used Lttng UST with xenomai threads, and if there is any need to compile lttng/liburcu for xenomai or using some patches. (I haven’t seen anything that indicates it would not work). ## urcu flavours This has a few variants, lttng uses the bulletproof one. Most others should be faster on average – but all of them might unlock a futex with a raw syscall. Other flavours like qsbr could likely be faster if the futex sycall would be replaced with a cobalt mutex (it’s very unlikely this path is executed). Would need some work to get this done (and lttng to use it). ## sys_membarrier recent kernels and liburcu versions support this syscall, which supposedly allows removal of reader memory barriers. The syscall will somehow interrupt the threads (all *running threads* of the process), which implicitly causes a barrier for readers. Q: I guess this will *not* interrupt xenomai threads, as their shadow linux thread is not *running*? Q: x86_64 accesses are strictly ordered, do you actually need membarriers at all? I didn't look into details of enabling userspace lttng yet, but I had a chat with Mathieu about this, maybe a year ago. He said back then that there is also a polling mode where a data collection thread is simply trying to obtain the trace output time-driven. I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any OS-level synchronization in the readers, only some atomic variables. Mathieu is a lttng dev? Mathieu Desnoyers is the creator of lttng. And of those nice urcu services and syscalls. Let's try to pull him in... :) You could also try to place your questions on some lttng channel, I guess. Then the producer (including cobalt threads) would not need any syscall at all. In the context of lttng those are readers (of the shared rcu structures), writes would only happen if tracepoint providers are added/removed. Maybe Mathieu had a static (upfront to application start) tracepoint configuration in mind. That's what I would expect from an RT setup at least. But then I don’t know how the buffers are managed, this appears to be system-wide in another process. The sys_membarrier syscall would be called by writers (not xenomai threads) to additionally allow instructions like dmb (for arm) around atomic accesses to be removed for the readers. I think it's useless for x86_64 and the syscall itself would not do anything for running xenomai threads. (you can only force the syscall but not disable it, without changing sources that is). While an x86-only view can be ok for a concrete setup, it's better to develop a generic / portable solution that enables lttng for broader use in Xenomai applications. As I said, that was just a conceptual discussion. None of us actually looked into the implementation. Hmm, would like to test this soon. Still need a way to totally disable it in-case something goes wrong.. ie ugly macro magic. Can you tell me that I am right about membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until the running xenomai thread had some sort of syscall synchronization? I haven't looked into membarrier semantics in Xenomai context yet. The key question is if the Xenomai task switcher happens to provide the same information to that service as a normal Linux task switch would do. Maybe it's working, just slower, maybe it's stalling with CPUs that only switch between Linux idle and the Xenomai scheduler as a black box from Linux perspective. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux