urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Lange Norbert via Xenomai
Hello,

I am trying to figure out if Xenomai would work correctly with Lttng. Currently 
I haven’t figured out how the system manages buffers,
but I am checking if this would be generally applicable to Xenomai.

I’d like to know if anyone has already used Lttng UST with xenomai threads,
and if there is any need to compile lttng/liburcu for xenomai or using some 
patches.
(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others should be
faster on average – but all of them might unlock a futex with a raw syscall.

Other flavours like qsbr could likely be faster if the futex sycall would be 
replaced with a cobalt mutex
(it’s very unlikely this path is executed). Would need some work to get this 
done (and lttng to use it).

## sys_membarrier
recent kernels and liburcu versions support this syscall, which supposedly
allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of the 
process), which implicitly causes a barrier for readers.

Q: I guess this will *not* interrupt xenomai threads, as their shadow linux 
thread is not *running*?
Q: x86_64 accesses are strictly ordered, do you actually need membarriers at 
all?

Kind regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Jan Kiszka via Xenomai

On 21.11.19 11:26, Lange Norbert via Xenomai wrote:

Hello,

I am trying to figure out if Xenomai would work correctly with Lttng. Currently 
I haven’t figured out how the system manages buffers,
but I am checking if this would be generally applicable to Xenomai.

I’d like to know if anyone has already used Lttng UST with xenomai threads,
and if there is any need to compile lttng/liburcu for xenomai or using some 
patches.
(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others should be
faster on average – but all of them might unlock a futex with a raw syscall.

Other flavours like qsbr could likely be faster if the futex sycall would be 
replaced with a cobalt mutex
(it’s very unlikely this path is executed). Would need some work to get this 
done (and lttng to use it).

## sys_membarrier
recent kernels and liburcu versions support this syscall, which supposedly
allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of the 
process), which implicitly causes a barrier for readers.

Q: I guess this will *not* interrupt xenomai threads, as their shadow linux 
thread is not *running*?
Q: x86_64 accesses are strictly ordered, do you actually need membarriers at 
all?



I didn't look into details of enabling userspace lttng yet, but I had a 
chat with Mathieu about this, maybe a year ago. He said back then that 
there is also a polling mode where a data collection thread is simply 
trying to obtain the trace output time-driven. Then the producer 
(including cobalt threads) would not need any syscall at all. As I said, 
that was just a conceptual discussion. None of us actually looked into 
the implementation.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



RE: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 21. November 2019 14:46
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: urcu/lttng (Userspace) and Xenomai
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 21.11.19 11:26, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I am trying to figure out if Xenomai would work correctly with Lttng.
> > Currently I haven’t figured out how the system manages buffers, but I am
> checking if this would be generally applicable to Xenomai.
> >
> > I’d like to know if anyone has already used Lttng UST with xenomai
> > threads, and if there is any need to compile lttng/liburcu for xenomai or
> using some patches.
> > (I haven’t seen anything that indicates it would not work).
> >
> > ## urcu flavours
> > This has a few variants, lttng uses the bulletproof one. Most others
> > should be faster on average – but all of them might unlock a futex with a
> raw syscall.
> >
> > Other flavours like qsbr could likely be faster if the futex sycall
> > would be replaced with a cobalt mutex (it’s very unlikely this path is
> executed). Would need some work to get this done (and lttng to use it).
> >
> > ## sys_membarrier
> > recent kernels and liburcu versions support this syscall, which
> > supposedly allows removal of reader memory barriers.
> > The syscall will somehow interrupt the threads (all *running threads* of
> the process), which implicitly causes a barrier for readers.
> >
> > Q: I guess this will *not* interrupt xenomai threads, as their shadow linux
> thread is not *running*?
> > Q: x86_64 accesses are strictly ordered, do you actually need membarriers
> at all?
> >
>
> I didn't look into details of enabling userspace lttng yet, but I had a chat 
> with
> Mathieu about this, maybe a year ago. He said back then that there is also a
> polling mode where a data collection thread is simply trying to obtain the
> trace output time-driven.

I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any 
OS-level
synchronization in the readers, only some atomic variables.
Mathieu is a lttng dev?

> Then the producer (including cobalt threads) would
> not need any syscall at all.

In the context of lttng those are readers (of the shared rcu structures),
writes would only happen if tracepoint providers are added/removed.

But then I don’t know how the buffers are managed, this appears to
be system-wide in another process.

The sys_membarrier syscall would be called by writers (not xenomai threads) to 
additionally allow
instructions like dmb (for arm) around atomic accesses to be removed for the 
readers.
I think it's useless for x86_64 and the syscall itself would not do anything 
for running xenomai threads.
(you can only force the syscall but not disable it, without changing sources 
that is).

> As I said, that was just a conceptual discussion.
> None of us actually looked into the implementation.

Hmm, would like to test this soon. Still need a way to totally disable it 
in-case something goes wrong.. ie ugly macro magic.
Can you tell me that I am right about 
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until
the running xenomai thread had some sort of syscall synchronization?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: Deadlock during debugging

2019-11-21 Thread Philippe Gerum via Xenomai
On 11/19/19 6:39 PM, Philippe Gerum wrote:
> On 11/19/19 5:46 PM, Philippe Gerum via Xenomai wrote:
>> On 11/18/19 4:13 PM, Lange Norbert via Xenomai wrote:
>>> Hello,
>>>
>>> Here's one of my deadlocks, the output seems interleaved from 2 concurrent 
>>> dumps,
>>> I ran the crashlog through decode_stacktrace.sh.
>>>
>>
>> Ok, I can reproduce this one, including in a vm. The symptom can be either a 
>> lockup, or recursive faults. I'm on it.
>>
> 
> #0b9e81807 seems to have introduced a regression. Something there may be 
> subtly at odds with the core scheduler logic. More later.

Eh, no. #0b9e81807 only exposed a very serious and longstanding issue into the 
scheduler core, which may cause a CPU to pick threads from a remote runqueue in 
rare circumstances. And that bug is definitely mine. I need to review more code 
to make sure no more horror shows alike are waiting for prime time. This bug 
affects all Xenomai series, 3.1, 3.0, 2.x.

-- 
Philippe.



Re: Deadlock during debugging

2019-11-21 Thread Jan Kiszka via Xenomai

On 21.11.19 16:31, Philippe Gerum wrote:

On 11/19/19 6:39 PM, Philippe Gerum wrote:

On 11/19/19 5:46 PM, Philippe Gerum via Xenomai wrote:

On 11/18/19 4:13 PM, Lange Norbert via Xenomai wrote:

Hello,

Here's one of my deadlocks, the output seems interleaved from 2 concurrent 
dumps,
I ran the crashlog through decode_stacktrace.sh.



Ok, I can reproduce this one, including in a vm. The symptom can be either a 
lockup, or recursive faults. I'm on it.



#0b9e81807 seems to have introduced a regression. Something there may be subtly 
at odds with the core scheduler logic. More later.


Eh, no. #0b9e81807 only exposed a very serious and longstanding issue into the 
scheduler core, which may cause a CPU to pick threads from a remote runqueue in 
rare circumstances. And that bug is definitely mine. I need to review more code 
to make sure no more horror shows alike are waiting for prime time. This bug 
affects all Xenomai series, 3.1, 3.0, 2.x.



Uh... good that we surfaced this now. Curious to see the result!

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Jan Kiszka via Xenomai

On 21.11.19 15:15, Lange Norbert wrote:

-Original Message-
From: Jan Kiszka 
Sent: Donnerstag, 21. November 2019 14:46
To: Lange Norbert ; Xenomai
(xenomai@xenomai.org) 
Subject: Re: urcu/lttng (Userspace) and Xenomai

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
ATTACHMENTS.


On 21.11.19 11:26, Lange Norbert via Xenomai wrote:

Hello,

I am trying to figure out if Xenomai would work correctly with Lttng.
Currently I haven’t figured out how the system manages buffers, but I am

checking if this would be generally applicable to Xenomai.


I’d like to know if anyone has already used Lttng UST with xenomai
threads, and if there is any need to compile lttng/liburcu for xenomai or

using some patches.

(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others
should be faster on average – but all of them might unlock a futex with a

raw syscall.


Other flavours like qsbr could likely be faster if the futex sycall
would be replaced with a cobalt mutex (it’s very unlikely this path is

executed). Would need some work to get this done (and lttng to use it).


## sys_membarrier
recent kernels and liburcu versions support this syscall, which
supposedly allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of

the process), which implicitly causes a barrier for readers.


Q: I guess this will *not* interrupt xenomai threads, as their shadow linux

thread is not *running*?

Q: x86_64 accesses are strictly ordered, do you actually need membarriers

at all?




I didn't look into details of enabling userspace lttng yet, but I had a chat 
with
Mathieu about this, maybe a year ago. He said back then that there is also a
polling mode where a data collection thread is simply trying to obtain the
trace output time-driven.


I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any 
OS-level
synchronization in the readers, only some atomic variables.
Mathieu is a lttng dev?


Mathieu Desnoyers is the creator of lttng. And of those nice urcu 
services and syscalls. Let's try to pull him in... :)


You could also try to place your questions on some lttng channel, I guess.




Then the producer (including cobalt threads) would
not need any syscall at all.


In the context of lttng those are readers (of the shared rcu structures),
writes would only happen if tracepoint providers are added/removed.


Maybe Mathieu had a static (upfront to application start) tracepoint 
configuration in mind. That's what I would expect from an RT setup at least.




But then I don’t know how the buffers are managed, this appears to
be system-wide in another process.

The sys_membarrier syscall would be called by writers (not xenomai threads) to 
additionally allow
instructions like dmb (for arm) around atomic accesses to be removed for the 
readers.
I think it's useless for x86_64 and the syscall itself would not do anything 
for running xenomai threads.
(you can only force the syscall but not disable it, without changing sources 
that is).


While an x86-only view can be ok for a concrete setup, it's better to 
develop a generic / portable solution that enables lttng for broader use 
in Xenomai applications.





As I said, that was just a conceptual discussion.
None of us actually looked into the implementation.


Hmm, would like to test this soon. Still need a way to totally disable it 
in-case something goes wrong.. ie ugly macro magic.
Can you tell me that I am right about 
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until
the running xenomai thread had some sort of syscall synchronization?


I haven't looked into membarrier semantics in Xenomai context yet. The 
key question is if the Xenomai task switcher happens to provide the same 
information to that service as a normal Linux task switch would do. 
Maybe it's working, just slower, maybe it's stalling with CPUs that only 
switch between Linux idle and the Xenomai scheduler as a black box from 
Linux perspective.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux