Re: urcu/lttng (Userspace) and Xenomai

2019-11-22 Thread Mathieu Desnoyers via Xenomai



- On Nov 21, 2019, at 11:19 AM, Jan Kiszka jan.kis...@siemens.com wrote:

> On 21.11.19 15:15, Lange Norbert wrote:
>>> -Original Message-
>>> From: Jan Kiszka 
>>> Sent: Donnerstag, 21. November 2019 14:46
>>> To: Lange Norbert ; Xenomai
>>> (xenomai@xenomai.org) 
>>> Subject: Re: urcu/lttng (Userspace) and Xenomai
>>>
>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>>> ATTACHMENTS.
>>>
>>>
>>> On 21.11.19 11:26, Lange Norbert via Xenomai wrote:
>>>> Hello,
>>>>
>>>> I am trying to figure out if Xenomai would work correctly with Lttng.
>>>> Currently I haven’t figured out how the system manages buffers, but I am
>>> checking if this would be generally applicable to Xenomai.
>>>>
>>>> I’d like to know if anyone has already used Lttng UST with xenomai
>>>> threads, and if there is any need to compile lttng/liburcu for xenomai or
>>> using some patches.
>>>> (I haven’t seen anything that indicates it would not work).
>>>>
>>>> ## urcu flavours
>>>> This has a few variants, lttng uses the bulletproof one. Most others
>>>> should be faster on average – but all of them might unlock a futex with a
>>> raw syscall.
>>>>
>>>> Other flavours like qsbr could likely be faster if the futex sycall
>>>> would be replaced with a cobalt mutex (it’s very unlikely this path is
>>> executed). Would need some work to get this done (and lttng to use it).
>>>>
>>>> ## sys_membarrier
>>>> recent kernels and liburcu versions support this syscall, which
>>>> supposedly allows removal of reader memory barriers.
>>>> The syscall will somehow interrupt the threads (all *running threads* of
>>> the process), which implicitly causes a barrier for readers.
>>>>
>>>> Q: I guess this will *not* interrupt xenomai threads, as their shadow linux
>>> thread is not *running*?
>>>> Q: x86_64 accesses are strictly ordered, do you actually need membarriers
>>> at all?
>>>>
>>>
>>> I didn't look into details of enabling userspace lttng yet, but I had a chat
>>> with
>>> Mathieu about this, maybe a year ago. He said back then that there is also a
>>> polling mode where a data collection thread is simply trying to obtain the
>>> trace output time-driven.
>> 
>> I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any
>> OS-level
>> synchronization in the readers, only some atomic variables.
>> Mathieu is a lttng dev?
> 
> Mathieu Desnoyers is the creator of lttng. And of those nice urcu
> services and syscalls. Let's try to pull him in... :)

Hi Jan,

Thanks for putting me in contact. I've seen that Lange started a email
thread on lttng-dev. I will reply there for posterity. :)

Thanks!

Mathieu

> 
> You could also try to place your questions on some lttng channel, I guess.
> 
>> 
>>> Then the producer (including cobalt threads) would
>>> not need any syscall at all.
>> 
>> In the context of lttng those are readers (of the shared rcu structures),
>> writes would only happen if tracepoint providers are added/removed.
> 
> Maybe Mathieu had a static (upfront to application start) tracepoint
> configuration in mind. That's what I would expect from an RT setup at least.
> 
>> 
>> But then I don’t know how the buffers are managed, this appears to
>> be system-wide in another process.
>> 
>> The sys_membarrier syscall would be called by writers (not xenomai threads) 
>> to
>> additionally allow
>> instructions like dmb (for arm) around atomic accesses to be removed for the
>> readers.
>> I think it's useless for x86_64 and the syscall itself would not do anything 
>> for
>> running xenomai threads.
>> (you can only force the syscall but not disable it, without changing sources
>> that is).
> 
> While an x86-only view can be ok for a concrete setup, it's better to
> develop a generic / portable solution that enables lttng for broader use
> in Xenomai applications.
> 
>> 
>>> As I said, that was just a conceptual discussion.
>>> None of us actually looked into the implementation.
>> 
>> Hmm, would like to test this soon. Still need a way to totally disable it
>> in-case something goes wrong.. ie ugly macro magic.
>> Can you tell me that I am right about
>> membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until
>> the running xenomai thread had some sort of syscall synchronization?
> 
> I haven't looked into membarrier semantics in Xenomai context yet. The
> key question is if the Xenomai task switcher happens to provide the same
> information to that service as a normal Linux task switch would do.
> Maybe it's working, just slower, maybe it's stalling with CPUs that only
> switch between Linux idle and the Xenomai scheduler as a black box from
> Linux perspective.
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



Re: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Jan Kiszka via Xenomai

On 21.11.19 15:15, Lange Norbert wrote:

-Original Message-
From: Jan Kiszka 
Sent: Donnerstag, 21. November 2019 14:46
To: Lange Norbert ; Xenomai
(xenomai@xenomai.org) 
Subject: Re: urcu/lttng (Userspace) and Xenomai

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
ATTACHMENTS.


On 21.11.19 11:26, Lange Norbert via Xenomai wrote:

Hello,

I am trying to figure out if Xenomai would work correctly with Lttng.
Currently I haven’t figured out how the system manages buffers, but I am

checking if this would be generally applicable to Xenomai.


I’d like to know if anyone has already used Lttng UST with xenomai
threads, and if there is any need to compile lttng/liburcu for xenomai or

using some patches.

(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others
should be faster on average – but all of them might unlock a futex with a

raw syscall.


Other flavours like qsbr could likely be faster if the futex sycall
would be replaced with a cobalt mutex (it’s very unlikely this path is

executed). Would need some work to get this done (and lttng to use it).


## sys_membarrier
recent kernels and liburcu versions support this syscall, which
supposedly allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of

the process), which implicitly causes a barrier for readers.


Q: I guess this will *not* interrupt xenomai threads, as their shadow linux

thread is not *running*?

Q: x86_64 accesses are strictly ordered, do you actually need membarriers

at all?




I didn't look into details of enabling userspace lttng yet, but I had a chat 
with
Mathieu about this, maybe a year ago. He said back then that there is also a
polling mode where a data collection thread is simply trying to obtain the
trace output time-driven.


I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any 
OS-level
synchronization in the readers, only some atomic variables.
Mathieu is a lttng dev?


Mathieu Desnoyers is the creator of lttng. And of those nice urcu 
services and syscalls. Let's try to pull him in... :)


You could also try to place your questions on some lttng channel, I guess.




Then the producer (including cobalt threads) would
not need any syscall at all.


In the context of lttng those are readers (of the shared rcu structures),
writes would only happen if tracepoint providers are added/removed.


Maybe Mathieu had a static (upfront to application start) tracepoint 
configuration in mind. That's what I would expect from an RT setup at least.




But then I don’t know how the buffers are managed, this appears to
be system-wide in another process.

The sys_membarrier syscall would be called by writers (not xenomai threads) to 
additionally allow
instructions like dmb (for arm) around atomic accesses to be removed for the 
readers.
I think it's useless for x86_64 and the syscall itself would not do anything 
for running xenomai threads.
(you can only force the syscall but not disable it, without changing sources 
that is).


While an x86-only view can be ok for a concrete setup, it's better to 
develop a generic / portable solution that enables lttng for broader use 
in Xenomai applications.





As I said, that was just a conceptual discussion.
None of us actually looked into the implementation.


Hmm, would like to test this soon. Still need a way to totally disable it 
in-case something goes wrong.. ie ugly macro magic.
Can you tell me that I am right about 
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until
the running xenomai thread had some sort of syscall synchronization?


I haven't looked into membarrier semantics in Xenomai context yet. The 
key question is if the Xenomai task switcher happens to provide the same 
information to that service as a normal Linux task switch would do. 
Maybe it's working, just slower, maybe it's stalling with CPUs that only 
switch between Linux idle and the Xenomai scheduler as a black box from 
Linux perspective.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



RE: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 21. November 2019 14:46
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: urcu/lttng (Userspace) and Xenomai
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 21.11.19 11:26, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I am trying to figure out if Xenomai would work correctly with Lttng.
> > Currently I haven’t figured out how the system manages buffers, but I am
> checking if this would be generally applicable to Xenomai.
> >
> > I’d like to know if anyone has already used Lttng UST with xenomai
> > threads, and if there is any need to compile lttng/liburcu for xenomai or
> using some patches.
> > (I haven’t seen anything that indicates it would not work).
> >
> > ## urcu flavours
> > This has a few variants, lttng uses the bulletproof one. Most others
> > should be faster on average – but all of them might unlock a futex with a
> raw syscall.
> >
> > Other flavours like qsbr could likely be faster if the futex sycall
> > would be replaced with a cobalt mutex (it’s very unlikely this path is
> executed). Would need some work to get this done (and lttng to use it).
> >
> > ## sys_membarrier
> > recent kernels and liburcu versions support this syscall, which
> > supposedly allows removal of reader memory barriers.
> > The syscall will somehow interrupt the threads (all *running threads* of
> the process), which implicitly causes a barrier for readers.
> >
> > Q: I guess this will *not* interrupt xenomai threads, as their shadow linux
> thread is not *running*?
> > Q: x86_64 accesses are strictly ordered, do you actually need membarriers
> at all?
> >
>
> I didn't look into details of enabling userspace lttng yet, but I had a chat 
> with
> Mathieu about this, maybe a year ago. He said back then that there is also a
> polling mode where a data collection thread is simply trying to obtain the
> trace output time-driven.

I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any 
OS-level
synchronization in the readers, only some atomic variables.
Mathieu is a lttng dev?

> Then the producer (including cobalt threads) would
> not need any syscall at all.

In the context of lttng those are readers (of the shared rcu structures),
writes would only happen if tracepoint providers are added/removed.

But then I don’t know how the buffers are managed, this appears to
be system-wide in another process.

The sys_membarrier syscall would be called by writers (not xenomai threads) to 
additionally allow
instructions like dmb (for arm) around atomic accesses to be removed for the 
readers.
I think it's useless for x86_64 and the syscall itself would not do anything 
for running xenomai threads.
(you can only force the syscall but not disable it, without changing sources 
that is).

> As I said, that was just a conceptual discussion.
> None of us actually looked into the implementation.

Hmm, would like to test this soon. Still need a way to totally disable it 
in-case something goes wrong.. ie ugly macro magic.
Can you tell me that I am right about 
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until
the running xenomai thread had some sort of syscall synchronization?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Jan Kiszka via Xenomai

On 21.11.19 11:26, Lange Norbert via Xenomai wrote:

Hello,

I am trying to figure out if Xenomai would work correctly with Lttng. Currently 
I haven’t figured out how the system manages buffers,
but I am checking if this would be generally applicable to Xenomai.

I’d like to know if anyone has already used Lttng UST with xenomai threads,
and if there is any need to compile lttng/liburcu for xenomai or using some 
patches.
(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others should be
faster on average – but all of them might unlock a futex with a raw syscall.

Other flavours like qsbr could likely be faster if the futex sycall would be 
replaced with a cobalt mutex
(it’s very unlikely this path is executed). Would need some work to get this 
done (and lttng to use it).

## sys_membarrier
recent kernels and liburcu versions support this syscall, which supposedly
allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of the 
process), which implicitly causes a barrier for readers.

Q: I guess this will *not* interrupt xenomai threads, as their shadow linux 
thread is not *running*?
Q: x86_64 accesses are strictly ordered, do you actually need membarriers at 
all?



I didn't look into details of enabling userspace lttng yet, but I had a 
chat with Mathieu about this, maybe a year ago. He said back then that 
there is also a polling mode where a data collection thread is simply 
trying to obtain the trace output time-driven. Then the producer 
(including cobalt threads) would not need any syscall at all. As I said, 
that was just a conceptual discussion. None of us actually looked into 
the implementation.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Lange Norbert via Xenomai
Hello,

I am trying to figure out if Xenomai would work correctly with Lttng. Currently 
I haven’t figured out how the system manages buffers,
but I am checking if this would be generally applicable to Xenomai.

I’d like to know if anyone has already used Lttng UST with xenomai threads,
and if there is any need to compile lttng/liburcu for xenomai or using some 
patches.
(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others should be
faster on average – but all of them might unlock a futex with a raw syscall.

Other flavours like qsbr could likely be faster if the futex sycall would be 
replaced with a cobalt mutex
(it’s very unlikely this path is executed). Would need some work to get this 
done (and lttng to use it).

## sys_membarrier
recent kernels and liburcu versions support this syscall, which supposedly
allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of the 
process), which implicitly causes a barrier for readers.

Q: I guess this will *not* interrupt xenomai threads, as their shadow linux 
thread is not *running*?
Q: x86_64 accesses are strictly ordered, do you actually need membarriers at 
all?

Kind regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You