RE: [PATCH] cobalt/thread: Export __xnthread_discard

2021-12-15 Thread Lange Norbert via Xenomai
That takes care of the issue, thanks for the quick help.

> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 15. Dezember 2021 13:57
> To: Xenomai 
> Cc: Lange Norbert 
> Subject: [PATCH] cobalt/thread: Export __xnthread_discard
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless you
> know the sender and that the content is safe.
>
> From: Jan Kiszka 
>
> It's needed since f4dac53c04ae by switchtest, and that could be compiled
> as module.
>
> Reported-by: Lange Norbert 
> Signed-off-by: Jan Kiszka 
> ---
>  kernel/cobalt/thread.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index 2e1667d197..debc5077b1 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -553,6 +553,7 @@ void __xnthread_discard(struct xnthread *thread)
> xnthread_deregister(thread);
> xnlock_put_irqrestore(&nklock, s);
>  }
> +EXPORT_SYMBOL_GPL(__xnthread_discard);
>
>  /**
>   * @fn void xnthread_init(struct xnthread *thread,const struct
> xnthread_init_attr *attr,struct xnsched_class *sched_class,const union
> xnsched_policy_param *sched_param)
> --
> 2.31.1


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Build failure branch 3.1.x

2021-12-15 Thread Lange Norbert via Xenomai
Hello,

rebuilding from the 3.1.x branch yields an error:

ERROR: "__xnthread_discard" [drivers/xenomai/testing/xeno_switchtest.ko] 
undefined!

xeno_switchtest can't be compiled as module because of this (works as built-in)

Mit besten Grüßen / Kind regards

NORBERT LANGE




This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: cobalt_assert_nrt should use __cobalt_pthread_kill

2021-08-20 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 20. August 2021 11:15
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless
> you know the sender and that the content is safe.
>
> On 20.08.21 10:52, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Freitag, 20. August 2021 08:37
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>
> >>
> >>
> >> CAUTION: External email. Do not click on links or open attachments
> >> unless you know the sender and that the content is safe.
> >>
> >> On 19.08.21 18:54, Lange Norbert wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Jan Kiszka 
> >>>> Sent: Donnerstag, 19. August 2021 17:42
> >>>> To: Lange Norbert ; Xenomai
> >>>> (xenomai@xenomai.org) 
> >>>> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>>>
> >>>>
> >>>>
> >>>> CAUTION: External email. Do not click on links or open attachments
> >>>> unless you know the sender and that the content is safe.
> >>>>
> >>>> On 19.08.21 17:24, Lange Norbert wrote:
> >>>>>
> >>>>>
> >>>>>> -Original Message-----
> >>>>>> From: Jan Kiszka 
> >>>>>> Sent: Donnerstag, 19. August 2021 12:54
> >>>>>> To: Lange Norbert ; Xenomai
> >>>>>> (xenomai@xenomai.org) 
> >>>>>> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> CAUTION: External email. Do not click on links or open
> >>>>>> attachments unless you know the sender and that the content is
> safe.
> >>>>>>
> >>>>>> On 19.08.21 11:56, Lange Norbert via Xenomai wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I have some small slight issue with the cobalt_assert_nrt
> >>>>>>> function, incase a violation is detected the thread should get a
> >>>>>>> signal, but the implementation will implicitly get a signal
> >>>>>>> during the execution of
> >>>>>> pthread_kill, see:
> >>>>>>>
> >>>>>>>
> >>>>>>> #0  getpid () at ../sysdeps/unix/syscall-template.S:60
> >>>>>>> #1  0x7fc1dc4fa0d6 in __pthread_kill (threadid= >>>>>>> out>,
> >>>>>>> signo=24) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
> >>>>>>> #2  0x7fc1dc8b2470 in callAssertFunction () at
> >>>>>>> /home/lano/git/preload_checkers/src/pchecker.h:199
> >>>>>>> #3  malloc () at
> >>>>>>> /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
> >>>>>>> #4 
> >>>>>>>
> >>>>>>> You see, the signal should happen with the pc of #2, not from
> >>>>>>> the
> >>>>>> implementation of glibc (or whatever c library).
> >>>>>>> So the function should be changed to:
> >>>>>>>
> >>>>>>> void cobalt_assert_nrt(void)
> >>>>>>> {
> >>>>>>> if (cobalt_should_warn())
> >>>>>>> __cobalt_pthread_kill(pthread_self(),
> >>>>>>> SIGDEBUG); }
> >>>>>>>
> >>>>>>> (or even replaced with the raw syscall ?)
> >>>>>>>
> >>>>>>
> >>>>>> Hmm, that's similar to an assert causing a lengthy trace, not
> >>>>>> failing directly at the place where the assert was raised:
> >>>>>>
> >>>>>> #0  0x77a3918b in raise () from /lib64/libc.so.6
> >>>>>> #1  0x77a3a585 in abort () from /lib64/libc.so.6
> >>>>>> #2  0x77a3185a in __assert_f

RE: cobalt_assert_nrt should use __cobalt_pthread_kill

2021-08-20 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 20. August 2021 08:37
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless you
> know the sender and that the content is safe.
>
> On 19.08.21 18:54, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Donnerstag, 19. August 2021 17:42
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>
> >>
> >>
> >> CAUTION: External email. Do not click on links or open attachments
> >> unless you know the sender and that the content is safe.
> >>
> >> On 19.08.21 17:24, Lange Norbert wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Jan Kiszka 
> >>>> Sent: Donnerstag, 19. August 2021 12:54
> >>>> To: Lange Norbert ; Xenomai
> >>>> (xenomai@xenomai.org) 
> >>>> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>>>
> >>>>
> >>>>
> >>>> CAUTION: External email. Do not click on links or open attachments
> >>>> unless you know the sender and that the content is safe.
> >>>>
> >>>> On 19.08.21 11:56, Lange Norbert via Xenomai wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I have some small slight issue with the cobalt_assert_nrt
> >>>>> function, incase a violation is detected the thread should get a
> >>>>> signal, but the implementation will implicitly get a signal during
> >>>>> the execution of
> >>>> pthread_kill, see:
> >>>>>
> >>>>>
> >>>>> #0  getpid () at ../sysdeps/unix/syscall-template.S:60
> >>>>> #1  0x7fc1dc4fa0d6 in __pthread_kill (threadid= >>>>> out>,
> >>>>> signo=24) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
> >>>>> #2  0x7fc1dc8b2470 in callAssertFunction () at
> >>>>> /home/lano/git/preload_checkers/src/pchecker.h:199
> >>>>> #3  malloc () at
> >>>>> /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
> >>>>> #4 
> >>>>>
> >>>>> You see, the signal should happen with the pc of #2, not from the
> >>>> implementation of glibc (or whatever c library).
> >>>>> So the function should be changed to:
> >>>>>
> >>>>> void cobalt_assert_nrt(void)
> >>>>> {
> >>>>> if (cobalt_should_warn())
> >>>>> __cobalt_pthread_kill(pthread_self(),
> >>>>> SIGDEBUG); }
> >>>>>
> >>>>> (or even replaced with the raw syscall ?)
> >>>>>
> >>>>
> >>>> Hmm, that's similar to an assert causing a lengthy trace, not
> >>>> failing directly at the place where the assert was raised:
> >>>>
> >>>> #0  0x77a3918b in raise () from /lib64/libc.so.6
> >>>> #1  0x77a3a585 in abort () from /lib64/libc.so.6
> >>>> #2  0x77a3185a in __assert_fail_base () from
> >>>> /lib64/libc.so.6
> >>>> #3  0x77a318d2 in __assert_fail () from /lib64/libc.so.6
> >>>> #4  0x00400524 in main () at assert.c:5
> >>>>
> >>>> What is your practical problem with the current implementation? Do
> >>>> you expect a specific SIGDEBUG reason?
> >>>
> >>> A better stacktrace. (I actually cut the trace in the signal handler
> >>> in case of hitting an __assert_fail)
> >>
> >> The backtrace should still point to the right function that caused the
> migration.
> >> I miss cobalt_assert_nrt() in your backtrace though, but that should
> >> have nothing to do with how it is implemented. Are you actually using
> >> cobalt_assert_nrt() from libcobalt?
> >
> > Yes, but I dlsym it.
> > I would prefer if the cobalt_assert_nrt would be the start of the trace.
> >
>
> That it always does under normal constraints - please check your local setup,
> this is not a generic problem. It's your pc

RE: cobalt_assert_nrt should use __cobalt_pthread_kill

2021-08-19 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 19. August 2021 17:42
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless you
> know the sender and that the content is safe.
>
> On 19.08.21 17:24, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Donnerstag, 19. August 2021 12:54
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
> >>
> >>
> >>
> >> CAUTION: External email. Do not click on links or open attachments
> >> unless you know the sender and that the content is safe.
> >>
> >> On 19.08.21 11:56, Lange Norbert via Xenomai wrote:
> >>> Hello,
> >>>
> >>> I have some small slight issue with the cobalt_assert_nrt function,
> >>> incase a violation is detected the thread should get a signal, but
> >>> the implementation will implicitly get a signal during the execution
> >>> of
> >> pthread_kill, see:
> >>>
> >>>
> >>> #0  getpid () at ../sysdeps/unix/syscall-template.S:60
> >>> #1  0x7fc1dc4fa0d6 in __pthread_kill (threadid=,
> >>> signo=24) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
> >>> #2  0x7fc1dc8b2470 in callAssertFunction () at
> >>> /home/lano/git/preload_checkers/src/pchecker.h:199
> >>> #3  malloc () at
> >>> /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
> >>> #4 
> >>>
> >>> You see, the signal should happen with the pc of #2, not from the
> >> implementation of glibc (or whatever c library).
> >>> So the function should be changed to:
> >>>
> >>> void cobalt_assert_nrt(void)
> >>> {
> >>> if (cobalt_should_warn())
> >>> __cobalt_pthread_kill(pthread_self(),
> >>> SIGDEBUG); }
> >>>
> >>> (or even replaced with the raw syscall ?)
> >>>
> >>
> >> Hmm, that's similar to an assert causing a lengthy trace, not failing
> >> directly at the place where the assert was raised:
> >>
> >> #0  0x77a3918b in raise () from /lib64/libc.so.6
> >> #1  0x77a3a585 in abort () from /lib64/libc.so.6
> >> #2  0x77a3185a in __assert_fail_base () from /lib64/libc.so.6
> >> #3  0x77a318d2 in __assert_fail () from /lib64/libc.so.6
> >> #4  0x00400524 in main () at assert.c:5
> >>
> >> What is your practical problem with the current implementation? Do
> >> you expect a specific SIGDEBUG reason?
> >
> > A better stacktrace. (I actually cut the trace in the signal handler
> > in case of hitting an __assert_fail)
>
> The backtrace should still point to the right function that caused the 
> migration.
> I miss cobalt_assert_nrt() in your backtrace though, but that should have
> nothing to do with how it is implemented. Are you actually using
> cobalt_assert_nrt() from libcobalt?

Yes, but I dlsym it.
I would prefer if the cobalt_assert_nrt would be the start of the trace.

>
> > BTW, __cobalt_pthread_kill(pthread_self(), SIGDEBUG) doesn’t seem to do a
> thing, doesn’t handle SIGDEBUG?
> >
>
> It only triggers the signal (in one way or another...). Handling is up to the
> application. If you don't handle that, the application is terminated, 
> obviously.

The application continues running. But I did not try with 
__cobalt_pthread_kill(pthread_self(), SIGDEBUG)
but XENOMAI_SYSCALL2(sc_cobalt_thread_kill, thread, sig).
Means the cobalt syscall is not handling the signal.

So for to satisfy my OCD toggling off/on the modeswitch signals would be 
correct I guess

pthread_setmode_np(PTHREAD_WARNSW, 0, NULL);
pthread_kill(pthread_self(), SIGDEBUG);
pthread_setmode_np(0, PTHREAD_WARNSW, NULL);

or even just using a linux syscall:

getpid();

Point being that right now you trap alteast twice

I went forward and tried using a single linux sycall.

Stacktrace is now cleaner:

#0  0x7fc3997a1797 in cobalt_assert_nrt () from 
/opt/hipase2/buildroot-acpu/x86_64-buildroot-linux-gnu/sysroot/usr/xenomai/lib/libcobalt.so.2
#1  0x7fc3997c0470 in callAssertFunction () at 
/home/lano/git/preload_checkers/src/pchecker.h:199
#2  malloc () at /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
#3  

diff --git a/lib/cobalt/internal

RE: cobalt_assert_nrt should use __cobalt_pthread_kill

2021-08-19 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 19. August 2021 12:54
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: cobalt_assert_nrt should use __cobalt_pthread_kill
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless you
> know the sender and that the content is safe.
>
> On 19.08.21 11:56, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I have some small slight issue with the cobalt_assert_nrt function,
> > incase a violation is detected the thread should get a signal, but the
> > implementation will implicitly get a signal during the execution of
> pthread_kill, see:
> >
> >
> > #0  getpid () at ../sysdeps/unix/syscall-template.S:60
> > #1  0x7fc1dc4fa0d6 in __pthread_kill (threadid=,
> > signo=24) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
> > #2  0x7fc1dc8b2470 in callAssertFunction () at
> > /home/lano/git/preload_checkers/src/pchecker.h:199
> > #3  malloc () at
> > /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
> > #4 
> >
> > You see, the signal should happen with the pc of #2, not from the
> implementation of glibc (or whatever c library).
> > So the function should be changed to:
> >
> > void cobalt_assert_nrt(void)
> > {
> > if (cobalt_should_warn())
> > __cobalt_pthread_kill(pthread_self(),
> > SIGDEBUG); }
> >
> > (or even replaced with the raw syscall ?)
> >
>
> Hmm, that's similar to an assert causing a lengthy trace, not failing 
> directly at
> the place where the assert was raised:
>
> #0  0x77a3918b in raise () from /lib64/libc.so.6
> #1  0x77a3a585 in abort () from /lib64/libc.so.6
> #2  0x77a3185a in __assert_fail_base () from /lib64/libc.so.6
> #3  0x77a318d2 in __assert_fail () from /lib64/libc.so.6
> #4  0x00400524 in main () at assert.c:5
>
> What is your practical problem with the current implementation? Do you
> expect a specific SIGDEBUG reason?

A better stacktrace. (I actually cut the trace in the signal handler in case of 
hitting an __assert_fail)
BTW, __cobalt_pthread_kill(pthread_self(), SIGDEBUG) doesn’t seem to do a 
thing, doesn’t handle SIGDEBUG?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



cobalt_assert_nrt should use __cobalt_pthread_kill

2021-08-19 Thread Lange Norbert via Xenomai
Hello,

I have some small slight issue with the cobalt_assert_nrt function,
incase a violation is detected the thread should get a signal,
but the implementation will implicitly get a signal during the execution of 
pthread_kill, see:


#0  getpid () at ../sysdeps/unix/syscall-template.S:60
#1  0x7fc1dc4fa0d6 in __pthread_kill (threadid=, signo=24) 
at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
#2  0x7fc1dc8b2470 in callAssertFunction () at 
/home/lano/git/preload_checkers/src/pchecker.h:199
#3  malloc () at /home/lano/git/preload_checkers/src/pchecker_heap_glibc.c:220
#4 

You see, the signal should happen with the pc of #2, not from the 
implementation of glibc (or whatever c library).
So the function should be changed to:

void cobalt_assert_nrt(void)
{
if (cobalt_should_warn())
__cobalt_pthread_kill(pthread_self(), SIGDEBUG);
}

(or even replaced with the raw syscall ?)

Regards,
Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschr?nkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: kernel bug if rtnet device is accesses during unbind

2021-08-04 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 3. August 2021 18:04
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: kernel bug if rtnet device is accesses during unbind
>
>
>
> CAUTION: External email. Do not click on links or open attachments unless you
> know the sender and that the content is safe.
>
> On 03.08.21 13:18, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > There is some bigger kernel oops when an rtnet device is unbound from
> > linux but still accessible via ioctl.
> > Effect and backtrace depends on timing, usually the rt_igb module will
> > not decrease its reference count, and a following soft reboot might hang.
> >
> > To repoduce, for example with rt_igb (doubt its driver specific):
> >
> > echo ":01:00.0" > /sys/bus/pci/drivers/rt_igb/bind # rtifconfig
> > has to run in background echo ":01:00.0" >
> > /sys/bus/pci/drivers/rt_igb/unbind & rtifconfig rteth0 up
> >
>
> So, running one after the other (rtifconfig up first) will not trigger this? 
> Then it
> would sounds like a race between rtnet or the driver preventing the unbind
> and the ongoing ifup.

There is definitely some missing synchronization, and arguably thing could
Be improved in terms of supporting uevents.
What happens as far as I can tell (the udev example is more explicit):
1) unbinding starts, deallocates (atleast part of) the instance
2) an uevent "remove rteth0" is catched by udev, handled by running 'rtifconfig 
rteth0 up' (this was originally by accident)
3) rtifconfig still finds the rteth0 device, but then accesses invalid memory

Ie. rtifconfig was called *after* linux did broadcast the removal of rteth0

This doesn't happen if commands are sent serially on the terminal or via script,
I guess the write is blocking until the instance is completely removed.

FYI, everything is running on core 0 via affinity mask, and its dead easy to 
reproduce.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



kernel bug if rtnet device is accesses during unbind

2021-08-03 Thread Lange Norbert via Xenomai
Hello,

There is some bigger kernel oops when an rtnet device is unbound from
linux but still accessible via ioctl.
Effect and backtrace depends on timing, usually the rt_igb module will not
decrease its reference count, and a following soft reboot might hang.

To repoduce, for example with rt_igb (doubt its driver specific):

echo ":01:00.0" > /sys/bus/pci/drivers/rt_igb/bind
# rtifconfig has to run in background
echo ":01:00.0" > /sys/bus/pci/drivers/rt_igb/unbind & rtifconfig rteth0 up

* kernel oops attached at the end of mail.

Background: I wanted to use udev to set  the device up ASAP (and I missed the 
ACTION filter)
ACTION=="add", SUBSYSTEM=="rtnet", KERNEL=="rteth0", RUN+="/sbin/rtifconfig %k 
up"

This rule does not work for the reason that the udev rule fires
before the device is hooked in the rtnet subsystem.
I believe that this ordering might be the cause of the kernel bug aswell
(reachable via rtnet, while already unbound in linux/sysfs)

* udev log is added at the end

  kernel oops

[  350.463476] RTnet: unregistered rteth0
[  350.467328] invalid opcode:  [#1] SMP
[  350.471350] CPU: 0 PID: 564 Comm: zsh Not tainted 5.4.133-xeno6-static #3
[  350.478146] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.28.22 09/30/2019
[  350.487458] I-pipe domain: Linux
[  350.490705] RIP: 0010:free_msi_irqs+0x170/0x1a0
[  350.495247] Code: 0f 84 e4 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 d4 
fe ff ff 8b 7b 10 01 ef e8 fa fa c8 ff 48 83 b8 80 00 00 00 00 74 e0 <0f> 0b 49 
8d b5 b0 00 00 00 e8 f2 93 c9 ff e9 d3 fe ff ff 48 8b 7d
[  350.514018] RSP: 0018:a6ad40077d30 EFLAGS: 00010286
[  350.519252] RAX: a32ab799d400 RBX: a32ab9b4b3c0 RCX: 
[  350.526392] RDX: a32aba52b478 RSI: a32aba52b680 RDI: 007c
[  350.533532] RBP:  R08: ac026f80 R09: 
[  350.540676] R10:  R11: ac026f88 R12: a32abb3a11c0
[  350.547817] R13: a32abb3a1000 R14: a6ad40077eb0 R15: a32ab6a26860
[  350.554960] FS:  7fa8dc65e640() GS:a32abba0() 
knlGS:
[  350.563058] CS:  0010 DS:  ES:  CR0: 80050033
[  350.568815] CR2: 0068a7e0 CR3: 000177135000 CR4: 003406f0
[  350.575957] Call Trace:
[  350.578421]  igb_reset_interrupt_capability+0x8a/0x90 [rt_igb]
[  350.584268]  igb_remove+0xbf/0x170 [rt_igb]
[  350.588458]  pci_device_remove+0x28/0x60
[  350.592391]  __device_release_driver+0x134/0x1e0
[  350.597016]  device_driver_detach+0x3c/0xa0
[  350.601205]  unbind_store+0x113/0x130
[  350.604877]  kernfs_fop_write+0xcb/0x1b0
[  350.608810]  vfs_write+0xa5/0x1d0
[  350.612134]  ksys_write+0x5f/0xe0
[  350.615461]  do_syscall_64+0x7a/0x3d0
[  350.619132]  ? ipipe_restore_root+0x47/0x70
[  350.623325]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  350.628386] RIP: 0033:0x7fa8dc7639c4
[  350.631970] Code: 15 d1 d4 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 
0f 1f 00 48 8d 05 f1 13 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
[  350.650742] RSP: 002b:7ffd1feafee8 EFLAGS: 0246 ORIG_RAX: 
0001
[  350.658322] RAX: ffda RBX: 000d RCX: 7fa8dc7639c4
[  350.665466] RDX: 000d RSI: 00581ce0 RDI: 0001
[  350.672611] RBP: 7fa8dc832760 R08: 7fa8dc65e640 R09: 000a
[  350.679754] R10:  R11: 0246 R12: 000d
[  350.686897] R13: 00581ce0 R14: 000d R15: 7fa8dc82d740
[  350.694042] Modules linked in: rt_igb rtpacket rtnet
[  350.699024] ---[ end trace 582d575b2ac29cad ]---
[  350.703651] RIP: 0010:free_msi_irqs+0x170/0x1a0
[  350.708188] Code: 0f 84 e4 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 d4 
fe ff ff 8b 7b 10 01 ef e8 fa fa c8 ff 48 83 b8 80 00 00 00 00 74 e0 <0f> 0b 49 
8d b5 b0 00 00 00 e8 f2 93 c9 ff e9 d3 fe ff ff 48 8b 7d
[  350.726964] RSP: 0018:a6ad40077d30 EFLAGS: 00010286
[  350.732195] RAX: a32ab799d400 RBX: a32ab9b4b3c0 RCX: 
[  350.739335] RDX: a32aba52b478 RSI: a32aba52b680 RDI: 007c
[  350.746478] RBP:  R08: ac026f80 R09: 
[  350.753620] R10:  R11: ac026f88 R12: a32abb3a11c0
[  350.760760] R13: a32abb3a1000 R14: a6ad40077eb0 R15: a32ab6a26860
[  350.767902] FS:  7fa8dc65e640() GS:a32abba0() 
knlGS:
[  350.776000] CS:  0010 DS:  ES:  CR0: 80050033
[  350.781755] CR2: 0068a7e0 CR3: 000177135000 CR4: 003406f0
[  350.788907] [ cut here ]
[  350.793534] kernel BUG at drivers/pci/msi.c:375!


  udev log after binding

13:51:27 systemd-udevd[424]: rteth0: Device is queued (SEQNUM=1575, ACTION=add)
13:51:27 systemd-udevd[424]: Validate module index
13:51:27 systemd-ude

Some build issues with dovetail + ipipe 4.19

2021-02-10 Thread Lange Norbert via Xenomai
Hello,


By accident I built 4.19.152-cip37-x86-15 with the wip/dovetail branch. I 
understand that this should be supported one day?
In the hope that this is of use for you, I get built following errors during 
the modpost step with this config:

ERROR: "__rtdm_nrtsig_execute" [drivers/xenomai/testing/xeno_switchtest.ko] 
undefined!
ERROR: "__rtdm_nrtsig_execute" [drivers/xenomai/net/drivers/igb/rt_igb.ko] 
undefined!
ERROR: "pipeline_read_wallclock" [drivers/xenomai/net/drivers/igb/rt_igb.ko] 
undefined!

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



FW: is memory locking per core necessary?

2021-01-15 Thread Lange Norbert via Xenomai
I originally had the Xenomai thread tied to another CPU-Core, hence the subject.
The issue happens also if all threads are tied to Core #0.

So the question should read: is memory locking per *thread* necessary?

> -Original Message-
> From: Lange Norbert
> Sent: Freitag, 15. Jänner 2021 13:14
> To: Xenomai (xenomai@xenomai.org) 
> Subject: is memory locking per core necessary?
>
> Hello,
>
> I am trying to track down some spurios relaxes.
>
> What happens is that I:
> -   cobalt_init calls  mlockall(MCL_CURRENT | MCL_FUTURE);
> -   allocate and initialize some data in the heap area
> -   spawn the xenomai-thread
> -   wait for notification from the xenomai-thread (so that I know, any
> initialization there is done)
> -   call mlockall(MCL_CURRENT | MCL_FUTURE) once more
> -   notify the xenomai-thread (only now the real work starts)
>
> Then the Xenomai thread will have spurios relaxes when writing data in the 
> heap
> or bss (reads seem to be fine).
> I am a bit confused why this happens, adding a mlockall(MCL_CURRENT |
> MCL_FUTURE) in the Xenomai thread fixes the issue.
> Note that  the locking should happen even before spawning the threads.
>
> Message I get is:
> Jan 15 12:05:19 buildroot kernel: [Xenomai] switching realtime_packet to
> secondary mode after exception #14 from user-space at 0x40f3ed (pid 1383)
> Code is always a store to heap or bss.
>
> X86_64 + 4.19.152-cip37-xeno15-static + glibc 2.28
>
> /proc/pid/maps:
> 00429000-00435000 rw-p  00:00 0
> 014b5000-014f8000 rw-p  00:00 0  
> [heap]
>
> /proc/pid/smaps:
> 00429000-00435000 rw-p  00:00 0
> Size: 48 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> Rss:  48 kB
> Pss:  48 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:48 kB
> Referenced:   48 kB
> Anonymous:48 kB
> LazyFree:  0 kB
> AnonHugePages: 0 kB
> ShmemPmdMapped:0 kB
> Shared_Hugetlb:0 kB
> Private_Hugetlb:   0 kB
> Swap:  0 kB
> SwapPss:   0 kB
> Locked:   48 kB
> THPeligible:0
> VmFlags: rd wr mr mw me lo ac
> 014b5000-014f8000 rw-p  00:00 0  
> [heap]
> Size:268 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> Rss: 268 kB
> Pss: 268 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:   268 kB
> Referenced:  268 kB
> Anonymous:   268 kB
> LazyFree:  0 kB
> AnonHugePages: 0 kB
> ShmemPmdMapped:0 kB
> Shared_Hugetlb:0 kB
> Private_Hugetlb:   0 kB
> Swap:  0 kB
> SwapPss:   0 kB
> Locked:  268 kB
> THPeligible:0
> VmFlags: rd wr mr mw me lo ac
>
> Mit besten Grüßen / Kind regards
>
> NORBERT LANGE



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You




is memory locking per core necessary?

2021-01-15 Thread Lange Norbert via Xenomai
Hello,

I am trying to track down some spurios relaxes.

What happens is that I:
-   cobalt_init calls  mlockall(MCL_CURRENT | MCL_FUTURE);
-   allocate and initialize some data in the heap area
-   spawn the xenomai-thread
-   wait for notification from the xenomai-thread (so that I know, any 
initialization there is done)
-   call mlockall(MCL_CURRENT | MCL_FUTURE) once more
-   notify the xenomai-thread (only now the real work starts)

Then the Xenomai thread will have spurios relaxes when writing data in the heap 
or bss (reads seem to be fine).
I am a bit confused why this happens, adding a mlockall(MCL_CURRENT | 
MCL_FUTURE) in the Xenomai thread
fixes the issue.
Note that  the locking should happen even before spawning the threads.

Message I get is:
Jan 15 12:05:19 buildroot kernel: [Xenomai] switching realtime_packet to 
secondary mode after exception #14 from user-space at 0x40f3ed (pid 1383)
Code is always a store to heap or bss.

X86_64 + 4.19.152-cip37-xeno15-static + glibc 2.28

/proc/pid/maps:
00429000-00435000 rw-p  00:00 0
014b5000-014f8000 rw-p  00:00 0  [heap]

/proc/pid/smaps:
00429000-00435000 rw-p  00:00 0
Size: 48 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Rss:  48 kB
Pss:  48 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:48 kB
Referenced:   48 kB
Anonymous:48 kB
LazyFree:  0 kB
AnonHugePages: 0 kB
ShmemPmdMapped:0 kB
Shared_Hugetlb:0 kB
Private_Hugetlb:   0 kB
Swap:  0 kB
SwapPss:   0 kB
Locked:   48 kB
THPeligible:0
VmFlags: rd wr mr mw me lo ac
014b5000-014f8000 rw-p  00:00 0  [heap]
Size:268 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Rss: 268 kB
Pss: 268 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:   268 kB
Referenced:  268 kB
Anonymous:   268 kB
LazyFree:  0 kB
AnonHugePages: 0 kB
ShmemPmdMapped:0 kB
Shared_Hugetlb:0 kB
Private_Hugetlb:   0 kB
Swap:  0 kB
SwapPss:   0 kB
Locked:  268 kB
THPeligible:0
VmFlags: rd wr mr mw me lo ac

Mit besten Grüßen / Kind regards

NORBERT LANGE



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You




RE: rtnet not locating ethercat slaves

2020-10-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of John Ho via
> Xenomai
> Sent: Donnerstag, 15. Oktober 2020 16:37
> To: xenomai@xenomai.org
> Subject: rtnet not locating ethercat slaves
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hi all, thank you so much for the help so far with the setting up of xenomai,
> however I am still stuck with rtnet. Am i missing something to find the 6 
> different
> slaves on the robot?
> I thought that by modifying the tdma.conf, i will at least be able to get a
> response from the ip of the slave but till now i have no success.
> I have also tried changing the ipaddress, but decided to revert back to 
> default to
> see if i am making a mistake there.
> And my computer have a tendency to freeze after i start rtnet and check the
> rtroute.
>
> Will appreciate any help i can get with regards to this matter.
>
> Please do correct me if i am making a mistake.
>
> after loading into xenomai kernel, I do the following commands :
>
> sudo ifconfig eth0 down
>
> sudo rmmod e1000e
>
> cd /usr/xenomai
>
> sudo ./rtnet start
>
> waiting for slaves.
>
> sudo ./rtifconfig
>
> rteth0Medium: Ethernet  Hardware address: 1C:69:7A:0F:04:42
>   IP address: 10.0.0.1  Broadcast address: 10.255.255.255
>   UP BROADCAST  MTU: 1500
>
> rtlo  Medium: Local Loopback
>   IP address: 127.0.0.1
>   UP LOOPBACK RUNNING  MTU: 1500
>
> sudo ./rtroute
>
> Host Routing Table
> Hash Destination HW Address Device
> 00 0.0.0.0   00:00:00:00:00:00 rtlo
> 01 10.0.0.1   00:00:00:00:00:00 rtlo
> 01 127.0.0.1   00:00:00:00:00:00 rtlo
> 3F 10.255.255.255 FF:FF:FF:FF:FF:FF rteth0
>
> dmesg
> [  566.798184] TDMA: Failed to transmit sync frame!
>
> cat rtnet.conf
>
> # This file is usually located in /etc/rtnet.conf # Please adapt it 
> to your
> system.
> # This configuration file is used with the rtnet script.
>
> # RTnet installation path
> prefix="/usr/xenomai"
> exec_prefix="${prefix}"
> RTNET_MOD="/lib/modules/`uname -r`/kernel/drivers/xenomai/net"
> RTIFCONFIG="${exec_prefix}/sbin/rtifconfig"
> RTCFG="${exec_prefix}/sbin/rtcfg"
> TDMACFG="${exec_prefix}/sbin/tdmacfg"
>
> # Module suffix: ".o" for 2.4 kernels, ".ko" for later versions 
> MODULE_EXT=".ko"
>
>
>
> # RT-NIC driver
> RT_DRIVER="rt_e1000e"
> RT_DRIVER_OPTIONS=""
>
> # PCI addresses of RT-NICs to claim (format: :00:00.0)
> #   If both Linux and RTnet drivers for the same hardware are loaded, this
> #   list instructs the start script to rebind the given PCI devices,
> detaching
> #   from their Linux driver, attaching it to the RT driver above. Example:
> #   REBIND_RT_NICS=":00:19.0 :01:1d.1"
> REBIND_RT_NICS=":00:1f.6"
>
> # IP address and netmask of this station
> #   The TDMA_CONFIG file overrides these parameters for masters and backup
> #   masters. Leave blank if you do not use IP addresses or if this station
> is
> #   intended to retrieve its IP from the master based on its MAC address.
> IPADDR="10.0.0.1"
> NETMASK="255.255.255.0"
>
> # Start realtime loopback device ("yes" or "no") RT_LOOPBACK="yes"
>
> # Use the following RTnet protocol drivers RT_PROTOCOLS="udp packet"
>
> # Start capturing interface ("yes" or "no") RTCAP="yes"
>
>
>
> # Common RTcfg stage 2 config data (master mode only)
> #   The TDMA_CONFIG file overrides this parameter.
> STAGE_2_SRC=""
>
> # Stage 2 config data destination file (slave mode only) STAGE_2_DST=""
>
> # Command to be executed after stage 2 phase (slave mode only)
> STAGE_2_CMDS=""
>
> # TDMA mode of the station ("master" or "slave")
> #   Start backup masters in slave mode, it will then be switched to master
> #   mode automatically during startup.
> TDMA_MODE="master"
>
> # Master parameters
>
> # Simple setup: List of TDMA slaves
> #TDMA_SLAVES="10.0.0.2"
>
> # Simple setup: Cycle time in microsecond #TDMA_CYCLE="5000"
>
> # Simple setup: Offset in microsecond between TDMA slots
> #TDMA_OFFSET="200"
>
> # Advanced setup: Config file containing all TDMA station parameters
> #   To use this mode, uncomment the following line and disable the
> #   three master parameters above (SLAVES, CYCLE, and OFFSET).
> TDMA_CONFIG="${prefix}/etc/tdma.conf"
>
>
> cat tdma.conf
>
> #
> # Examplary TDMA configuration file
> #
>
> # Primary master
>
> master:
> ip 10.0.0.1
> cycle 5000
> slot 0 0
> slot 1 1000
>
>
> # Backup master
> #  Cycle is defined by the primary master
>
> backup-master:
> ip 10.0.0.2
> backup-offset 200
> slot 0 400
>
>
> # Slave A
> #  MAC is unknown, slave will be pre-configured to the given IP
>
> slave:
> ip 10.0.0.3
> slot 0 2000
> slot 1 2200 1/2
>
>
> # Slave B
> #  IP is assigned to the slave via its known MAC address
>
> slave:
> ip 10.0.0.4
> mac 00:12:34:56:AA:FF
> slot 0 2400
> slot 1 2200 2/2

RTnet TDMA is a own protocol, and your Ethercat slaves aren’t speaking it 
(unless you implement it there, but that goes against the "pipe-through" 
architecture of Ethercat).
If you want to use

RE: [PATCH] libs: Add linking dependencies to libcopperplate and libsmokey

2020-09-15 Thread Lange Norbert via Xenomai
On debian, dh_shlibdeps (dpkg-shlibdeps) should be able to diagnose this aswell.

Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Jan Kiszka
> via Xenomai
> Sent: Dienstag, 15. September 2020 12:39
> To: Vitaly Chikunov ; Xenomai 
> Subject: Re: [PATCH] libs: Add linking dependencies to libcopperplate and
> libsmokey
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 14.09.20 23:52, Vitaly Chikunov wrote:
> > ALT rpmbuild QA script detects that libsmokey and libcopperplate is
> > improperly linked (when built for Mercury core).
> >
> > Excerpt from the error message:
> >verify-elf: ERROR: ./usr/lib64/libsmokey.so.0.0.0: undefined symbol:
> get_mem_size
> >verify-elf: ERROR: ./usr/lib64/libcopperplate.so.0.0.0: undefined
> > symbol: get_mem_size
> >
>
> Could such a test be added to our regular build, or is it depending on rpm or
> altlinux?
>
> > Signed-off-by: Vitaly Chikunov 
> > ---
> >   lib/copperplate/Makefile.am | 3 ++-
> >   lib/smokey/Makefile.am  | 1 +
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/copperplate/Makefile.am b/lib/copperplate/Makefile.am
> > index f832d1c89..c7104eb02 100644
> > --- a/lib/copperplate/Makefile.am
> > +++ b/lib/copperplate/Makefile.am
> > @@ -2,7 +2,8 @@
> >   lib_LTLIBRARIES = libcopperplate.la
> >
> >   libcopperplate_la_LDFLAGS = @XENO_LIB_LDFLAGS@ -lpthread -lrt
> > -version-info 0:0:0 -libcopperplate_la_LIBADD =
> > +libcopperplate_la_LIBADD = @XENO_CORE_LDADD@
> > +
> >   noinst_LTLIBRARIES =
> >
> >   libcopperplate_la_SOURCES = \
> > diff --git a/lib/smokey/Makefile.am b/lib/smokey/Makefile.am index
> > 53c775c68..4ecae1f16 100644
> > --- a/lib/smokey/Makefile.am
> > +++ b/lib/smokey/Makefile.am
> > @@ -1,6 +1,7 @@
> >   lib_LTLIBRARIES = libsmokey.la
> >
> >   libsmokey_la_LDFLAGS = @XENO_LIB_LDFLAGS@ -version-info 0:0:0
> > +libsmokey_la_LIBADD = @XENO_CORE_LDADD@
> >
> >   libsmokey_la_SOURCES =  \
> >   helpers.c   \
> >
>
> Thanks, applied to next.
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
> Competence Center Embedded Linux



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: malloc and stl container

2020-07-27 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of Jan Holtz via
> Xenomai
> Sent: Montag, 20. Juli 2020 07:05
> To: xenomai@xenomai.org
> Subject: malloc and stl container
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
>Hello,
>i am using xenomai 3.0 on a cobalt x64 smp system with the alchemy
>skin.
>As far i know dynamic memory allocation can cause a context switch,
>which should be preventet.
>If i am right,malloc should not cause a CSW anymore at this xenomai
>version.

Don't see where you get this from. malloc might not cause a CSW every time,
but it has the potential to do so at *any* time.

>If malloc don't cause a CSW,  in which case the alchemy heap management
>services is suggested to use instead for dynamic memory allocation?
>I like to use stl containers like vector, etc.
>Is there a way to use this containers wthout adding values to
>containers seems to cause a CSW.

A generic malloc will always have properties that are unwanted for
realtime. When do you need to resize vectors?
If you have a part of your system not running in RT you can resize your vectors 
there,
or you pre-allocate them to be big enough so they won't ever call malloc.

>Can it work/wrapped with heap management services ?

The STL? Sure, you can use your own allocators (google: pool allocator), or 
something like boost::static_vector.
You still need to make sure you don’t cause allocations when in RT.

Norbert

>Hope you can help me to understand.
>Regards Jan


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Are there some methods that could limit how much CPU resources could be a single Xenomai process or thread?

2020-07-20 Thread Lange Norbert via Xenomai
Hello,

I am reconnecting the ML.


I am not aware of any good documentation for SCHED_TP,
but there is an example in smokey/sched-tp which Id use as starting point.

I don’t think SCHED_TP will measurable affect latency, outside of course in
the case where its “by design” (process needs to wait for its timeslice).

Norbert

From: 孙世龙 sunshilong 
Sent: Samstag, 18. Juli 2020 07:51
To: Lange Norbert 
Cc: Meng, Fino 
Subject: Re: Are there some methods that could limit how much CPU resources 
could be a single Xenomai process or thread?

Hi, Norbert

Thank you for the clarification.

>You can do something similar with the  temporal partitioning scheduler 
>(SCHED_TP),
>cgroups uses a similar concept of "time-slices", but is less strict AFAIU
Does SCHED_TP enlarge the latency(compared to SCHED_FIFO)?

Do you have more information about SCHED_TP?  If the answer is yes,
could you please suggest some documents for me to go through?

I searched all the source code of the Xenomai project and checked
the help information from the Kconfig, but no useful information about
SCHED_TP was found.
I googled it, only found this:
The SCHED_TP policy divides the scheduling time into a recurring
global frame, which is itself divided into an arbitrary number of time
partitions. Only threads assigned to the current partition are deemed
runnable, and scheduled according to a FIFO-based rule within this
partition. When completed, the current partition is advanced
automatically to the next one by the scheduler, and the global time
frame recurs from the first partition defined, when the last partition
has ended.

Thank you for your attention to this matter.
Best Regards.
Sunshilong(孙世龙)
On Fri, Jul 17, 2020 at 6:44 PM Lange Norbert 
mailto:norbert.la...@andritz.com>> wrote:


> -Original Message-
> From: Xenomai 
> mailto:xenomai-boun...@xenomai.org>> On Behalf 
> Of ???
> sunshilong via Xenomai
> Sent: Freitag, 17. Juli 2020 12:18
> To: Meng, Fino mailto:fino.m...@intel.com>>
> Cc: Xenomai (xenomai@xenomai.org) 
> mailto:xenomai@xenomai.org>>
> Subject: Re: Are there some methods that could limit how much CPU
> resources could be a single Xenomai process or thread?
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hi, 孟祥夫
> Thank you for taking the time to respond to my question.
>
> >In my understanding cgroup's design is exclusionary with real-
> time/deterministic/time coordinate design.
> >The latency/jitter is already down to 20us level,  how it can endure cgroup's
> volatility.
> I don't hold much hope, either. But I am not sure whether it's impossible to
> achieve this goal or not.

You can do something similar with the  temporal partitioning scheduler 
(SCHED_TP),
cgroups uses a similar concept of "time-slices", but is less strict AFAIU

Norbert



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: xenomai.supported_cpus not working as intended?

2020-07-14 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 13. Juli 2020 19:52
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: xenomai.supported_cpus not working as intended?
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.07.20 15:37, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I am using Xenomai 3.1 and I tried once more to tie Linux to Core0, and RT
> to the remaining Cores.
> > It seems that both Linux and Xenomai favor Core0, as rtnet-stack rtnet-rtpc
> seem to always run on that.
> > Network drivers will process the IRQs on all cores, but realistally all are
> handles at Core0.
> >
> > I tries the commandline xenomai.supported_cpus=0xFFFE, but this doesn’t
> seem to work right.
> >
> > I get stuck Xenomai processes, and entries in dmesg (both with and
> without the isolcpus parameter).
> > Kernel cmdline is like this: irqaffinity=0 xenomai.smi=disabled
> > isolcpus=1-3 xenomai.supported_cpus=0xFFFE
> >
> > Running a Xenomai Process looks like this:
> >
> > # /usr/xenomai/bin/clocktest
> > [  191.759690] [Xenomai] thread clocktest[575] switched to non-rt CPU0,
> aborted.
> > == Testing built-in CLOCK_REALTIME (0)
> > CPU  ToD offset [us] ToD drift [us/s]  warps max delta [us]
> > ---   -- --
> >0  0.00.000  00.0
> >1   -1450249.3  -36.240  00.0
> >2   -1450249.8  -36.040  00.0
> >3   -1450249.4  -35.292  00.0
> >
> > And similar stuff happens when booting, Kernel dmesg is following.
> >
> > So I don’t see how xenomai.supported_cpus can be safely used, It more
> > or less works by chance for me (processes not getting stuck)
>
> IIRC, we had troubles with starting off RTDM kernel tasks (like RTnet
> uses) when supported_cpus is used. There is no good control over where a
> kernel task comes up, specifically when it started by a built-in driver.
>
> Or are you seeing issues even without RTnet in the picture?

I guess clocktest is not really RTnet. Or would I need a kernel with no traces 
of  RTnet?

>
> cpus_supported is one of those "would be nice but usually broken when
> needed" feature because we do not consistently test it.

I got repeatedly told, locking Linux to some cores and RT to others is a good 
idea.
Last time it was a MMC driver that ran into a timeout as it got preempted ( = 
no IO on the rootfs anymore).
Seems pretty much needed instead of an optional goodie to me.

I can bind Linux to Core0, I can bind my RT task to other cores, but RT drivers 
IRQS and rtnet stack still remains there.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



xenomai.supported_cpus not working as intended?

2020-07-13 Thread Lange Norbert via Xenomai
Hello,

I am using Xenomai 3.1 and I tried once more to tie Linux to Core0, and RT to 
the remaining Cores.
It seems that both Linux and Xenomai favor Core0, as rtnet-stack rtnet-rtpc 
seem to always run on that.
Network drivers will process the IRQs on all cores, but realistally all are 
handles at Core0.

I tries the commandline xenomai.supported_cpus=0xFFFE, but this doesn’t seem to 
work right.

I get stuck Xenomai processes, and entries in dmesg (both with and without the 
isolcpus parameter).
Kernel cmdline is like this: irqaffinity=0 xenomai.smi=disabled isolcpus=1-3 
xenomai.supported_cpus=0xFFFE

Running a Xenomai Process looks like this:

# /usr/xenomai/bin/clocktest
[  191.759690] [Xenomai] thread clocktest[575] switched to non-rt CPU0, aborted.
== Testing built-in CLOCK_REALTIME (0)
CPU  ToD offset [us] ToD drift [us/s]  warps max delta [us]
---   -- --
  0  0.00.000  00.0
  1   -1450249.3  -36.240  00.0
  2   -1450249.8  -36.040  00.0
  3   -1450249.4  -35.292  00.0

And similar stuff happens when booting, Kernel dmesg is following.

So I don’t see how xenomai.supported_cpus can be safely used,
It more or less works by chance for me (processes not getting stuck)

Kind regards,
Norbert

[2.284754] clocksource: tsc: mask: 0x max_cycles: 
0x16f8873b2b5, max_idle_ns: 440795270785 ns
[2.294855] clocksource: Switched to clocksource tsc
[2.307865] [Xenomai] scheduling class idle registered.
[2.313104] [Xenomai] scheduling class tp registered.
[2.318242] [Xenomai] scheduling class rt registered.
[2.323445] I-pipe: head domain Xenomai registered.
[2.330836] [ cut here ]
[2.335675] WARNING: CPU: 0 PID: 1 at kernel/xenomai/timer.c:509 
__xntimer_migrate.cold+0xc/0x14
[2.344470] Modules linked in:
[2.345470] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.19.124-cip27-xeno13-static #2
[2.345470] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.18 04/12/2019
[2.345470] I-pipe domain: Linux
[2.345470] RIP: 0010:__xntimer_migrate.cold+0xc/0x14
[2.345470] Code: 24 e8 ac 2f 5c 00 48 8b 3c 24 48 8d 50 f0 48 85 c0 48 0f 
45 c2 49 89 46 08 e9 42 ff ff ff 48 c7 c7 f0 2a e0 9d e8 bc fd f3 ff <0f> 0b e9 
80 f1 ff ff 90 e8 0b 45 69 00 41 55 49 89 f5 41 54 49 89
[2.345470] RSP: 0018:b6cf0004bd20 EFLAGS: 00010246
[2.345470] RAX: 0024 RBX: a038bba30450 RCX: 019c
[2.345470] RDX:  RSI:  RDI: 
[2.345470] RBP: a038bba30450 R08: 019c R09: 0001
[2.345470] R10: dc8dc5e993c0 R11: 0300 R12: a038bbab0380
[2.345470] R13: b6cf0004be18 R14:  R15: 
[2.345470] FS:  () GS:a038bba0() 
knlGS:
[2.345470] CS:  0010 DS:  ES:  CR0: 80050033
[2.345470] CR2: 0044ecd3 CR3: 00012f608000 CR4: 003406f0
[2.345470] Call Trace:
[2.345470]  __xntimer_init+0x5c/0x210
[2.345470]  xnsched_tp_init+0xa8/0xde
[2.345470]  xnsched_init+0x9d/0x2a0
[2.345470]  ? vsnprintf+0x2d7/0x480
[2.345470]  ? xnheap_set_name+0x63/0x90
[2.345470]  xnsched_init_all+0x2a/0x70
[2.345470]  ? xnclock_init+0x42/0x42
[2.345470]  xenomai_init+0x32e/0x452
[2.345470]  do_one_initcall+0x4c/0x1d6
[2.345470]  kernel_init_freeable+0x12a/0x1b2
[2.345470]  ? rest_init+0x9a/0x9a
[2.345470]  kernel_init+0xa/0xf2
[2.345470]  ret_from_fork+0x20/0x30
[2.345470] ---[ end trace a8efbd6dc99f41b6 ]---
[2.509099] [ cut here ]
[2.513797] WARNING: CPU: 0 PID: 1 at kernel/xenomai/timer.c:509 
__xntimer_migrate.cold+0xc/0x14
[2.519076] Modules linked in:
[2.519076] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
4.19.124-cip27-xeno13-static #2
[2.519076] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.18 04/12/2019
[2.519076] I-pipe domain: Linux
[2.519076] RIP: 0010:__xntimer_migrate.cold+0xc/0x14
[2.519076] Code: 24 e8 ac 2f 5c 00 48 8b 3c 24 48 8d 50 f0 48 85 c0 48 0f 
45 c2 49 89 46 08 e9 42 ff ff ff 48 c7 c7 f0 2a e0 9d e8 bc fd f3 ff <0f> 0b e9 
80 f1 ff ff 90 e8 0b 45 69 00 41 55 49 89 f5 41 54 49 89
[2.519076] RSP: 0018:b6cf0004bcd0 EFLAGS: 00010246
[2.519076] RAX: 0024 RBX: a038bba31970 RCX: 01bc
[2.519076] RDX:  RSI:  RDI: 
[2.519076] RBP: a038bba31970 R08: 01bc R09: 0005
[2.519076] R10:  R11: 9e69960d R12: a038bbab0380
[2.519076] R13: a038bba30700 R14: a038bba

RE: FW: Xenomai with isolcpus and workqueue task

2020-07-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Alexander Frolov 
> Sent: Montag, 13. Juli 2020 13:09
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: FW: Xenomai with isolcpus and workqueue task
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 7/13/20 1:48 PM, Lange Norbert wrote:
> >
> >> -Original Message-
> >> From: Xenomai  On Behalf Of
> Alexander
> >> Frolov via Xenomai
> >> Sent: Montag, 13. Juli 2020 12:27
> >> To: xenomai@xenomai.org
> >> Subject: Re: FW: Xenomai with isolcpus and workqueue task
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> -Original Message-
>  From: Lange Norbert
>  Sent: Montag, 13. Juli 2020 10:34
>  To: Alexander Frolov 
>  Subject: RE: Xenomai with isolcpus and workqueue task
> 
> 
> 
> > -Original Message-
> > From: Xenomai  On Behalf Of
> >> Alexander
> > Frolov via Xenomai
> > Sent: Samstag, 11. Juli 2020 16:26
> > To: xenomai@xenomai.org
> > Subject: Xenomai with isolcpus and workqueue task
> >
> > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>  ATTACHMENTS.
> > Hi all!
> >
> > I am using Xenomai 3.1 with 4.19.124 I-pipe patchon a smp
> motherboard.
> > For my RT task I allocate few CPU cores with isolcpus option.
> > However, large latency spikes are noticed due to igb watchdog
> > activities (I am using common igb driver, not rt_igb).
> >
> > Looking into igb sources, it was understood that workqueue is used
> > for some tasks (afaiu, it is used to link status
> > monitoring)
> >
> > from igb_main.c
> > ...
> >  INIT_WORK(&adapter->reset_task, igb_reset_task);
> >  INIT_WORK(&adapter->watchdog_task, igb_watchdog_task); ...
> >
> > The Linux kernel scheduler runs this igb activities on isolated CPUs
> > disregarding isolcpus option, ruining real-time system behavior.
>  isolcpus does not mean the CPUs aren't used, it means they are
>  excluded from the normal CPU scheduler. No process will automatically
>  be moved from/to isolated CPUs, but you still need to make sure to
>  free them of any tasks.
>  Irq-handlers still run anywhere, and processes still can allow those
>  CPUs to be used.
> 
> > So the question, is it a correct way to use normal igb on Xenomai at
> > all or it is not recommended? What can be done to prohibit Linux
> > scheduler to allocate those tasks on isolated cores?
>  I use the normal igb and rt_igb concurrently, I doubt it is
>  recommended but possible ;)
> 
>  You should add irqaffinity=0 to the cmdline (CPU0 is apparently
>  always used for irqs), then check 'cat /proc/irq/*/smp_affinity'.
>  This keeps the other CPUs free from linux IRQs.
>  You can use some measures to bind Linux tasks to CPU0 aswell. One of:
> 
>  -   isolcpus (sets default affinity mask aswell)
>  -   set affinity early (like in Ramdisk)
>  -   Use cgroups (cset-shield)
> 
>  Only cgroups actually prohibit processes ignoring your defaults and
>  using other CPUs, I did not get around playing with this, and just use
> >> isolcpus.
>  But the most important part is to dont run RT on cores dealing with
>  Linux interrupts, some handlers/drivers don’t expects being
>  preempted, had the MMC driver bail because of a timeout.
> 
>  I haven’t solved moving the rtnet-stack, rtnet-rpc off CPU0, and the
>  rt_igb IRQs will use all CPUs.
> 
>  Norbert
> >> Thank you! Using an IRQ affinity feature to move handlers to specified
> cores
> >> is very practical, but in this case we experience problem with another
> >> artefakt of igb.
> >>
> >> Just as example of influence of igb activity (igb_watchdog_task) on CPU4
> >> (which is an isolated one).
> > Hmm, I dont have that task.
> > You can use taskset -p 0  to change affinity,
> > If not pretty but should work.
> >
> > (isolcpus doesn’t work the way you think, it only affects CPU migration)
>
>   Not sure, that I can find out , cause it can be
> kworker
>   thread which takes this task to execution. I can try to move all kworkers to
> general-
>   purposed cores, which looks a bit crazy.

Ok, missed the general workqueue part. No, moving kworkers is likely dangerous.
Still I wonder why its picking core 4, maybe binding irqs to core #0 will
Indirectly affect that.

>
> >
> >> # cat /proc/ipipe/trace/frozen | grep '\!'
> >> ...
> >> :  +func   -1231!  52.379  igb_rd32+0x0 [igb]
> >> (igb_update_stats+0x520 [igb])
> >> :  +func   -1145!  45.864  igb_rd32+0x0 [igb]
> >> (igb_update_stats+0x536 [igb])
> >> :  +func   -1099!  51.917  igb_rd32+0x0 [igb]
> >> (igb_update_stats+0x75a [igb])
> >> :  +func   -1047!  51.517  igb_rd32+0x0 [igb]
> >> (igb_update_stats+0x782 [igb])
> >> :  +func

RE: FW: Xenomai with isolcpus and workqueue task

2020-07-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of Alexander
> Frolov via Xenomai
> Sent: Montag, 13. Juli 2020 12:27
> To: xenomai@xenomai.org
> Subject: Re: FW: Xenomai with isolcpus and workqueue task
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> -Original Message-
> >> From: Lange Norbert
> >> Sent: Montag, 13. Juli 2020 10:34
> >> To: Alexander Frolov 
> >> Subject: RE: Xenomai with isolcpus and workqueue task
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: Xenomai  On Behalf Of
> Alexander
> >>> Frolov via Xenomai
> >>> Sent: Samstag, 11. Juli 2020 16:26
> >>> To: xenomai@xenomai.org
> >>> Subject: Xenomai with isolcpus and workqueue task
> >>>
> >>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>>
> >>> Hi all!
> >>>
> >>> I am using Xenomai 3.1 with 4.19.124 I-pipe patchon a smp motherboard.
> >>> For my RT task I allocate few CPU cores with isolcpus option.
> >>> However, large latency spikes are noticed due to igb watchdog
> >>> activities (I am using common igb driver, not rt_igb).
> >>>
> >>> Looking into igb sources, it was understood that workqueue is used
> >>> for some tasks (afaiu, it is used to link status
> >>> monitoring)
> >>>
> >>> from igb_main.c
> >>> ...
> >>> INIT_WORK(&adapter->reset_task, igb_reset_task);
> >>> INIT_WORK(&adapter->watchdog_task, igb_watchdog_task); ...
> >>>
> >>> The Linux kernel scheduler runs this igb activities on isolated CPUs
> >>> disregarding isolcpus option, ruining real-time system behavior.
> >> isolcpus does not mean the CPUs aren't used, it means they are
> >> excluded from the normal CPU scheduler. No process will automatically
> >> be moved from/to isolated CPUs, but you still need to make sure to
> >> free them of any tasks.
> >> Irq-handlers still run anywhere, and processes still can allow those
> >> CPUs to be used.
> >>
> >>> So the question, is it a correct way to use normal igb on Xenomai at
> >>> all or it is not recommended? What can be done to prohibit Linux
> >>> scheduler to allocate those tasks on isolated cores?
> >> I use the normal igb and rt_igb concurrently, I doubt it is
> >> recommended but possible ;)
> >>
> >> You should add irqaffinity=0 to the cmdline (CPU0 is apparently
> >> always used for irqs), then check 'cat /proc/irq/*/smp_affinity'.
> >> This keeps the other CPUs free from linux IRQs.
> >> You can use some measures to bind Linux tasks to CPU0 aswell. One of:
> >>
> >> -   isolcpus (sets default affinity mask aswell)
> >> -   set affinity early (like in Ramdisk)
> >> -   Use cgroups (cset-shield)
> >>
> >> Only cgroups actually prohibit processes ignoring your defaults and
> >> using other CPUs, I did not get around playing with this, and just use
> isolcpus.
> >>
> >> But the most important part is to dont run RT on cores dealing with
> >> Linux interrupts, some handlers/drivers don’t expects being
> >> preempted, had the MMC driver bail because of a timeout.
> >>
> >> I haven’t solved moving the rtnet-stack, rtnet-rpc off CPU0, and the
> >> rt_igb IRQs will use all CPUs.
> >>
> >> Norbert
> Thank you! Using an IRQ affinity feature to move handlers to specified cores
> is very practical, but in this case we experience problem with another
> artefakt of igb.
>
> Just as example of influence of igb activity (igb_watchdog_task) on CPU4
> (which is an isolated one).

Hmm, I dont have that task.
You can use taskset -p 0  to change affinity,
If not pretty but should work.

(isolcpus doesn’t work the way you think, it only affects CPU migration)

>
> # cat /proc/ipipe/trace/frozen | grep '\!'
> ...
> :  +func   -1231!  52.379  igb_rd32+0x0 [igb]
> (igb_update_stats+0x520 [igb])
> :  +func   -1145!  45.864  igb_rd32+0x0 [igb]
> (igb_update_stats+0x536 [igb])
> :  +func   -1099!  51.917  igb_rd32+0x0 [igb]
> (igb_update_stats+0x75a [igb])
> :  +func   -1047!  51.517  igb_rd32+0x0 [igb]
> (igb_update_stats+0x782 [igb])
> :  +func-996!  51.988  igb_rd32+0x0 [igb]
> (igb_update_stats+0x54e [igb])
> :  +func-944!  51.436  igb_rd32+0x0 [igb]
> (igb_update_stats+0x564 [igb])
> :  +func-893!  52.569  igb_rd32+0x0 [igb]
> (igb_update_stats+0x57a [igb])
> :  +func-840!  52.529  igb_rd32+0x0 [igb]
> (igb_update_stats+0x590 [igb])
> :  +func-787!  52.018  igb_rd32+0x0 [igb]
> (igb_update_stats+0x5a6 [igb])
> :  +func-735!  52.058  igb_rd32+0x0 [igb]
> (igb_update_stats+0x5bc [igb])
> :  +func-683!  51.497  igb_rd32+0x0 [igb]
> (igb_update_stats+0x5d2 [igb])
> :  +func-632!  51.436  igb_rd32+0x0 [igb]
> (igb_update_stats+0x5e8 [igb])
> :  +func-580!  51.416  igb_rd32+0x0 [igb]
> (igb_update_stats+0x5fe [igb])
> :  +func-529!  52.038  igb_rd32+0x0 [igb]
> (igb_update_stats+0x614 [igb])
> :  +func-477!  52.058  igb_rd3

FW: Xenomai with isolcpus and workqueue task

2020-07-13 Thread Lange Norbert via Xenomai
(fwd to list)

> -Original Message-
> From: Lange Norbert
> Sent: Montag, 13. Juli 2020 10:34
> To: Alexander Frolov 
> Subject: RE: Xenomai with isolcpus and workqueue task
>
>
>
> > -Original Message-
> > From: Xenomai  On Behalf Of Alexander
> > Frolov via Xenomai
> > Sent: Samstag, 11. Juli 2020 16:26
> > To: xenomai@xenomai.org
> > Subject: Xenomai with isolcpus and workqueue task
> >
> > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >
> >
> > Hi all!
> >
> > I am using Xenomai 3.1 with 4.19.124 I-pipe patchon a smp motherboard.
> > For my RT task I allocate few CPU cores with isolcpus option. However,
> > large latency spikes are noticed due to igb watchdog activities (I am
> > using common igb driver, not rt_igb).
> >
> > Looking into igb sources, it was understood that workqueue is used for
> > some tasks (afaiu, it is used to link status
> > monitoring)
> >
> > from igb_main.c
> > ...
> >INIT_WORK(&adapter->reset_task, igb_reset_task);
> >INIT_WORK(&adapter->watchdog_task, igb_watchdog_task); ...
> >
> > The Linux kernel scheduler runs this igb activities on isolated CPUs
> > disregarding isolcpus option, ruining real-time system behavior.
>
> isolcpus does not mean the CPUs aren't used, it means they are excluded
> from the normal CPU scheduler. No process will automatically be moved
> from/to isolated CPUs, but you still need to make sure to free them of any
> tasks.
> Irq-handlers still run anywhere, and processes still can allow those CPUs to
> be used.
>
> > So the question, is it a correct way to use normal igb on Xenomai at
> > all or it is not recommended? What can be done to prohibit Linux
> > scheduler to allocate those tasks on isolated cores?
>
> I use the normal igb and rt_igb concurrently, I doubt it is recommended but
> possible ;)
>
> You should add irqaffinity=0 to the cmdline (CPU0 is apparently always used
> for irqs), then check 'cat /proc/irq/*/smp_affinity'. This keeps the other
> CPUs free from linux IRQs.
> You can use some measures to bind Linux tasks to CPU0 aswell. One of:
>
> -   isolcpus (sets default affinity mask aswell)
> -   set affinity early (like in Ramdisk)
> -   Use cgroups (cset-shield)
>
> Only cgroups actually prohibit processes ignoring your defaults and using
> other CPUs, I did not get around playing with this, and just use isolcpus.
>
> But the most important part is to dont run RT on cores dealing with Linux
> interrupts, some handlers/drivers don’t expects being preempted, had the
> MMC driver bail because of a timeout.
>
> I haven’t solved moving the rtnet-stack, rtnet-rpc off CPU0, and the rt_igb
> IRQs will use all CPUs.
>
> Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



What is the purpose of corectl

2020-07-10 Thread Lange Norbert via Xenomai
Hello,

I just found the corectl tool (and the related syscall). After a stop, all 
threads previously using cobalt services end up zombified,
apparently never getting reaped. (Only useful thing is to reboot then IMHO).

Maybe if this tool could shutdown everything cleanly, I would run this before 
unloading modules like rtnet as safety measure (unloading rtnet will just block 
if its used).


Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: TSN support on xenomai

2020-06-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of Jan Kiszka
> via Xenomai
> Sent: Dienstag, 16. Juni 2020 10:58
> To: Stéphane Ancelot ; xenomai@xenomai.org
> Subject: Re: TSN support on xenomai
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.06.20 10:55, Stéphane Ancelot via Xenomai wrote:
> > Hi,
> >
> > This is natively supported in standard linux kernel
> >
>
> Right. What is missing it linking the Xenomai timebase with that of the kernel
> so that we would benefit from Linux doing the sync for us already.

But that means you can't use the synchronizing NIC with RTNet / Realtime 
traffic.

Regarding linking timebases, it would help (for various things)
If you could easily read those various timebases from the kernel. The rest can 
happen in
userspace.

I actually have 3: Linux Monotonic, Xenomai Monotonic, and a PPS synchronized 
IEEE1588 clock,
Cant fold them into one but sometimes need to map between these.
Solved this by letting the NIC read out all 3 and providing them to (Xenomai) 
userspace.

Better would be a shared mmap (with 2 or more buffers), so that Linux tasks can 
asynchronously get that info too.

Norbert



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Still getting Deadlocks with condition variables

2020-06-15 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Philippe Gerum 
> Sent: Montag, 15. Juni 2020 12:03
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) ;
> 'jan.kis...@siemens.com' 
> Subject: Re: Still getting Deadlocks with condition variables
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 6/15/20 11:06 AM, Lange Norbert wrote:
> >>
> >> This code does not take away any protection, on the contrary this
> >> ensures that PROT_EXEC is set for all stacks along with read and
> >> write access, which is glibc's default for the x86_64 architecture.
> >
> > I meant that it might have to do some non-atomic procedure, for
> > example when splitting up a continuous bigger mapping with the stack
> > in the middle, as the protection flags are now different.
> >
>
> We are talking about mprotect(), not mmap().

My bad, yes.

>
> >> The fault is likely due to mm code fixing up those protections for
> >> the relevant page(s). It looks like such pages are force faulted-in,
> >> which would explain the #PF, and the SIGXCPU notification as a
> >> consequence. These are minor faults in the MMU management sense, so
> >> this is transparent for common applications.
> >
> > I don’t know enough about the x86 (and don’t want to know), but this
> > needs some explanation. First, the DSOs don’t need executable stack
> > (build system did not care to add the .note.GNU-stack everywhere ), so
> > this specific issue can be worked around.
> >
> > -   I don’t understand why this is very timing sensitive, If page is marked 
> > to
> #PF (or removed)
> > Then this should fault predictable on the next access (I don’t
> > share data on stack that Linux threads could run into an #PF instead)
>
> Faulting is what it does. It is predictable and synchronous, you seem to be
> assuming that the fault is somehow async or proxied, it is not.

It does happen rather sparsely, affected by small changes in unrelated code,
Loading a DSO (which requires executable stack) might trigger a #PF or not.
Nothing but the respective RT-Thread is accessing it's *private* stack, the 
pagefault
Happens at a callq.

If that stack accesss is to cause a #PF, then the code would run into it 
*every* time.
Yet it does not, its rather really hard to get this to reproduce.

> > -   If that’s a non-atomic operation (perhaps only if the sparse tables need
> modification in a higher level), then I would expect
> > some sort of lazy locking (RCU?). Is this ending up in chaos as cores
> running Xenomai are "idle" for Linux, and pick up outdated data?
>
> I have no idea why you would bring up RCU in this picture, there is no
> convoluted aspect in what happens. There is no chaos, only a plain simple
> #PF event which unfortunately occurs as a result of running an apparently
> innocuous regular operation which is loading a DSO. The reason for the #PF
> can be explained, how it is dealt with is fine, the rt loop in your app just 
> does
> not like observing it for a legitimate reason.

I am asking a question, I assume pagetables need to be reallocated under
some circumstances. I understand a #PF *has to happen* according to your 
explanation,
and I don’t know *where it happens if I don’t observe the RT task switching*.

The faults are very timing sensitive.

So I could imagine the multilevel page-table map looking like this (dunt know 
how many levels x86 is using nowadays, but that’s beside the point):
[first level] -> [second level] -> [stack mapping]

if the mprotect syscall changes just the *private* stack mapping, then the RT 
thread will always fault - (not what I observe).
if the syscall modifies lower levels, then any thread can hit the PF, and if 
it's not a RT thread then no #PF will be observed (by WARNSW).

This is my conjecture, if that’s true, then the question is modified to:
-   under what circumstances can this appear?

>
> > -   Are/can such minor faults be handled in Xenomai? In other words is the
> WARNSW correct, or is
> > this actually just the check causing the issues?
> > Would it make sense to handle such minor faults in Xenomai (only
> demoting to Linux if necessary)?
> >
> >> Not for those of us who do not want the application code to run into
> >> any page fault unfortunately.
> >>
> >> Loading DSOs while the real-time system is running just proved to be
> >> a bad idea it seems (did not check how other *libc implementations
> >> behave on
> >> dlopen() though).
> >
> > glibc dlopens files on its own BTW, for nss plugins and encodings.
> > Practically than means you would need to check everything running (in
> > non-rt threads), for dlopen and various calls that could resolve names to
> uid/gid, do dns lookups, use iconv etc.
> >
>
> The glibc is fortunately not dlopening DSOs at every corner. You mention
> very specific features that would have to take place during the app init
> chores instead, or at the very least in a way which is synchronized with a
> quiescent state of the rt portion of the process.

RE: Still getting Deadlocks with condition variables

2020-06-15 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Philippe Gerum 
> Sent: Mittwoch, 10. Juni 2020 18:48
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) ;
> 'jan.kis...@siemens.com' 
> Subject: Re: Still getting Deadlocks with condition variables
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 6/9/20 7:10 PM, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Philippe Gerum 
> >> Sent: Montag, 8. Juni 2020 16:17
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: Still getting Deadlocks with condition variables
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 6/8/20 12:08 PM, Lange Norbert wrote:
> >>>
>  This kernel message tells a different story, thread pid 681 received
>  a #PF, maybe due to accessing its own stack (cond.c, line 316). This
>  may be a minor fault though, nothing invalid. Such fault is not
>  supposed to occur for Xenomai threads on x86, but that would be
>  another issue. Code-wise, I'm referring to the current state of the
>  master branch for lib/cobalt/cond.c, which seems to match your
> >> description.
> >>>
> >>> I dont know what you mean with minor fault, from the perspective of
> >> Linux?
> >>> A RT thread getting demoted to Linux is rather serious to me.
> >>>
> >>
> >> Minor from a MMU standpoint: the memory the CPU  dereferenced is
> valid
> >> but no page table entry currently maps it. So, #PF in this case seems to be
> a
> >> 'minor'
> >> fault in MMU lingo, but it is still not expected.
> >>
> >>> Also, the thing is that I would not know how a PF in the long running
> >>> thread, with locked memory, With the call being close to the thread
> >>> entry point in a wait-for-condvar-loop, never using more than an
> >>> insignificant> amount of stack at this time should be possible.>
> >>
> >> Except if mapping an executable segment via dlopen() comes into play,
> >> affecting the page table. Only an assumption at this stage.
> >>
> >>> On the other hand, the non-RT thread loads a DSO and is stuck
> somewhere
> >> after allocating memory.
> >>> My guess would be that the PF ends up at the wrong thread.
> >>>
> >>
> >> As Jan pointed out, #PF are synchronously taken, synchronously handled.
> I
> >> really don't see how #PF handling could ever wander.
> >>
> >
> 
>  You refer to an older post describing a lockup, but this post
>  describes an application crashing with a core dump. What made you
>  draw the conclusion that the same bug would be at work?
> >>>
> >>> Same bug, different PTHREAD_WARNSW setting is my guess.
> >>> The underlying issue that a unrelated signal ends up to a RT thread.
> >>>
>  Also, could you give some details
>  regarding the
>  following:
> 
>  - what do you mean by 'lockup' in this case? Can you still access the
>  board or is there some runaway real-time code locking out everything
>  else when this happens? My understanding is that this is no hard lock
>  up otherwise the watchdog would have triggered. If this is a softer
>  kind of lockup instead, what does /proc/xenomai/sched/stat tell you
>  about the thread states after the problem occurred?
> >>>
> >>> This was a post-mortem, no access to /proc/xenomai/sched/stat
> anymore.
> >>> lockup means deadlock (the thread getting the signal holds a mutex,
> >>> but is stuck), Coredump happens if PTHREAD_WARNSW is enabled
> (means
> >> it asserts out before).
> >>>
>  - did you determine that using the dynamic linker is required to
>  trigger the bug yet? Or could you observe it without such interaction
> with
> >> dl?
> >>>
> >>> AFAIK, always occurred at the stage where we load a "configuration",
> and
> >> load DSOs.
> >>>
> 
>  - what is the typical size of your Xenomai thread stack? It defaults
>  to 64k min with Xenomai 3.1.
> >>>
> >>> 1MB
> >>
> >> I would dig the following distinct issues:
> >>
> >> - why is #PF taken on an apparently innocuous instruction. dlopen(3)-
> >>> mmap(2) might be involved. With a simple test case, you could check the
> >> impact of loading/unloading DSOs on memory management for real-time
> >> threads running in parallel. Setting the WARNSW bit on for these threads
> >> would be required.
> >>
> >> - whether dealing with a signal adversely affects the wait-side of a
> Xenomai
> >> condvar. There is a specific trick to handle this in the Cobalt and 
> >> libcobalt
> >> code, which is the reason for the wait_prologue / wait_epilogue dance in
> the
> >> implementation IIRC. Understanding why that thread receives a signal in
> the
> >> first place would help too. According to your description, this may not be
> >> directly due to taking #PF, but may be an indirect consequence of that
> event
> >> on sibling threads (propagation of a debug condition of some sort, such as
> >> those detected by CONFIG_XENO_OPT_DEBUG_MUTEX*).
> >>
> >> At any rate, y

RE: Still getting Deadlocks with condition variables

2020-06-09 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Philippe Gerum 
> Sent: Montag, 8. Juni 2020 16:17
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Still getting Deadlocks with condition variables
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 6/8/20 12:08 PM, Lange Norbert wrote:
> >
> >> This kernel message tells a different story, thread pid 681 received
> >> a #PF, maybe due to accessing its own stack (cond.c, line 316). This
> >> may be a minor fault though, nothing invalid. Such fault is not
> >> supposed to occur for Xenomai threads on x86, but that would be
> >> another issue. Code-wise, I'm referring to the current state of the
> >> master branch for lib/cobalt/cond.c, which seems to match your
> description.
> >
> > I dont know what you mean with minor fault, from the perspective of
> Linux?
> > A RT thread getting demoted to Linux is rather serious to me.
> >
>
> Minor from a MMU standpoint: the memory the CPU  dereferenced is valid
> but no page table entry currently maps it. So, #PF in this case seems to be a
> 'minor'
> fault in MMU lingo, but it is still not expected.
>
> > Also, the thing is that I would not know how a PF in the long running
> > thread, with locked memory, With the call being close to the thread
> > entry point in a wait-for-condvar-loop, never using more than an
> > insignificant> amount of stack at this time should be possible.>
>
> Except if mapping an executable segment via dlopen() comes into play,
> affecting the page table. Only an assumption at this stage.
>
> > On the other hand, the non-RT thread loads a DSO and is stuck somewhere
> after allocating memory.
> > My guess would be that the PF ends up at the wrong thread.
> >
>
> As Jan pointed out, #PF are synchronously taken, synchronously handled. I
> really don't see how #PF handling could ever wander.
>
> >>>
> >>
> >> You refer to an older post describing a lockup, but this post
> >> describes an application crashing with a core dump. What made you
> >> draw the conclusion that the same bug would be at work?
> >
> > Same bug, different PTHREAD_WARNSW setting is my guess.
> > The underlying issue that a unrelated signal ends up to a RT thread.
> >
> >> Also, could you give some details
> >> regarding the
> >> following:
> >>
> >> - what do you mean by 'lockup' in this case? Can you still access the
> >> board or is there some runaway real-time code locking out everything
> >> else when this happens? My understanding is that this is no hard lock
> >> up otherwise the watchdog would have triggered. If this is a softer
> >> kind of lockup instead, what does /proc/xenomai/sched/stat tell you
> >> about the thread states after the problem occurred?
> >
> > This was a post-mortem, no access to /proc/xenomai/sched/stat anymore.
> > lockup means deadlock (the thread getting the signal holds a mutex,
> > but is stuck), Coredump happens if PTHREAD_WARNSW is enabled (means
> it asserts out before).
> >
> >> - did you determine that using the dynamic linker is required to
> >> trigger the bug yet? Or could you observe it without such interaction with
> dl?
> >
> > AFAIK, always occurred at the stage where we load a "configuration", and
> load DSOs.
> >
> >>
> >> - what is the typical size of your Xenomai thread stack? It defaults
> >> to 64k min with Xenomai 3.1.
> >
> > 1MB
>
> I would dig the following distinct issues:
>
> - why is #PF taken on an apparently innocuous instruction. dlopen(3)-
> >mmap(2) might be involved. With a simple test case, you could check the
> impact of loading/unloading DSOs on memory management for real-time
> threads running in parallel. Setting the WARNSW bit on for these threads
> would be required.
>
> - whether dealing with a signal adversely affects the wait-side of a Xenomai
> condvar. There is a specific trick to handle this in the Cobalt and libcobalt
> code, which is the reason for the wait_prologue / wait_epilogue dance in the
> implementation IIRC. Understanding why that thread receives a signal in the
> first place would help too. According to your description, this may not be
> directly due to taking #PF, but may be an indirect consequence of that event
> on sibling threads (propagation of a debug condition of some sort, such as
> those detected by CONFIG_XENO_OPT_DEBUG_MUTEX*).
>
> At any rate, you may want to enable the function ftracer, enabling
> conditional snapshots, e.g. when SIGXCPU is sent by the cobalt core.
> Guesswork with such bug is unlikely to uncover every aspect of the issue,
> hard data would be required to go to the bottom of it. With a bit of luck, 
> that
> bug is not time-sensitive in a way that the overhead due to ftracing would
> paper over it.
>
> --
> Philippe.

This aint exactly easy to reproduce, managed to get something that often 
reproduces just now.
Tracing however hides the issue, as well as disabling PTHREAD_WARNSW (could be 
just that this changes timing enough to make a difference).


I got a few ins

RE: Still getting Deadlocks with condition variables

2020-06-08 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 8. Juni 2020 12:09
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Still getting Deadlocks with condition variables
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 08.06.20 11:48, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Freitag, 5. Juni 2020 17:40
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: Still getting Deadlocks with condition variables
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 05.06.20 16:36, Lange Norbert via Xenomai wrote:
> >>> Hello,
> >>>
> >>> I brought this up once or twice at this ML [1], I am still getting
> >>> some occasional lockups. Now the first time without running under an
> >>> debugger,
> >>>
> >>> Harwdare is a TQMxE39M (Goldmont Atom)
> >>> Kernel: 4.19.124-cip27-xeno12-static x86_64 I-pipe Version: 12 Xenomai
> >>> Version: 3.1 Glibc Version 2.28
> >>>
> >>> What happens (as far as I understand it):
> >>>
> >>> The setup is an project with several cobalt threads (no "native" Linux
> >> thread as far as I can tell, apart maybe from the cobalt's printf thread).
> >>> They mostly sleep, and are triggered if work is available, the project
> >>> also can load DSOs (specialized maths) during configuration stage -
> >>> during this stages is when the exceptions occur
> >>>
> >>>
> >>> 1.   Linux Thread LWP 682 calls SYS_futex "wake"
> >>>
> >>> Code immediately before syscall, file x86_64/lowlevellock.S:
> >>> movl$0, (%rdi)
> >>> LOAD_FUTEX_WAKE (%esi)
> >>> movl$1, %edx/* Wake one thread.  */
> >>> movl$SYS_futex, %eax
> >>> syscall
> >>>
> >>> 2. Xenomai switches a cobalt thread to secondary, potentially because all
> >> threads are in primary:
> >>>
> >>> Jun 05 12:35:19 buildroot kernel: [Xenomai] switching dispatcher to
> >>> secondary mode after exception #14 from user-space at 0x7fd731299115
> >>> (pid 681)
> >>
> >> #14 mean page fault, fixable or real. What is at that address? What
> address
> >> was accessed by that instruction?
> >>
> >>>
> >>> Note that most threads are stuck waiting for a condvar in
> >> sc_cobalt_cond_wait_prologue (cond.c:313), LWP 681 is at the next
> >> instruction.
> >>>
> >>
> >> Stuck at what? Waiting for the condvar itsself or getting the enclosing
> mutex
> >> again? What are the states of the involved synchonization objects?
> >
> > All mutexes are free. There is one task (Thread 2) pulling the mutexes for
> the duration of signaling the condvars,
> > this task should never block outside of a sleep function giving it a 1ms 
> > cycle.
> > No deadlock is possible.
> >
> > What happens is that for some weird reason, Thread 1 got a sporadic
> wakeup (handling a PF fault from another thread?),
>
> PFs are synchronous, not proxied.
>
> As Philippe also pointed out, understanding that PF is the first step.
> Afterwards, we may look into the secondary issue, if there is still one,
> and that would be be behavior around the condvars after that PF.

As I told Phillipe, there is no way I can image the thread holding the condvar 
running into a PF.
This code is run several thousand times before the issue happens, it's 
basically the outer loop that just waits for work,
Stack is plenty (1M).


The dlopen call triggers it, it's always stuck in the same position.

Thread 9 (LWP 682):
#0  __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:339
#1  0x7fd731275d65 in __pthread_mutex_unlock_usercnt (mutex=0x7fd7312f6968 
<_rtld_local+2312>, decr=1) at pthread_mutex_unlock.c:54
#2  0x7fd7312e0442 in _dl_open (file=, mode=-2147483647, 
caller_dlopen=0x460864 , nsid=-2, argc=7, 
argv=0x7fd728680f90, env=0x7ffc1e972b88) at dl-open.c:627
#3  0x7fd7312c72ac in dlopen_doit (a=a@entry=0x7fd7286811d0) at dlopen.c:66
#4  0x7fd73104211f in __GI__dl_catch_exception 
(exception=exception@entry=0x7fd728681170, operate=operate@entry=0x7fd7312c7250 
, args=args@entry=0x7fd7286811d0) at dl-error-skeleton.c:196
#5  0x7fd731042190 in __GI__dl_catch_error 
(objname=objname@entry=0x7fd72005c010, 
errstring=errstri

RE: Still getting Deadlocks with condition variables

2020-06-08 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Philippe Gerum 
> Sent: Sonntag, 7. Juni 2020 22:16
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Still getting Deadlocks with condition variables
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 6/5/20 4:36 PM, Lange Norbert wrote:
> > Hello,
> >
> > I brought this up once or twice at this ML [1], I am still getting
> > some occasional lockups. Now the first time without running under an
> > debugger,
> >
> > Harwdare is a TQMxE39M (Goldmont Atom)
> > Kernel: 4.19.124-cip27-xeno12-static x86_64 I-pipe Version: 12 Xenomai
> > Version: 3.1 Glibc Version 2.28
> >
> > What happens (as far as I understand it):
> >
> > The setup is an project with several cobalt threads (no "native" Linux
> thread as far as I can tell, apart maybe from the cobalt's printf thread).
> > They mostly sleep, and are triggered if work is available, the project
> > also can load DSOs (specialized maths) during configuration stage -
> > during this stages is when the exceptions occur
> >
> >
> > 1.   Linux Thread LWP 682 calls SYS_futex "wake"
> >
> > Code immediately before syscall, file x86_64/lowlevellock.S:
> > movl$0, (%rdi)
> > LOAD_FUTEX_WAKE (%esi)
> > movl$1, %edx/* Wake one thread.  */
> > movl$SYS_futex, %eax
> > syscall
> >
> > 2. Xenomai switches a cobalt thread to secondary, potentially because all
> threads are in primary:
> >
> > Jun 05 12:35:19 buildroot kernel: [Xenomai] switching dispatcher to
> > secondary mode after exception #14 from user-space at 0x7fd731299115
> > (pid 681)
> >
>
> This kernel message tells a different story, thread pid 681 received a #PF,
> maybe due to accessing its own stack (cond.c, line 316). This may be a minor
> fault though, nothing invalid. Such fault is not supposed to occur for Xenomai
> threads on x86, but that would be another issue. Code-wise, I'm referring to
> the current state of the master branch for lib/cobalt/cond.c, which seems to
> match your description.

I dont know what you mean with minor fault, from the perspective of Linux?
A RT thread getting demoted to Linux is rather serious to me.

Also, the thing is that I would not know how a PF in the long running thread, 
with locked memory,
With the call being close to the thread entry point in a wait-for-condvar-loop, 
never using more than an insignificant
amount of stack at this time should be possible.

On the other hand, the non-RT thread loads a DSO and is stuck somewhere after 
allocating memory.
My guess would be that the PF ends up at the wrong thread.

Note that both tasks are locked to the same CPU core.

>
> > Note that most threads are stuck waiting for a condvar in
> sc_cobalt_cond_wait_prologue (cond.c:313), LWP 681 is at the next
> instruction.
> >
> > 3. Xenomai gets XCPU signal -> coredump
> >
>
> More precisely, Xenomai is likely sending this signal to your application, 
> since
> it had to switch pid 681 to secondary mode for fixing up the #PF event.
> You may have set PTHREAD_WARNSW with pthread_setmode_np() for that
> thread.

Yes, I use PTHREAD_WARNSW, if I did not, then chances are that the code would 
run
to the sc_cobalt_cond_wait_epilogue, never freeing the mutex and the other 
thread trying to send a
signal would never be able to acquire the mutex.
Ie. identical to my previous reports (where PTHREAD_WARNSW was disabled)

>
> > gdb) thread apply all bt 3
> >
> > Thread 9 (LWP 682):
> > #0  __lll_unlock_wake () at
> > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:339
> > #1  0x7fd731275d65 in __pthread_mutex_unlock_usercnt
> > (mutex=0x7fd7312f6968 <_rtld_global+2312>, decr=1) at
> > pthread_mutex_unlock.c:54
> > #2  0x7fd7312e0442 in ?? () from
> > /home/lano/Downloads/bugcrash/lib64/ld-linux-x86-64.so.2
> > #3  0x7fd7312c72ac in ?? () from /lib/libdl.so.2
> > #4  0x7fd73104211f in _dl_catch_exception () from /lib/libc.so.6
> > #5  0x7fd731042190 in _dl_catch_error () from /lib/libc.so.6
> > #6  0x7fd7312c7975 in ?? () from /lib/libdl.so.2
> > #7  0x7fd7312c7327 in dlopen () from /lib/libdl.so.2 (More stack
> > frames follow...)
> >
> > Thread 8 (LWP 686):
> > #0  0x7fd731298d48 in __cobalt_clock_nanosleep (clock_id=0,
> > flags=0, rqtp=0x7fd727e3ad10, rmtp=0x0) at
> > /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/clock.c:312
> > #1  0x7fd731298d81 in __cobalt_nanosleep (rqtp=,
> > rmtp=) at
> > /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/clock.c:354
> > #2  0x00434590 in operator() (__closure=0x7fd720006fb8) at
> > ../../acpu.runner/asim/asim_com.cpp:685
> > (More stack frames follow...)
> >
> > Thread 7 (LWP 677):
> > #0  0x7fd73127b6c6 in __GI___nanosleep
> > (requested_time=requested_time@entry=0x7fd7312b1fb0 ,
> > remaining=remaining@entry=0x0) at
> > ../sysdeps/unix/sysv/linux/nanosleep.c:28
> > #1  0x7fd73129b746 in printer_loop (arg=) at
> > /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/printf.c:635
> > #2  0x7fd7312720f7 in start_thread (arg=) at

RE: Still getting Deadlocks with condition variables

2020-06-08 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 5. Juni 2020 17:40
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Still getting Deadlocks with condition variables
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 05.06.20 16:36, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I brought this up once or twice at this ML [1], I am still getting
> > some occasional lockups. Now the first time without running under an
> > debugger,
> >
> > Harwdare is a TQMxE39M (Goldmont Atom)
> > Kernel: 4.19.124-cip27-xeno12-static x86_64 I-pipe Version: 12 Xenomai
> > Version: 3.1 Glibc Version 2.28
> >
> > What happens (as far as I understand it):
> >
> > The setup is an project with several cobalt threads (no "native" Linux
> thread as far as I can tell, apart maybe from the cobalt's printf thread).
> > They mostly sleep, and are triggered if work is available, the project
> > also can load DSOs (specialized maths) during configuration stage -
> > during this stages is when the exceptions occur
> >
> >
> > 1.   Linux Thread LWP 682 calls SYS_futex "wake"
> >
> > Code immediately before syscall, file x86_64/lowlevellock.S:
> > movl$0, (%rdi)
> > LOAD_FUTEX_WAKE (%esi)
> > movl$1, %edx/* Wake one thread.  */
> > movl$SYS_futex, %eax
> > syscall
> >
> > 2. Xenomai switches a cobalt thread to secondary, potentially because all
> threads are in primary:
> >
> > Jun 05 12:35:19 buildroot kernel: [Xenomai] switching dispatcher to
> > secondary mode after exception #14 from user-space at 0x7fd731299115
> > (pid 681)
>
> #14 mean page fault, fixable or real. What is at that address? What address
> was accessed by that instruction?
>
> >
> > Note that most threads are stuck waiting for a condvar in
> sc_cobalt_cond_wait_prologue (cond.c:313), LWP 681 is at the next
> instruction.
> >
>
> Stuck at what? Waiting for the condvar itsself or getting the enclosing mutex
> again? What are the states of the involved synchonization objects?

All mutexes are free. There is one task (Thread 2) pulling the mutexes for the 
duration of signaling the condvars,
this task should never block outside of a sleep function giving it a 1ms cycle.
No deadlock is possible.

What happens is that for some weird reason, Thread 1 got a sporadic wakeup 
(handling a PF fault from another thread?),
Acquires the mutex and then either is getting demoted to Linux and cause a XCPU 
signal (if that check is enabled),
or stuck at sc_cobalt_cond_wait_epilogue infinitely.

Then Thread 2 will logically be stuck at re-aquiring the mutex.

I have an alternaivte implementation using Semaphores instead of condvars, I 
think I have never seen this issue crop up there.

>
> Jan
>
> > 3. Xenomai gets XCPU signal -> coredump
> >
> > gdb) thread apply all bt 3
> >
> > Thread 9 (LWP 682):
> > #0  __lll_unlock_wake () at
> > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:339
> > #1  0x7fd731275d65 in __pthread_mutex_unlock_usercnt
> > (mutex=0x7fd7312f6968 <_rtld_global+2312>, decr=1) at
> > pthread_mutex_unlock.c:54
> > #2  0x7fd7312e0442 in ?? () from
> > /home/lano/Downloads/bugcrash/lib64/ld-linux-x86-64.so.2
> > #3  0x7fd7312c72ac in ?? () from /lib/libdl.so.2
> > #4  0x7fd73104211f in _dl_catch_exception () from /lib/libc.so.6
> > #5  0x7fd731042190 in _dl_catch_error () from /lib/libc.so.6
> > #6  0x7fd7312c7975 in ?? () from /lib/libdl.so.2
> > #7  0x7fd7312c7327 in dlopen () from /lib/libdl.so.2 (More stack
> > frames follow...)
> >
> > Thread 8 (LWP 686):
> > #0  0x7fd731298d48 in __cobalt_clock_nanosleep (clock_id=0,
> > flags=0, rqtp=0x7fd727e3ad10, rmtp=0x0) at
> > /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/clock.c:312
> > #1  0x7fd731298d81 in __cobalt_nanosleep (rqtp=,
> > rmtp=) at
> > /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/clock.c:354
> > #2  0x00434590 in operator() (__closure=0x7fd720006fb8) at
> > ../../acpu.runner/asim/asim_com.cpp:685
> > (More stack frames follow...)
> >
> > Thread 7 (LWP 677):
> > #0  0x7fd73127b6c6 in __GI___nanosleep
> > (requested_time=requested_time@entry=0x7fd7312b1fb0 ,
> > remaining=remaining@entry=0x0) at
> > ../sysdeps/unix/sysv/linux/nanosleep.c:28
> > #1  0x7fd73129b746 in printer_loop (arg=) at
> > /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/printf.c:635
> > #2  0x7fd7312720f7 in start_thread (arg=) at
> > pthread_create.c:486 (More stack 

Still getting Deadlocks with condition variables

2020-06-05 Thread Lange Norbert via Xenomai
Hello,

I brought this up once or twice at this ML [1], I am still getting some 
occasional lockups. Now the first time without running under an debugger,

Harwdare is a TQMxE39M (Goldmont Atom)
Kernel: 4.19.124-cip27-xeno12-static x86_64
I-pipe Version: 12
Xenomai Version: 3.1
Glibc Version 2.28

What happens (as far as I understand it):

The setup is an project with several cobalt threads (no "native" Linux thread 
as far as I can tell, apart maybe from the cobalt's printf thread).
They mostly sleep, and are triggered if work is available, the project also can 
load DSOs (specialized maths) during configuration stage - during this stages 
is when the exceptions occur


1.   Linux Thread LWP 682 calls SYS_futex "wake"

Code immediately before syscall, file x86_64/lowlevellock.S:
movl$0, (%rdi)
LOAD_FUTEX_WAKE (%esi)
movl$1, %edx/* Wake one thread.  */
movl$SYS_futex, %eax
syscall

2. Xenomai switches a cobalt thread to secondary, potentially because all 
threads are in primary:

Jun 05 12:35:19 buildroot kernel: [Xenomai] switching dispatcher to secondary 
mode after exception #14 from user-space at 0x7fd731299115 (pid 681)

Note that most threads are stuck waiting for a condvar in 
sc_cobalt_cond_wait_prologue (cond.c:313), LWP 681 is at the next instruction.

3. Xenomai gets XCPU signal -> coredump

gdb) thread apply all bt 3

Thread 9 (LWP 682):
#0  __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:339
#1  0x7fd731275d65 in __pthread_mutex_unlock_usercnt (mutex=0x7fd7312f6968 
<_rtld_global+2312>, decr=1) at pthread_mutex_unlock.c:54
#2  0x7fd7312e0442 in ?? () from 
/home/lano/Downloads/bugcrash/lib64/ld-linux-x86-64.so.2
#3  0x7fd7312c72ac in ?? () from /lib/libdl.so.2
#4  0x7fd73104211f in _dl_catch_exception () from /lib/libc.so.6
#5  0x7fd731042190 in _dl_catch_error () from /lib/libc.so.6
#6  0x7fd7312c7975 in ?? () from /lib/libdl.so.2
#7  0x7fd7312c7327 in dlopen () from /lib/libdl.so.2
(More stack frames follow...)

Thread 8 (LWP 686):
#0  0x7fd731298d48 in __cobalt_clock_nanosleep (clock_id=0, flags=0, 
rqtp=0x7fd727e3ad10, rmtp=0x0) at 
/opt/hipase2/src/xenomai-3.1.0/lib/cobalt/clock.c:312
#1  0x7fd731298d81 in __cobalt_nanosleep (rqtp=, 
rmtp=) at /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/clock.c:354
#2  0x00434590 in operator() (__closure=0x7fd720006fb8) at 
../../acpu.runner/asim/asim_com.cpp:685
(More stack frames follow...)

Thread 7 (LWP 677):
#0  0x7fd73127b6c6 in __GI___nanosleep 
(requested_time=requested_time@entry=0x7fd7312b1fb0 , 
remaining=remaining@entry=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x7fd73129b746 in printer_loop (arg=) at 
/opt/hipase2/src/xenomai-3.1.0/lib/cobalt/printf.c:635
#2  0x7fd7312720f7 in start_thread (arg=) at 
pthread_create.c:486
(More stack frames follow...)

Thread 6 (LWP 685):
#0  0x7fd73129910a in __cobalt_pthread_cond_wait (cond=0x7fd72f269660, 
mutex=0x7fd72f269630) at /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/cond.c:313
#1  0x0046377c in conditionvar_wait (pData=0x7fd72f269660, 
pMutex=0x7fd72f269630) at ../../alib/src/alib/posix/conditionvar.c:66
#2  0x0040a620 in HIPASE::Posix::CAlib_ConditionVariable::wait 
(this=0x7fd72f269660, lock=...) at 
../../alib/include/alib/alib_conditionvar_posix.h:67
(More stack frames follow...)

Thread 5 (LWP 684):
#0  0x7fd73129910a in __cobalt_pthread_cond_wait (cond=0x7fd72f267790, 
mutex=0x7fd72f267760) at /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/cond.c:313
#1  0x0046377c in conditionvar_wait (pData=0x7fd72f267790, 
pMutex=0x7fd72f267760) at ../../alib/src/alib/posix/conditionvar.c:66
#2  0x0040a620 in HIPASE::Posix::CAlib_ConditionVariable::wait 
(this=0x7fd72f267790, lock=...) at 
../../alib/include/alib/alib_conditionvar_posix.h:67
(More stack frames follow...)

Thread 4 (LWP 680):
#0  0x7fd73129910a in __cobalt_pthread_cond_wait (cond=0xfeafa0 <(anonymous 
namespace)::m_MainTaskStart>, mutex=0xfeaf60 <(anonymous 
namespace)::m_TaskMutex>) at 
/opt/hipase2/src/xenomai-3.1.0/lib/cobalt/cond.c:313
#1  0x0046377c in conditionvar_wait (pData=0xfeafa0 <(anonymous 
namespace)::m_MainTaskStart>, pMutex=0xfeaf60 <(anonymous 
namespace)::m_TaskMutex>) at ../../alib/src/alib/posix/conditionvar.c:66
#2  0x0040a620 in HIPASE::Posix::CAlib_ConditionVariable::wait 
(this=0xfeafa0 <(anonymous namespace)::m_MainTaskStart>, lock=...) at 
../../alib/include/alib/alib_conditionvar_posix.h:67
(More stack frames follow...)

Thread 3 (LWP 683):
#0  0x7fd73129910a in __cobalt_pthread_cond_wait (cond=0x7fd72f2658c0, 
mutex=0x7fd72f265890) at /opt/hipase2/src/xenomai-3.1.0/lib/cobalt/cond.c:313
#1  0x0046377c in conditionvar_wait (pData=0x7fd72f2658c0, 
pMutex=0x7fd72f265890) at ../../alib/src/alib/posix/conditionvar.c:66
#2  0x0040a620 in HIPASE::Posix::CAlib_ConditionVariable::wait 
(this=0x7fd72f2658c0, lock=...) at 
../../alib/include/alib/alib_conditionvar

RE: Ipipe-Patch for kernel version 5.4.23

2020-03-02 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 2. März 2020 16:59
> To: Lange Norbert 
> Subject: Re: Ipipe-Patch for kernel version 5.4.23
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Offlist by intention?

Nope, just Outlook hating me.

Norbert

>
> Jan
>
> On 02.03.20 16:26, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Xenomai  On Behalf Of Jan
> Kiszka
> >> via Xenomai
> >> Sent: Montag, 2. März 2020 13:38
> >> To: Ralf Moder ; xenomai@xenomai.org
> >> Subject: Re: Ipipe-Patch for kernel version 5.4.23
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 02.03.20 11:00, Ralf Moder via Xenomai wrote:
> >>> Hello everyone,
> >>> I need the kernel version 5.4.23 in an embedded linux project because
> of
> >> driver stuff but also the Xenomai extension. How big is the probability
> that an
> >> I-pipe patch for the 5.4 version will be released in the next 2 to 3 
> >> months?
> >>
> >> I would first of all say that it's >0. But beyond that, it depends on the
> >> architecture: x86 should be available on 5.4 in the next months with a
> >> probability >>50% as that will be my primary target for looking into how to
> >> move forward. ARM and ARM64 currently lacks a maintainer. If we are
> lucky,
> >> the base enabling via Philippe's dovetail queue will enable those more or
> less
> >> as well. But it will definitely take someone to take responsibility for 
> >> those
> >> archs because my colleagues and I currently cannot invest beyond
> integrating
> >> contributions. There is also ppc32 (which has a maintainer), but I suppose
> >> that this is not your target.
> >>
> >> Jan
> >
> > I still can't estimate the impact of swapping to dovetail, how much 
> > "ripples"
> this will cause.
> > I suppose the cobalt usermode interface will stay the same.
> >
> > -   Will drivers (rtnet) be affected?
> > -   AFAIU there finally will be shared clocks (monotonic, realtime) between
> cobalt and linux?
> >
> > I would of course welcome a 5.4 kernel, allowing me to drop several
> backported patches,
> > at the same time it's not a really good time to swap the basic building 
> > blocks
> for us.
> >
> > Norbert
> > 
> >
> > This message and any attachments are solely for the use of the intended
> recipients. They may contain privileged and/or confidential information or
> other information protected from disclosure. If you are not an intended
> recipient, you are hereby notified that you received this email in error and
> that any review, dissemination, distribution or copying of this email and any
> attachment is strictly prohibited. If you have received this email in error,
> please contact the sender and delete the message and any attachment from
> your system.
> >
> > ANDRITZ HYDRO GmbH
> >
> >
> > Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung /
> Corporation
> >
> > Firmensitz/ Registered seat: Wien
> >
> > Firmenbuchgericht/ Court of registry: Handelsgericht Wien
> >
> > Firmenbuchnummer/ Company registration: FN 61833 g
> >
> > DVR: 0605077
> >
> > UID-Nr.: ATU14756806
> >
> >
> > Thank You
> > 
> >


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Interrupt timeout

2020-02-25 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 25. Februar 2020 18:05
> To: Lange Norbert ; Greg Gallagher
> 
> Cc: Xenomai (xenomai@xenomai.org) 
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 25.02.20 16:55, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Dienstag, 25. Februar 2020 16:47
> >> To: Lange Norbert ; Greg Gallagher
> >> 
> >> Cc: Xenomai (xenomai@xenomai.org) 
> >> Subject: Re: Interrupt timeout
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Greg Gallagher 
> >>>> Sent: Dienstag, 25. Februar 2020 16:24
> >>>> To: Lange Norbert 
> >>>> Cc: Xenomai (xenomai@xenomai.org) ;
> Philippe
> >>>> Gerum (r...@xenomai.org) 
> >>>> Subject: Re: Interrupt timeout
> >>>>
> >>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> >>>>  wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I hope you can give me some pointers to understand why this Bug
> >>>> happened.
> >>>>> It seems an interrupt got lost somehow, maybe some issue with
> >>>> leveltriggers?
> >>>>>
> >>>>> Note that I run on an Apollo Lake, which would normally use
> >>>>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The
> system
> >>>>> works fine from eMMC otherwise, this bug occurred only once so far.
> >>>>>
> >>>>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>>>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev
> >> 0b)
> >>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> >> E3900
> >>>>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci
> Kernel
> >>>>> modules: sdhci_pci
> >>>>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>>>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> >>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> >> E3900
> >>>>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel
> modules:
> >>>>> sdhci_pci
> >>>>>
> >>>>> # cat /proc/interrupts
> >>>>>   CPU0   CPU1   CPU2   CPU3
> >>>>>  0: 50  0  0  0   IO-APIC2-edge 
> >>>>>  timer
> >>>>>  4:  0  0498  0   IO-APIC4-edge 
> >>>>>  ttyS0
> >>>>>  8:  0  0  0  0   IO-APIC8-edge 
> >>>>>  rtc0
> >>>>>  9:  0  0  0  0   IO-APIC
> >>>>> 9-fasteoi   acpi
> >>>>> 20: 20  0  0  0   IO-APIC   
> >>>>> 20-fasteoi   i801_smbus
> >>>>> 39:  0   1315  0  0   IO-APIC   
> >>>>> 39-fasteoi   mmc0
> >>>>>129:  0  1  0  0   PCI-MSI 
> >>>>> 1048576-edge  enp2s0
> >>>>>130:  0  0839  0   PCI-MSI 
> >>>>> 1048577-edge  enp2s0-rx-0
> >>>>>131:  0  0  0773   PCI-MSI 
> >>>>> 1048578-edge  enp2s0-rx-1
> >>>>>132:985  0  0  0   PCI-MSI 
> >>>>> 1048579-edge  enp2s0-tx-0
> >>>>>133:  0773  0  0   PCI-MSI 
> >>>>> 1048580-edge  enp2s0-tx-1
> >>>>>134:  0  0  0  1   PCI-MSI 
> >>>>> 1572864-edge  enp3s0
> >>>>

RE: Interrupt timeout

2020-02-25 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 25. Februar 2020 16:47
> To: Lange Norbert ; Greg Gallagher
> 
> Cc: Xenomai (xenomai@xenomai.org) 
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
> >
> >
> >> -Original Message-
> >> From: Greg Gallagher 
> >> Sent: Dienstag, 25. Februar 2020 16:24
> >> To: Lange Norbert 
> >> Cc: Xenomai (xenomai@xenomai.org) ; Philippe
> >> Gerum (r...@xenomai.org) 
> >> Subject: Re: Interrupt timeout
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> Hi,
> >>
> >> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> >>  wrote:
> >>>
> >>> Hello,
> >>>
> >>> I hope you can give me some pointers to understand why this Bug
> >> happened.
> >>> It seems an interrupt got lost somehow, maybe some issue with
> >> leveltriggers?
> >>>
> >>> Note that I run on an Apollo Lake, which would normally use
> >>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
> >>> works fine from eMMC otherwise, this bug occurred only once so far.
> >>>
> >>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev
> 0b)
> >>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> E3900
> >>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
> >>> modules: sdhci_pci
> >>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> >>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> E3900
> >>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
> >>> sdhci_pci
> >>>
> >>> # cat /proc/interrupts
> >>>  CPU0   CPU1   CPU2   CPU3
> >>> 0: 50  0  0  0   IO-APIC2-edge
> >>>   timer
> >>> 4:  0  0498  0   IO-APIC4-edge
> >>>   ttyS0
> >>> 8:  0  0  0  0   IO-APIC8-edge
> >>>   rtc0
> >>> 9:  0  0  0  0   IO-APIC9-fasteoi 
> >>>   acpi
> >>>20: 20  0  0  0   IO-APIC   20-fasteoi 
> >>>   i801_smbus
> >>>39:  0   1315  0  0   IO-APIC   39-fasteoi 
> >>>   mmc0
> >>>   129:  0  1  0  0   PCI-MSI 1048576-edge 
> >>>  enp2s0
> >>>   130:  0  0839  0   PCI-MSI 1048577-edge 
> >>>  enp2s0-rx-0
> >>>   131:  0  0  0773   PCI-MSI 1048578-edge 
> >>>  enp2s0-rx-1
> >>>   132:985  0  0  0   PCI-MSI 1048579-edge 
> >>>  enp2s0-tx-0
> >>>   133:  0773  0  0   PCI-MSI 1048580-edge 
> >>>  enp2s0-tx-1
> >>>   134:  0  0  0  1   PCI-MSI 1572864-edge 
> >>>  enp3s0
> >>>   135:  11464  0  0  0   PCI-MSI 1572865-edge 
> >>>  enp3s0-rx-0
> >>>   136:  0781  0  0   PCI-MSI 1572866-edge 
> >>>  enp3s0-rx-1
> >>>   137:  0  0899  0   PCI-MSI 1572867-edge 
> >>>  enp3s0-tx-0
> >>>   138:  0  0  0   9701   PCI-MSI 1572868-edge 
> >>>  enp3s0-tx-1
> >>>   139:  1  0  0  0   PCI-MSI 2097152-edge 
> >>>  enp4s0
> >>>   140:  0   1985  0  0   PCI-MSI 2097153-edge 
> >>>  enp4s0-TxRx-
> 0
> >>>   141:  0  0774  0   PCI-MSI 2097154-edge 
> >>>  enp4s0-TxRx-1
> >>>   142:  0  0  0   1905   PCI-MSI 2097155-edge 
> >>>  enp4s0-TxRx-
> 2
> >>>   143:775  0  0  0   PC

RE: Interrupt timeout

2020-02-25 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Greg Gallagher 
> Sent: Dienstag, 25. Februar 2020 16:24
> To: Lange Norbert 
> Cc: Xenomai (xenomai@xenomai.org) ; Philippe
> Gerum (r...@xenomai.org) 
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hi,
>
> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
>  wrote:
> >
> > Hello,
> >
> > I hope you can give me some pointers to understand why this Bug
> happened.
> > It seems an interrupt got lost somehow, maybe some issue with
> leveltriggers?
> >
> > Note that I run on an Apollo Lake, which would normally use
> > PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
> > works fine from eMMC otherwise, this bug occurred only once so far.
> >
> > 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> > Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
> > Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
> > Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
> > modules: sdhci_pci
> > 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> > Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> > Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
> > Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
> > sdhci_pci
> >
> > # cat /proc/interrupts
> > CPU0   CPU1   CPU2   CPU3
> >0: 50  0  0  0   IO-APIC2-edge  
> > timer
> >4:  0  0498  0   IO-APIC4-edge  
> > ttyS0
> >8:  0  0  0  0   IO-APIC8-edge  
> > rtc0
> >9:  0  0  0  0   IO-APIC9-fasteoi   
> > acpi
> >   20: 20  0  0  0   IO-APIC   20-fasteoi   
> > i801_smbus
> >   39:  0   1315  0  0   IO-APIC   39-fasteoi   
> > mmc0
> >  129:  0  1  0  0   PCI-MSI 1048576-edge
> >   enp2s0
> >  130:  0  0839  0   PCI-MSI 1048577-edge
> >   enp2s0-rx-0
> >  131:  0  0  0773   PCI-MSI 1048578-edge
> >   enp2s0-rx-1
> >  132:985  0  0  0   PCI-MSI 1048579-edge
> >   enp2s0-tx-0
> >  133:  0773  0  0   PCI-MSI 1048580-edge
> >   enp2s0-tx-1
> >  134:  0  0  0  1   PCI-MSI 1572864-edge
> >   enp3s0
> >  135:  11464  0  0  0   PCI-MSI 1572865-edge
> >   enp3s0-rx-0
> >  136:  0781  0  0   PCI-MSI 1572866-edge
> >   enp3s0-rx-1
> >  137:  0  0899  0   PCI-MSI 1572867-edge
> >   enp3s0-tx-0
> >  138:  0  0  0   9701   PCI-MSI 1572868-edge
> >   enp3s0-tx-1
> >  139:  1  0  0  0   PCI-MSI 2097152-edge
> >   enp4s0
> >  140:  0   1985  0  0   PCI-MSI 2097153-edge
> >   enp4s0-TxRx-0
> >  141:  0  0774  0   PCI-MSI 2097154-edge
> >   enp4s0-TxRx-1
> >  142:  0  0  0   1905   PCI-MSI 2097155-edge
> >   enp4s0-TxRx-2
> >  143:775  0  0  0   PCI-MSI 2097156-edge
> >   enp4s0-TxRx-3
> >  144:  0  0  97790  0   PCI-MSI 344064-edge 
> >  xhci_hcd
> >  NMI:  0  0  0  0   Non-maskable interrupts
> >  LOC:  47839 147583  13807  32602   Local timer interrupts
> >  SPU:  0  0  0  0   Spurious interrupts
> >  PMI:  0  0  0  0   Performance monitoring 
> > interrupts
> >  IWI:  0  0  0  0   IRQ work interrupts
> >  RTR:  0  0  0  0   APIC ICR read retries
> >  RES:  11955   6931   6567   4946   Rescheduling interrupts
> >  CAL:268223210217   Function call interrupts
> >  TLB: 63 57 61 50   TLB shootdowns
> >  TRM:  0  0  0  0   Thermal event interrupts
> >  THR:  0  0  0  0   Threshold APIC 
> &g

Interrupt timeout

2020-02-25 Thread Lange Norbert via Xenomai
Hello,

I hope you can give me some pointers to understand why this Bug happened.
It seems an interrupt got lost somehow, maybe some issue with leveltriggers?

Note that I run on an Apollo Lake, which would normally use PINCTRL_BROXTON,
but that’s not fixed up for Xenomai yet. The system works fine from eMMC 
otherwise,
this bug occurred only once so far.

00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium Processor 
N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series 
SDXC/MMC Host Controller
Kernel driver in use: sdhci-pci
Kernel modules: sdhci_pci
00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium Processor 
N4200/N3350/E3900 Series eMMC Controller (rev 0b)
Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series eMMC 
Controller
Kernel driver in use: sdhci-pci
Kernel modules: sdhci_pci

# cat /proc/interrupts
CPU0   CPU1   CPU2   CPU3
   0: 50  0  0  0   IO-APIC2-edge  timer
   4:  0  0498  0   IO-APIC4-edge  ttyS0
   8:  0  0  0  0   IO-APIC8-edge  rtc0
   9:  0  0  0  0   IO-APIC9-fasteoi   acpi
  20: 20  0  0  0   IO-APIC   20-fasteoi   
i801_smbus
  39:  0   1315  0  0   IO-APIC   39-fasteoi   mmc0
 129:  0  1  0  0   PCI-MSI 1048576-edge  
enp2s0
 130:  0  0839  0   PCI-MSI 1048577-edge  
enp2s0-rx-0
 131:  0  0  0773   PCI-MSI 1048578-edge  
enp2s0-rx-1
 132:985  0  0  0   PCI-MSI 1048579-edge  
enp2s0-tx-0
 133:  0773  0  0   PCI-MSI 1048580-edge  
enp2s0-tx-1
 134:  0  0  0  1   PCI-MSI 1572864-edge  
enp3s0
 135:  11464  0  0  0   PCI-MSI 1572865-edge  
enp3s0-rx-0
 136:  0781  0  0   PCI-MSI 1572866-edge  
enp3s0-rx-1
 137:  0  0899  0   PCI-MSI 1572867-edge  
enp3s0-tx-0
 138:  0  0  0   9701   PCI-MSI 1572868-edge  
enp3s0-tx-1
 139:  1  0  0  0   PCI-MSI 2097152-edge  
enp4s0
 140:  0   1985  0  0   PCI-MSI 2097153-edge  
enp4s0-TxRx-0
 141:  0  0774  0   PCI-MSI 2097154-edge  
enp4s0-TxRx-1
 142:  0  0  0   1905   PCI-MSI 2097155-edge  
enp4s0-TxRx-2
 143:775  0  0  0   PCI-MSI 2097156-edge  
enp4s0-TxRx-3
 144:  0  0  97790  0   PCI-MSI 344064-edge  
xhci_hcd
 NMI:  0  0  0  0   Non-maskable interrupts
 LOC:  47839 147583  13807  32602   Local timer interrupts
 SPU:  0  0  0  0   Spurious interrupts
 PMI:  0  0  0  0   Performance monitoring 
interrupts
 IWI:  0  0  0  0   IRQ work interrupts
 RTR:  0  0  0  0   APIC ICR read retries
 RES:  11955   6931   6567   4946   Rescheduling interrupts
 CAL:268223210217   Function call interrupts
 TLB: 63 57 61 50   TLB shootdowns
 TRM:  0  0  0  0   Thermal event interrupts
 THR:  0  0  0  0   Threshold APIC interrupts
 MCE:  0  0  0  0   Machine check exceptions
 MCP:  5  6  6  6   Machine check polls
 ERR:  0
 MIS:  0
 PIN:  0  0  0  0   Posted-interrupt 
notification event
 NPI:  0  0  0  0   Nested posted-interrupt 
event
 PIW:  0  0  0  0   Posted-interrupt wakeup 
event


[  238.245509] mmc0: Timeout waiting for hardware interrupt.
[  238.250944] mmc0: sdhci:  SDHCI REGISTER DUMP ===
[  238.257422] mmc0: sdhci: Sys addr:  0x2008 | Version:  0x1002
[  238.263900] mmc0: sdhci: Blk size:  0x7200 | Blk cnt:  0x
[  238.270373] mmc0: sdhci: Argument:  0x00118678 | Trn mode: 0x002b
[  238.276848] mmc0: sdhci: Present:   0x1fff | Host ctl: 0x003d
[  238.283323] mmc0: sdhci: Power: 0x000b | Blk gap:  0x0080
[  238.289797] mmc0: sdhci: Wake-up:   0x | Clock:0x0007
[  238.296265] mmc0: sdhci: Timeout:   0x0007 | Int stat: 0x0003
[  238.302734] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab: 0x03ff000b
[  238.309206] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0

RE: [PATCH ipipe-noarch] ipipe: Disable rcuidle trace path when running over the head domain

2020-02-18 Thread Lange Norbert via Xenomai
Hello Jan,

yes this fixes the issue for me, what's the downside of not using *_rcuidle,
is that a performance optimization?

Norbert Lange

> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 17. Februar 2020 19:06
> To: Philippe Gerum ; Xenomai
> 
> Cc: Lange Norbert 
> Subject: [PATCH ipipe-noarch] ipipe: Disable rcuidle trace path when running
> over the head domain
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> From: Jan Kiszka 
>
> We do not need the special handling of __DO_TRACE(..., rcuidle=1) when
> running over the head domain. In fact, we cannot use it because it
> switches to srcu which is incompatible with that context. It's safe to
> switch to normal RCU because no head domain caller of a trace_*_rcuidle
> tracepoints should do this from rcu-problematic paths, specifically
> idle.
>
> Ported from the dovetail queue.
>
> Signed-off-by: Jan Kiszka 
> ---
>
> Philippe, does this description make sense?
>
> Norbert, please test.
>
>  include/linux/tracepoint.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index e9de8ad0bad7..83df22a6f284 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct module;
>  struct tracepoint;
> @@ -212,7 +213,7 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> __DO_TRACE(&__tracepoint_##name,\
> TP_PROTO(data_proto),   \
> TP_ARGS(data_args), \
> -   TP_CONDITION(cond), 1); \
> +   TP_CONDITION(cond), ipipe_root_p);  \
> }
>  #else
>  #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto,
> data_args)
> --
> 2.16.4


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Still some lockups when enabling ftrace

2020-02-17 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 17. Februar 2020 18:16
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Still some lockups when enabling ftrace
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 17.02.20 17:57, Jan Kiszka via Xenomai wrote:
> > On 17.02.20 17:50, Lange Norbert via Xenomai wrote:
> >> I managed to narrow it down to this:
> >>
> >> trace-cmd start  -e 'tlb:tlb_flush'
> >>
> >> Seems to bug the kernel even if no cobalt thread is running, only a
> >> rt_igb and the rtnet stack.
> >>
> >
> > OK, that was already my idea as well.
> >
> > That rcu_irq_enter_irqson must not be there. Trying to understand
> > right now how it can sneak in. What are your RCU configs?
> >
>
> Found it.
>
> Preconditions: CONFIG_PREEMPT_NONE/VOLUNTARY ->
> CONFIG_TREE_RCU
>
> Now switch_mm_irqs_off, which we also use for primary tasks, carries some
> trace_tlb_flush_rcuidle. The "_rcuidle" appendix makes the difference. Such
> tracepoints end up in tracepoint.h's __DO_TRACE in the rcuidle path that
> contains a rcu_irq_enter_irqson of this form:
>
> void rcu_irq_exit_irqson(void)
> {
> unsigned long flags;
>
> local_irq_save(flags);
> rcu_irq_exit();
> local_irq_restore(flags);
> }
>
> And those local_irq* bite us. With CONFIG_TINY_RCU, rcu_irq_exit_irqson is
> empty.
>
> I guess we need to skip the problematic tracepoints, either when I-pipe is on
> or when the caller is in the primary domain.

That rcuidle sprung into my eye aswell, but not because I have any clue what 
happens
in the system.

I asked some time ago, about related stuff (IPI, 
MEMBARRIER_CMD_PRIVATE_EXPEDITED)
Related to Lttng whether those are safe to use at all.
IPIs still are interrupts and will have to affect Cobalt AFAIK, and a lot of 
assertions are
That they are very fast (while they might be blocked a couple ms with realtime 
tasks).

https://www.xenomai.org/pipermail/xenomai/2019-November/042080.html

Quite good you found it that fast anyway ;)

Thanks, Norbert



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Still some lockups when enabling ftrace

2020-02-17 Thread Lange Norbert via Xenomai
I managed to narrow it down to this:

trace-cmd start  -e 'tlb:tlb_flush'

Seems to bug the kernel even if no cobalt thread is running, only a rt_igb and 
the rtnet stack.


> -Original Message-
> From: Xenomai  On Behalf Of Lange
> Norbert via Xenomai
> Sent: Montag, 17. Februar 2020 17:09
> To: Xenomai (xenomai@xenomai.org) 
> Subject: Still some lockups when enabling ftrace
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hello,
>
> Enabling traces still can lockup Xenomai, apparently if any cobalt thread is
> running.
>
> trace-cmd record -e all
>
> [11598.080137] I-pipe: Detected illicit call from head domain 'Xenomai'
> [11598.080137] into a regular Linux service
> [11598.091070] CPU: 3 PID: 948 Comm: sshd Not tainted 4.19.98-cip19-
> xeno10-static #1 [11598.098531] Hardware name: TQ-Group
> TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.18 04/12/2019
> [11598.107813] I-pipe domain: Xenomai [11598.111215] Call Trace:
> [11598.113666]  
> [11598.115690]  dump_stack+0x8c/0xc0
> [11598.119011]  ipipe_stall_root+0xc/0x30 [11598.122768]
> rcu_irq_enter_irqson+0x13/0x40 [11598.126947]
> switch_mm_irqs_off+0x441/0x4e0 [11598.131132]
> xnarch_switch_to+0x2b/0x70 [11598.134968]  ___xnsched_run+0x1b6/0x480
> [11598.138806]  xnintr_core_clock_handler+0x224/0x2f0
> [11598.143602]  dispatch_irq_head+0x95/0x130 [11598.147622]
> __ipipe_handle_irq+0x94/0x200 [11598.151730]
> apic_timer_interrupt+0x12/0x40 [11598.155914]   [11598.158021] RIP:
> 0010:copy_user_generic_unrolled+0xb1/0xc0
> [11598.163513] Code: 8b 06 4c 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 
> 74
> 10 89 d1 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 31 c0 0f 01 ca  66 66 
> 2e 0f 1f
> 84 00 00 00 00 00 0f 1f 00 0f 2 [11598.182250] RSP: 0018:95ffc0947c50
> EFLAGS: 0246 ORIG_RAX: ff13 [11598.189809] RAX:
>  RBX: 001e RCX: 
> [11598.196934] RDX: 0006 RSI: 95ffc006f2f0 RDI:
> 7fff3aa91f7e [11598.204055] RBP: 90d2f952a400 R08:
> 726f63657220706f R09: 7fff3aa91f60 [11598.211179] R10:
> 095c R11: 90d2f3e1596c R12: 95ffc0947d40
> [11598.218306] R13: 95ffc006f000 R14: 95ffc0947d48 R15:
> 95ffc006f2d2 [11598.225447]  _copy_to_user+0x28/0x30 [11598.229027]
> copy_from_read_buf+0x97/0x150 [11598.233127]  n_tty_read+0x202/0x920
> [11598.236624]  ? set_fd_set.part.0+0x37/0x40 [11598.240720]  ?
> core_sys_select+0x278/0x2d0 [11598.244818]  ? rcu_all_qs+0x5/0x80
> [11598.248229]  ? _cond_resched+0x15/0x30 [11598.251985]  ?
> ldsem_down_read+0x38/0x200 [11598.255997]  ?
> do_wait_intr_irq+0x90/0x90 [11598.260004]  tty_read+0x83/0xf0
> [11598.263141]  __vfs_read+0x34/0x180 [11598.266540]  ?
> trace_buffer_unlock_commit_regs+0x37/0x90
> [11598.271853]  ? trace_event_buffer_commit+0x66/0x1d0
> [11598.276735]  vfs_read+0x9d/0x150
> [11598.279968]  ksys_read+0x57/0xd0
> [11598.283201]  do_syscall_64+0x78/0x3d0 [11598.286864]  ?
> __do_page_fault+0x206/0x400 [11598.290954]
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [11598.296000] RIP: 0033:0x7effc6e2c441
> [11598.299590] Code: Bad RIP value.
> [11598.302820] RSP: 002b:7fff3aa91f58 EFLAGS: 0246 ORIG_RAX:
>  [11598.310371] RAX: ffda RBX: 7effc6d1b2c0
> RCX: 00
>
>
> Mit besten Grüßen / Kind regards
>
> NORBERT LANGE
>
> AT-RD3
>
> ANDRITZ HYDRO GmbH
> Eibesbrunnergasse 20
> 1120 Vienna / AUSTRIA
> p: +43 50805 56684
> norbert.la...@andritz.com<mailto:norbert.la...@andritz.com>
> andritz.com<https://hes32-
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=http%3a%2f%2fwww.a
> ndritz.com&umid=43f64137-0f9f-4732-84b1-
> eb0d5e227769&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> da68bb35e4763b25686bf9bec4795991e92722be>
>
> 
>
> This message and any attachments are solely for the use of the intended
> recipients. They may contain privileged and/or confidential information or
> other information protected from disclosure. If you are not an intended
> recipient, you are hereby notified that you received this email in error and
> that any review, dissemination, distribution or copying of this email and any
> attachment is strictly prohibited. If you have received this email in error,
> please contact the sender and delete the message and any attachment from
> your system.
>
> ANDRITZ HYDRO GmbH
>
>
> Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
>
> Firmensitz/ Registered seat: Wien
>
> Firmenbuchgericht/ Court of registry: Handelsgericht Wien
>
> Firmenbuchnummer/ Company registration: FN

Still some lockups when enabling ftrace

2020-02-17 Thread Lange Norbert via Xenomai
Hello,

Enabling traces still can lockup Xenomai, apparently if any cobalt thread is 
running.

trace-cmd record -e all

[11598.080137] I-pipe: Detected illicit call from head domain 'Xenomai'
[11598.080137] into a regular Linux service
[11598.091070] CPU: 3 PID: 948 Comm: sshd Not tainted 
4.19.98-cip19-xeno10-static #1
[11598.098531] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.18 04/12/2019
[11598.107813] I-pipe domain: Xenomai
[11598.111215] Call Trace:
[11598.113666]  
[11598.115690]  dump_stack+0x8c/0xc0
[11598.119011]  ipipe_stall_root+0xc/0x30
[11598.122768]  rcu_irq_enter_irqson+0x13/0x40
[11598.126947]  switch_mm_irqs_off+0x441/0x4e0
[11598.131132]  xnarch_switch_to+0x2b/0x70
[11598.134968]  ___xnsched_run+0x1b6/0x480
[11598.138806]  xnintr_core_clock_handler+0x224/0x2f0
[11598.143602]  dispatch_irq_head+0x95/0x130
[11598.147622]  __ipipe_handle_irq+0x94/0x200
[11598.151730]  apic_timer_interrupt+0x12/0x40
[11598.155914]  
[11598.158021] RIP: 0010:copy_user_generic_unrolled+0xb1/0xc0
[11598.163513] Code: 8b 06 4c 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 
74 10 89 d1 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 31 c0 0f 01 ca  66 66 
2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 2
[11598.182250] RSP: 0018:95ffc0947c50 EFLAGS: 0246 ORIG_RAX: 
ff13
[11598.189809] RAX:  RBX: 001e RCX: 
[11598.196934] RDX: 0006 RSI: 95ffc006f2f0 RDI: 7fff3aa91f7e
[11598.204055] RBP: 90d2f952a400 R08: 726f63657220706f R09: 7fff3aa91f60
[11598.211179] R10: 095c R11: 90d2f3e1596c R12: 95ffc0947d40
[11598.218306] R13: 95ffc006f000 R14: 95ffc0947d48 R15: 95ffc006f2d2
[11598.225447]  _copy_to_user+0x28/0x30
[11598.229027]  copy_from_read_buf+0x97/0x150
[11598.233127]  n_tty_read+0x202/0x920
[11598.236624]  ? set_fd_set.part.0+0x37/0x40
[11598.240720]  ? core_sys_select+0x278/0x2d0
[11598.244818]  ? rcu_all_qs+0x5/0x80
[11598.248229]  ? _cond_resched+0x15/0x30
[11598.251985]  ? ldsem_down_read+0x38/0x200
[11598.255997]  ? do_wait_intr_irq+0x90/0x90
[11598.260004]  tty_read+0x83/0xf0
[11598.263141]  __vfs_read+0x34/0x180
[11598.266540]  ? trace_buffer_unlock_commit_regs+0x37/0x90
[11598.271853]  ? trace_event_buffer_commit+0x66/0x1d0
[11598.276735]  vfs_read+0x9d/0x150
[11598.279968]  ksys_read+0x57/0xd0
[11598.283201]  do_syscall_64+0x78/0x3d0
[11598.286864]  ? __do_page_fault+0x206/0x400
[11598.290954]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[11598.296000] RIP: 0033:0x7effc6e2c441
[11598.299590] Code: Bad RIP value.
[11598.302820] RSP: 002b:7fff3aa91f58 EFLAGS: 0246 ORIG_RAX: 

[11598.310371] RAX: ffda RBX: 7effc6d1b2c0 RCX: 00


Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You

-- next part --
A non-text attachment was scrubbed...
Name: config-4.19.tar.xz
Type: application/octet-stream
Size: 20180 bytes
Desc: config-4.19.tar.xz
URL: 



RE: Installing RTnet

2020-02-11 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of Kloock,
> Lennard via Xenomai
> Sent: Montag, 10. Februar 2020 14:11
> To: xenomai@xenomai.org
> Subject: Installing RTnet
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hello all,
>
> i have succesfully installed Xenomai 3.0.5 with linux 4.9.90, but I am having
> trouble installing RTnet on an i218-lm.
>
> I followed these steps:
> 1) ifconfig eth0 down
> 2) rmmod e1000e
> 3) configured /usr/xenomai/etc/rtnet.conf according to
> https://gitlab.denx.de/Xenomai/xenomai/-/wikis/RTnet
> 4) modprobe rtpacket
> 5) modprobe rtnet
> 6) modprobe rt_e1000e
>
> But now if I try to run rtnet with  ./rtnet start I get the following errors:
>
> ioctl: No such device
> ioctl: No such device
> ioctl: No such device
> ioctl: No such device
> ioctl (add): No such device
> ioctl (add): No such device
> ioctl (add): No such device
> ioctl (add): No such device
> vnic0: ERROR while getting interface flags: No such device
> SIOCSIFADDR: No such device
> vnic0: ERROR while getting interface flags: No such device
> SIOCSIFNETMASK: No such device
> Waiting for all slaves...ioctl: No such device
> ioctl: No such device
>
> ./rtifconfig only show the local loopback but not rteth0
>
> Dmesg shows that rtnet and the driver are being loaded
>
> [  391.184595]
>*** RTnet for Xenomai v3.0.5 ***
>
> [  391.184596] RTnet: initialising real-time networking [  400.034853]
> rt_e1000e: Intel(R) PRO/1000 Network Driver - 1.5.1-k-rt [  400.034854]
> rt_e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
>
> But with lspci -knn it shows that the NIC doesn't pick up the real time 
> driver:
>
> 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection
> I219-LM [8086:156f] (rev 21)
> Subsystem: Gigabyte Technology Co., Ltd Ethernet Connection 
> I219-
> LM [1458:e000]
> Kernel modules: e1000e
>
> Did I do something wrong or is my I219-LM NIC not supported by RTnet?

Cant tell if your device is supported, but rtnet will never *replace* a linux 
driver automatically,
you need to make sure the hardware is not bound to a driver, or manually un/bind
to the rtnet driver.

# kick out kernel modules
modprobe -r e1000e
modprobe rtnet
modprobe rt_e1000e


# manuall rebind, look in th e1000e folder for the correct name
echo "00:1f.6" > /sys/bus/pci/drivers/e1000e/unbind
echo "00:1f.6" > /sys/bus/pci/drivers/rt_e1000e/bind

Norbert



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Cobalt deadlock for no apparent reason

2020-01-22 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 21. Jänner 2020 18:46
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Cobalt deadlock for no apparent reason
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 20.01.20 19:03, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I got a deadlock while running through gdbserver, this is an
> > implementation of a synchronized queue, Fup side waits via condition
> variable, main wants to push data, but main fails to acquire the mutex.
> > The mutex is an errorchecking type, without priority inheritance, and not
> used elsewhere.
> >
> >
> > The task are as following:
> >
> > CPU  PIDCLASS  TYPE  PRI   TIMEOUT   STAT   NAME
> >1  1686   rt cobalt  4   - Wt main
> >3  1690   rt cobalt  2   - Wt fup.medium
> >
> > main is stuck in this function
> >
> > int mutex_lock(struct mutex_data *pData) {
> >  pthread_t threadId = pthread_self();
> >  // assert(pthread_equal(threadId, pData->m_LockId) == 0);
> > ->int r = pthread_mutex_lock(&pData->m_Mutex);
> >  assert(r == 0);
> >  pData->m_LockId = threadId;
> >  return r;
> > }
> >
> > In libcobalt:
> >do
> >  ret = XENOMAI_SYSCALL1(sc_cobalt_mutex_lock, _mutex);
> >while (ret == -EINTR);
> >
> > fup.medium is stuck in:
> >
> > int conditionvar_wait(struct conditionvar_data *pData, struct
> > mutex_data *pMutex) {
> >  pthread_t sid = pthread_self();
> >  assert(pthread_equal(sid, pMutex->m_LockId) != 0);
> >  pMutex->m_LockId = 0;
> > ->int r = pthread_cond_wait(&pData->m_CondVar, &pMutex-
> >m_Mutex);
> >  assert(r == 0);
> >  pMutex->m_LockId = sid;
> >  return r;
> > }
> >
> > In libcobalt:
> >while (err == -EINTR)
> >  err = XENOMAI_SYSCALL2(sc_cobalt_cond_wait_epilogue, _cnd, _mx);
> >
>
> This is likely tricky to debug by just looking at things. Can you factor out a
> reproducer?

Well, the "no apparent reason" is key here, it's not easily reproducible either.
Might help if you tell me how I could end up in this situation, AFAIK the 
pthread_cond_wait function got an interrupt,
when can this occur for example.
It seems limited to running under a debugger (or just happens a lot less 
without), and when the process dlopen's libraries this pauses
the process for example. At that time the fup.medium is supposed to stick in 
pthread_cond_wait (means the dlopens might cause spurious wakeups)
and notified after everything is ready to run.

So, if you have any idea how I could narrow it down, I migh be able to build a 
reproducer. Right now I am going to use a timed mutex to atleast detect the 
issue.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Cobalt deadlock for no apparent reason

2020-01-20 Thread Lange Norbert via Xenomai
Hello,

I got a deadlock while running through gdbserver, this is an implementation of 
a synchronized queue,
Fup side waits via condition variable, main wants to push data, but main fails 
to acquire the mutex.
The mutex is an errorchecking type, without priority inheritance, and not used 
elsewhere.


The task are as following:

CPU  PIDCLASS  TYPE  PRI   TIMEOUT   STAT   NAME
  1  1686   rt cobalt  4   - Wt main
  3  1690   rt cobalt  2   - Wt fup.medium

main is stuck in this function

int mutex_lock(struct mutex_data *pData)
{
pthread_t threadId = pthread_self();
// assert(pthread_equal(threadId, pData->m_LockId) == 0);
->int r = pthread_mutex_lock(&pData->m_Mutex);
assert(r == 0);
pData->m_LockId = threadId;
return r;
}

In libcobalt:
  do
ret = XENOMAI_SYSCALL1(sc_cobalt_mutex_lock, _mutex);
  while (ret == -EINTR);

fup.medium is stuck in:

int conditionvar_wait(struct conditionvar_data *pData, struct mutex_data 
*pMutex)
{
pthread_t sid = pthread_self();
assert(pthread_equal(sid, pMutex->m_LockId) != 0);
pMutex->m_LockId = 0;
->int r = pthread_cond_wait(&pData->m_CondVar, &pMutex->m_Mutex);
assert(r == 0);
pMutex->m_LockId = sid;
return r;
}

In libcobalt:
  while (err == -EINTR)
err = XENOMAI_SYSCALL2(sc_cobalt_cond_wait_epilogue, _cnd, _mx);

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: 4.19.94-cip18-xeno10 regression: bugs out when changing drivers

2020-01-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 15. Jänner 2020 23:07
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: 4.19.94-cip18-xeno10 regression: bugs out when changing
> drivers
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 15.01.20 18:12, Lange Norbert via Xenomai wrote:
> >
> > Reverting commit #0393b8720128 "ptp: fix the race between the release of
> ptp_clock and cdev" fixes the issue.
> > I don't see any locks or similar involved to it doesn't appear to be 
> > related to
> Xenomai at all.
> >
>
> It might be that this commit lacks some dependencies - the stable tree in
> unfortunately not as stable as one may expect.

And it passed through cip aswell, not really reassuring :{

> If you can reproduce this with a vanilla stable, also with the latest one 
> (.96),
> please report upstream. If it should be -cip specific, i.e.
> not present in 4.19.95 (which would be a surprise in this area), please report
> there.

Nope, needs a patch hat not even in master, but should be soon:
https://lore.kernel.org/netdev/20200113130009.2938-1-vdro...@redhat.com/


Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: 4.19.94-cip18-xeno10 regression: bugs out when changing drivers

2020-01-15 Thread Lange Norbert via Xenomai


Reverting commit #0393b8720128 "ptp: fix the race between the release of 
ptp_clock and cdev" fixes the issue.
I don't see any locks or similar involved to it doesn't appear to be related to 
Xenomai at all.

Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Lange
> Norbert via Xenomai
> Sent: Mittwoch, 15. Jänner 2020 17:25
> To: Xenomai (xenomai@xenomai.org) 
> Subject: 4.19.94-cip18-xeno10 regression: bugs out when changing drivers
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hello,
>
> I had no problem with the previous 4.19.89 (and versions back to 4.14), but
> this kernel does not like unloading the linux eth driver.
> Will try with the plain linux kernel in the coming days, but any help is
> appreciated.
>
> Norbert
>
> ethpci=":01:00.0"
> echo "$ethpci" > /sys/bus/pci/devices/$ethpci/driver/unbind.
>
> [  199.590152] BUG: unable to handle kernel NULL pointer dereference at
> 
> [  199.597995] PGD 179717067 P4D 179717067 PUD 17896b067 PMD 0
> [  199.603670] Oops:  [#1] SMP NOPTI
> [  199.607344] CPU: 2 PID: 764 Comm: zsh Not tainted 4.19.94-cip18-xeno10-
> static #1
> [  199.614745] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product
> Name, BIOS 5.12.30.21.20 08/05/2019
> [  199.624059] I-pipe domain: Linux
> [  199.627300] RIP: 0010:strlen+0x0/0x20
> [  199.630972] Code: f6 82 e0 5e 31 8b 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 
> e0
> 5e 31 8b 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 
> 74 10
> 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
> [  199.649742] RSP: 0018:ad3ec06ffb20 EFLAGS: 00010246
> [  199.654975] RAX:  RBX:  RCX:
> 
> [  199.662118] RDX:  RSI:  RDI:
> 
> [  199.669258] RBP:  R08:  R09:
> 9756f996f018
> [  199.676402] R10:  R11: 9756fb006c00 R12:
> 9756f9916788
> [  199.683543] R13:  R14: 9756fa81e190 R15:
> 9756f8b66f20
> [  199.690683] FS:  00535558() GS:9756fbb0()
> knlGS:
> [  199.698780] CS:  0010 DS:  ES:  CR0: 80050033
> [  199.704535] CR2:  CR3: 00017aaa CR4:
> 003406e0
> [  199.711674] Call Trace:
> [  199.714136]  kernfs_name_hash+0x12/0x80
> [  199.717983]  kernfs_find_ns+0x35/0xd0
> [  199.721654]  kernfs_remove_by_name_ns+0x32/0x90
> [  199.726194]  remove_files.isra.0+0x30/0x70
> [  199.730301]  sysfs_remove_group+0x3d/0x80
> [  199.734321]  sysfs_remove_groups+0x29/0x40
> [  199.738428]  device_remove_attrs+0x42/0x80
> [  199.742534]  device_del+0x14f/0x360
> [  199.746036]  cdev_device_del+0x15/0x30
> [  199.749797]  posix_clock_unregister+0x21/0x50
> [  199.754165]  ptp_clock_unregister+0x6e/0x80
> [  199.758359]  igb_ptp_stop+0x1f/0x50
> [  199.761861]  igb_remove+0x37/0x110
> [  199.765272]  pci_device_remove+0x28/0x60
> [  199.769202]  device_release_driver_internal+0x162/0x220
> [  199.774437]  unbind_store+0xb1/0x170
> [  199.778024]  kernfs_fop_write+0x10b/0x190
> [  199.782042]  do_iter_write+0x140/0x180
> [  199.785801]  vfs_writev+0xa6/0xf0
> [  199.789127]  ? __alloc_fd+0x3d/0x140
> [  199.792711]  ? f_dupfd+0x66/0x79
> [  199.795949]  do_writev+0x5f/0x100
> [  199.799273]  do_syscall_64+0x78/0x3d0
> [  199.802944]  ? __do_page_fault+0x206/0x400
> [  199.807049]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  199.812106] RIP: 0033:0x4cc34c
> [  199.815172] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd 48 29 c2 
> 49
> 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 89 da 0f 05 <48> 89 c7 
> e8 cc
> 4e ff ff 49 39 c6 75 b7 49 8b 47 58 49 8b 57 60 48
> [  199.833943] RSP: 002b:7ffe32e417a0 EFLAGS: 0202 ORIG_RAX:
> 0014
> [  199.841521] RAX: ffda RBX: 0002 RCX:
> 004cc34c
> [  199.848663] RDX: 0002 RSI: 7ffe32e417b0 RDI:
> 0001
> [  199.855805] RBP: 0002 R08: 00523040 R09:
> 
> [  199.862949] R10:  R11: 0202 R12:
> 0014
> [  199.870091] R13: 7ffe32e417b0 R14: 000d R15:
> 00523040
> [  199.877237] Modules linked in: plusb usbnet mii
> [  199.881783] CR2: 
> [  199.885115] ---[ end trace 218fd81d1aa77ca4 ]---
> [  199.889741] RIP: 0010:strlen+0x0/0x20
> [  199.893413] Code: f6 82 e0 5e 31 8b 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 
> e0
> 5e 31 8b 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 0

4.19.94-cip18-xeno10 regression: bugs out when changing drivers

2020-01-15 Thread Lange Norbert via Xenomai
Hello,

I had no problem with the previous 4.19.89 (and versions back to 4.14), but 
this kernel does not like unloading the linux eth driver.
Will try with the plain linux kernel in the coming days, but any help is 
appreciated.

Norbert

ethpci=":01:00.0"
echo "$ethpci" > /sys/bus/pci/devices/$ethpci/driver/unbind.

[  199.590152] BUG: unable to handle kernel NULL pointer dereference at 

[  199.597995] PGD 179717067 P4D 179717067 PUD 17896b067 PMD 0
[  199.603670] Oops:  [#1] SMP NOPTI
[  199.607344] CPU: 2 PID: 764 Comm: zsh Not tainted 
4.19.94-cip18-xeno10-static #1
[  199.614745] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.20 08/05/2019
[  199.624059] I-pipe domain: Linux
[  199.627300] RIP: 0010:strlen+0x0/0x20
[  199.630972] Code: f6 82 e0 5e 31 8b 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 
e0 5e 31 8b 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 
74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[  199.649742] RSP: 0018:ad3ec06ffb20 EFLAGS: 00010246
[  199.654975] RAX:  RBX:  RCX: 
[  199.662118] RDX:  RSI:  RDI: 
[  199.669258] RBP:  R08:  R09: 9756f996f018
[  199.676402] R10:  R11: 9756fb006c00 R12: 9756f9916788
[  199.683543] R13:  R14: 9756fa81e190 R15: 9756f8b66f20
[  199.690683] FS:  00535558() GS:9756fbb0() 
knlGS:
[  199.698780] CS:  0010 DS:  ES:  CR0: 80050033
[  199.704535] CR2:  CR3: 00017aaa CR4: 003406e0
[  199.711674] Call Trace:
[  199.714136]  kernfs_name_hash+0x12/0x80
[  199.717983]  kernfs_find_ns+0x35/0xd0
[  199.721654]  kernfs_remove_by_name_ns+0x32/0x90
[  199.726194]  remove_files.isra.0+0x30/0x70
[  199.730301]  sysfs_remove_group+0x3d/0x80
[  199.734321]  sysfs_remove_groups+0x29/0x40
[  199.738428]  device_remove_attrs+0x42/0x80
[  199.742534]  device_del+0x14f/0x360
[  199.746036]  cdev_device_del+0x15/0x30
[  199.749797]  posix_clock_unregister+0x21/0x50
[  199.754165]  ptp_clock_unregister+0x6e/0x80
[  199.758359]  igb_ptp_stop+0x1f/0x50
[  199.761861]  igb_remove+0x37/0x110
[  199.765272]  pci_device_remove+0x28/0x60
[  199.769202]  device_release_driver_internal+0x162/0x220
[  199.774437]  unbind_store+0xb1/0x170
[  199.778024]  kernfs_fop_write+0x10b/0x190
[  199.782042]  do_iter_write+0x140/0x180
[  199.785801]  vfs_writev+0xa6/0xf0
[  199.789127]  ? __alloc_fd+0x3d/0x140
[  199.792711]  ? f_dupfd+0x66/0x79
[  199.795949]  do_writev+0x5f/0x100
[  199.799273]  do_syscall_64+0x78/0x3d0
[  199.802944]  ? __do_page_fault+0x206/0x400
[  199.807049]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  199.812106] RIP: 0033:0x4cc34c
[  199.815172] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd 48 29 c2 
49 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 89 da 0f 05 <48> 89 c7 
e8 cc 4e ff ff 49 39 c6 75 b7 49 8b 47 58 49 8b 57 60 48
[  199.833943] RSP: 002b:7ffe32e417a0 EFLAGS: 0202 ORIG_RAX: 
0014
[  199.841521] RAX: ffda RBX: 0002 RCX: 004cc34c
[  199.848663] RDX: 0002 RSI: 7ffe32e417b0 RDI: 0001
[  199.855805] RBP: 0002 R08: 00523040 R09: 
[  199.862949] R10:  R11: 0202 R12: 0014
[  199.870091] R13: 7ffe32e417b0 R14: 000d R15: 00523040
[  199.877237] Modules linked in: plusb usbnet mii
[  199.881783] CR2: 
[  199.885115] ---[ end trace 218fd81d1aa77ca4 ]---
[  199.889741] RIP: 0010:strlen+0x0/0x20
[  199.893413] Code: f6 82 e0 5e 31 8b 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 
e0 5e 31 8b 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 
74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[  199.912189] RSP: 0018:ad3ec06ffb20 EFLAGS: 00010246
[  199.917424] RAX:  RBX:  RCX: 
[  199.924568] RDX:  RSI:  RDI: 
[  199.931711] RBP:  R08:  R09: 9756f996f018
[  199.938855] R10:  R11: 9756fb006c00 R12: 9756f9916788
[  199.946000] R13:  R14: 9756fa81e190 R15: 9756f8b66f20
[  199.953142] FS:  00535558() GS:9756fbb0() 
knlGS:
[  199.961237] CS:  0010 DS:  ES:  CR0: 80050033
[  199.966990] CR2:  CR3: 00017aaa CR4: 003406e0
0



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in

Xenomai + conan + cmake: Building with less headaches

2020-01-09 Thread Lange Norbert via Xenomai
This is some scheme I am trying to cook up for our internal development,
which should produce production binaries, and offering them for local
development.

It uses Xenomai 3.0.9, some time after 3.1 is released and I am less stressed
with other work I plan to clean up all involved stuff, so consider this a
preview.

Conan is not dependent on CMake, CMake is not dependent on Conan,
but the integration between those 2 is far better than other options.

## Developing with conan

this takes care of the dependencies (and their dependencies),
the basic idea is that you run `conan install ...` the dependencies end up
somewhere and the cwd has a script for your buildsystem to resolve those.

(Packaging for conan is harder, my attempt is at [2])

## Developing with CMake

CMake is one of the supported buildsystem, and I rigged up a patchset to add
support for Xenomai (I would want to upstream this one day).
As said, this is not strictly required but eases up integration.

CMake will take care of all those wrapper flags and bootstrap code.

## Big picture

I attached one small project, ripped from Xenomai and augmented with
CMake + conan files. A small script "build.sh" should be called from
*ouside* the source tree.

What this does is:

1)  Resolves the dependencies - in this case Xenomai.
It will try to resolve a binary first, this means the first time you will
build the Xenomai libraries, the next time you will reuse a cached version.

2)  Configures the project, CMake will pick up the dependencies

3)  Build

For a developer this means he just needs to install a local toolchain
(or none at all if the system compiler is sufficient for ex unit tests).
The packages can come from a variety of sources:

-   Existing open source packages (like gtest, xmlparsers, etc...).
if your architecture fits, you wont even have to compile once.
-   Locally shared packages. ie.: production binaries, typically by 
build-servers
-   Locally cached binaries. Your own builds, but potentially shared across 
projects

What I would want in the longterm is to get [1] upstreamed (after a cleanup or 
2),
the source package is uploaded to [2].
Going even further, the the conan packages could be built and uploaded for 
common enough targets, so building the example attached will work for ex. on 
plain x86 debian.

Regards, Norbert

[1] - https://github.com/nolange/cmake_xenomai
[2] - https://bintray.com/nolange79/conan/xenomai3%3Axenomai3


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You

-- next part --
A non-text attachment was scrubbed...
Name: cyclictest.tar.xz
Type: application/octet-stream
Size: 21232 bytes
Desc: cyclictest.tar.xz
URL: 



RE: CONFIG_XENO_DRIVERS_NET_CHECKED not possible to enable

2019-12-17 Thread Lange Norbert via Xenomai
> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 17. Dezember 2019 07:53
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: CONFIG_XENO_DRIVERS_NET_CHECKED not possible to enable
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.12.19 18:49, Lange Norbert via Xenomai wrote:
> > Seems to me like a lot instances of XENO_DRIVERS_NET_CHECKED should
> be
> > renamed to XENO_DRIVERS_RTNET_CHECKED
> >
>
> Good catch, was always wrong. But as the higher-level config is called
> XENO_DRIVERS_NET, we should rather rename
> XENO_DRIVERS_RTNET_CHECKED.

Fixing and enabling the option results in multiple compile errors,
so it probably was never tested either.


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: rtcap destroys packet contents

2019-12-17 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 17. Dezember 2019 08:10
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: rtcap destroys packet contents
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.12.19 13:36, Lange Norbert via Xenomai wrote:
> > Hi,
> >
> > I have such a setup, where I push/pull ethernet traffic from an Application
> every millisecond:
> >
> > [App] ---IDDP--> [RTSwitch] --ETH_P_ALL socket--> [rt_igp] [App]
> > <---IDDP-- [RTSwitch] <--ETH_P_ALL socket-- [rt_igp] [Tun]
> > <---tundev---/
> >
> >
> > Now I am sometimes missing packets that the other side should
> > definitely have sent. I wanted to use rtcap to inspect this issue, but
> unfortunately then I get messed up data so that I can't even finish some
> handshake procedures.
> >
> > Is there some incompatibility between RTCAP and ETH_P_ALL (which
> makes its own copies of rtskb's)?
> > Btw, I can't unload rtap/rtnet once loaded.
>
> There is at least no known incompatibility but it may simply have never been
> tested in that combination.
>
> Do you see corruption on packets sent from the rtcap machine or received by
> it?

I definitely see corruption from the "normal" reception over the raw packet 
socket (as soon as the rtcap module is active).
Seems like rtskb are reused before/while they are copied to userspace.

Analyzing the packets from the rtcap fake device for corruption would be 
time-consuming.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: [I-PIPE] ipipe-core-4.19.89-x86-9 released

2019-12-17 Thread Lange Norbert via Xenomai
Is there an easily digestible list of i-pipe changes (on top of the upstream 
Kernel)?

Norbert

> -Original Message-
> From: Xenomai  On Behalf Of xenomai---
> via Xenomai
> Sent: Dienstag, 17. Dezember 2019 09:47
> To: xenomai@xenomai.org
> Subject: [I-PIPE] ipipe-core-4.19.89-x86-9 released
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Download URL: https://xenomai.org/downloads/ipipe/v4.x/x86/ipipe-core-
> 4.19.89-x86-9.patch
>
> Repository: https://git.xenomai.org/ipipe-x86 Release tag: ipipe-core-
> 4.19.89-x86-9



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



CONFIG_XENO_DRIVERS_NET_CHECKED not possible to enable

2019-12-16 Thread Lange Norbert via Xenomai
Seems to me like a lot instances of XENO_DRIVERS_NET_CHECKED should be renamed 
to XENO_DRIVERS_RTNET_CHECKED

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: stalled head domain with 3.1rc4

2019-12-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 16. Dezember 2019 15:30
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.12.19 14:45, Jan Kiszka wrote:
> > On 16.12.19 14:03, Lange Norbert wrote:
> >>> I still need to find the actually used pattern in the latest kernel.
> >>> It's not the one I suspected.
> >>>
> >>> Do I have your config for this setup already?
> >>
> >> Attached now
> >
> > Analyzing... not very different to mine /wrt tracing.
> >
> > Can you reproduce the issue by running only some of the Xenomai test
> > cases while turning on tracing?
> >
>
> Please retry without CONFIG_JUMP_LABEL. I think this brings in the
> unsupported dynamic (and we should make it depend on !IPIPE).

Yes, cant reproduce anymore. Thanks.



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: stalled head domain with 3.1rc4

2019-12-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 16. Dezember 2019 13:57
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.12.19 13:50, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Freitag, 13. Dezember 2019 14:44
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: stalled head domain with 3.1rc4
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 13.12.19 14:35, Lange Norbert wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Jan Kiszka 
> >>>> Sent: Freitag, 13. Dezember 2019 14:13
> >>>> To: Lange Norbert ; Xenomai
> >>>> (xenomai@xenomai.org) 
> >>>> Subject: Re: stalled head domain with 3.1rc4
> >>>>
> >>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>>>
> >>>>
> >>>> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> >>>>> Same thing with panic trace enabled (another, longer trace with 4000
> >>>>> samples attached)
> >>>>>
> >>>>> [  292.743618] I-pipe: Detected stalled head domain, probably caused
> by
> >> a
> >>>> bug.
> >>>>> [  292.743618] A critical section may have been left 
> >>>>> unterminated.
> >>>>> [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
> >>>> 4.19.84-xeno8-static #1
> >>>>> [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> >>>> Product
> >>>>> Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
> >>>>> Linux [  292.778546] Call Trace:
> >>>>> [  292.781005]  
> >>>>> [  292.783034]  dump_stack+0x8c/0xc0
> >>>>> [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
> >>>>> ipipe_stall_root+0xe/0x60 [  292.794322]
> >>>>> __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
> >>>>> 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code:
> 55
> >>>>> 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
> >>>>> 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
> >>>>> 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
> >>>>>  RBX: 00025090 RCX:  [
> >>>>> 292.837219] RDX:  RSI: 000c6130 RDI:
> >>>>> 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
> >>>>>  R09: 0027e6d0 [  292.851514] R10:
> >>>>> 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
> >>>>> 292.858658] R13:  R14: 9e03bca0 R15:
> >>>>> 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
> >>>>> 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
> >>>>> clockevents_program_event+0x88/0xe0
> >>>>> [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
> >>>>> smp_apic_timer_interrupt+0x46/0x110
> >>>>> [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
> >>>>> __ipipe_handle_irq+0xa0/0x220 [  292.895699]
> >>>>> ipipe_reschedule_interrupt+0x12/0x40
> >>>>> [  292.900412]  
> >>>>> [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> >>>>> [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
> >>>>> fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> >>>>> [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202
> ORIG_RAX:
> >>>>> ff15 [  292.934210] RAX: 0003 RBX:
> >>>>> 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
> >>>>> 0001 RSI:  RDI: 0001 [
> >>>>> 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
> >>>>> 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
> >>>>> 

RE: stalled head domain with 3.1rc4

2019-12-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 13. Dezember 2019 14:44
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.12.19 14:35, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Freitag, 13. Dezember 2019 14:13
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: stalled head domain with 3.1rc4
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> >>> Same thing with panic trace enabled (another, longer trace with 4000
> >>> samples attached)
> >>>
> >>> [  292.743618] I-pipe: Detected stalled head domain, probably caused by
> a
> >> bug.
> >>> [  292.743618] A critical section may have been left unterminated.
> >>> [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
> >> 4.19.84-xeno8-static #1
> >>> [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> >> Product
> >>> Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
> >>> Linux [  292.778546] Call Trace:
> >>> [  292.781005]  
> >>> [  292.783034]  dump_stack+0x8c/0xc0
> >>> [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
> >>> ipipe_stall_root+0xe/0x60 [  292.794322]
> >>> __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
> >>> 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code: 55
> >>> 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
> >>> 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
> >>> 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
> >>>  RBX: 00025090 RCX:  [
> >>> 292.837219] RDX:  RSI: 000c6130 RDI:
> >>> 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
> >>>  R09: 0027e6d0 [  292.851514] R10:
> >>> 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
> >>> 292.858658] R13:  R14: 9e03bca0 R15:
> >>> 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
> >>> 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
> >>> clockevents_program_event+0x88/0xe0
> >>> [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
> >>> smp_apic_timer_interrupt+0x46/0x110
> >>> [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
> >>> __ipipe_handle_irq+0xa0/0x220 [  292.895699]
> >>> ipipe_reschedule_interrupt+0x12/0x40
> >>> [  292.900412]  
> >>> [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> >>> [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
> >>> fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> >>> [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX:
> >>> ff15 [  292.934210] RAX: 0003 RBX:
> >>> 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
> >>> 0001 RSI:  RDI: 0001 [
> >>> 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
> >>> 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
> >>> 97d43a003800 R12:  [  292.962789] R13:
> >>> 97d43aeb4c08 R14: 0004 R15: 0001 [
> >>> 292.969936]  ? optimize_nops.isra.0+0x90/0x90 [  292.974306]  ?
> >>> optimize_nops.isra.0+0x90/0x90 [  292.978673]  ?
> >>> xntimer_start+0x39/0x330 [  292.982519]  ? xntimer_start+0x3a/0x330 [
> >>> 292.986368]  on_each_cpu+0x28/0x50 [  292.989782]  ?
> >>> xntimer_start+0x39/0x330 [  292.993630]  text_poke_bp+0x68/0xde [
> >>> 292.997128]  ?
> >> trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> >>> [  293.003495]  __jump_label_transform.isra.0+0x102/0x150
> >>> [  293.008645]  arch_jump_label_transform+0x2e/0x40
> >>> [  293.013276]  __jump_label_update+0x67/0xa0 [  293.017382]
> >>> static_key_slow_inc_cpuslocked+0x75/0x80
> >>> [  293.022445]  static_key_slow_inc+0x16/0x20 [  293.026555]
>

rtcap destroys packet contents

2019-12-16 Thread Lange Norbert via Xenomai
Hi,

I have such a setup, where I push/pull ethernet traffic from an Application 
every millisecond:

[App] ---IDDP--> [RTSwitch] --ETH_P_ALL socket--> [rt_igp]
[App] <---IDDP-- [RTSwitch] <--ETH_P_ALL socket-- [rt_igp]
[Tun] <---tundev---/


Now I am sometimes missing packets that the other side should definitely have 
sent. I wanted to use rtcap
to inspect this issue, but unfortunately then I get messed up data so that I 
can't even finish some handshake procedures.

Is there some incompatibility between RTCAP and ETH_P_ALL (which makes its own 
copies of rtskb's)?
Btw, I can't unload rtap/rtnet once loaded.

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You




RE: stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 13. Dezember 2019 14:13
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> > Same thing with panic trace enabled (another, longer trace with 4000
> > samples attached)
> >
> > [  292.743618] I-pipe: Detected stalled head domain, probably caused by a
> bug.
> > [  292.743618] A critical section may have been left unterminated.
> > [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
> 4.19.84-xeno8-static #1
> > [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> Product
> > Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
> > Linux [  292.778546] Call Trace:
> > [  292.781005]  
> > [  292.783034]  dump_stack+0x8c/0xc0
> > [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
> > ipipe_stall_root+0xe/0x60 [  292.794322]
> > __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
> > 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code: 55
> > 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
> > 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
> > 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
> >  RBX: 00025090 RCX:  [
> > 292.837219] RDX:  RSI: 000c6130 RDI:
> > 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
> >  R09: 0027e6d0 [  292.851514] R10:
> > 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
> > 292.858658] R13:  R14: 9e03bca0 R15:
> > 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
> > 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
> > clockevents_program_event+0x88/0xe0
> > [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
> > smp_apic_timer_interrupt+0x46/0x110
> > [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
> > __ipipe_handle_irq+0xa0/0x220 [  292.895699]
> > ipipe_reschedule_interrupt+0x12/0x40
> > [  292.900412]  
> > [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> > [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
> > fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> > [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX:
> > ff15 [  292.934210] RAX: 0003 RBX:
> > 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
> > 0001 RSI:  RDI: 0001 [
> > 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
> > 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
> > 97d43a003800 R12:  [  292.962789] R13:
> > 97d43aeb4c08 R14: 0004 R15: 0001 [
> > 292.969936]  ? optimize_nops.isra.0+0x90/0x90 [  292.974306]  ?
> > optimize_nops.isra.0+0x90/0x90 [  292.978673]  ?
> > xntimer_start+0x39/0x330 [  292.982519]  ? xntimer_start+0x3a/0x330 [
> > 292.986368]  on_each_cpu+0x28/0x50 [  292.989782]  ?
> > xntimer_start+0x39/0x330 [  292.993630]  text_poke_bp+0x68/0xde [
> > 292.997128]  ?
> trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> > [  293.003495]  __jump_label_transform.isra.0+0x102/0x150
> > [  293.008645]  arch_jump_label_transform+0x2e/0x40
> > [  293.013276]  __jump_label_update+0x67/0xa0 [  293.017382]
> > static_key_slow_inc_cpuslocked+0x75/0x80
> > [  293.022445]  static_key_slow_inc+0x16/0x20 [  293.026555]
> > tracepoint_probe_register_prio+0x1f3/0x2a0
> > [  293.031790]  ?
> > trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> > [  293.038155]  __ftrace_event_enable_disable+0x6f/0x230
> > [  293.043217]  __ftrace_set_clr_event_nolock+0xe6/0x130
> > [  293.048280]  system_enable_write+0xaa/0xe0 [  293.052392]
> > do_iter_write+0x140/0x180 [  293.056151]  vfs_writev+0xa6/0xf0 [
> > 293.059484]  do_writev+0x5f/0x100 [  293.062813]
> > do_syscall_64+0x82/0x4e0 [  293.066489]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  293.071554] RIP: 0033:0x45874c
> > [  293.074619] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd
> > 48 29 c2 49 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 898
> > [  293.093397] RSP: 002b:7ffc91a57a00 EFLAGS: 0202 ORIG_RAX:
> > 0014 [  293.100983] RAX: ffda RBX:
> > 

3.1rc4: rcu_sched self-detected stall on CPU

2019-12-13 Thread Lange Norbert via Xenomai
Got this stall, when trying to reboot. Apparently a Xenomai process can't be 
killed.

[  350.298889] rcu: INFO: rcu_sched self-detected stall on CPU
[  350.304621] rcu:2-: (20999 ticks this GP) 
idle=546/1/0x4002 softirq=9363/9363 fqs=5108
[  350.314280] rcu: (t=21000 jiffies g=26533 q=91)
[  350.319134] NMI backtrace for cpu 2
[  350.322716] CPU: 2 PID: 1 Comm: systemd Tainted: GW 
4.19.84-xeno8-static #1
[  350.331151] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.20 08/05/2019
[  350.340542] I-pipe domain: Linux
[  350.343855] Call Trace:
[  350.346398]  
[  350.348510]  dump_stack+0x8c/0xc0
[  350.351922]  nmi_cpu_backtrace.cold+0x14/0x53
[  350.356371]  ? lapic_can_unplug_cpu.cold+0x42/0x42
[  350.361255]  nmi_trigger_cpumask_backtrace+0x7a/0x87
[  350.366314]  rcu_dump_cpu_stacks+0x86/0xbe
[  350.370507]  rcu_check_callbacks.cold+0x1fb/0x363
[  350.375318]  update_process_times+0x41/0x80
[  350.379595]  tick_sched_handle+0x34/0x50
[  350.383610]  tick_sched_timer+0x38/0x80
[  350.387540]  __hrtimer_run_queues+0xfd/0x270
[  350.391900]  ? tick_sched_do_timer+0x60/0x60
[  350.396277]  hrtimer_interrupt+0x106/0x230
[  350.400489]  smp_apic_timer_interrupt+0x46/0x110
[  350.405204]  __ipipe_do_sync_stage+0x15d/0x1c0
[  350.409748]  __ipipe_handle_irq+0xa0/0x220
[  350.413953]  apic_timer_interrupt+0x12/0x40
[  350.418225]  
[  350.420419] RIP: 0010:smp_call_function_single+0xd5/0x120
[  350.425907] Code: 00 00 75 5d 48 8d 65 e8 5b 41 5c 41 5d 5d c3 48 8d 74 24 
20 4c 89 e2 4c 89 e9 e8 56 fe ff ff 8b 54 24 38 83 e2 01 74 0b f3 90 <8b> 54 24 
38 83 e2 01 75 f5 eb bf 89 7c 24 1c e8 87 eb 01 00 8b 7c
[  350.444761] RSP: 0018:a22840047b80 EFLAGS: 0202 ORIG_RAX: 
ff13
[  350.452471] RAX:  RBX: 0001 RCX: 000b
[  350.459687] RDX: 0001 RSI: 956bbb561548 RDI: 000a0040
[  350.466902] RBP: a22840047bf8 R08: 000a0040 R09: 0002e248
[  350.474119] R10: b2dc R11: 006468fd R12: 8e56c120
[  350.481338] R13: a22840047c08 R14: a22840047cb0 R15: a22840047ce0
[  350.488571]  ? perf_cgroup_attach+0x70/0x70
[  350.492878]  ? trace+0x59/0x8d
[  350.496026]  ? perf_cgroup_attach+0x70/0x70
[  350.500306]  ? smp_call_function_single+0x5/0x120
[  350.505109]  task_function_call+0x45/0x70
[  350.509208]  ? perf_cgroup_switch+0x170/0x170
[  350.513657]  perf_cgroup_attach+0x37/0x70
[  350.517757]  cgroup_migrate_execute+0x2c3/0x370
[  350.522392]  cgroup_attach_task+0x154/0x1f0
[  350.526707]  cgroup_procs_write+0xc7/0x100
[  350.530905]  cgroup_file_write+0x88/0x150
[  350.535012]  kernfs_fop_write+0x10b/0x190
[  350.539121]  __vfs_write+0x34/0x190
[  350.542711]  ? __vfs_write+0x5/0x190
[  350.546377]  ? rcu_all_qs+0x5/0x80
[  350.549867]  vfs_write+0xb6/0x190
[  350.553282]  ksys_write+0x57/0xd0
[  350.556696]  do_syscall_64+0x82/0x4e0
[  350.560473]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  350.565614] RIP: 0033:0x7f6e72efd4e4
[  350.569280] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bc 0f 1f 80 00 
00 00 00 48 8d 05 19 22 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 54 c3 0f 1f 00 48 8d 64 24 d8 48 89 54 24 18
[  350.588133] RSP: 002b:7ffd434b8cb8 EFLAGS: 0246 ORIG_RAX: 
0001
[  350.595845] RAX: ffda RBX: 0005 RCX: 7f6e72efd4e4
[  350.603061] RDX: 0005 RSI: 7ffd434b8e8a RDI: 002a
[  350.610279] RBP: 7ffd434b8e8a R08:  R09: 0005
[  350.617495] R10:  R11: 0246 R12: 0005
[  350.624714] R13: 009abaf0 R14: 0005 R15: 7f6e72fc6760
[  413.301893] rcu: INFO: rcu_sched self-detected stall on CPU
[  413.307621] rcu:2-: (83670 ticks this GP) 
idle=546/1/0x4002 softirq=9363/9363 fqs=20382
[  413.317366] rcu: (t=84003 jiffies g=26533 q=170)
[  413.322307] NMI backtrace for cpu 2
[  413.325887] CPU: 2 PID: 1 Comm: systemd Tainted: GW 
4.19.84-xeno8-static #1
[  413.334321] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.20 08/05/2019
[  413.343712] I-pipe domain: Linux
[  413.347025] Call Trace:
[  413.349566]  
[  413.351677]  dump_stack+0x8c/0xc0
[  413.355089]  nmi_cpu_backtrace.cold+0x14/0x53
[  413.359535]  ? lapic_can_unplug_cpu.cold+0x42/0x42
[  413.364418]  nmi_trigger_cpumask_backtrace+0x7a/0x87
[  413.369480]  rcu_dump_cpu_stacks+0x86/0xbe
[  413.373672]  rcu_check_callbacks.cold+0x1fb/0x363
[  413.378482]  update_process_times+0x41/0x80
[  413.382761]  tick_sched_handle+0x34/0x50
[  413.386769]  tick_sched_timer+0x38/0x80
[  413.390698]  __hrtimer_run_queues+0xfd/0x270
[  413.395058]  ? tick_sched_do_timer+0x60/0x60
[  413.399434]  hrtimer_interrupt+0x106/0x230
[  413.403644]  smp_apic_timer_interrupt+0x46/0x110
[  413.408358]  __ipipe_

RE: stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai
30 [ 1842.477308] Code: 55 49
> 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63 40 18 4d 8b 
> a6
> 90 00 00 00 4c 03 24 c5 00 d3f [ 1842.496083] RSP: 0018:8fe9fba03e80
> EFLAGS: 0082 [ 1842.501324] RAX:  RBX:
> 00025090 RCX:  [ 1842.508468] RDX:
>  RSI: 0003b55f RDI: 8fe9fba305c8 [
> 1842.515609] RBP: 8fe9fba305c8 R08:  R09:
> 01acc52f873d [ 1842.522754] R10: 01acc52b974d R11:
> 01acc52b974d R12: 8fe9fba3aee0 [ 1842.529898] R13:
>  R14: b223bbe0 R15: 0003b55f [
> 1842.537044]  ? xntimer_start+0x3a/0x330 [ 1842.540889]  ?
> enqueue_hrtimer+0x36/0x90 [ 1842.544823]
> program_htick_shot+0x83/0x100 [ 1842.548931]
> clockevents_program_event+0x88/0xe0
> [ 1842.553561]  hrtimer_interrupt+0x140/0x230 [ 1842.557669]
> smp_apic_timer_interrupt+0x46/0x110
> [ 1842.562296]  __ipipe_do_sync_stage+0x130/0x180 [ 1842.566751]
> __ipipe_handle_irq+0x94/0x200 [ 1842.570860]
> apic_timer_interrupt+0x12/0x40 [ 1842.575054]   [ 1842.577163] RIP:
> 0010:smp_call_function_many+0x1b6/0x250
> [ 1842.582485] Code: e8 6f a0 6b 00 3b 05 dd 60 01 01 89 c7 0f 83 c4 fe ff ff 
> 48
> 63 c7 48 8b 0b 48 03 0c c5 00 d3 11 b2 8b 41 18 a8 01 745 [ 1842.601264] RSP:
> 0018:957380bbfba8 EFLAGS: 0202 ORIG_RAX: ff13 [
> 1842.608846] RAX: 0003 RBX: 8fe9fba34ac0 RCX:
> 8fe9fbbb8680 [ 1842.615989] RDX: 0001 RSI:
>  RDI: 0003 [ 1842.623133] RBP:
> b12179a0 R08: 8fe9fba34ac8 R09:  [ 1842.630276]
> R10: 000a R11: f000 R12:  [
> 1842.637417] R13: 8fe9fba34ac8 R14: 0004 R15:
> 0001 [ 1842.644565]  ? optimize_nops.isra.0+0x90/0x90 [
> 1842.648934]  ? smp_call_function_many+0x191/0x250
> [ 1842.653650]  ? optimize_nops.isra.0+0x90/0x90 [ 1842.658015]  ?
> xntimer_start+0x39/0x330 [ 1842.661859]  ? xntimer_start+0x3a/0x330 [
> 1842.665705]  on_each_cpu+0x28/0x50 [ 1842.669116]  ?
> xntimer_start+0x39/0x330 [ 1842.672959]  text_poke_bp+0x91/0xde [
> 1842.676460]  __jump_label_transform.isra.0+0x102/0x150
> [ 1842.681610]  arch_jump_label_transform+0x2e/0x40
> [ 1842.686239]  __jump_label_update+0x67/0xa0 [ 1842.690348]
> __static_key_slow_dec_cpuslocked+0x30/0x80
> [ 1842.695583]  static_key_slow_dec+0x23/0x50 [ 1842.699689]
> tracepoint_probe_unregister+0x176/0x1b0
> [ 1842.704661]  trace_event_reg+0x31/0xa0 [ 1842.708421]  ?
> mutex_lock+0x13/0x30 [ 1842.711921]
> __ftrace_event_enable_disable+0x120/0x230
> [ 1842.717072]  __ftrace_set_clr_event_nolock+0xe6/0x130
> [ 1842.722133]  system_enable_write+0xaa/0xe0 [ 1842.726240]
> __vfs_write+0x34/0x190 [ 1842.729739]  ? __check_heap_object+0x5/0x120
> [ 1842.734021]  ? __check_object_size+0x136/0x147 [ 1842.738474]  ?
> rcu_all_qs+0x5/0x80 [ 1842.741884]  vfs_write+0xb6/0x190 [ 1842.745210]
> ksys_write+0x57/0xd0 [ 1842.748537]  do_syscall_64+0x78/0x3c0 [
> 1842.752212]  ? __do_page_fault+0x207/0x400 [ 1842.756319]
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1842.761381] RIP: 0033:0x45f5d9
> [ 1842.76] Code: 89 d6 0f 05 c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 
> f8
> 4d 89 c2 48 89 f7 4d 89 c8 48 89 d6 4c 8b 4c 24 08 48 890 [ 1842.783220] RSP:
> 002b:7fff22863618 EFLAGS: 0246 ORIG_RAX: 0001 [
> 1842.790801] RAX: ffda RBX: 004013b0 RCX:
> 0045f5d9 [ 1842.797944] RDX: 0001 RSI:
> 7fff2286365f RDI: 0005 [ 1842.805086] RBP:
> 7fff228636c0 R08:  R09:  [
> 1842.812230] R10:  R11: 0246 R12:
> 7fff22863848 [ 1842.819372] R13: 7fff22863870 R14:
>  R15: 
>
>
> > -Original Message-
> > From: Xenomai  On Behalf Of Lange
> Norbert
> > via Xenomai
> > Sent: Freitag, 13. Dezember 2019 11:16
> > To: Xenomai (xenomai@xenomai.org) 
> > Subject: stalled head domain with 3.1rc4
> >
> > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >
> >
> > Just had a bug msg pop up. Its triggered by enabling tracing, while we
> > have 2 processes running, using IDDP, XDDP and RTNet (just packet
> > sockets, no ip stack).
> > Some points:
> >
> > -   trace-cmd stores in tmp, so shouldn't touch other filesystems than
> > tmpfs, sysfs
> >
> > -   upon starting this, our process complains about a 150ms hole in CPU
> time
> > (likely the time of the bug)
> >
> > -   it seems to happen only the first time after a boot
> >
> > -  

RE: stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai
 d6 4c 8b 4c 24 08 48 890
> [ 1842.783220] RSP: 002b:7fff22863618 EFLAGS: 0246 ORIG_RAX:
> 0001
> [ 1842.790801] RAX: ffda RBX: 004013b0 RCX:
> 0045f5d9
> [ 1842.797944] RDX: 0001 RSI: 7fff2286365f RDI:
> 0005
> [ 1842.805086] RBP: 00007fff228636c0 R08:  R09:
> 
> [ 1842.812230] R10:  R11: 0246 R12:
> 7fff22863848
> [ 1842.819372] R13: 7fff22863870 R14:  R15:
> 
>
>
> > -Original Message-
> > From: Xenomai  On Behalf Of Lange
> > Norbert via Xenomai
> > Sent: Freitag, 13. Dezember 2019 11:16
> > To: Xenomai (xenomai@xenomai.org) 
> > Subject: stalled head domain with 3.1rc4
> >
> > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> > ATTACHMENTS.
> >
> >
> > Just had a bug msg pop up. Its triggered by enabling tracing, while we have
> 2
> > processes running, using IDDP, XDDP and RTNet (just packet sockets, no ip
> > stack).
> > Some points:
> >
> > -   trace-cmd stores in tmp, so shouldn't touch other filesystems than
> > tmpfs, sysfs
> >
> > -   upon starting this, our process complains about a 150ms hole in CPU
> time
> > (likely the time of the bug)
> >
> > -   it seems to happen only the first time after a boot
> >
> > -   running trace-cmd "dry" (without our processes) doesn't trigger the
> bug.
> > Neither when disabling active communication on our project (per
> millisecond
> > up to 15 eth packets in both directions via packet socket, using the new
> > send/recv_mmsg calls).
> >
> > -   system seems to continue stable afterwards
> >
> > -   a trace is attached, not after triggering the bug (then it would 
> > just
> > contain our project in error state) but showing or project with active
> > communication  (ie. trace-cmd started a second time after a bug)
> >
> >
> > # trace-cmd record -e 'cobalt*'
> > [  160.443596] I-pipe: Detected stalled head domain, probably caused by a
> > bug.
> > [  160.443596] A critical section may have been left unterminated.
> > [  160.457178] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.84-xeno8-
> > static #1
> > [  160.464323] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> Product
> > Name, BIOS 5.12.30.21.20 08/05/2019
> > [  160.473640] I-pipe domain: Linux
> > [  160.476877] Call Trace:
> > [  160.479345]  dump_stack+0x8c/0xc0
> > [  160.482672]  ipipe_stall_root+0xc/0x30
> > [  160.486436]  __ipipe_trap_prologue+0x100/0x210
> > [  160.490894]  int3+0x45/0x70
> > [  160.493702] RIP: 0010:xnthread_resume+0x75/0x3a0
> > [  160.498329] Code: 0f eb 00 74 21 31 c0 ba 01 00 00 00 f0 0f b1 15 c5 0f 
> > eb 00
> > 85 c0 0f 85 db 02 00 00 4c 8b 2c 24 89 1d af 0f eb 00 4d0
> > [  160.517108] RSP: 0018:9934400a7dd8 EFLAGS: 0046
> > [  160.522349] RAX: 0001 RBX: 0001 RCX:
> > 7f37aa603700
> > [  160.529490] RDX: 0001 RSI: 0080 RDI:
> > 9934405dc240
> > [  160.536631] RBP: 9934405dc240 R08: 000f7df7 R09:
> > 9140f8cb2800
> > [  160.543774] R10: 03b3 R11: 000b8c4a R12:
> > 00025090
> > [  160.550918] R13: 0003 R14: 0080 R15:
> > 0080
> > [  160.558064]  ? xnthread_resume+0x75/0x3a0
> > [  160.562083]  ? xnthread_resume+0x1f/0x3a0
> > [  160.566104]  ipipe_migration_hook+0xda/0x1d0
> > [  160.570385]  complete_domain_migration+0x79/0xe0
> > [  160.575011]  __ipipe_switch_tail+0x39/0x50
> > [  160.579118]  __schedule+0x2d0/0x890
> > [  160.582615]  schedule_idle+0x28/0x40
> > [  160.586203]  do_idle+0x101/0x130
> > [  160.589440]  cpu_startup_entry+0x6f/0x80
> > [  160.593373]  start_secondary+0x169/0x1b0
> > [  160.597312]  secondary_startup_64+0xa4/0xb0
> >
> >
> >
> > Mit besten Grüßen / Kind regards
> >
> > NORBERT LANGE
> >
> > AT-RD3
> >
> > ANDRITZ HYDRO GmbH
> > Eibesbrunnergasse 20
> > 1120 Vienna / AUSTRIA
> > p: +43 50805 56684
> > norbert.la...@andritz.com<mailto:norbert.la...@andritz.com>
> > andritz.com<http://www.andritz.com/>
> >
> > 
> >
> > This message and any attachments are solely for the use of the intended
> > recipients. They may contain privileged and/or confidential information or
> > other informati

stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai
Just had a bug msg pop up. Its triggered by enabling tracing, while we have 2 
processes running, using IDDP, XDDP and RTNet (just packet sockets, no ip 
stack).
Some points:

-   trace-cmd stores in tmp, so shouldn't touch other filesystems than 
tmpfs, sysfs

-   upon starting this, our process complains about a 150ms hole in CPU 
time (likely the time of the bug)

-   it seems to happen only the first time after a boot

-   running trace-cmd "dry" (without our processes) doesn't trigger the 
bug. Neither when disabling active communication on our project (per 
millisecond up to 15 eth packets in both directions via packet socket, using 
the new send/recv_mmsg calls).

-   system seems to continue stable afterwards

-   a trace is attached, not after triggering the bug (then it would just 
contain our project in error state) but showing or project with active 
communication  (ie. trace-cmd started a second time after a bug)


# trace-cmd record -e 'cobalt*'
[  160.443596] I-pipe: Detected stalled head domain, probably caused by a bug.
[  160.443596] A critical section may have been left unterminated.
[  160.457178] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.84-xeno8-static #1
[  160.464323] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.20 08/05/2019
[  160.473640] I-pipe domain: Linux
[  160.476877] Call Trace:
[  160.479345]  dump_stack+0x8c/0xc0
[  160.482672]  ipipe_stall_root+0xc/0x30
[  160.486436]  __ipipe_trap_prologue+0x100/0x210
[  160.490894]  int3+0x45/0x70
[  160.493702] RIP: 0010:xnthread_resume+0x75/0x3a0
[  160.498329] Code: 0f eb 00 74 21 31 c0 ba 01 00 00 00 f0 0f b1 15 c5 0f eb 
00 85 c0 0f 85 db 02 00 00 4c 8b 2c 24 89 1d af 0f eb 00 4d0
[  160.517108] RSP: 0018:9934400a7dd8 EFLAGS: 0046
[  160.522349] RAX: 0001 RBX: 0001 RCX: 7f37aa603700
[  160.529490] RDX: 0001 RSI: 0080 RDI: 9934405dc240
[  160.536631] RBP: 9934405dc240 R08: 000f7df7 R09: 9140f8cb2800
[  160.543774] R10: 03b3 R11: 000b8c4a R12: 00025090
[  160.550918] R13: 0003 R14: 0080 R15: 0080
[  160.558064]  ? xnthread_resume+0x75/0x3a0
[  160.562083]  ? xnthread_resume+0x1f/0x3a0
[  160.566104]  ipipe_migration_hook+0xda/0x1d0
[  160.570385]  complete_domain_migration+0x79/0xe0
[  160.575011]  __ipipe_switch_tail+0x39/0x50
[  160.579118]  __schedule+0x2d0/0x890
[  160.582615]  schedule_idle+0x28/0x40
[  160.586203]  do_idle+0x101/0x130
[  160.589440]  cpu_startup_entry+0x6f/0x80
[  160.593373]  start_secondary+0x169/0x1b0
[  160.597312]  secondary_startup_64+0xa4/0xb0



Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You

-- next part --
A non-text attachment was scrubbed...
Name: trace.dat.xz
Type: application/octet-stream
Size: 2775472 bytes
Desc: trace.dat.xz
URL: 



Inspecting heap allocations?

2019-12-12 Thread Lange Norbert via Xenomai
Hello,

I have some circumstances where I run out of global heap and then simple stull 
like creating a mutex fails with ENOMEM.
My suspicion is an IDDP connection between 2 processes, where 1 process might 
send a lot small packets before the other will pull them (I will try using a 
local pool).

In the future, is there a way to dump the owners of heap objects?

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Moving on

2019-12-11 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of Philippe
> Gerum via Xenomai
> Sent: Montag, 2. Dezember 2019 17:05
> To: Xenomai@xenomai.org
> Subject: Moving on
>
>
> It has been two years since I stepped down as Xenomai's lead maintainer.
> In the meantime, Jan took over and did a very good job in this role as
> expected.
>
> Since this transition period is now taking an end, I'm switching focus to the
> EVL project [1] I have been nurturing for some time, with the goal to make it
> production-grade next year. In a nutshell, EVL is about laying the groundwork
> for dual kernel systems to become first-class citizens of Linux. This work
> includes a SMP scalable real-time core which can be maintained with
> common kernel knowledge over the latest mainline kernel series, a standard
> driver model, and a single, compact API
>
> I will still review patches coming my way for the I-pipe ARM and arm64 trees
> for a few weeks, until the next maintainer is appointed for these
> architectures. I'll be around to help with Xenomai core issues if needed.

Doesnt seem like you will move too far away, good luck with EVL (and getting it 
upstreamed if that’s the aim).

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Request: offer CLOCK_HOST_MONOTONIC for systemwide tracing

2019-11-27 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 27. November 2019 18:06
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Cc: mathieu.desnoy...@efficios.com
> Subject: Re: Request: offer CLOCK_HOST_MONOTONIC for systemwide
> tracing
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 27.11.19 17:20, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > Systemwide traces would require the same clock, so Linux processes use
> > CLOCK_MONOTONIC (they normally do by default), Kernel traces should
> do so too (ktime_get_mono_fast_ns).
> > Xenomai Processes/Thread have no reliable to produce matching traces.
> > The idea would be to configure the tracing frameworks to use the new
> CLOCK_HOST_MONOTONIC for those.
>
> Do we know what makes the two monotonic clock deviate? Does Linux apply
> the frequency modifications of the realtime clock on the monotonic one as
> well?

Yes it does, it leaves out the jumps but rate is adjusted.

>
> > As concrete example, Lttng would use a clock plugin (only) for Xenomai.
> >
> > I looked at the kernel sources if I can add this myself quickly, but
> > seems to be involve some stuff like kevents which I know nothing about
> (killing the “quickly” part at the very least).
> >
> > The request would be that
> > - Similar to CLOCK_HOST_REALTIME, a CLOCK_HOST_MONOTONIC would
> result in timestamps that a very close to the plain Linux CLOCK_MONOTONIC.
> > - This should happen without syscalls and be very efficient
> > - In respect to being able to attach preloaded tracer dso’s (like
> > tracing malloc/free), the call should work correctly before cobalt is
> > initialized (could be worked around, but still..)
> >
> > Other alternatives (most not realistic):
> >
> > Calling the linux vdso functions might result in deadlocks [1].
> >
> > A “safe” variant like ktime_get_mono_fast_ns is not accessible per syscall
> or userspace vDSO.
> >
> > The normal xenomai CLOCK_MONOTONIC is not skew corrected to
> REALTIME
> >
> > CLOCK_MONOTONIC_RAW perhaps might be easily convertible between
> > domains, but likely problematic on several CPUs (unstable TSC, not in
> > sync between CPUs)
>
> We do not support such CPUs with Xenomai (and they get rarer these days).

You use some specific architectural timer like rdstc on x86. The monotonic clock
could have different/switchable sources.

> >
> > As a sidenote, recent Kernels (5.3?) have a cleaned up and unified
> > vDSO framework, Potentially could grab stuff from there.
>
> If we go for hardening the kernel's gtod data updates, I guess we could also
> switch to that as source. I now wonder why that path wasn't considered back
> then, or why it wasn't possible.

Well, they are paranoid about keeping the monotonic clock monotonic.
Look at the implementation for ktime_get_mono_fast_ns,
this has a proper implementation for "safe" access, but it might violate
the monotonic property now and then.
(which would be perfectly fine for a timestamp source)

>
> Thinking longer term, synchronizing the two clock sets not just for tracing
> would likely be beneficial - PTP/TSN...

Sure, but I would keep a clock that is as simple as cobalts clock now (ie. what 
should have been
CLOCK_MONOTONIC_RAW and arguably the default),
for durations of max a couple seconds you typically need, everything else would 
be overkill.

For timestamps (and absolute waits) a synchronized clock might make sense.

Adding the interpolation values from ktime_get_mono_fast_ns to the Linux vDSO
without the usual lock (but alternating buffers, like the function does)
might be the best solution, AFAIU this time is the same as CLOCK_MONOTONIC,
 (ie. but is not "bulletproof monotonic") used for kernel traces.


Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Request: offer CLOCK_HOST_MONOTONIC for systemwide tracing

2019-11-27 Thread Lange Norbert via Xenomai
Hello,

Systemwide traces would require the same clock, so Linux processes use 
CLOCK_MONOTONIC (they normally do by default),
Kernel traces should do so too (ktime_get_mono_fast_ns).
Xenomai Processes/Thread have no reliable to produce matching traces. The idea 
would be to
configure the tracing frameworks to use the new CLOCK_HOST_MONOTONIC for those.
As concrete example, Lttng would use a clock plugin (only) for Xenomai.

I looked at the kernel sources if I can add this myself quickly, but seems to 
be involve
some stuff like kevents which I know nothing about (killing the “quickly” part 
at the very least).

The request would be that
- Similar to CLOCK_HOST_REALTIME, a CLOCK_HOST_MONOTONIC would result in 
timestamps that a very close to the plain Linux CLOCK_MONOTONIC.
- This should happen without syscalls and be very efficient
- In respect to being able to attach preloaded tracer dso’s (like tracing 
malloc/free), the call should work correctly before cobalt is initialized 
(could be worked around, but still..)

Other alternatives (most not realistic):

Calling the linux vdso functions might result in deadlocks [1].

A “safe” variant like ktime_get_mono_fast_ns is not accessible per syscall or 
userspace vDSO.

The normal xenomai CLOCK_MONOTONIC is not skew corrected to REALTIME

CLOCK_MONOTONIC_RAW perhaps might be easily convertible between domains, but 
likely
problematic on several CPUs (unstable TSC, not in sync between CPUs)

As a sidenote, recent Kernels (5.3?) have a cleaned up and unified vDSO 
framework,
Potentially could grab stuff from there.

[1] - https://www.xenomai.org/pipermail/xenomai/2018-December/040134.html



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Impact and use of the membarrier syscall

2019-11-25 Thread Lange Norbert via Xenomai
Hello,

I looked at this sycall, as lttng makes use of it, and the last posts on this 
ML would imply that Xenomai is using "IPI" aswell.
This raises a few questions:


-  Could an linux app (like lttng), that uses the membarrier syscall 
create IRQs/IPIs that ultimately negative affect realtime threads?
Ie. should this be kernel config be recommended to be disabled for Xenomai.

-  Does the syscall do anything for running Xenomai threads 
(particularly, Lttng makes use of MEMBARRIER_CMD_PRIVATE_EXPEDITED).

Regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschr?nkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 21. November 2019 14:46
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: urcu/lttng (Userspace) and Xenomai
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 21.11.19 11:26, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I am trying to figure out if Xenomai would work correctly with Lttng.
> > Currently I haven’t figured out how the system manages buffers, but I am
> checking if this would be generally applicable to Xenomai.
> >
> > I’d like to know if anyone has already used Lttng UST with xenomai
> > threads, and if there is any need to compile lttng/liburcu for xenomai or
> using some patches.
> > (I haven’t seen anything that indicates it would not work).
> >
> > ## urcu flavours
> > This has a few variants, lttng uses the bulletproof one. Most others
> > should be faster on average – but all of them might unlock a futex with a
> raw syscall.
> >
> > Other flavours like qsbr could likely be faster if the futex sycall
> > would be replaced with a cobalt mutex (it’s very unlikely this path is
> executed). Would need some work to get this done (and lttng to use it).
> >
> > ## sys_membarrier
> > recent kernels and liburcu versions support this syscall, which
> > supposedly allows removal of reader memory barriers.
> > The syscall will somehow interrupt the threads (all *running threads* of
> the process), which implicitly causes a barrier for readers.
> >
> > Q: I guess this will *not* interrupt xenomai threads, as their shadow linux
> thread is not *running*?
> > Q: x86_64 accesses are strictly ordered, do you actually need membarriers
> at all?
> >
>
> I didn't look into details of enabling userspace lttng yet, but I had a chat 
> with
> Mathieu about this, maybe a year ago. He said back then that there is also a
> polling mode where a data collection thread is simply trying to obtain the
> trace output time-driven.

I believe that’s the "bulletproof" rcu mode that lttng uses. I don’t see any 
OS-level
synchronization in the readers, only some atomic variables.
Mathieu is a lttng dev?

> Then the producer (including cobalt threads) would
> not need any syscall at all.

In the context of lttng those are readers (of the shared rcu structures),
writes would only happen if tracepoint providers are added/removed.

But then I don’t know how the buffers are managed, this appears to
be system-wide in another process.

The sys_membarrier syscall would be called by writers (not xenomai threads) to 
additionally allow
instructions like dmb (for arm) around atomic accesses to be removed for the 
readers.
I think it's useless for x86_64 and the syscall itself would not do anything 
for running xenomai threads.
(you can only force the syscall but not disable it, without changing sources 
that is).

> As I said, that was just a conceptual discussion.
> None of us actually looked into the implementation.

Hmm, would like to test this soon. Still need a way to totally disable it 
in-case something goes wrong.. ie ugly macro magic.
Can you tell me that I am right about 
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) not blocking until
the running xenomai thread had some sort of syscall synchronization?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



urcu/lttng (Userspace) and Xenomai

2019-11-21 Thread Lange Norbert via Xenomai
Hello,

I am trying to figure out if Xenomai would work correctly with Lttng. Currently 
I haven’t figured out how the system manages buffers,
but I am checking if this would be generally applicable to Xenomai.

I’d like to know if anyone has already used Lttng UST with xenomai threads,
and if there is any need to compile lttng/liburcu for xenomai or using some 
patches.
(I haven’t seen anything that indicates it would not work).

## urcu flavours
This has a few variants, lttng uses the bulletproof one. Most others should be
faster on average – but all of them might unlock a futex with a raw syscall.

Other flavours like qsbr could likely be faster if the futex sycall would be 
replaced with a cobalt mutex
(it’s very unlikely this path is executed). Would need some work to get this 
done (and lttng to use it).

## sys_membarrier
recent kernels and liburcu versions support this syscall, which supposedly
allows removal of reader memory barriers.
The syscall will somehow interrupt the threads (all *running threads* of the 
process), which implicitly causes a barrier for readers.

Q: I guess this will *not* interrupt xenomai threads, as their shadow linux 
thread is not *running*?
Q: x86_64 accesses are strictly ordered, do you actually need membarriers at 
all?

Kind regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Deadlock during debugging

2019-11-19 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 19. November 2019 07:51
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Deadlock during debugging
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 18.11.19 18:31, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Montag, 18. November 2019 18:22
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: Deadlock during debugging
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 18.11.19 17:24, Lange Norbert via Xenomai wrote:
> >>> One more,
> >>>
> >>> Note that there seem to be quite different reports, from a recursive
> fault
> >> to some threads getting marked as "runaway".
> >>> I can reproduce the issue now easily, but its proprietary software I cant
> >> reach around.
> >>
> >> Understood. Will try to read something from the traces.
> >>
> >> This is a regression over 3.0 now, correct?
> >
> > No, can't say that. I had various recurring issues with 4.9, 4.14 and 4.19
> kernels,
> > aswell as 3.0 and and 3.1.
> > It's hard to narrow down and often just vanished after a while, and my only
> > gut-feeling is that condition variables are involved.
> > I also have a couple cobalt threads *not* pinned to a single cpu.
>
> I'm only talking about the crash during debug - one issue after the other.

So did I, the crashes produce different logs (likely just the effect, after 
some mem corruption)
and I occasionally had these crashes for a while. So this does *not* appear to 
be a 3.1 regression,
which is why I brought these points up.

Compiling the kernel with another gcc and pinning the threads did not help BTW.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Deadlock during debugging

2019-11-18 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 18. November 2019 18:22
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Deadlock during debugging
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 18.11.19 17:24, Lange Norbert via Xenomai wrote:
> > One more,
> >
> > Note that there seem to be quite different reports, from a recursive fault
> to some threads getting marked as "runaway".
> > I can reproduce the issue now easily, but its proprietary software I cant
> reach around.
>
> Understood. Will try to read something from the traces.
>
> This is a regression over 3.0 now, correct?

No, can't say that. I had various recurring issues with 4.9, 4.14 and 4.19 
kernels,
aswell as 3.0 and and 3.1.
It's hard to narrow down and often just vanished after a while, and my only
gut-feeling is that condition variables are involved.
I also have a couple cobalt threads *not* pinned to a single cpu.

Atleast I can now say it’s a single app causing the issue, not using rtnet or 
having additional cobalt applications running.
Since I can easily reproduce the issue, I will now try using debian's gcc-8, to 
rule out troubles with the toolchain.

Norbert.



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Deadlock during debugging

2019-11-18 Thread Lange Norbert via Xenomai
One more,

Note that there seem to be quite different reports, from a recursive fault to 
some threads getting marked as "runaway".
I can reproduce the issue now easily, but its proprietary software I cant reach 
around.

Norbert

[  226.354729] I-pipe: Detected stalled head domain, probably caused by a bug.
[  226.354729] A critical section may have been left unterminated.
[  226.370156] CPU: 1 PID: 0 Comm: swapper/2 Tainted: GW 
4.19.84-xenod8-static #1
[  226.370160] CPU: 2 PID: 732 Comm: fup.fast Tainted: GW 
4.19.84-xenod8-static #1
[  226.378775] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[  226.387475] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[  226.396782] I-pipe domain: Linux
[  226.406089] I-pipe domain: Linux
[  226.409320] RIP: 0010:do_idle+0xaf/0x140
[  226.412549] Call Trace:
[  226.416476] Code: 85 92 00 00 00 e8 51 f5 04 00 e8 bc 65 03 00 e8 77 36 7c 
00 f0 80 4d 02 20 9c 58 f6 c4 02 74 7e e8 66 2d 07 00 48 85 c0 74 6b <0f> 0b e8 
0a 42 07 00 e8 45 68 03 00 9c 58 f6 c4 02 0f 85 79 ff ff
[  226.418936]  dump_stack+0x8c/0xc0
[  226.437687] RSP: 0018:932cc00afef8 EFLAGS: 00010002
[  226.441009]  ipipe_root_only.cold+0x11/0x32
[  226.446240]  ipipe_stall_root+0xe/0x60
[  226.450424] RAX: 0001 RBX: 0002 RCX: 000b
[  226.454182]  __ipipe_trap_prologue+0x2ae/0x2f0
[  226.461319] RDX: a3fc RSI: 8f63f99c8208 RDI: 
[  226.465767]  ? __ipipe_complete_domain_migration+0x40/0x40
[  226.472899] RBP: 8f63f815a7c0 R08:  R09: 0002e248
[  226.478386]  invalid_op+0x26/0x51
[  226.485518] R10: 00015800 R11: 003480cf3801 R12: 8f63f815a7c0
[  226.488839] RIP: 0010:xnthread_suspend+0x3ef/0x540
[  226.495973] R13:  R14:  R15: 
[  226.500766] Code: 58 12 00 00 4c 89 e7 e8 ef ca ff ff 41 83 8c 24 c4 11 00 
00 01 e9 82 fd ff ff 0f 0b 48 83 bf 58 12 00 00 00 0f 84 49 fc ff ff <0f> 0b 0f 
0b 9c 58 f6 c4 02 0f 84 85 fd ff ff fa bf 00 00 00 80 e8
[  226.507900] FS:  () GS:8f63f980() 
knlGS:
[  226.52] RSP: 0018:932cc083bd60 EFLAGS: 00010082
[  226.534755] CS:  0010 DS:  ES:  CR0: 80050033
[  226.539986] CR2: 7ff8dca27000 CR3: 000174c54000 CR4: 003406e0
[  226.545738] RAX: 932cc0617e30 RBX: 00025090 RCX: 
[  226.552870] Call Trace:
[  226.560005] RDX:  RSI: 0002 RDI: 932cc0616240
[  226.562461]  cpu_startup_entry+0x6f/0x80
[  226.569590] RBP: 932cc0617e08 R08: 932cc0617e08 R09: 0005cc88
[  226.573520]  start_secondary+0x169/0x1b0
[  226.580655] R10:  R11:  R12: 932cc0616240
[  226.584585]  secondary_startup_64+0xa4/0xb0
[  226.591716] R13:  R14:  R15: 932cc0617e08
[  226.595905] ---[ end trace aa5dc96dbf303c58 ]---
[  226.603042]  xnsynch_sleep_on+0x117/0x2d0
[  226.611670]  __cobalt_cond_wait_prologue+0x29f/0x950
[  226.616647]  ? __cobalt_cond_wait_prologue+0x950/0x950
[  226.621798]  CoBaLt_cond_wait_prologue+0x23/0x30
[  226.626425]  handle_head_syscall+0xe1/0x370
[  226.630618]  ipipe_fastcall_hook+0x14/0x20
[  226.634724]  ipipe_handle_syscall+0x57/0xe0
[  226.638920]  do_syscall_64+0x4b/0x500
[  226.642598]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  226.647660] RIP: 0033:0x77f9c134
[  226.651244] Code: 8b 73 04 49 89 dc e8 fb ef ff ff 48 89 de 48 8b 5c 24 10 
45 31 c0 b9 23 00 00 10 48 8d 54 24 44 45 31 d2 48 89 df 89 c8 0f 05 <8b> 7c 24 
2c 31 f6 49 89 c5 89 c5 e8 cc ef ff ff 4c 89 ff e8 74 e9
[  226.670014] RSP: 002b:7fffe1a6bb10 EFLAGS: 0246 ORIG_RAX: 
1023
[  226.677599] RAX: ffda RBX: 74d91c78 RCX: 77f9c134
[  226.684744] RDX: 7fffe1a6bb54 RSI: 74d91c48 RDI: 74d91c78
[  226.691885] RBP: 7fffe1a6bc30 R08:  R09: 
[  226.699027] R10:  R11: 0246 R12: 74d91c48
[  226.706166] R13:  R14: 0001 R15: 7fffe1a6bb60
[  226.713325] I-pipe tracer log (100 points):
[  226.717520]  |*+func0 ipipe_trace_panic_freeze+0x0 
(ipipe_root_only+0xcf)
[  226.726114]  |*+func0 ipipe_root_only+0x0 
(ipipe_stall_root+0xe)
[  226.733926]  |*+func   -1 ipipe_stall_root+0x0 
(__ipipe_trap_prologue+0x2ae)
[  226.742431]  |# func   -2 ipipe_trap_hook+0x0 
(__ipipe_notify_trap+0x98)
[  226.750590]  |# func   -3 __ipipe_notify_trap+0x0 
(__ipipe_trap_prologue+0x7f)
[  226.759268]  |# func   -3 __ipipe_trap_prologue+0x0 
(invalid_op+0x26)
[  226.767167]  |# func   -5 xnthread_suspend+0x0 
(xnsynch_sleep_on+0x117)
[  226.7752

RE: Deadlock during debugging

2019-11-18 Thread Lange Norbert via Xenomai
New crash, same thing with ipipe panic trace (the decoded log does not add 
information to the relevant parts).

Is the dump_stack function itself trashing the stack?

[  168.411205] [Xenomai] watchdog triggered on CPU #1 -- runaway thread 'main' 
signaled
[  209.176742] [ cut here ]
[  209.181381] xnthread_relax() failed for thread aboard_runner[790]
[  209.181389] BUG: Unhandled exception over domain Xenomai at 
0x7fed - switching to ROOT
[  209.196451] CPU: 0 PID: 790 Comm: aboard_runner Tainted: GW 
4.19.84-xenod8-static #1
[  209.205588] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[  209.214900] I-pipe domain: Linux
[  209.218137] Call Trace:
[  209.220593]  dump_stack+0x8c/0xc0
[  209.223919]  __ipipe_trap_prologue.cold+0x1f/0x5e
[  209.228629]  invalid_op+0x26/0x51
[  209.231952] RIP: 0010:xnthread_relax+0x46d/0x4a0
[  209.236576] Code: f6 83 c2 11 00 00 01 75 0e 48 8b 03 48 85 c0 74 33 8b 90 
c0 04 00 00 48 8d b3 5c 14 00 00 48 c7 c7 90 00 8b 9a e8 02 02 ef ff <0f> 0b e9 
42 fd ff ff 89 c6 48 c7 c7 c4 f8 a3 9a e8 2e 71 f3 ff e9
[  209.255347] RSP: 0018:9a0e4074fd90 EFLAGS: 00010286
[  209.260586] RAX:  RBX: 9a0e4065aa40 RCX: 000b
[  209.267728] RDX: 5129 RSI: 902a794791f8 RDI: 007800c0
[  209.274869] RBP: 9a0e4074fe68 R08: 007800c0 R09: 0002e248
[  209.282013] R10: 9bb72040 R11: 9bb3209c R12: 9bbfdc80
[  209.289157] R13: 902a76da8000 R14: 0001 R15: 0292
[  209.296299]  ? xnthread_prepare_wait+0x20/0x20
[  209.300752]  ? trace+0x59/0x8d
[  209.303814]  ? __cobalt_clock_nanosleep+0x540/0x540
[  209.308700]  handle_head_syscall+0x307/0x370
[  209.312979]  ipipe_fastcall_hook+0x14/0x20
[  209.317083]  ipipe_handle_syscall+0x57/0xe0
[  209.321280]  do_syscall_64+0x4b/0x500
[  209.324950]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  209.330011] RIP: 0033:0x77f9bd68
[  209.333598] Code: 89 fb bf 01 00 00 00 48 83 ec 18 48 8d 74 24 0c e8 bd f3 
ff ff b9 19 00 00 10 48 63 f5 48 63 fb 4d 89 ea 4c 89 e2 89 c8 0f 05 <8b> 7c 24 
0c 48 89 c3 31 f6 e8 9a f3 ff ff 48 83 c4 18 89 d8 f7 d8
[  209.352370] RSP: 002b:7fffe7d0 EFLAGS: 0246 ORIG_RAX: 
1019
[  209.359954] RAX: fe00 RBX: 0001 RCX: 77f9bd68
[  209.367098] RDX: 7fffe820 RSI: 0001 RDI: 0001
[  209.374237] RBP: 0001 R08: 0001 R09: 0014
[  209.381381] R10: 7fffe820 R11: 0246 R12: 7fffe820
[  209.388524] R13: 7fffe820 R14:  R15: 
[  209.395665] I-pipe tracer log (100 points):
[  209.399857]  | #func0 ipipe_trace_panic_freeze+0x0 
(__ipipe_trap_prologue+0x237)
[  209.409056]  | +func0 ipipe_root_only+0x0 
(ipipe_stall_root+0xe)
[  209.416862]  | +func   -1 ipipe_stall_root+0x0 
(__ipipe_trap_prologue+0x2ae)
[  209.425365]  |+ func   -2 ipipe_trap_hook+0x0 
(__ipipe_notify_trap+0x98)
[  209.433523]  |+ func   -3 __ipipe_notify_trap+0x0 
(__ipipe_trap_prologue+0x7f)
[  209.442199]  |+ func   -4 __ipipe_trap_prologue+0x0 
(invalid_op+0x26)
[  209.450097]  |+ end 0x8001 -5 
__ipipe_spin_unlock_irqrestore+0x4f (<>)
[  209.458425]  |# func   -6 __ipipe_spin_unlock_irqrestore+0x0 
(__ipipe_log_printk+0x69)
[  209.467797]  |+ begin   0x8001-10 __ipipe_spin_lock_irqsave+0x5e 
(<>)
[  209.475693]   + func  -10 __ipipe_spin_lock_irqsave+0x0 
(__ipipe_log_printk+0x22)
[  209.484630]   + func  -10 __ipipe_log_printk+0x0 
(__warn_printk+0x6c)
[  209.492525]  |+ end 0x8001-11 do_vprintk+0xf6 (<>)
[  209.499120]  |+ begin   0x8001-11 do_vprintk+0x106 (<>)
[  209.505799]   + func  -12 do_vprintk+0x0 (__warn_printk+0x6c)
[  209.513000]   + func  -12 vprintk+0x0 (__warn_printk+0x6c)
[  209.519939]  |+ end 0x8001-12 ipipe_raise_irq+0x70 (<>)
[  209.526969]  |+ func  -13 __ipipe_set_irq_pending+0x0 
(__ipipe_dispatch_irq+0xad)
[  209.535905]  |+ func  -14 __ipipe_dispatch_irq+0x0 
(ipipe_raise_irq+0x7e)
[  209.544148]  |+ begin   0x8001-14 ipipe_raise_irq+0x64 (<>)
[  209.551178]   + func  -15 ipipe_raise_irq+0x0 
(__ipipe_log_printk+0x84)
[  209.559250]  |+ end 0x8001-15 
__ipipe_spin_unlock_irqrestore+0x4f (<>)
[  209.567581]  |# func  -15 __ipipe_spin_unlock_irqrestore+0x0 
(__ipipe_log_printk+0x69)
[  209.576951]  |+ begin   0x8001-17 __ipipe_spin_lock_irqsave+0x5e 
(<>)
[  209.584847]   + func  -18 __ipipe_spin_lock_irqsave+0x0 
(__ipipe_log_printk+0x22)
[  2

Deadlock during debugging

2019-11-18 Thread Lange Norbert via Xenomai
Hello,

Here's one of my deadlocks, the output seems interleaved from 2 concurrent 
dumps,
I ran the crashlog through decode_stacktrace.sh.

I got to this, after enabling a breakpoint in gdb  (execution did stop there), 
setting another breakpoint and
hitting continue.

[  135.414273] CPU: 1 PID: 0 Comm: swapper/2 Tainted: GW 
4.19.84-xeno8-static #1
[  135.414275] I-pipe: Detected stalled head domain, probably caused by a bug.
[  135.414275] A critical section may have been left unterminated.
[  135.414287] CPU: 2 PID: 798 Comm: fup.fast Tainted: GW 
4.19.84-xeno8-static #1
[  135.422810] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[  135.436373] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[  135.444984] I-pipe domain: Linux
[  135.454290] I-pipe domain: Linux
[  135.463598] RIP: 0010:rcu_nmi_exit+0x140/0x150
[  135.466825] Call Trace:
[  135.470057] Code: 45 89 f0 4c 89 f9 4c 89 e2 4c 89 ee ff d0 48 8b 03 48 85 
c0 75 e2 48 8b 45 08 4c 8d 78 fe e9 5b ff ff ff 0f 0b e9 ee fe ff ff <0f> 0b e9 
f8
[  135.474513]  dump_stack+0x8c/0xc0
[  135.476950] RSP: 0018:a3513bb03f18 EFLAGS: 00010046
[  135.495720]  ipipe_stall_root+0xc/0x30
[  135.504264]  __ipipe_trap_prologue+0x209/0x210
[  135.508011] RAX: 000573f4 RBX: 00019480 RCX: 001f
[  135.512458]  invalid_op+0x26/0x51
[  135.519592] RDX:  RSI: 50523fbe RDI: 0001
[  135.522914] RIP: 0010:xnthread_suspend+0x3d5/0x4e0
[  135.530050] RBP: a3513ba99480 R08: 0001 R09: 
[  135.534843] Code: 58 12 00 00 4c 89 e7 e8 f9 cf ff ff 41 83 8c 24 c4 11 00 
00 01 e9 92 fd ff ff 0f 0b 48 83 bf 58 12 00 00 00 0f 84 63 fc ff ff <0f> 0b 0f 
0b
[  135.541979] R10: a35139832440 R11: 0424 R12: 
[  135.560746] RSP: 0018:bddd0073fd60 EFLAGS: 00010082
[  135.567878] R13: 0022 R14:  R15: 
[  135.580241] FS:  () GS:a3513ba8() 
knlGS:
[  135.580246] RAX: bddd005fbe30 RBX: 00025090 RCX: 
[  135.588336] CS:  0010 DS:  ES:  CR0: 80050033
[  135.595477] RDX:  RSI: 0002 RDI: bddd005fa240
[  135.601225] CR2: 7f8899c36a10 CR3: 00017b31c000 CR4: 003406e0
[  135.608362] RBP: bddd005fbe08 R08: bddd005fbe08 R09: 
[  135.615500] Call Trace:
[  135.622637] R10:  R11:  R12: bddd005fa240
[  135.625085] ---[ end trace adb8b44963759cc1 ]---
[  135.632220] R13:  R14:  R15: bddd005fbe08
[  135.636851] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:941 
rcu_nmi_enter+0xe4/0xf0
[  135.643982]  xnsynch_sleep_on+0x102/0x260
[  135.651634] Modules linked in:
[  135.655649]  __cobalt_cond_wait_prologue+0x295/0x8c0
[  135.655653]  rt_igb
[  135.658713]  ? __cobalt_cond_wait_prologue+0x8c0/0x8c0
[  135.663677]  plusb
[  135.665781]  CoBaLt_cond_wait_prologue+0x23/0x30
[  135.670918]  usbnet
[  135.672936]  handle_head_syscall+0xe1/0x370
[  135.677555]  mii
[  135.679658]  ipipe_fastcall_hook+0x14/0x20
[  135.685687]  ipipe_handle_syscall+0x4a/0xa0
[  135.689784] CPU: 1 PID: 0 Comm: swapper/2 Tainted: GW 
4.19.84-xeno8-static #1
[  135.693971]  do_syscall_64+0x41/0x3d0
[  135.702495] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[  135.706160]  ? __ipipe_handle_irq+0xb7/0x200
[  135.715464] I-pipe domain: Linux
[  135.719738]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  135.722972] RIP: 0010:rcu_nmi_enter+0xe4/0xf0
[  135.728025] RIP: 0033:0x77f9c134
[  135.732386] Code: 48 85 c0 75 d9 48 8b 6b 08 eb 9a e8 b6 a9 ff ff 48 8b 6b 
08 41 bd 01 00 00 00 4c 8b 35 5d cb 23 01 4c 8d 7d 01 e9 72 ff ff ff <0f> 0b e9 
44
[  135.735966] Code: 8b 73 04 49 89 dc e8 fb ef ff ff 48 89 de 48 8b 5c 24 10 
45 31 c0 b9 23 00 00 10 48 8d 54 24 44 45 31 d2 48 89 df 89 c8 0f 05 <8b> 7c 24 
29
[  135.754730] RSP: 0018:a3513bb03f38 EFLAGS: 00010082
[  135.773496] RSP: 002b:7fffe82dab10 EFLAGS: 0246
[  135.778728]  ORIG_RAX: 1023
[  135.783954] RAX: 00019480 RBX: a3513ba99480 RCX: a3513ba9c008
[  135.787792] RAX: ffda RBX: 74127c78 RCX: 77f9c134
[  135.794927] RDX: a3513ba9c000 RSI: 0001 RDI: 1140
[  135.802065] RDX: 7fffe82dab54 RSI: 74127c48 RDI: 74127c78
[  135.809201] RBP: fffe R08: a3513ba9c228 R09: 0045
[  135.816337] RBP: 7fffe82dac30 R08:  R09: 
[  135.823470] R10: a35139832440 R11: 0424 R12: 9e7af080
[  135.830604] R10:  R11: 0246 R12: 74127c48
[  135.837738] R13: 00045000 R14: 

unable to handle kernel paging request

2019-11-15 Thread Lange Norbert via Xenomai
Hello,

How can I get to the bottom of bugs that lockup the system completely?
I got that error now the 3rd time.

[ 1643.652566] BUG: unable to handle kernel paging request at 00044540
[ 1643.659546] PGD 1750d1067 P4D 1750d1067 PUD 1775e7067 PMD 0
[ 1643.665237] Oops: 0010 [#1] SMP NOPTI
[ 1643.668911] CPU: 2 PID: 862 Comm: fup.medium Tainted: GW 
4.19.75-xeno7-static #1
[ 1643.677703] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[ 1643.687013] I-pipe domain: Linux
[ 1643.690250] RIP: 0086:0x1c000
[ 1643.693231] Code: Bad RIP value.
[ 1643.696468] RSP: 3bb0:0083 EFLAGS: 92103bb25090 
ORIG_RAX: 0046
[ 1643.705092] RAX: 92103bb30400 RBX: 92103bb30400 RCX: 92103bb31bb0
[ 1643.712235] RDX: 92103bb31d58 RSI: 8ab85272 RDI: 921039f24cc0
[ 1643.719378] RBP: 92103bb305c8 R08: 0002 R09: 92103bb31d58
[ 1643.726517] R10: 92103bb31bb0 R11: 4000 R12: 8bc3f3c0
[ 1643.733657] R13: 8ab8284b R14: 0001c000 R15: 0086
[ 1643.740799] FS:  7fffe81da700 GS:  
[ 1643.746033] Modules linked in: rt_igb plusb usbnet mii
[ 1643.751210] CR2: 00044540
[ 1643.754542] ---[ end trace 7dda9c557e28b024 ]---
[ 1643.759167] RIP: 0086:0x1c000
[ 1643.762146] Code: Bad RIP value.
[ 1643.765379] RSP: 3bb0:0083 EFLAGS: 92103bb25090 
ORIG_RAX: 0046
[ 1643.774003] RAX: 92103bb30400 RBX: 92103bb30400 RCX: 92103bb31bb0
[ 1643.781147] RDX: 92103bb31d58 RSI: 8ab85272 RDI: 921039f24cc0
[ 1643.788289] RBP: 92103bb305c8 R08: 0002 R09: 92103bb31d58
[ 1643.795427] R10: 92103bb31bb0 R11: 4000 R12: 8bc3f3c0
[ 1643.802569] R13: 8ab8284b R14: 0001c000 R15: 0086
[ 1643.809712] FS:  7fffe81da700() GS:92103bb0() 
knlGS:
[ 1643.817806] CS:  0010 DS:  ES:  CR0: 80050033
[ 1643.823558] CR2: 0001bfd6 CR3: 0001750e4000 CR4: 003406e0

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: RTnet sendmmsg and ENOBUFS

2019-11-15 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 15. November 2019 14:36
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: RTnet sendmmsg and ENOBUFS
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 15.11.19 13:37, Lange Norbert wrote:
> >
> >
> >>
> >>>
> 
> > I suppose the receive path works similarly.
> >
> 
>  RX works by accepting a global-pool buffer (this is where incoming
>  packets first end up in) filled with data in exchange to an empty
>  rtskb from the
> >> socket
>  pool. That filled rtskb is put into the socket pool once the data
>  was transferred to userspace.
> >>>
> >>> I suppose all pools can exchange rtskb, so this is just a matter of
> >>> which pool
> >> size is limiting.
> >>> If I want to recvmmsg 100 messages, will I get at most 16 (socket
> >>> pool size), or will a single slot be used and exchanged with the drivers?
> >>
> >> One packet, one rtskb. So you have both the device and the socket
> >> pool as limiting factors.
> >
> > I guess this is different to the sendpath as the device "pushes up"
> > the rtskb's,
>
> Actually, commit 91b3302284fd "aligned" TX to RX path. But the documents
> still state something else. And I doubt now that this commit was going in the
> right direction.
>
> In fact, it introduced a way for competing transmitters to starve each other
> by exhausting the now shared device pool for TX.
>
> > And the recvmmsg call then picks up the packets that are available?
> > (having some trouble following the path in the kernel sources)
>
> The point of the ownership transfer on RX is that, when receiving only with
> one queue (RTnet can't handle more in fact, though it should by now...), we
> first need to parse the packet content in order to dispatch it. That makes the
> packet first owned by the device (formerly the RX pool), and once the actual
> owner is known, it is transferred - or dropped if the recipient has no free
> buffer.

Right, but if I open up a packetsocket with ETH_P_802_EX1, rt_stack_deliver will
try to match the incoming packets, so somewhere the eth packet needs to be
dissected, is that rt_eth_type_trans?
means the valid values for a packet socket would be ETH_P_802_2, ETH_P_802_3,
and the protocols >= 0x600 from 802.3.

>
> >
> > We have some legacy application that ran on an imx28, and we have to
> > keep the protocols for now, They are almost exclusively UDP/IP4 but
> there's one message for distributing timestamps with a 802.2 SNAP packet.
> > (had some technical/implementation reasons, cant remember now).
> > Currently I use an ETH_P_ALL packet socket for that reason, but looking at
> the code it seems beneficial to drop that.
> >
> > -   there seems to be copies of rtskb made
> > -   the packets are not consumed, but kicked up further in the stack
>
> ETH_P_ALL is a kind of snooping mode, therefore the copies. You rather
> want to register on the desired (non-IP) type(s).

See above, seems the only usable types are ETH_P_802_2 and ETH_P_802_3?

> > What I would want is a socket that simply drains everything, packets
> > in use are ETH_P_IP, ETH_P_ARP and whatever is necessary to
> send/receive those 802.2 SNAP packets.
>
> IP and ARP will be handled by RTnet, the other types is what you need
> packet sockets for.

I use neither IP or ARP from RtNet, and I don’t have the choice right now,
Our timesync is via an external wire, and I gotta burst out everything at that 
point.
This is no cleanroom design.

Further, a tftp server is hooked into the RT connection via a TUN device,
So packets not for the RT Application (udp port range) go to that device.

> > I see ETH_P_802_EX1 used for such packets in examples, but I don’t see
> how your stack identifies such packets?
>
> rt_packet_bind -> rtdev_add_pack. Dispatching is done in rt_stack_deliver,
> in rt-kernel thread context.

Yes, and I fail to see anything supported outside ETH_P_802_2 and ETH_P_802_3
and the protocols from 803.
Nowhere will a eth packet be dissected and classified as ETH_P_802_EX1 or 
ETH_P_SNAP,
What packets would end up in such a socket?

> > -   I don’t know which ETH_P type to use, theres ETH_P_802_2,
> ETH_P_802_EX1, ETH_P_SNAP potentially others.
> > -   Those constants seems to be largely missing in xenomais sources, so I
> don’t know how you would test for a match.
>
> You can register any value you like with AF_PACKET, just like in Linux.
> Matching is not done by a switch-case, rather via a hashed list via the 
> values.

That's for faster lookup, values are still compared afterwards.

>
> > -   Easiest for me would be a ETH_P_ALL_CONSUME type, buffers just end
> there when such a socket is open (feasible, would be accepted upstream?)
>
> I think you are better of with target subscriptions. Then all unknown packets
> will simply be dropped, and that much earlier.

Unknown packets need to be forwarded to the TUN device.
(To put it another way, I had a deep

RE: [PATCH] rtdm: Do not return an error from send/recvmmsg if there are packets

2019-11-15 Thread Lange Norbert via Xenomai
Hello,

Just for consideration, If you can pass both error value and # of successfully 
sent packets out of the kernel function,
perhaps you could return the # (if > 0) and still set errno in case of an 
(real) error?

It would be somewhat different to Linux, but that would not be the only 
difference (like theres no way to block until all msgs are sent/received).

Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Jan Kiszka
> via Xenomai
> Sent: Freitag, 15. November 2019 10:43
> To: Philippe Gerum ; Xenomai
> 
> Subject: Re: [PATCH] rtdm: Do not return an error from send/recvmmsg if
> there are packets
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 15.11.19 10:39, Philippe Gerum wrote:
> > On 11/15/19 10:37 AM, Philippe Gerum via Xenomai wrote:
> >> On 11/14/19 7:59 PM, Jan Kiszka wrote:
> >>> From: Jan Kiszka 
> >>>
> >>> This is in line with Linux behavior.
> >>>
> >>> We likely still miss an equivalent to sk_err in recvmmsg, though.
> >>
> >> Ack. Tracing on early exit due to null vlen in sendmmsg() is still missing;
> having this might help in debugging issue(s) at call site.
> >>
> >
> > Likewise in recvmmsg().
> >
>
> Yes, I'll simply remove the "vlan == 0" optimization.
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
> Competence Center Embedded Linux



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: RTnet sendmmsg and ENOBUFS

2019-11-15 Thread Lange Norbert via Xenomai


>
> >
> >>
> >>> I suppose the receive path works similarly.
> >>>
> >>
> >> RX works by accepting a global-pool buffer (this is where incoming packets
> >> first end up in) filled with data in exchange to an empty rtskb from the
> socket
> >> pool. That filled rtskb is put into the socket pool once the data was
> >> transferred to userspace.
> >
> > I suppose all pools can exchange rtskb, so this is just a matter of which 
> > pool
> size is limiting.
> > If I want to recvmmsg 100 messages, will I get at most 16 (socket pool 
> > size),
> > or will a single slot be used and exchanged with the drivers?
>
> One packet, one rtskb. So you have both the device and the socket pool
> as limiting factors.

I guess this is different to the sendpath as the device "pushes up" the rtskb's,
And the recvmmsg call then picks up the packets that are available?
(having some trouble following the path in the kernel sources)

We have some legacy application that ran on an imx28, and we have to keep the 
protocols for now,
They are almost exclusively UDP/IP4 but there's one message for distributing 
timestamps with a 802.2 SNAP packet.
(had some technical/implementation reasons, cant remember now).
Currently I use an ETH_P_ALL packet socket for that reason, but looking at the 
code it seems beneficial to drop that.

-   there seems to be copies of rtskb made
-   the packets are not consumed, but kicked up further in the stack

What I would want is a socket that simply drains everything, packets in use are 
ETH_P_IP,
ETH_P_ARP and whatever is necessary to send/receive those 802.2 SNAP packets.

I see ETH_P_802_EX1 used for such packets in examples, but I don’t see how your 
stack identifies such packets?

-   I don’t know which ETH_P type to use, theres ETH_P_802_2, ETH_P_802_EX1, 
ETH_P_SNAP potentially others.
-   Those constants seems to be largely missing in xenomais sources, so I don’t 
know how you would test for a match.
-   Easiest for me would be a ETH_P_ALL_CONSUME type, buffers just end there 
when such a socket is open (feasible, would be accepted upstream?)

Regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: RTnet sendmmsg and ENOBUFS

2019-11-15 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 14. November 2019 19:18
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Cc: Philippe Gerum (r...@xenomai.org) 
> Subject: Re: RTnet sendmmsg and ENOBUFS
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 14.11.19 18:55, Lange Norbert wrote:
> > So, for my setup socket_rtskbs is 16, the rt_igp driver rtskbs are 256TX +
> 256RX.
> >
> > As said, our software prepares packets before a timeslice, and would
> > aim to minimize systemcalls and interrupts, packets are sent over raw
> rtsockets.
> >
> > if  understand __rtdm_fd_sendmmsg and rt_packet_sendmsg correctly,
> > sendmsg will pick one socket_rtskbs, copies data from userspace and
> > then passes this rtskbs to rtdev_xmit.
> > I don’t see how a free buffers gets passed back, like README.pools
> > describes, I guess rtskb_acquire should somehow do this.
>
> The buffer returns (not necessarily the same one, though) when the packet
> was truly sent, and the driver ran its TX cleanup. If you submit many packets
> as a chunk, that may block them for a while.
>
> >
> > So in short, I am using only one socket_rtskbs temporarily, as the
> > function passes the buffer to the rtdev (rt_igp driver)?
>
> You are using as many rtskbs as it takes to get the data you passed down
> forwarded as packets to the NIC, and that as long as the NIC needs to get
> that data DMA'ed to the transmitter.

I was talking about the pools. The socket pool has 16  rtskbs, the device pool 
has 512.
As I understand it, the __rtdm_fd_sendmmsg picks one rtskb from the socket pool,
Then exchanges this buffer with a free one from the device pool 
(rtskb_acquire?).
So sendmmsg requires a single "slot" from their pool, then gets that "slot" back
when passing the rtskb down to the device.

Or In other words, I could successfully sendmmsg 100 messages, aslong there is 
one free
slot in the socket pool, and the device pool has enough free slots.

>
> > I suppose the receive path works similarly.
> >
>
> RX works by accepting a global-pool buffer (this is where incoming packets
> first end up in) filled with data in exchange to an empty rtskb from the 
> socket
> pool. That filled rtskb is put into the socket pool once the data was
> transferred to userspace.

I suppose all pools can exchange rtskb, so this is just a matter of which pool 
size is limiting.
If I want to recvmmsg 100 messages, will I get at most 16 (socket pool size),
or will a single slot be used and exchanged with the drivers?

>
> >
> > Now if I would want to send nonblocking, ie. as much packets as are
> > possible, exhausting the rtskbs then I would expect the
> > EAGAIN/EWOULDBLOCK error and getting back the number of successfully
> queued packets (so I could  drop them and send the remaining later).
>
> I don't recall why anymore, but we decided to use a different error code in
> RTnet for this back then, possibly to differentiate this "should never ever
> happen in a deterministic network" from other errors.

Yes, I guess that makes sense in alot usecases. Mine is a bit different,
I use a service that just tunnels a TUN to a RT Packet socket, and once someone
connects to an IDDP socket for RT traffic, timeslices are used.

So the network is not always in "deterministic mode".

>
> >
> > According to the code in __rtdm_fd_sendmmsg, that’s not what happens,
> > ENOBUFS would be returned instead, And the amount of sent packets is
> lost forever.
> >
> > if (datagrams > 0 && (ret == 0 || ret == -EWOULDBLOCK)) {
> > /* NOTE: SO_ERROR should be honored for other errors. */
> > rtdm_fd_put(fd); return datagrams; }
> >
> > IMHO this condition would need to added:
> > ((flags | MSG_DONTWAIT) && ret == -ENOBUFS)
> >
> > (Recvmmsg possibly similarly, havent checked yet)
>
> sendmmsg was only added to Xenomai 3.1. There might be room for
> improvements, if not corrections. So, if we do not return the number of sent
> messages or signal an error where we should not (this is how I read the man
> page currently), this needs a patch...

Yes, seems you either need to drop the number of transmitted msgs (unlike the 
linux call)
or the error condition.
If you can pass both out of the kernel function, perhaps you could still set 
errno in case of an (real) error?
(I really don’t need it, but it's worth considering)

Thanks,
Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Lega

RE: RTnet sendmmsg and ENOBUFS

2019-11-14 Thread Lange Norbert via Xenomai
So, for my setup socket_rtskbs is 16, the rt_igp driver rtskbs are 256TX + 
256RX.

As said, our software prepares packets before a timeslice, and would aim to 
minimize systemcalls and interrupts,
packets are sent over raw rtsockets.

if  understand __rtdm_fd_sendmmsg and rt_packet_sendmsg correctly,
sendmsg will pick one socket_rtskbs, copies data from userspace and
then passes this rtskbs to rtdev_xmit.
I don’t see how a free buffers gets passed back, like README.pools describes,
I guess rtskb_acquire should somehow do this.

So in short, I am using only one socket_rtskbs temporarily, as the function 
passes
the buffer to the rtdev (rt_igp driver)?
I suppose the receive path works similarly.


Now if I would want to send nonblocking, ie. as much packets as are possible,
exhausting the rtskbs then I would expect the EAGAIN/EWOULDBLOCK error and 
getting
back the number of successfully queued packets (so I could  drop them and send 
the remaining later).

According to the code in __rtdm_fd_sendmmsg, that’s not what happens, ENOBUFS 
would be returned instead,
And the amount of sent packets is lost forever.

if (datagrams > 0 && (ret == 0 || ret == -EWOULDBLOCK)) {
/* NOTE: SO_ERROR should be honored for other errors. */
rtdm_fd_put(fd);
return datagrams;
}

IMHO this condition would need to added:
((flags | MSG_DONTWAIT) && ret == -ENOBUFS)

(Recvmmsg possibly similarly, havent checked yet)

Thanks for the help,
Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Lange
> Norbert via Xenomai
> Sent: Mittwoch, 13. November 2019 18:53
> To: Jan Kiszka ; Xenomai
> (xenomai@xenomai.org) 
> Subject: RE: RTnet sendmmsg and ENOBUFS
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> > -Original Message-
> > From: Jan Kiszka 
> > Sent: Mittwoch, 13. November 2019 18:39
> > To: Lange Norbert ; Xenomai
> > (xenomai@xenomai.org) 
> > Subject: Re: RTnet sendmmsg and ENOBUFS
> >
> > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >
> >
> > On 13.11.19 16:10, Lange Norbert via Xenomai wrote:
> > > Hello,
> > >
> > > for one of our applications, we have (unfortunatly) a single ethernet
> > connection for Realtime and Nonrealtime.
> > >
> > > We solve this by sending timeslices with RT first, then filling up the
> > > remaining space. When stressing the limits (quite possibly beyond if
> > accounting for bugs), the sendmmsg call over a raw socket returns
> ENOBUFS
> > (even with a single small packet).
> > > I was expecting this call to just block until the resouces are available.
> >
> > Blocking would mean that the sites which make buffers available again had
> to
> > signal this. The original design idea was to avoid such overhead and rather
> > rely on the applications to schedule their submissions properly and
> > preallocate resources accordingly.
>
> Ok.
> In other words, this is the same behaviour as using MSG_DONTWAIT
> (with a different errno value)
>
> >
> > >
> > > Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte
> > packets (theoretical max).
> > >
> > > What variables are involved (whats the xenomai buffer limits, are they
> > shared or per interface) and choices do I have?
> > >
> > > - I could send the packages nonblocking and wait or drop the remaining
> > > myself
> > > - I could deal with ENOBUFS the same way as EAGAIN (is there any
> > > difference actually)
> > > - I could raise the amount of internal buffer somehow
> >
> > Check kernel/drivers/net/doc/README.pools
> >
> > >
> > > Also while stresstesting I get these messages:
> > >
> > > [ 5572.044934] hard_start_xmit returned 16 [ 5572.054989]
> > > hard_start_xmit returned 16 [ 5572.064007] hard_start_xmit returned 16
> > > [ 5572.067893] hard_start_xmit returned 16 [ 5572.071739]
> > > hard_start_xmit returned 16 [ 5572.075586] hard_start_xmit returned 16
> > > [ 5575.096116] hard_start_xmit returned 16 [ 5579.377038]
> > > hard_start_xmit returned 16
> >
> > This likely comes from NETDEV_TX_BUSY signaled by the driver. Check the
> > one you use for reasons. May include "I don't have buffers left".
>
> Yes it does, I was afraid this would indicate some leaked buffers.
>
> Norbert
> 
>
> This message and any attachments are solely for the use of the intended
> recipients. They may contain privileged and/or confidential information or
> other information protected from disclosure. If you are not 

RE: Xenomai crashes when braking into the debugger

2019-11-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 13. November 2019 18:42
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Xenomai crashes when braking into the debugger
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.11.19 16:18, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I am running into some bad issues with debugging, can't really narrow
> > down when they happen, but usually when I run through GDB and want to
> > "break" (pause execution), it seems to be related to *other* Xenomai
> programs running at the same time (as said its hard to narrow down).
>
> We have a gdb test case. Does it trigger for you as well when you run some
> other program in parallel?
>
> Also, could you provide the kernel full log? Possibly, enabling the I-pipe
> tracer with panic dump could be useful as well. But the most important step
> would be to create reproducibility for a third party like me.

Currently the issue is gone, and I don't have time for researching the cause.
is panic dump a kernel compilation config?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: RTnet sendmmsg and ENOBUFS

2019-11-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 13. November 2019 18:39
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: RTnet sendmmsg and ENOBUFS
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.11.19 16:10, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > for one of our applications, we have (unfortunatly) a single ethernet
> connection for Realtime and Nonrealtime.
> >
> > We solve this by sending timeslices with RT first, then filling up the
> > remaining space. When stressing the limits (quite possibly beyond if
> accounting for bugs), the sendmmsg call over a raw socket returns ENOBUFS
> (even with a single small packet).
> > I was expecting this call to just block until the resouces are available.
>
> Blocking would mean that the sites which make buffers available again had to
> signal this. The original design idea was to avoid such overhead and rather
> rely on the applications to schedule their submissions properly and
> preallocate resources accordingly.

Ok.
In other words, this is the same behaviour as using MSG_DONTWAIT
(with a different errno value)

>
> >
> > Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte
> packets (theoretical max).
> >
> > What variables are involved (whats the xenomai buffer limits, are they
> shared or per interface) and choices do I have?
> >
> > - I could send the packages nonblocking and wait or drop the remaining
> > myself
> > - I could deal with ENOBUFS the same way as EAGAIN (is there any
> > difference actually)
> > - I could raise the amount of internal buffer somehow
>
> Check kernel/drivers/net/doc/README.pools
>
> >
> > Also while stresstesting I get these messages:
> >
> > [ 5572.044934] hard_start_xmit returned 16 [ 5572.054989]
> > hard_start_xmit returned 16 [ 5572.064007] hard_start_xmit returned 16
> > [ 5572.067893] hard_start_xmit returned 16 [ 5572.071739]
> > hard_start_xmit returned 16 [ 5572.075586] hard_start_xmit returned 16
> > [ 5575.096116] hard_start_xmit returned 16 [ 5579.377038]
> > hard_start_xmit returned 16
>
> This likely comes from NETDEV_TX_BUSY signaled by the driver. Check the
> one you use for reasons. May include "I don't have buffers left".

Yes it does, I was afraid this would indicate some leaked buffers.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Xenomai crashes when braking into the debugger

2019-11-13 Thread Lange Norbert via Xenomai
Hello,

I am running into some bad issues with debugging,
can't really narrow down when they happen, but usually when I run through GDB 
and want to "break" (pause execution),
it seems to be related to *other* Xenomai programs running at the same time (as 
said its hard to narrow down).

Kind regards, Norbert Lange

[10352.719588] I-pipe: Detected stalled head domain, probably caused by a bug.
[10352.719588] A critical section may have been left unterminated.
[10352.733165] CPU: 2 PID: 12883 Comm: aboard_runner Tainted: GW
 4.19.75-xeno7-static #1
[10352.742389] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[10352.751702] I-pipe domain: Linux
[10352.754938] Call Trace:
[10352.757406]  dump_stack+0x82/0xb0
[10352.760735]  ipipe_stall_root+0xc/0x30
[10352.764497]  __ipipe_trap_prologue+0x209/0x210
[10352.768955]  page_fault+0x24/0x5b
[10352.772281] RIP: 0010:xnthread_suspend+0x13/0x4e0
[10352.776992] Code: f8 c3 a5 e8 1f ce f3 ff e9 e4 fe ff ff 66 2e 0f 1f 84 00 
00 00 00 00 e8 bb 0b 87 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 20  c
[10352.795762] RSP: :a45140797e10 EFLAGS: 00010082 ORIG_RAX: 

[10352.803343] RAX:  RBX: eab8 RCX: 
[10352.810485] RDX:  RSI: 0040 RDI: eaf8
[10352.817626] RBP: a4514060ab80 R08:  R09: 
[10352.824770] R10:  R11:  R12: 94477bb30780
[10352.831910] R13:  R14:  R15: 94477bb30400
[10352.839058]  ? __cobalt_clock_nanosleep+0x4b0/0x4b0
[10352.843946]  ? CoBaLt_clock_nanosleep+0x7f/0x100
[10352.848571]  stop_debugged_process+0x51/0x70
[10352.852850]  ipipe_trap_hook+0x2da/0x3f0
[10352.856785]  __ipipe_notify_trap+0x80/0xc0
[10352.860892]  __ipipe_trap_prologue+0x76/0x210
[10352.865259]  ? int3+0x29/0x70
[10352.868236]  int3+0x45/0x70
[10352.871040] RIP: 0033:0x409fb8
[10352.874102] Code: ff 15 34 47 95 00 c7 05 fe e2 99 00 00 00 00 00 48 8d 95 
78 fd ff ff 48 8d 85 70 fe ff ff 48 89 d6 48 89 c7 e8 08 14 00 00 cc  6
[10352.892874] RSP: 002b:7fffe890 EFLAGS: 0297
[10352.898111] RAX:  RBX:  RCX: 0002
[10352.905249] RDX:  RSI:  RDI: 
[10352.912389] RBP: 7fffebb0 R08: 0001 R09: 0015
[10352.919531] R10: 7fffe820 R11: 0246 R12: 00409170
[10352.926671] R13: 7fffec90 R14:  R15: 
[10352.933813]  ? int3+0x29/0x70
[10352.936872] BUG: Unhandled exception over domain Xenomai at 
0xa4b90f23 - switching to ROOT
[10352.945841] CPU: 2 PID: 12883 Comm: aboard_runner Tainted: GW
 4.19.75-xeno7-static #1
[10352.955065] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[10352.964374] I-pipe domain: Linux
[10352.967611] Call Trace:
[10352.970070]  dump_stack+0x82/0xb0
[10352.973393]  __ipipe_trap_prologue.cold+0x22/0x4e
[10352.978108]  page_fault+0x24/0x5b
[10352.981435] RIP: 0010:xnthread_suspend+0x13/0x4e0
[10352.986144] Code: f8 c3 a5 e8 1f ce f3 ff e9 e4 fe ff ff 66 2e 0f 1f 84 00 
00 00 00 00 e8 bb 0b 87 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 20  c
[10353.004916] RSP: :a45140797e10 EFLAGS: 00010082 ORIG_RAX: 

[10353.012497] RAX:  RBX: eab8 RCX: 
[10353.019637] RDX:  RSI: 0040 RDI: eaf8
[10353.026777] RBP: a4514060ab80 R08:  R09: 
[10353.033918] R10:  R11:  R12: 94477bb30780
[10353.041060] R13:  R14:  R15: 94477bb30400
[10353.048204]  ? __cobalt_clock_nanosleep+0x4b0/0x4b0
[10353.053087]  ? CoBaLt_clock_nanosleep+0x7f/0x100
[10353.057713]  stop_debugged_process+0x51/0x70
[10353.061991]  ipipe_trap_hook+0x2da/0x3f0
[10353.065921]  __ipipe_notify_trap+0x80/0xc0
[10353.070029]  __ipipe_trap_prologue+0x76/0x210
[10353.074393]  ? int3+0x29/0x70
[10353.077369]  int3+0x45/0x70
[10353.080171] RIP: 0033:0x409fb8
[10353.083232] Code: ff 15 34 47 95 00 c7 05 fe e2 99 00 00 00 00 00 48 8d 95 
78 fd ff ff 48 8d 85 70 fe ff ff 48 89 d6 48 89 c7 e8 08 14 00 00 cc  6
[10353.102002] RSP: 002b:7fffe890 EFLAGS: 0297
[10353.107238] RAX:  RBX:  RCX: 0002
[10353.114380] RDX:  RSI:  RDI: 
[10353.121522] RBP: 7fffebb0 R08: 0001 R09: 0015
[10353.128663] R10: 7fffe820 R11: 0246 R12: 00409170
[10353.135803] R13: 7fffec90 R14:  R15: 
[10353.142947]  ? int3+0x29/0x70
[10353.145931] BUG: unable to handle kernel paging request at fcba
[10353.152900] PGD 12600c067

RTnet sendmmsg and ENOBUFS

2019-11-13 Thread Lange Norbert via Xenomai
Hello,

for one of our applications, we have (unfortunatly) a single ethernet 
connection for Realtime and Nonrealtime.

We solve this by sending timeslices with RT first, then filling up the 
remaining space. When stressing the limits (quite possibly beyond if accounting 
for bugs),
the sendmmsg call over a raw socket returns ENOBUFS (even with a single small 
packet).
I was expecting this call to just block until the resouces are available.

Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte 
packets (theoretical max).

What variables are involved (whats the xenomai buffer limits, are they shared 
or per interface) and choices do I have?

- I could send the packages nonblocking and wait or drop the remaining myself
- I could deal with ENOBUFS the same way as EAGAIN (is there any difference 
actually)
- I could raise the amount of internal buffer somehow

Also while stresstesting I get these messages:

[ 5572.044934] hard_start_xmit returned 16
[ 5572.054989] hard_start_xmit returned 16
[ 5572.064007] hard_start_xmit returned 16
[ 5572.067893] hard_start_xmit returned 16
[ 5572.071739] hard_start_xmit returned 16
[ 5572.075586] hard_start_xmit returned 16
[ 5575.096116] hard_start_xmit returned 16
[ 5579.377038] hard_start_xmit returned 16

Kind regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: binding to iddp socket often blocks forever

2019-10-29 Thread Lange Norbert via Xenomai
Well, seems like my bootargs are related,
I tried to bind linux to Core #0 and leve the remaining for Xenomai,
Using "isolcpus=1-3 xenomai.supported_cpus=0x6"

I get this Warning when the deadlock happens:
[  106.882123] [Xenomai] thread new_project[798] switched to non-rt CPU0, 
aborted.

I am not clear what happened, this is likely my non-rt thread suddenly running 
on the
non-Xenomai CPU, I don't know how it happened.
This is not the first Xenomai/cobalt function I am calling, I don't understand 
why its failing always at that point.

> -Original Message-
> From: Xenomai  On Behalf Of Lange
> Norbert via Xenomai
> Sent: Dienstag, 29. Oktober 2019 17:32
> To: Xenomai (xenomai@xenomai.org) 
> Subject: binding to iddp socket often blocks forever
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hello,
>
> I have some problems with Xenomai 3.1rc2, the following function will often
> block after printing "b".
> This does not happen always and never happens when I run the program
> through gdb, and I can't attach with gdb if its blocked.
> Using another port then -1 makes no difference, and there is a single XDDP
> connection active at the time (on port #0), but this should be the first and
> only IDDP socket open.
>
> static int s_bindRtIpcEndpoint(int protocol, czstring pLabel, const int 
> *pArgs,
> int *pErrno) {
> int ret;
> int __attribute__((cleanup(auto_closesocket_rt))) fd =
> __RT(socket)(AF_RTIPC, SOCK_DGRAM, protocol);
> TEST_LOG_ERRNO_ERROR_M_SCOPE(fd < 0, *pErrno, "socket", -1);
> printf("a\n");
> if (protocol == IPCPROTO_BUFP) {
> size_t bufsz = 32768; /* bytes */
> if (pArgs)
> bufsz = *pArgs;
> ret = setsockopt(fd, SOL_BUFP, BUFP_BUFSZ, &bufsz, sizeof(bufsz));
> TEST_LOG_ERRNO_ERROR_M_SCOPE(ret != 0, *pErrno, "setsockopt", -
> 1);
> }
>
> if (pLabel)
> setLabel(fd, protocol, pLabel);
>
> printf("b\n");
> // bind first endpoint
> struct sockaddr_ipc saddr = {AF_RTIPC, -1};
>
> ret = __RT(bind)(fd, (struct sockaddr *)&saddr, sizeof(saddr));
> printf("c\n");
> TEST_LOG_ERRNO_ERROR_M_SCOPE(ret != 0, *pErrno, "bind", -1);
> ret = fd;
> fd = -1; // NOLINT(clang-analyzer-deadcode.DeadStores)
> return ret;
> }
>
> The function is called like this:
>
> int tmp;
> s_bindRtIpcEndpoint(IPCPROTO_IDDP, NULL, NULL, &tmp);
>
> I really don't know how to tackle this, any hints?
>
> Mit besten Grüßen / Kind regards
>
> NORBERT LANGE
>
> AT-RD3
>
> ANDRITZ HYDRO GmbH
> Eibesbrunnergasse 20
> 1120 Vienna / AUSTRIA
> p: +43 50805 56684
> norbert.la...@andritz.com
> andritz.com
>
> 
>
> This message and any attachments are solely for the use of the intended
> recipients. They may contain privileged and/or confidential information or
> other information protected from disclosure. If you are not an intended
> recipient, you are hereby notified that you received this email in error and
> that any review, dissemination, distribution or copying of this email and any
> attachment is strictly prohibited. If you have received this email in error,
> please contact the sender and delete the message and any attachment from
> your system.
>
> ANDRITZ HYDRO GmbH
>
>
> Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
>
> Firmensitz/ Registered seat: Wien
>
> Firmenbuchgericht/ Court of registry: Handelsgericht Wien
>
> Firmenbuchnummer/ Company registration: FN 61833 g
>
> DVR: 0605077
>
> UID-Nr.: ATU14756806
>
>
> Thank You
> 



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You




binding to iddp socket often blocks forever

2019-10-29 Thread Lange Norbert via Xenomai
Hello,

I have some problems with Xenomai 3.1rc2, the following function will often 
block after printing "b".
This does not happen always and never happens when I run the program through 
gdb, and I can't attach
with gdb if its blocked.
Using another port then -1 makes no difference, and there is a single XDDP 
connection active at the time
(on port #0), but this should be the first and only IDDP socket open.

static int s_bindRtIpcEndpoint(int protocol, czstring pLabel, const int *pArgs, 
int *pErrno)
{
int ret;
int __attribute__((cleanup(auto_closesocket_rt))) fd =  
__RT(socket)(AF_RTIPC, SOCK_DGRAM, protocol);
TEST_LOG_ERRNO_ERROR_M_SCOPE(fd < 0, *pErrno, "socket", -1);
printf("a\n");
if (protocol == IPCPROTO_BUFP) {
size_t bufsz = 32768; /* bytes */
if (pArgs)
bufsz = *pArgs;
ret = setsockopt(fd, SOL_BUFP, BUFP_BUFSZ, &bufsz, sizeof(bufsz));
TEST_LOG_ERRNO_ERROR_M_SCOPE(ret != 0, *pErrno, "setsockopt", -1);
}

if (pLabel)
setLabel(fd, protocol, pLabel);

printf("b\n");
// bind first endpoint
struct sockaddr_ipc saddr = {AF_RTIPC, -1};

ret = __RT(bind)(fd, (struct sockaddr *)&saddr, sizeof(saddr));
printf("c\n");
TEST_LOG_ERRNO_ERROR_M_SCOPE(ret != 0, *pErrno, "bind", -1);
ret = fd;
fd = -1; // NOLINT(clang-analyzer-deadcode.DeadStores)
return ret;
}

The function is called like this:

int tmp;
s_bindRtIpcEndpoint(IPCPROTO_IDDP, NULL, NULL, &tmp);

I really don't know how to tackle this, any hints?

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You




RE: Process load balancing among CPU

2019-10-24 Thread Lange Norbert via Xenomai
Nothing has an effect, as the cobalt setup routine binds the main thread to the 
single core.
See: https://xenomai.org/pipermail/xenomai/2019-August/041381.html

Every thread you spawn will inherit the same affinity mask (with a single bit 
set), and all options (tunables, cmdline) affect the mask *before* the setup 
then  picks the current core as mask.
You will need to reset the affinity mask yourself.

Kind regards, Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Stéphane
> Ancelot via Xenomai
> Sent: Donnerstag, 24. Oktober 2019 16:01
> To: Jan Kiszka ; xenomai@xenomai.org
> Subject: Re: Process load balancing among CPU
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Le 24/10/2019 à 12:28, Jan Kiszka a écrit :
> > On 24.10.19 12:21, Stéphane Ancelot wrote:
> >> Le 24/10/2019 à 12:13, Jan Kiszka a écrit :
> >>> On 24.10.19 10:00, Stéphane Ancelot via Xenomai wrote:
>  Hi,
> 
>  In our system, I noticed maybe a setup problem, or a xenomai
>  related setup problem regarding user processes.
> 
>  I have 2 user tasks that does not use RT (but xenomai shared memory
>  access), I noticed that if these both tasks were consuming 50% of
>  the CPU they were not migrated on other cpus.
> 
>  There is no CPU affinity setup in these programs , so I kept the
>  kernel scheduler doing its job.
> 
>  Is this a bad kernel setting or xenomai setting  to allow process
>  cpu migration for user processes ?
> >>> There is - by design - no automatic load balancing for Xenomai threads.
> >>> The common case is static schedules /wrt RT task for each CPU.
> >> This is not an RT thread, this is a userspace non rt task linking
> >> xenomai heap shared memory.
> >>
> >> having a look at this page, the documentation states  cpu_affinity
> >> Defaults to any online CPU.
> > As long as that non-RT task is in Linux mode and has an affinity mask
> > that is not reduced to the single core, Linux will happily
> > load-balance it. But if that task went through shadowing by Xenomai,
> > its affinity mask was likely reduced.
>
> A tunable should permit to change it, but it has no effect
>
>  cpu_set_t cpus;
>  CPU_ZERO(&cpus);
>  CPU_SET(0,&cpus);
>  CPU_SET(1,&cpus);
>  CPU_SET(2,&cpus);
>  CPU_SET(3,&cpus);
>
>  set_config_tunable(cpu_affinity,cpus);
>
> > Jan


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: pthread_cond_wait() never returns?

2019-10-24 Thread Lange Norbert via Xenomai
Signaling the waiting thread happens when you unlock the corresponding mutex,
You need to lock/unlock the mutex around the  pthread_cond_signal call.

Regards, Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Jan Leupold
> via Xenomai
> Sent: Mittwoch, 23. Oktober 2019 17:52
> To: xenomai@xenomai.org
> Subject: pthread_cond_wait() never returns?
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hi all,
>
> I am trying to use pthread_cond_t in a Xenomai application. When compiling
> and linking with the flags from xeno-config, the condition variable stops
> working as expected. It seems as if pthread_cond_wait() would never
> return.
>
> Is this a known issue?
> (but most probably I think it's just my fault and I would be glad if someone
> could give me a hint)
>
> BTW, "cyclictest -M" (also uses pthread_cond_t to signal a refresh) does not
> work as expected either.
>
> Regards,
> Jan
>
> Using:
> 
> ipipe-arm, branched off at tag ipipe-core-4.19.33-arm-2 xenomai-3, branched
> off at 44065a107 (master April 2019) custom hardware, SAMA5D2 SoC
>
> The program:
> --
> #include 
> #include 
> #include 
>
> pthread_mutex_t mutex;
> pthread_cond_t cond;
>
> void*
> task(void* arg)
> {
> (void) arg;
> printf("task\n");
> for (int i = 0; i < 3; ++i)
> {
> pthread_mutex_lock(&mutex);
> pthread_cond_wait(&cond, &mutex);
> pthread_mutex_unlock(&mutex);
> printf("task: condition signaled (i=%d)\n", i);
> }
> printf("task: done\n");
> return NULL;
> }
>
> int
> main()
> {
> printf("main\n");
>
> pthread_mutex_init(&mutex, NULL);
> pthread_cond_init(&cond, NULL);
>
> pthread_t t;
> pthread_create(&t, NULL, task, NULL);
>
> for (int i = 0; i < 3; ++i)
> {
> sleep(1);
> pthread_cond_signal(&cond);
> printf("main: signal(i=%d)\n", i);
> }
>
> pthread_join(t, NULL);
>
> printf("main: done\n");
> return 0;
> }
>
> When compiling without Xenomai the output is:
>
> main
> task
> main: signal(i=0)
> task: condition signaled (i=0)
> main: signal(i=1)
> task: condition signaled (i=1)
> main: signal(i=2)
> task: condition signaled (i=2)
> task: done
> main: done
>
> When compiling with Xenomai the output is:
>
> main
> task
> main: signal(i=0)
> main: signal(i=1)
> main: signal(i=2)
> ^C
> (pthread_join() never returns)
>
> The difference in Makefile.am:
> --
>
> bin_PROGRAMS = test-condition-xeno test-condition-linux
>
> test_condition_linux_SOURCES = condition_c.c
> test_condition_linux_LDFLAGS = -pthread
>
> test_condition_xeno_SOURCES = condition_c.c
> test_condition_xeno_CFLAGS = $(shell xeno-config --skin=posix --cflags)
> test_condition_xeno_LDADD = $(shell xeno-config --skin=posix --ldflags)
>
> Output from --dump-config:
> --
> based on Xenomai/cobalt v3.1-devel -- #94e729d53 (2019-06-03 13:32:22
> +0200)
> CONFIG_MMU=1
> CONFIG_XENO_BUILD_ARGS=" '--build=x86_64-linux'
> '--host=arm-poky-linux-gnueabi' '--target=arm-poky-linux-gnueabi'
> '--prefix=/usr' '--exec_prefix=/usr' '--bindir=/usr/bin'
> '--sbindir=/usr/sbin' '--libexecdir=/usr/libexec' '--datadir=/usr/share'
> '--sysconfdir=/etc' '--sharedstatedir=/com' '--localstatedir=/var'
> '--libdir=/usr/lib' '--includedir=/usr/include/xenomai'
> '--oldincludedir=/usr/include/xenomai' '--infodir=/usr/share/info'
> '--mandir=/usr/share/man' '--disable-silent-rules'
> '--with-libtool-sysroot=/home/jan/yocto/aertronic2.0-io/build-
> betrieb/tmp/work/cortexa5t2hf-vfp-poky-linux-gnueabi/xenomai/3+git999-
> r0/recipe-sysroot'
> '--disable-static' '--disable-async-cancel' '--disable-clock-monotonic-raw'
> '--with-core=cobalt' '--enable-debug=symbols' '--disable-dlopen-libs'
> '--disable-doc-build' '--disable-fortify' '--disable-lores-clock'
> '--disable-pshared' '--disable-registry' '--disable-smp' '--disable-tls'
> 'build_alias=x86_64-linux' 'host_alias=arm-poky-linux-gnueabi'
> 'target_alias=arm-poky-linux-gnueabi' 'CC=arm-poky-linux-gnueabi-gcc -
> mthumb -mfpu=vfp -mfloat-abi=hard -mcpu=cortex-a5 -fstack-protector-
> strong
>  -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-
> security --sysroot=/home/jan/yocto/aertronic2.0-io/build-
> betrieb/tmp/work/cortexa5t2hf-vfp-poky-linux-gnueabi/xenomai/3+git999-
> r0/recipe-sysroot'
> 'CFLAGS= -O2 -pipe -g -feliminate-unused-debug-types
> -fmacro-prefix-map=/home/jan/yocto/aertronic2.0-io/build-
> betrieb/tmp/work/cortexa5t2hf-vfp-poky-linux-gnueabi/xenomai/3+git999-
> r0=/usr/src/debug/xenomai/3+git999-r0
>
> -fdebug-prefix-map=/home/jan/yocto/aertronic2.0-io/build-
> betrieb/tmp/work/cortexa5t2hf-vfp-poky-linux-gnueabi/xenomai/3+git999-
> r0=/usr/src/debug/xenomai/3+git999-r0
>
> -fdebug-prefix-map=/home/jan/yocto/aertronic2.0-io/build-
> betrieb/tmp/work/cortexa5t2hf-vfp-poky-linux-gnueabi/xenomai/3+git999-
> r0/recipe-sysroot=
>
> -fdebug-prefix-map=/home/jan/y

RE: running xenomai through scan-build or: some 100 issues with static code analysis

2019-10-14 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 14. Oktober 2019 12:09
> To: Lange Norbert ; Xenomai
> 
> Subject: Re: running xenomai through scan-build or: some 100 issues with
> static code analysis
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Readding the list.
>
> On 14.10.19 11:36, Lange Norbert wrote:
> >> -Original Message-
> >> From: Xenomai  On Behalf Of Jan
> Kiszka
> >> via Xenomai
> >> Sent: Montag, 14. Oktober 2019 08:26
> >> To: Norbert Lange ; Xenomai
> >> 
> >> Subject: Re: running xenomai through scan-build or: some 100 issues
> >> with static code analysis
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 13.10.19 21:49, Norbert Lange via Xenomai wrote:
> >>> Hello,
> >>>
> >>> I did run some static analysis tools over xenomai 3.1rc2 userspace
> >>> libraries, and there seems to be alot of real issues.
> >>>
> >>> The tools are clangs builtin statical analysis and clang-tidy,
> >>> naturally there is some overlap in the reports.
> >>> clang-tidy would need to be configured to fit Xenomai's practices
> >>> (there is a ton of configurable checks), so this is more of an example.
> >>> The other, clang's statical analysis is more relevant as there are
> >>> very few false positives.
> >>>
> >>> Additionally to the checks, there is a directory failures, files that
> >>> cant be built with clang. Even if no one ships Xenomai built by that
> >>> compiler, fixing those should help, for being able to run those tools
> >>> and several IDE's and Editors already use clangd for code competition
> etc.
> >>>
> >>> I'd hope that such reports could be incorporated into the CI builds.
> >>> running the analysis on cross-builds is alot more daunting, but on
> >>> native builds its rather easy.
> >>
> >> This is generally a valuable thing. Unfortunately, it starts with some more
> >> work: modelling of functions and syscalls that clang has no insight into
> and,
> >> thus, throws false-positives around them. Quickly browsing through the
> >> report, I only saw one real finding so far, and that was a harmless
> "assigned
> >> but never used" warning. But I'm sure that there are a few more severe
> >> issues in that haystack.
> >
> > Did you look only at the tidy-report? The other, website based report has
> tons of valid issues,
> > and clang models POSIX/GNU functions - that’s enough for most of the
> code.
>
> No, I didn't look at the tidy reports, the other ones instead. And a lot
> of those I checked (might not have been representative) stumbled over
> the unknown behavior of Xenomai systems - naturally. Some may have also
> stumbled over missing noreturn annotations. Others simply because the
> graph didn't get that a path was not taken with the assumed input
> values. All normal for static analysis.
>
> >
> > eg.:
> >
> > utils/analogy/calibration_ni_m.c: use after free (your error() macro is
> broken).
>
> A nice example for improper modeling of clang: error[_at_line] will
> never return when passed a non-zero status.

It's actually worse than that, glibc uses a gcc extension __builtin_va_arg_pack,
Which clang does not support but would be necessary to use the inline 
declaration.

>
> The error macro is still wrong because it assumes that this will only
> happen for status < 0.

Yes, thats what I meant.

>
> > lib/copperplate/heapobj-heapmem.c: heapobj_init_array_private with
> size=0 has undefined behavior
> >
> > Every report I looked at seemed valid, those that are questionable, like
> heapobj_init_array_private
> > Are questionable because it's not clear if this function is allowed to be
> called with size=0.
> > Means it's questionable for everyone reading the code, and something an
> analyzer should throw up.
> > Its either a bug or some annotation would help (__builtin_expect(size != 0,
> 1) or assert(size != 0)).
> >
>
> This one indeed requires a closer look and likely proper error handline
> for size == 0.
>
> >>
> >> I was already considering to enable Coverity via our CI. It generally 
> >> works,
> it
> >> has proven to find real issues without too much modelling effort (though
> this
> >> case may be different because of all the custom syscalls), but since
> Synopsis
> >> bought it, the availability and quality of their public OSS service 
> >> massively
> >> degraded.
> >>
> >> So, looking into clang might be a more reliable alternative. I'm open for
> >> patches that pave the way. For CI, we may need a more recent source
> than
> >> clang-6 because that is what Travis provides us ATM.
> >
> > I dont have any experience with travis, but can't you just tell you want a
> recent Ubuntu/Debian?
>
> We cannot easily change that, there are only pre-defined Ubuntu images
> available. At most we can try a dist-upgrade during the test, but that
> may prolong the setup time too much.

There would be clang-8 available in 18.04 "updates" repository:
https://packages.ubuntu.com/search?suit

Re: running xenomai through scan-build or: some 100 issues

2019-10-14 Thread Lange Norbert via Xenomai
> -Original Message-

> From: Xenomai 
> mailto:xenomai-boun...@xenomai.org>> On Behalf 
> Of Jan Kiszka

> via Xenomai

> Sent: Montag, 14. Oktober 2019 08:26

> To: Norbert Lange mailto:nolang...@gmail.com>>; Xenomai 
> mailto:xenomai@xenomai.org>>

> Subject: Re: running xenomai through scan-build or: some 100 issues

> with static code analysis

>

> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR ATTACHMENTS.

>

>

> On 13.10.19 21:49, Norbert Lange via Xenomai wrote:

> > Hello,

> >

> > I did run some static analysis tools over xenomai 3.1rc2 userspace

> > libraries, and there seems to be alot of real issues.

> >

> > The tools are clangs builtin statical analysis and clang-tidy,

> > naturally there is some overlap in the reports.

> > clang-tidy would need to be configured to fit Xenomai's practices

> > (there is a ton of configurable checks), so this is more of an example.

> > The other, clang's statical analysis is more relevant as there are

> > very few false positives.

> >

> > Additionally to the checks, there is a directory failures, files

> > that cant be built with clang. Even if no one ships Xenomai built by

> > that compiler, fixing those should help, for being able to run those

> > tools and several IDE's and Editors already use clangd for code competition 
> > etc.

> >

> > I'd hope that such reports could be incorporated into the CI builds.

> > running the analysis on cross-builds is alot more daunting, but on

> > native builds its rather easy.

>

> This is generally a valuable thing. Unfortunately, it starts with some

> more

> work: modelling of functions and syscalls that clang has no insight

> into and, thus, throws false-positives around them. Quickly browsing

> through the report, I only saw one real finding so far, and that was a

> harmless "assigned but never used" warning. But I'm sure that there

> are a few more severe issues in that haystack.



Did you look only at the tidy-report? The other, website based report has tons 
of valid issues, and clang models POSIX/GNU functions - that's enough for most 
of the code.



eg.:



utils/analogy/calibration_ni_m.c: use after free (your error() macro is broken).

lib/copperplate/heapobj-heapmem.c: heapobj_init_array_private with size=0 has 
undefined behavior



Every report I looked at seemed valid, those that are questionable, like 
heapobj_init_array_private Are questionable because it's not clear if this 
function is allowed to be called with size=0.

Means it's questionable for everyone reading the code, and something an 
analyzer should throw up.

Its either a bug or some annotation would help (__builtin_expect(size != 0, 1) 
or assert(size != 0)).



>

> I was already considering to enable Coverity via our CI. It generally

> works, it has proven to find real issues without too much modelling

> effort (though this case may be different because of all the custom

> syscalls), but since Synopsis bought it, the availability and quality

> of their public OSS service massively degraded.

>

> So, looking into clang might be a more reliable alternative. I'm open

> for patches that pave the way. For CI, we may need a more recent

> source than

> clang-6 because that is what Travis provides us ATM.



I dont have any experience with travis, but can't you just tell you want a 
recent Ubuntu/Debian?



Norbert





This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschr?nkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



xeno_heapcheck kernel module not loadable?

2019-10-11 Thread Lange Norbert via Xenomai
Hello,

I built the xeno_*test components as loadable modules, but it seems 
xeno_heapcheck is either broken or can’t be built as module?
Version is  Xenomai v3.1-rc2

~# modprobe xeno_rtdmtest
~# modprobe xeno_heapcheck
[57980.549627] xeno_heapcheck: Unknown symbol xnthread_relax (err -2)
[57980.555854] xeno_heapcheck: Unknown symbol __rtdm_task_sleep (err -2)
[57980.562332] xeno_heapcheck: Unknown symbol xnheap_init (err -2)
[57980.568274] xeno_heapcheck: Unknown symbol xnthread_harden (err -2)
[57980.574576] xeno_heapcheck: Unknown symbol xnheap_alloc (err -2)
[57980.580609] xeno_heapcheck: Unknown symbol xnclock_core_read_monotonic (err 
-2)
[57980.587946] xeno_heapcheck: Unknown symbol xnheap_destroy (err -2)
[57980.594158] xeno_heapcheck: Unknown symbol rtdm_dev_unregister (err -2)
[57980.600804] xeno_heapcheck: Unknown symbol xnheap_free (err -2)
[57980.606747] xeno_heapcheck: Unknown symbol rtdm_dev_register (err -2)
[57980.618524] xeno_heapcheck: Unknown symbol xnthread_relax (err -2)
[57980.624749] xeno_heapcheck: Unknown symbol __rtdm_task_sleep (err -2)
[57980.631223] xeno_heapcheck: Unknown symbol xnheap_init (err -2)
[57980.637170] xeno_heapcheck: Unknown symbol xnthread_harden (err -2)
[57980.643468] xeno_heapcheck: Unknown symbol xnheap_alloc (err -2)
[57980.649508] xeno_heapcheck: Unknown symbol xnclock_core_read_monotonic (err 
-2)
[57980.656841] xeno_heapcheck: Unknown symbol xnheap_destroy (err -2)
[57980.663056] xeno_heapcheck: Unknown symbol rtdm_dev_unregister (err -2)
[57980.669696] xeno_heapcheck: Unknown symbol xnheap_free (err -2)
[57980.675669] xeno_heapcheck: Unknown symbol rtdm_dev_register (err -2)

Regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Script for namespacing a few intel drivers

2019-10-09 Thread Lange Norbert via Xenomai


I managed to build a kernel with statically linked igb, e1000 and e1000e for 
linux and rtnet, after running the below script to namespace those drivers (I 
only use [rt_]igb, but this driver needs symbols from e1000).
Seems to basically work, with some caveats that might only relate to my changes.

Even if you don't use both drivers at one, I am not sure if there aren't some 
definitions leaking from the linux headers right now, so a cleanup might be 
useful anyway.


Norbert

---
# run me in kernel/drivers/net/drivers

sedreplace() {
(
TMPFILE=/tmp/functions.lst
{
grep -roh '\bigb_[[:alnum:]_]*[[:space:]]*(' | sed 
's,[[:space:]]*(,,'
grep -roh 'struct[[:space:]]*igb_[[:alnum:]_]*\b' | sed 
's,struct[[:space:]]*,,'
grep -roh '\be1000e*_[[:alnum:]_]*[[:space:]]*(' | sed 
's,[[:space:]]*(,,'
grep -roh 'struct[[:space:]]*e1000e*_[[:alnum:]_]*\b' | sed 
's,struct[[:space:]]*,,'

for ad; do echo $ad; done
} | sort -u >$TMPFILE

while read -r f; do
  printf -- '-e s,\\b%s\\b,rt_%s,g ' $f $f
done < $TMPFILE
)
}

SEDREP=$(sedreplace igb_driver_name igb_driver_version e1000_driver_name 
e1000_driver_version e1000e_driver_name e1000e_driver_version)

sed $SEDREP -e 's,^#include "rt_e1000_,#include "e1000_,' -i $(find -name '*.c' 
-o -name '*.h')

Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-DES

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: [PATCH] Allow RTNet to be builtin kernel

2019-10-09 Thread Lange Norbert via Xenomai
> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 9. Oktober 2019 13:00
> To: Lange Norbert ; François Legal
> 
> Subject: Re: [PATCH] Allow RTNet to be builtin kernel
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 09.10.19 12:49, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Xenomai  On Behalf Of Jan
> Kiszka
> >> via Xenomai
> >> Sent: Freitag, 4. Oktober 2019 16:31
> >> To: François Legal 
> >> Cc: Xenomai 
> >> Subject: Re: [PATCH] Allow RTNet to be builtin kernel
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 04.10.19 16:22, François Legal wrote:
> >>> Le Vendredi, Octobre 04, 2019 15:36 CEST, Jan Kiszka
> >>  a écrit:
> >>>
>  On 04.10.19 13:10, Jan Kiszka wrote:
> > On 03.10.19 10:10, François Legal via Xenomai wrote:
> >> Subject: [PATCH] Allow RTNet to be builtin kernel
> >>
> >> Makefile is changed because when builtin, the modules get loaded
> in
> >> the link order. In the previous version protocols would get loaded
> >> before stackmgr which obviously fails.
> >>
> >> Signed-off-by: François LEGAL 
> >>
> >> ---
> >>
> >> This has been tested on Zynq 7000 Hardware (microzed
> >> board) with MACB/GEM RTNet driver, with linux 4.4.189 vanilla.
> >> Kernel boots up correctly, and the following messages get printed
> >> on the console during the boot :
> >>
> >> [   13.875451] sdhci-pltfm: SDHCI platform and OF driver helper
> >> [   13.881627] sdhci-arasan e010.sdhci: No vmmc regulator found
> >> [   13.887581] sdhci-arasan e010.sdhci: No vqmmc regulator
> found
> >> [   13.927585] mmc0: SDHCI controller on e010.sdhci
> >> [e010.sdhci] using ADMA
> >> [   13.945085] fpga_manager fpga0: Xilinx Zynq FPGA Manager
> >> registered
> >> [   13.951896]
> >> [   13.951896] *** RTnet for Xenomai v3.0.9 ***
> >> [   13.951896]
> >> [   13.959156] RTnet: initialising real-time networking
> >> [   13.964928] RTmac/TDMA: init time division multiple access control
> >> mechanism
> >> [   13.972074] RTmac: init realtime media access control
> >> [   13.977064] RTcfg: init real-time configuration distribution 
> >> protocol
> >> [   13.983752] initializing loopback...
> >> [   13.987295] RTnet: registered rtlo
> >> [   13.993238] RTnet: registered rteth0
> >> [   13.996832] libphy: MACB_mii_bus: probed
> >> [   14.001140] rt_macb: rteth0: Cadence GEM at 0xe000b000 irq 26
> >> (00:0a:35:00:01:22)
> >> [   14.008630] rt_macb: rteth0: attached PHY driver [Marvell 88E1510]
> >> (mii_bus:phy_addr=e000b000.ethernet-:00, irq=-1)
> >> [   14.019845] RTcap: real-time capturing interface
> >> [   14.026562] RTproxy attached to rteth0
> >> [   14.031129] rtnetproxy installed as "rtproxy"
> >> [   14.035693] NET: Registered protocol family 26
> >> [   14.040309] NET: Registered protocol family 17
> >> [   14.044686] can: controller area network core (rev 20120528 abi 9)
> >> [   14.050957] NET: Registered protocol family 29
> >> [   14.055329] can: raw protocol (rev 20120528)
> >> [   14.059683] can: broadcast manager protocol (rev 20120528 t)
> >>
> >> kernel/drivers/net/Kconfig|  1 -
> >> kernel/drivers/net/addons/Kconfig |  4 ++--
> >> kernel/drivers/net/stack/Makefile | 16 
> >> utils/net/rtnet.in | 144
> >> +++--
> >> 4 files changed, 105 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/kernel/drivers/net/Kconfig
> >> b/kernel/drivers/net/Kconfig index ac3bced..49d2402 100644
> >> --- a/kernel/drivers/net/Kconfig
> >> +++ b/kernel/drivers/net/Kconfig
> >> @@ -1,7 +1,6 @@
> >> menu "RTnet"
> >>
> >> config XENO_DRIVERS_NET
> >> -depends on m
> >> select NET
> >> tristate "RTnet, TCP/IP socket interface"
> >>
> >> diff --git a/kernel/drivers/net/addons/Kconfig
> >> b/kernel/drivers/net/addons/Kconfig
> >> index baa6cbc..616ed40 100644
> >> --- a/kernel/drivers/net/addons/Kconfig
> >> +++ b/kernel/drivers/net/addons/Kconfig
> >> @@ -2,7 +2,7 @@ menu "Add-Ons"
> >> depends on XENO_DRIVERS_NET
> >>
> >> config XENO_DRIVERS_NET_ADDON_RTCAP
> >> -depends on XENO_DRIVERS_NET && m
> >> +depends on XENO_DRIVERS_NET
> >> select ETHERNET
> >> tristate "Real-Time Capturing Support"
> >> default n
> >> @@ -18,7 +18,7 @@ config XENO_DRIVERS_NET_ADDON_RTCAP
> >> For further information see Documentation/README.rtcap.
> >>
> >> config XENO_DRIVERS_NET_ADDON_PROXY
> >> -depends on XENO_DRIVERS_NET_RTIPV4 && m
> >> +depends on XENO_DRIVERS_NET_RTIPV4
> >> selec

RE: Static build of rtnet

2019-09-17 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 17. September 2019 09:42
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Static build of rtnet
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.09.19 11:13, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I havent tested this in a while, but building rtnet static will crash the 
> > kernel
> when this module initializes.
> > With the various fixes and cleanups in master/next (like rtdm_available) 
> > that
> might be worth a look?
> >
> > I would hope to build a static kernel one day, and so far there are 2
> roadblocks:
> >
> > -   rtnet (+ rtpacket) crashing when built statically
> >
> > -   symbol nameclashes with linux + rt drivers enabled (I could work on 
> > fixing
> that for rt_igb atleast)
> >
>
> Do you mean removing the "depends on m"?

Yes, ideally I would use a kernel without loadable modules, so kernel 
upgrades/changes don’t affect the rootfs (ideally read-only apart from few 
places).

> Possibly, that moves the
> initialization order in a way that causes troubles. I also just added another 
> case
> that exploits the module [1], but that would be solvable. More critical is
> understanding the crashes.

I had a quick test removing the "depends on m" about a year ago, I brought this 
up now because it might be fitting with the recent cleanups.

Regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Static build of rtnet

2019-09-16 Thread Lange Norbert via Xenomai
Hello,

I havent tested this in a while, but building rtnet static will crash the 
kernel when this module initializes.
With the various fixes and cleanups in master/next (like rtdm_available) that 
might be worth a look?

I would hope to build a static kernel one day, and so far there are 2 
roadblocks:

-   rtnet (+ rtpacket) crashing when built statically

-   symbol nameclashes with linux + rt drivers enabled (I could work on 
fixing that for rt_igb atleast)



Mit besten Grüßen / Kind regards


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: [PATCH 2/2] cobalt: switch hand over status to -ENODEV for non-RTDM fd

2019-08-30 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 29. August 2019 16:52
> To: Lange Norbert ; Philippe Gerum
> ; Xenomai (xenomai@xenomai.org)
> 
> Subject: Re: [PATCH 2/2] cobalt: switch hand over status to -ENODEV for non-
> RTDM fd
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 29.08.19 16:12, Lange Norbert via Xenomai wrote:
> >
> > I ran into a rather big issue with linux filehandles I use Xenomai
> > master on ipipe-core-4.19.60-x86-5 with those patches, (can't be 100%
> > sure its not some kernel/userspace conflict, but I doubt it)
> >
> > What happens is that upon a __cobalt_close with a linux filehande, the
> > syscall sc_cobalt_close returns EBADF, but that means the libc close
> > will never be tried and filehandles are leaking like mad.
> >
>
> Ah, good catch. Looks like Philippe's patch was missing a change to
> rtdm_fd_close().

Yes, but his v3 works.

> Thanks a lot for testing pro-actively!

You are welcome, its less benign than you might think though,
Philippe's patches (allow for device teardown) were requested from myself.

How does Xenomai/cobalt handle kernel/userspace conflicts like these BTW,
Is there some ABI variable that needs to be incremented and can detect 
mismatches?
(if you use an old libcobalt on a new kernel module with patchset or vice 
verse, it would result in leaks or other issues)

Kind regards,
Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: [PATCH 2/2] cobalt: switch hand over status to -ENODEV for non-RTDM fd

2019-08-29 Thread Lange Norbert via Xenomai

I ran into a rather big issue with linux filehandles
I use Xenomai master on ipipe-core-4.19.60-x86-5 with those patches,
(can't be 100% sure its not some kernel/userspace conflict, but I doubt it)

What happens is that upon a __cobalt_close with a linux filehande, the
syscall sc_cobalt_close returns EBADF, but that means the libc close will
never be tried and filehandles are leaking like mad.

test.c
int fileread(const char *pFilename, int *pErrno)
{
int fd = open(pFilename, O_RDONLY | FILEIO_OPENOPTS);
bool isOk = true;

if (fd < 0) {
*pErrno = errno;
return -1;
}

char buffer[1024];

ssize_t result = read(fd, buffer, sizeof(buffer));

if (result < 0) {
*pErrno = errno;
isOk = false;
}

close(fd);
return isOk ? (int)(result) : -1;
}


int main(int argc, char **argv)
{
if (argc != 2)
return -1;

int err;

for (;;) {
if(fileread(argv[1], &err) < 0)
{
perror("read failed: ");
break;
}
usleep(100);
}
}
---

Kind regards, Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Philippe
> Gerum via Xenomai
> Sent: Donnerstag, 20. Juni 2019 19:30
> To: xenomai@xenomai.org
> Subject: [PATCH 2/2] cobalt: switch hand over status to -ENODEV for non-
> RTDM fd
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> Having the RTDM core return -EBADF to indicate that it does not manage a
> file descriptor is a problem, as several drivers also raise this error to 
> notify
> userland about an aborted wait due to a connection being dismantled (e.g.
> RTnet). In this case, libcobalt ends up forwarding the aborted request to the
> glibc, which is wrong.
>
> Switch from -EBADF to -ENODEV to notify userland that RTDM does not
> manage a file descriptor, which cannot conflict with any sensible return from
> active I/O operations. This is also consistent with the status RTDM currently
> returns to notify that it cannot handle a device open request.
> ---
>  kernel/cobalt/rtdm/fd.c |  7 ---
>  lib/cobalt/rtdm.c   | 46 -
>  2 files changed, 27 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/cobalt/rtdm/fd.c b/kernel/cobalt/rtdm/fd.c index
> f3b6444c3..75c4acf28 100644
> --- a/kernel/cobalt/rtdm/fd.c
> +++ b/kernel/cobalt/rtdm/fd.c
> @@ -204,8 +204,9 @@ int rtdm_fd_register(struct rtdm_fd *fd, int ufd)
>   * @param[in] ufd User-side file descriptor
>   * @param[in] magic Magic word for lookup validation
>   *
> - * @return Pointer to the RTDM file descriptor matching @a ufd, or
> - * ERR_PTR(-EBADF).
> + * @return Pointer to the RTDM file descriptor matching @a
> + * ufd. Otherwise ERR_PTR(-ENODEV) is returned if the use-space handle
> + * is either invalid or not managed by RTDM.
>   *
>   * @note The file descriptor returned must be later released by a call
>   * to rtdm_fd_put().
> @@ -221,7 +222,7 @@ struct rtdm_fd *rtdm_fd_get(int ufd, unsigned int
> magic)
> xnlock_get_irqsave(&fdtree_lock, s);
> fd = fetch_fd(p, ufd);
> if (fd == NULL || (magic != 0 && fd->magic != magic)) {
> -   fd = ERR_PTR(-EBADF);
> +   fd = ERR_PTR(-ENODEV);
> goto out;
> }
>
> diff --git a/lib/cobalt/rtdm.c b/lib/cobalt/rtdm.c index 176210ddc..506302d26
> 100644
> --- a/lib/cobalt/rtdm.c
> +++ b/lib/cobalt/rtdm.c
> @@ -123,7 +123,7 @@ COBALT_IMPL(int, close, (int fd))
>
> pthread_setcanceltype(oldtype, NULL);
>
> -   if (ret != -EBADF && ret != -ENOSYS)
> +   if (ret != -ENODEV && ret != -ENOSYS)
> return set_errno(ret);
>
> return __STD(close(fd));
> @@ -154,7 +154,7 @@ COBALT_IMPL(int, fcntl, (int fd, int cmd, ...))
>
> ret = XENOMAI_SYSCALL3(sc_cobalt_fcntl, fd, cmd, arg);
>
> -   if (ret != -EBADF && ret != -ENOSYS)
> +   if (ret != -ENODEV && ret != -ENOSYS)
> return set_errno(ret);
>
> return __STD(fcntl(fd, cmd, arg)); @@ -171,7 +171,7 @@
> COBALT_IMPL(int, ioctl, (int fd, unsigned int request, ...))
> va_end(ap);
>
> ret = do_ioctl(fd, request, arg);
> -   if (ret != -EBADF && ret != -ENOSYS)
> +   if (ret != -ENODEV && ret != -ENOSYS)
> return set_errno(ret);
>
> return __STD(ioctl(fd, request, arg)); @@ -187,7 +187,7 @@
> COBALT_IMPL(ssize_t, read, (int fd, void *buf, size_t nbyte))
>
> pthread_setcanceltype(oldtype, NULL);
>
> -   if (ret != -EBADF && ret != -ENOSYS)
> +   if (ret != -ENODEV && ret != -ENOSYS)
> return set_errno(ret);
>
> return __STD(read(fd, buf, nbyte)); @@ -203,7 +203,7 @@
> COBALT_IMPL(ssize_t, write, (int fd, const void *buf, size_t nbyte))
>
> pthread_setcanceltype(oldtype, NULL);
>
> -   if (ret != 

RE: affinity of main thread is bound to current core

2019-08-22 Thread Lange Norbert via Xenomai
> -Original Message-
> From: Philippe Gerum 
> Sent: Donnerstag, 22. August 2019 17:29
> To: Lange Norbert ; Jan Kiszka
> ; Xenomai (xenomai@xenomai.org)
> 
> Subject: Re: affinity of main thread is bound to current core
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 8/22/19 5:16 PM, Lange Norbert via Xenomai wrote:
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Donnerstag, 22. August 2019 16:52
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: affinity of main thread is bound to current core
> >>
> >> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE,
> PLEASE
> >> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 21.08.19 14:12, Lange Norbert via Xenomai wrote:
> >>> Hello,
> >>>
> >>> I use Xenomai master on ipipe-core-4.19.60-x86-5.
> >>> I start out with an affinity mask of 0xF, in the function
> >>> cobalt_init_2,
> >> pthread_setschedparam will get called, after the syscall
> >> sc_cobalt_thread_setschedparam_ex the affinity mask will contain a
> >> single CPU (supposedly the current one).
> >>>
> >>> All methods to control affinity are executed before this point
> >>> (cmdline args,
> >> /proc/xenomai/affinity), so there is no working way to control it.
> >>>
> >>
> >> Not sure I get the problem yet: "some-xenomai-app --cpu-affinity X"
> >> seems to work fine, so does setting /proc/xenomai/affinity. Can you
> >> describe more concretely what behaves unexpectedly, maybe with a test
> case?
> >
> > Well, it does not work for me.
> >
> > -- file test.c
> > #define _GNU_SOURCE
> > #include 
> >
> > #include 
> > #include 
> >
> > static void printaff()
> > {
> > cpu_set_t cpuset;
> > pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
> > printf("Affinity: ");
> > for(unsigned i = 0; i < 256; ++i)
> > {
> > if(CPU_ISSET(i, &cpuset))
> > printf("%u,", i);
> > }
> > printf("\n");
> > }
> >
> > int main(int argc, char const *argv[]) { printaff(); return 0; }
> > --
> >
> > root@buildroot:~# cat /proc/xenomai/affinity 000f
> >
> > root@buildroot:~# /tmp/a.out
> > Affinity: 2,
> >
> > root@buildroot:~# /tmp/a.out app --cpu-affinity 0,1,2,3
> > Affinity: 3,
> >
> > I traced the point where affinity collapsed to the
> > sc_cobalt_thread_setschedparam_ex call.
> >
>
> Feature, note bug. CPU migration is at odds with (very) low-latency
> requirement, so cobalt pins the thread to one of the cores defined by --cpu-
> affinity, preferably the current one. There is no point in having more than a
> single core in the affinity mask for a real-time thread: we simply don't want
> that one to be involved in any load balancing strategy.

Why is all that trickery in setup reading /proc/xenomai/affinity,
parsing --cpu-affinity, and setting the resulting affinity when
soon thereafter *the same setup* will call pthread_setschedparam,
destroying its previous work?

Is that a feature to warm up the CPU, aside of sending me on a
goose chase...

This is the C main thread, even the most unnecessary thread I start with
__STD(pthread_create) will inherit that mask.
Sorry, but I have a hard time figuring out a single argument for that "feature".

I would understand if you bind the thread to a core, potentially affecting the 
affinity mask,
as soon the user explicitly kicked that thread into a realtime priority.
Is the affinity mask even affecting cobalt threads that don’t get demoted 
(would be a precondition for low-latency)?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: affinity of main thread is bound to current core

2019-08-22 Thread Lange Norbert via Xenomai
> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 22. August 2019 16:52
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: affinity of main thread is bound to current core
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 21.08.19 14:12, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I use Xenomai master on ipipe-core-4.19.60-x86-5.
> > I start out with an affinity mask of 0xF, in the function cobalt_init_2,
> pthread_setschedparam will get called, after the syscall
> sc_cobalt_thread_setschedparam_ex the affinity mask will contain a single
> CPU (supposedly the current one).
> >
> > All methods to control affinity are executed before this point (cmdline 
> > args,
> /proc/xenomai/affinity), so there is no working way to control it.
> >
>
> Not sure I get the problem yet: "some-xenomai-app --cpu-affinity X" seems
> to work fine, so does setting /proc/xenomai/affinity. Can you describe more
> concretely what behaves unexpectedly, maybe with a test case?

Well, it does not work for me.

-- file test.c
#define _GNU_SOURCE
#include 

#include 
#include 

static void printaff()
{
cpu_set_t cpuset;
pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
printf("Affinity: ");
for(unsigned i = 0; i < 256; ++i)
{
if(CPU_ISSET(i, &cpuset))
printf("%u,", i);
}
printf("\n");
}

int main(int argc, char const *argv[])
{
printaff();
return 0;
}
--

root@buildroot:~# cat /proc/xenomai/affinity
000f

root@buildroot:~# /tmp/a.out
Affinity: 2,

root@buildroot:~# /tmp/a.out app --cpu-affinity 0,1,2,3
Affinity: 3,

I traced the point where affinity collapsed to the
sc_cobalt_thread_setschedparam_ex call.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



affinity of main thread is bound to current core

2019-08-21 Thread Lange Norbert via Xenomai
Hello,

I use Xenomai master on ipipe-core-4.19.60-x86-5.
I start out with an affinity mask of 0xF, in the function cobalt_init_2,  
pthread_setschedparam will get called, after the syscall 
sc_cobalt_thread_setschedparam_ex the affinity mask will contain a single CPU 
(supposedly the current one).

All methods to control affinity are executed before this point (cmdline args, 
/proc/xenomai/affinity), so there is no working way to control it.

Kind regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschr?nkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: [PATCH] cobalt: remove call to sigprocmask

2019-07-12 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Xenomai  On Behalf Of Jan Kiszka
> via Xenomai
> Sent: Freitag, 12. Juli 2019 15:07
> To: Norbert Lange ; xenomai@xenomai.org
> Subject: Re: [PATCH] cobalt: remove call to sigprocmask
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 12.07.19 13:18, Norbert Lange via Xenomai wrote:
> > sigprocmask should not be used in multithreaded applications, doing so
> > is "unspecified".
>
> Is this more than cosmetic on Linux? Then it should be documented here
> because we shipped the change in 3.0.9, and if that may have caused
> regressions, users should know.

There is no practical issue to be expected, AFAIK with NPTL at least.
The only issue could be, that libpthread is not pulled in (might need to run 
some setup), but that’s ensured with libcobalt.

Sorry for the mess, must have send a stale patch in the first place (always get 
problems with whitespace issues).

Norbert




This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Lange Norbert via Xenomai
> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 10. Juli 2019 23:31
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) ; Philippe Gerum
> 
> Subject: Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 09.07.19 19:54, Jan Kiszka wrote:
> > On 09.07.19 18:33, Jan Kiszka wrote:
> >> On 09.07.19 18:21, Lange Norbert wrote:
> >>> Hello,
> >>>
> >>> maxcpus=1 still causes the spurious int, this time fully locking up.
> >>>
> >>> I attached the debug/irq directory after the cause.
>  Some things that might be relevant:
> >>> -   the SOC would use PINCTRL_BROXTON under linux, but this is disabled
> (not fixed up for Xenomai)
> >>> -   I have the regular igb driver in use, and am unbinding the network
> card prior to binding the rt_igp driver
> >>>
> >>
> >> Thanks. What's the interrupt number that Xenomai is using? Should be
> >> the same that the Linux driver is using as well.
> >
> > Found already: Should be IRQ 130-132 for device 00:03.0. If the
> > directory state was like that while Xenomai was still holding those
> > interrupts, the problem it that there are no vectors assigned to them.
> > Can you confirm that rt_igb was still loaded and the interface was up?
> >
> > Are those interrupts MSI or MSI-X? Can't read that from the logs.
> >
> > I probably need to get some rt_igb running somewhere...
> >
>
> Still no luck, even on a box with a igb-driven NIC (I350):
>
> [  667.928036] rt_igb :06:00.1: Intel(R) Gigabit Ethernet Network
> Connection [  667.928064] rt_igb :06:00.1: rteth0: (PCIe:5.0Gb/s:Width
> x4) 00:25:90:5d:10:19 [  667.928149] rt_igb :06:00.1: rteth0: PBA No:
> 010A00-000 [  667.928153] rt_igb :06:00.1: Using MSI-X interrupts. 1 rx
> queue(s), 1 tx queue(s) xeon-d:~ # cat /proc/xenomai/irq
>   IRQ CPU0...CPU15
>47:   0...   79 rteth0-TxRx-0
>
> I'm currently using the two attached patches on top of ipipe-core-4.19.57-
> x86-3.

With those 2 patches ist now fixed on my end,
So far I used this:

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6c279e065879..d503b875f086 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
 ipipe_root_only();

 raw_spin_lock_irqsave(&desc->lock, flags);
-if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
+if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
+!WARN_ON(irq_activate(desc))) {
 desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
 chip->irq_startup(&desc->irq_data);
 }

>
> Did you cross-check if the running kernel contains the fix(es)?

Yes, the old one.
Thanks for the fix.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: Best way to detect if a filedescriptor is a cobalt filedescriptor (/socket)

2019-07-10 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 10. Juli 2019 08:13
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) ; Philippe Gerum
> 
> Subject: Re: Best way to detect if a filedescriptor is a cobalt filedescriptor
> (/socket)
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 09.07.19 16:49, Lange Norbert via Xenomai wrote:
> > Hi,
> >
> > I am opening a packetsocket, which is supposed to be realtime.
> > Unfortunatly if the rtpacket (rtnet?) module is not loaded, then this will
> just silently fall back to a linux packet socket. Then later demote thread
> during accesses.
> >
> > How would I be able to detect this early during startup? I could
> __STD(close) the descriptor and check the returncode for EBADF I suppose...
> >
>
> Yeah, looks this is some feature we lost while embedding the RTDM file
> descriptor range into the regular Linux space.

My scheme does not work either, __STD(close) seems to return 0 for the cobalt 
fd,
But doesn’t seem to do anything beyond that.

Note: I am using Philippe's "cobalt: switch hand over status to -ENODEV for 
non-RTDM fd" patch,
so potential it’s a regression of this patch.

> We could either add a compile-time or runtime feature to libcobalt that
> permits to disable this silent fallback again or introduce alternative open 
> and
> socket implementations that do not expose this behavior. Spontaneously, I
> would be in favor or a runtime switch for the existing implementations.

I assumed that __RT(open) would call the __cobalt_open function (without 
fallback),
__STD(open) would call libc' open, and  __wrap_open would be the only function 
trying both.
Turns out that __wrap and __cobalt are identical, but I don’t understand the 
reasoning behind it.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



  1   2   >