Re: INTR-REMAP error with UDD driver

2019-11-13 Thread Jan Kiszka via Xenomai

On 14.11.19 06:05, Jeff Webb via Xenomai wrote:

I would like to revive this thread from several months ago:

https://xenomai.org/pipermail/xenomai/2019-March/040498.html

The issue is that on some hardware (a specific rack-mount PC with a PICMG 
daughtercard on a backplane containing PCI and PCIe slots) I get an INTR-REMAP 
error when trying to receive legacy (not MSI) interrupts from a custom 
FPGA-based PCI card using a UDD driver.  The card did work properly in one out 
of the five PCI slots on that machine, but UDD interrupts did not work in the 
other four slots.

Please review the original thread for more details about the specific error.

Here are a few more tidbits I have gathered:

- The UDD driver / userspace code works fine on the other hardware

- The UDD driver / userspace code works fine in one PCI slot out of five on 
this hardware.

- With another backplane model, but same processor card, the problem occurs in 
all four of the PCI slots.

- An almost identical pure-linux UIO version of the driver / userspace code 
works in all the cases I tested, even when the UDD version fails, and even with 
the same xenomai-patched kernel used for UDD testing.

In one of the previous posts in this thread a few months ago, Per Öberg 
mentioned experiencing something similar.  Based on the information that was 
shared, I tried my code with linux version 4.9.38, but it still failed.  This 
prompted me to try other linux / ipipe / xenomai combinations.  These are my 
findings:

Interrupts work:
xenomai-2.6.5   ipipe-core-3.18.20-x86-7.patch  (2016-07-05)
xenomai-3.0.9+  ipipe-core-3.18.20-x86-7.patch  (2016-07-05)
xenomai-3.0.9+  ipipe-core-4.1.18-x86-9.patch   (2017-05-25)

INTR-REMAP error:
xenomai-3.0.9+  ipipe-core-4.4.43-x86-6.patch   (2017-02-25)
xenomai-3.0.9+  ipipe-core-4.4.43-x86-7.patch   (2017-05-25)
xenomai-3.0.9+  ipipe-core-4.4.43-x86-8.patch   (2017-06-14)
xenomai-3.1-rc3 ipipe-core-4.4.196-cip38-x86-19.patch (2019-11-04)
xenomai-3.0.9+  ipipe-core-4.9.38-x86-4.patch   (2017-10-03)
xenomai-3.0.9   ipipe-core-4.14.132-x86-6.patch (2019-07-03)

The Xenomai 2.6.5 version of course does not use UDD, but uses the old 
pthread_intr_* userspace functions.

Hopefully this additional information can shed a little light on the matter.



This sounds like some RT interrupt enabling issue related to the IOAPIC 
in the x86 I-pipe patch. Please also test 4.19.


Are you using UDD_IRQ_CUSTOM or do you leave the interrupt registration 
to the UDD core?


And please share your kernel config.

BTW, interrupt remapping issues can be worked around by disabling the 
interrupt remapping feature (e.g. "intremap=off"). But that does not 
solve the unterlying issue, of course.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Xenomai crashes when braking into the debugger

2019-11-13 Thread Jan Kiszka via Xenomai

On 14.11.19 02:58, Jeff Webb via Xenomai wrote:

Lange Norbert via Xenomai wrote:

From: Jan Kiszka 
On 13.11.19 16:18, Lange Norbert via Xenomai wrote:

I am running into some bad issues with debugging, can't really narrow
down when they happen, but usually when I run through GDB and want to
"break" (pause execution), it seems to be related to *other* Xenomai

programs running at the same time (as said its hard to narrow down).

We have a gdb test case. Does it trigger for you as well when you run some
other program in parallel?

Also, could you provide the kernel full log? Possibly, enabling the I-pipe
tracer with panic dump could be useful as well. But the most important step
would be to create reproducibility for a third party like me.


Currently the issue is gone, and I don't have time for researching the cause.
is panic dump a kernel compilation config?


I think one of my colleagues has experienced something similar.
He said that a when one application was stopped in a breakpoint,
it caused sem_timedwait calls in another application to not time
out until execution of the other program was resumed.  I will ask
and see if he can put together a reproducible test case.  I know
the problem was repeatable at one point with the two applications
he was working with.


This particular behavior is solved in 3.1 by 
https://gitlab.denx.de/Xenomai/xenomai/commit/9ebc2b6ea49406026e9e69d8fa490b3f8d8f0a24.




I have personally experienced what seems (to me) to be a similar
issue involving signal handling where a signal handling thread
received a SIGINT via sigwait (other threads had SIGINT blocked),
and tried to set a global variable that should have caused the
other threads to terminate.  The other threads had an issue where
they would not wake up from sem_timedwait calls (or even sleep
calls) after the SIGINT was received by the other thread, so they
would not terminate properly.  The same code worked fine under
Xenomai 2.6.  I tried to create a standalone example to reproduce
this today, but I could recreate the problem.  I know it was very
reproducible when I was constructing a work-around for it.

Could it be that some fault occurs that causes subsequent bad
behavior with respect to signal handling (SIGINT/debugging) that
is fixed by a reboot?

Just trying to shed some light on the problem.  I think there is
a bug here somewhere...


Stand-alone test cases or test sequences are always welcome! Just please 
also make sure 3.1-rc as debugging code changed there quite a bit.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Fosdem?

2019-11-13 Thread Pierre FICHEUX via Xenomai
It's a must go.

One or 2 guys from Smile are Buildroot developers. The BR dev days meeting
is hosted by Google just after FOSDEM.

https://www.elinux.org/Buildroot:DeveloperDaysFOSDEM2019

Should be fine to get a similar approach (?)


Le mer. 13 nov. 2019 à 09:12, Jan Leupold via Xenomai 
a écrit :

> Am 11.11.19 um 10:32 schrieb Jan Kiszka via Xenomai:
> > On 11.11.19 10:20, Pierre FICHEUX via Xenomai wrote:
> >> Should be fine. IMHO Xenomai project is a must but lacks communication
> and
> >> events (compared to PREMPT_RT) :-)
> >>
> >
> > I don't disagree. And I welcome when things are not only kicked off by
> its
> > maintainer.
> >
> > So: Who else from the community plans to go to Fosdem or would do so if
> we
> > organize a meetup?
>
> This would be a reason for us to mark Fosdem as a "must go" event!
>
> Jan
>
> >
> > Topics for discussion or potentially presentation that come to my mind:
> >
> >  - 3.1 summary
> >  - status and plans for CI and testing
> >  - Dovetail/Steely walk-through
> >  - future of RTnet (modernization needs, TSN, ...)
> >  - 
> >
> > Jan
> >
> >> Le dim. 10 nov. 2019 à 20:09, Jan Kiszka  a écrit :
> >>
> >>> On 10.11.19 19:31, Pierre FICHEUX via Xenomai wrote:
>  Hi,
> 
>  Two years ago (?) there was a Xenomai meeting during FOSDEM, is there
> any
>  plan for a new one in 2020 ?
> >>>
> >>> Thanks for bringing this up! There is no plan yet, but if folks
> indicate
> >>> interest, we could try to organize a meeting again. May it be offsite
> as
> >>> last time or informal during the event.
> >>>
> >>> Jan
> >>>
> >
>
>
> --
> _
> R-S-I Elektrotechnik GmbH & Co. KG
> Woelkestrasse 11
> D-85301 Schweitenkirchen
> Fon: +49 8444 9204-0
> Fax: +49 8444 9204-50
> www.rsi-elektrotechnik.de
>
> _
> Amtsgericht Ingolstadt - GmbH: HRB 191328 - KG: HRA 170363
> Geschäftsführer: Dr.-Ing. Michael Sorg, Dipl.-Ing. Franz Sorg
> USt-IdNr.: DE 128592548
>
>
>

-- 

Pierre FICHEUX -/- CTO Smile ECS, France -\- pierre.fich...@smile.fr
 http://www.smile.fr
 https://smile.eu/fr/offres/embarque-iot
I would love to change the world, but they won't give me the source code


RE: INTR-REMAP error with UDD driver

2019-11-13 Thread Jeff Webb via Xenomai
I would like to revive this thread from several months ago:

https://xenomai.org/pipermail/xenomai/2019-March/040498.html

The issue is that on some hardware (a specific rack-mount PC with a PICMG 
daughtercard on a backplane containing PCI and PCIe slots) I get an INTR-REMAP 
error when trying to receive legacy (not MSI) interrupts from a custom 
FPGA-based PCI card using a UDD driver.  The card did work properly in one out 
of the five PCI slots on that machine, but UDD interrupts did not work in the 
other four slots.

Please review the original thread for more details about the specific error.

Here are a few more tidbits I have gathered:

- The UDD driver / userspace code works fine on the other hardware

- The UDD driver / userspace code works fine in one PCI slot out of five on 
this hardware.

- With another backplane model, but same processor card, the problem occurs in 
all four of the PCI slots.

- An almost identical pure-linux UIO version of the driver / userspace code 
works in all the cases I tested, even when the UDD version fails, and even with 
the same xenomai-patched kernel used for UDD testing.

In one of the previous posts in this thread a few months ago, Per Öberg 
mentioned experiencing something similar.  Based on the information that was 
shared, I tried my code with linux version 4.9.38, but it still failed.  This 
prompted me to try other linux / ipipe / xenomai combinations.  These are my 
findings:

Interrupts work:
xenomai-2.6.5   ipipe-core-3.18.20-x86-7.patch  (2016-07-05)
xenomai-3.0.9+  ipipe-core-3.18.20-x86-7.patch  (2016-07-05)
xenomai-3.0.9+  ipipe-core-4.1.18-x86-9.patch   (2017-05-25)

INTR-REMAP error:
xenomai-3.0.9+  ipipe-core-4.4.43-x86-6.patch   (2017-02-25)
xenomai-3.0.9+  ipipe-core-4.4.43-x86-7.patch   (2017-05-25)
xenomai-3.0.9+  ipipe-core-4.4.43-x86-8.patch   (2017-06-14)
xenomai-3.1-rc3 ipipe-core-4.4.196-cip38-x86-19.patch (2019-11-04)
xenomai-3.0.9+  ipipe-core-4.9.38-x86-4.patch   (2017-10-03)
xenomai-3.0.9   ipipe-core-4.14.132-x86-6.patch (2019-07-03)

The Xenomai 2.6.5 version of course does not use UDD, but uses the old 
pthread_intr_* userspace functions.

Hopefully this additional information can shed a little light on the matter.

Thanks in advance for any input you can provide,

-Jeff Webb




Xenomai crashes when braking into the debugger

2019-11-13 Thread Jeff Webb via Xenomai
Jeff Webb wrote:
> I have personally experienced what seems (to me) to be a similar
> issue involving signal handling where a signal handling thread
> received a SIGINT via sigwait (other threads had SIGINT blocked),
> and tried to set a global variable that should have caused the
> other threads to terminate.  The other threads had an issue where
> they would not wake up from sem_timedwait calls (or even sleep
> calls) after the SIGINT was received by the other thread, so they
> would not terminate properly.

I forgot to mention that I solved this issue by creating the
signal handling thread using __STD(pthread_create) instead of the
cobalt version.  When this linux thread received a SIGINT, the
other threads continued to operate normally.  There seemed to be
a difference in behavior even though I also used SCHED_OTHER with
the cobalt pthread_create call.  This may be a clue...  If I can
reproduce the issue again with a simple test case, I will post
it.  I attempted to do this earlier today, but I couldn't
recreate the problem.

Thanks,

-Jeff




RE: Xenomai crashes when braking into the debugger

2019-11-13 Thread Jeff Webb via Xenomai
Lange Norbert via Xenomai wrote:
> > From: Jan Kiszka 
> > On 13.11.19 16:18, Lange Norbert via Xenomai wrote:
> > > I am running into some bad issues with debugging, can't really narrow
> > > down when they happen, but usually when I run through GDB and want to
> > > "break" (pause execution), it seems to be related to *other* Xenomai
> > programs running at the same time (as said its hard to narrow down).
> >
> > We have a gdb test case. Does it trigger for you as well when you run some
> > other program in parallel?
> >
> > Also, could you provide the kernel full log? Possibly, enabling the I-pipe
> > tracer with panic dump could be useful as well. But the most important step
> > would be to create reproducibility for a third party like me.
>
> Currently the issue is gone, and I don't have time for researching the cause.
> is panic dump a kernel compilation config?

I think one of my colleagues has experienced something similar.
He said that a when one application was stopped in a breakpoint,
it caused sem_timedwait calls in another application to not time
out until execution of the other program was resumed.  I will ask
and see if he can put together a reproducible test case.  I know
the problem was repeatable at one point with the two applications
he was working with.

I have personally experienced what seems (to me) to be a similar
issue involving signal handling where a signal handling thread
received a SIGINT via sigwait (other threads had SIGINT blocked),
and tried to set a global variable that should have caused the
other threads to terminate.  The other threads had an issue where
they would not wake up from sem_timedwait calls (or even sleep
calls) after the SIGINT was received by the other thread, so they
would not terminate properly.  The same code worked fine under
Xenomai 2.6.  I tried to create a standalone example to reproduce
this today, but I could recreate the problem.  I know it was very
reproducible when I was constructing a work-around for it.

Could it be that some fault occurs that causes subsequent bad
behavior with respect to signal handling (SIGINT/debugging) that
is fixed by a reboot?

Just trying to shed some light on the problem.  I think there is
a bug here somewhere...

Thanks,

-Jeff Webb




RE: Xenomai crashes when braking into the debugger

2019-11-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 13. November 2019 18:42
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Xenomai crashes when braking into the debugger
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.11.19 16:18, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I am running into some bad issues with debugging, can't really narrow
> > down when they happen, but usually when I run through GDB and want to
> > "break" (pause execution), it seems to be related to *other* Xenomai
> programs running at the same time (as said its hard to narrow down).
>
> We have a gdb test case. Does it trigger for you as well when you run some
> other program in parallel?
>
> Also, could you provide the kernel full log? Possibly, enabling the I-pipe
> tracer with panic dump could be useful as well. But the most important step
> would be to create reproducibility for a third party like me.

Currently the issue is gone, and I don't have time for researching the cause.
is panic dump a kernel compilation config?

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



RE: RTnet sendmmsg and ENOBUFS

2019-11-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 13. November 2019 18:39
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: RTnet sendmmsg and ENOBUFS
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.11.19 16:10, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > for one of our applications, we have (unfortunatly) a single ethernet
> connection for Realtime and Nonrealtime.
> >
> > We solve this by sending timeslices with RT first, then filling up the
> > remaining space. When stressing the limits (quite possibly beyond if
> accounting for bugs), the sendmmsg call over a raw socket returns ENOBUFS
> (even with a single small packet).
> > I was expecting this call to just block until the resouces are available.
>
> Blocking would mean that the sites which make buffers available again had to
> signal this. The original design idea was to avoid such overhead and rather
> rely on the applications to schedule their submissions properly and
> preallocate resources accordingly.

Ok.
In other words, this is the same behaviour as using MSG_DONTWAIT
(with a different errno value)

>
> >
> > Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte
> packets (theoretical max).
> >
> > What variables are involved (whats the xenomai buffer limits, are they
> shared or per interface) and choices do I have?
> >
> > - I could send the packages nonblocking and wait or drop the remaining
> > myself
> > - I could deal with ENOBUFS the same way as EAGAIN (is there any
> > difference actually)
> > - I could raise the amount of internal buffer somehow
>
> Check kernel/drivers/net/doc/README.pools
>
> >
> > Also while stresstesting I get these messages:
> >
> > [ 5572.044934] hard_start_xmit returned 16 [ 5572.054989]
> > hard_start_xmit returned 16 [ 5572.064007] hard_start_xmit returned 16
> > [ 5572.067893] hard_start_xmit returned 16 [ 5572.071739]
> > hard_start_xmit returned 16 [ 5572.075586] hard_start_xmit returned 16
> > [ 5575.096116] hard_start_xmit returned 16 [ 5579.377038]
> > hard_start_xmit returned 16
>
> This likely comes from NETDEV_TX_BUSY signaled by the driver. Check the
> one you use for reasons. May include "I don't have buffers left".

Yes it does, I was afraid this would indicate some leaked buffers.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: Xenomai crashes when braking into the debugger

2019-11-13 Thread Jan Kiszka via Xenomai

On 13.11.19 16:18, Lange Norbert via Xenomai wrote:

Hello,

I am running into some bad issues with debugging,
can't really narrow down when they happen, but usually when I run through GDB and want to 
"break" (pause execution),
it seems to be related to *other* Xenomai programs running at the same time (as 
said its hard to narrow down).


We have a gdb test case. Does it trigger for you as well when you run 
some other program in parallel?


Also, could you provide the kernel full log? Possibly, enabling the 
I-pipe tracer with panic dump could be useful as well. But the most 
important step would be to create reproducibility for a third party like me.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: RTnet sendmmsg and ENOBUFS

2019-11-13 Thread Jan Kiszka via Xenomai

On 13.11.19 16:10, Lange Norbert via Xenomai wrote:

Hello,

for one of our applications, we have (unfortunatly) a single ethernet 
connection for Realtime and Nonrealtime.

We solve this by sending timeslices with RT first, then filling up the 
remaining space. When stressing the limits (quite possibly beyond if accounting 
for bugs),
the sendmmsg call over a raw socket returns ENOBUFS (even with a single small 
packet).
I was expecting this call to just block until the resouces are available.


Blocking would mean that the sites which make buffers available again 
had to signal this. The original design idea was to avoid such overhead 
and rather rely on the applications to schedule their submissions 
properly and preallocate resources accordingly.




Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte 
packets (theoretical max).

What variables are involved (whats the xenomai buffer limits, are they shared 
or per interface) and choices do I have?

- I could send the packages nonblocking and wait or drop the remaining myself
- I could deal with ENOBUFS the same way as EAGAIN (is there any difference 
actually)
- I could raise the amount of internal buffer somehow


Check kernel/drivers/net/doc/README.pools



Also while stresstesting I get these messages:

[ 5572.044934] hard_start_xmit returned 16
[ 5572.054989] hard_start_xmit returned 16
[ 5572.064007] hard_start_xmit returned 16
[ 5572.067893] hard_start_xmit returned 16
[ 5572.071739] hard_start_xmit returned 16
[ 5572.075586] hard_start_xmit returned 16
[ 5575.096116] hard_start_xmit returned 16
[ 5579.377038] hard_start_xmit returned 16


This likely comes from NETDEV_TX_BUSY signaled by the driver. Check the 
one you use for reasons. May include "I don't have buffers left".


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Xenomai crashes when braking into the debugger

2019-11-13 Thread Lange Norbert via Xenomai
Hello,

I am running into some bad issues with debugging,
can't really narrow down when they happen, but usually when I run through GDB 
and want to "break" (pause execution),
it seems to be related to *other* Xenomai programs running at the same time (as 
said its hard to narrow down).

Kind regards, Norbert Lange

[10352.719588] I-pipe: Detected stalled head domain, probably caused by a bug.
[10352.719588] A critical section may have been left unterminated.
[10352.733165] CPU: 2 PID: 12883 Comm: aboard_runner Tainted: GW
 4.19.75-xeno7-static #1
[10352.742389] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[10352.751702] I-pipe domain: Linux
[10352.754938] Call Trace:
[10352.757406]  dump_stack+0x82/0xb0
[10352.760735]  ipipe_stall_root+0xc/0x30
[10352.764497]  __ipipe_trap_prologue+0x209/0x210
[10352.768955]  page_fault+0x24/0x5b
[10352.772281] RIP: 0010:xnthread_suspend+0x13/0x4e0
[10352.776992] Code: f8 c3 a5 e8 1f ce f3 ff e9 e4 fe ff ff 66 2e 0f 1f 84 00 
00 00 00 00 e8 bb 0b 87 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 20  c
[10352.795762] RSP: :a45140797e10 EFLAGS: 00010082 ORIG_RAX: 

[10352.803343] RAX:  RBX: eab8 RCX: 
[10352.810485] RDX:  RSI: 0040 RDI: eaf8
[10352.817626] RBP: a4514060ab80 R08:  R09: 
[10352.824770] R10:  R11:  R12: 94477bb30780
[10352.831910] R13:  R14:  R15: 94477bb30400
[10352.839058]  ? __cobalt_clock_nanosleep+0x4b0/0x4b0
[10352.843946]  ? CoBaLt_clock_nanosleep+0x7f/0x100
[10352.848571]  stop_debugged_process+0x51/0x70
[10352.852850]  ipipe_trap_hook+0x2da/0x3f0
[10352.856785]  __ipipe_notify_trap+0x80/0xc0
[10352.860892]  __ipipe_trap_prologue+0x76/0x210
[10352.865259]  ? int3+0x29/0x70
[10352.868236]  int3+0x45/0x70
[10352.871040] RIP: 0033:0x409fb8
[10352.874102] Code: ff 15 34 47 95 00 c7 05 fe e2 99 00 00 00 00 00 48 8d 95 
78 fd ff ff 48 8d 85 70 fe ff ff 48 89 d6 48 89 c7 e8 08 14 00 00 cc  6
[10352.892874] RSP: 002b:7fffe890 EFLAGS: 0297
[10352.898111] RAX:  RBX:  RCX: 0002
[10352.905249] RDX:  RSI:  RDI: 
[10352.912389] RBP: 7fffebb0 R08: 0001 R09: 0015
[10352.919531] R10: 7fffe820 R11: 0246 R12: 00409170
[10352.926671] R13: 7fffec90 R14:  R15: 
[10352.933813]  ? int3+0x29/0x70
[10352.936872] BUG: Unhandled exception over domain Xenomai at 
0xa4b90f23 - switching to ROOT
[10352.945841] CPU: 2 PID: 12883 Comm: aboard_runner Tainted: GW
 4.19.75-xeno7-static #1
[10352.955065] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.16 01/31/2019
[10352.964374] I-pipe domain: Linux
[10352.967611] Call Trace:
[10352.970070]  dump_stack+0x82/0xb0
[10352.973393]  __ipipe_trap_prologue.cold+0x22/0x4e
[10352.978108]  page_fault+0x24/0x5b
[10352.981435] RIP: 0010:xnthread_suspend+0x13/0x4e0
[10352.986144] Code: f8 c3 a5 e8 1f ce f3 ff e9 e4 fe ff ff 66 2e 0f 1f 84 00 
00 00 00 00 e8 bb 0b 87 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 20  c
[10353.004916] RSP: :a45140797e10 EFLAGS: 00010082 ORIG_RAX: 

[10353.012497] RAX:  RBX: eab8 RCX: 
[10353.019637] RDX:  RSI: 0040 RDI: eaf8
[10353.026777] RBP: a4514060ab80 R08:  R09: 
[10353.033918] R10:  R11:  R12: 94477bb30780
[10353.041060] R13:  R14:  R15: 94477bb30400
[10353.048204]  ? __cobalt_clock_nanosleep+0x4b0/0x4b0
[10353.053087]  ? CoBaLt_clock_nanosleep+0x7f/0x100
[10353.057713]  stop_debugged_process+0x51/0x70
[10353.061991]  ipipe_trap_hook+0x2da/0x3f0
[10353.065921]  __ipipe_notify_trap+0x80/0xc0
[10353.070029]  __ipipe_trap_prologue+0x76/0x210
[10353.074393]  ? int3+0x29/0x70
[10353.077369]  int3+0x45/0x70
[10353.080171] RIP: 0033:0x409fb8
[10353.083232] Code: ff 15 34 47 95 00 c7 05 fe e2 99 00 00 00 00 00 48 8d 95 
78 fd ff ff 48 8d 85 70 fe ff ff 48 89 d6 48 89 c7 e8 08 14 00 00 cc  6
[10353.102002] RSP: 002b:7fffe890 EFLAGS: 0297
[10353.107238] RAX:  RBX:  RCX: 0002
[10353.114380] RDX:  RSI:  RDI: 
[10353.121522] RBP: 7fffebb0 R08: 0001 R09: 0015
[10353.128663] R10: 7fffe820 R11: 0246 R12: 00409170
[10353.135803] R13: 7fffec90 R14:  R15: 
[10353.142947]  ? int3+0x29/0x70
[10353.145931] BUG: unable to handle kernel paging request at fcba
[10353.152900] PGD 

RTnet sendmmsg and ENOBUFS

2019-11-13 Thread Lange Norbert via Xenomai
Hello,

for one of our applications, we have (unfortunatly) a single ethernet 
connection for Realtime and Nonrealtime.

We solve this by sending timeslices with RT first, then filling up the 
remaining space. When stressing the limits (quite possibly beyond if accounting 
for bugs),
the sendmmsg call over a raw socket returns ENOBUFS (even with a single small 
packet).
I was expecting this call to just block until the resouces are available.

Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte 
packets (theoretical max).

What variables are involved (whats the xenomai buffer limits, are they shared 
or per interface) and choices do I have?

- I could send the packages nonblocking and wait or drop the remaining myself
- I could deal with ENOBUFS the same way as EAGAIN (is there any difference 
actually)
- I could raise the amount of internal buffer somehow

Also while stresstesting I get these messages:

[ 5572.044934] hard_start_xmit returned 16
[ 5572.054989] hard_start_xmit returned 16
[ 5572.064007] hard_start_xmit returned 16
[ 5572.067893] hard_start_xmit returned 16
[ 5572.071739] hard_start_xmit returned 16
[ 5572.075586] hard_start_xmit returned 16
[ 5575.096116] hard_start_xmit returned 16
[ 5579.377038] hard_start_xmit returned 16

Kind regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: segmentation error when a task ends

2019-11-13 Thread Jan Kiszka via Xenomai

On 10.11.19 18:25, Philippe Gerum wrote:

On 11/7/19 11:12 AM, davy.regn...@univ-grenoble-alpes.fr wrote:

On 06.11.19 15:23, davy.regn...@univ-grenoble-alpes.fr wrote:

On 06.11.19 14:40, davy.regn...@univ-grenoble-alpes.fr wrote:

On 06.11.19 10:18, Davy via Xenomai wrote:


Hi,

Thanks for patching the shared sessions. It works.

Now I have another segmentation error that appears when a task ends
or
is
deleted :



RT_TASK task;
void foo(){
int i = 3;
while(i--){
printf("Hello !\n");
rt_task_sleep(10);
}
return;
}
int main(int argc, char *argv[]){
int n;
if (mlockall( MCL_CURRENT | MCL_FUTURE )!=0)
return 1;
if ((n=rt_task_spawn( , NULL, 0, 99, T_JOINABLE, ,
NULL))!=0){
rt_printf("rt_task_spawn error %d\n",n);
return 1;
}
printf("Join task\n");
rt_task_join();
return EXIT_SUCCESS;
}


$ sudo ./foo
Hello !
Join task
Hello !
Hello !
Erreur de segmentation



Works find here. Could you use a debugger to find out where the
exception is thrown?



I obtain this :

(gdb) run
Starting program: /home/davy/Documents/Programmes_Test/test/foo
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x76db1700 (LWP 14620)]
[New Thread 0x77ff6700 (LWP 14621)]
Hello !
Join task
0"025.863| WARNING: [main] Xenomai compiled with full debug
enabled,
   very high latencies expected
[--enable-debug=full]
Hello !
Hello !

Thread 3 "task@1[14616]" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x77ff6700 (LWP 14621)]
0x77799cf5 in shavlh_link (avl=0x77e47148, holder=0x0,
dir=0)
at ../../include/boilerplate/avl-inner.h:107
107 ptrdiff_t offset = holder->link[avl_type2index(dir)].offset;



Backtrace? Values of the vars in question?


(gdb) backtrace
#0  0x77799cf5 in shavlh_link (avl=0x77e47148, holder=0x0,
dir=0) at ../../include/boilerplate/avl-inner.h:107
#1  0x77799fa7 in shavl_inorder (avl=0x77e47148, holder=0x0,
dir=1) at avl.c:55
#2  0x779b89a5 in shavl_next (avl=0x77e47148,
holder=0x77e4d1c8) at ../../include/boilerplate/avl-inner.h:332
#3  0x779b8fe5 in find_next_neighbour (ext=0x77e47128,
r=0x77e4d1c8) at heapobj-pshared.c:276
#4  0x779b90e0 in release_page_range (ext=0x77e47128,
page=0x77f425c8, size=1024) at heapobj-pshared.c:303
#5  0x779b9f84 in sheapmem_free (heap=0x77e46000,
block=0x77f425c8) at heapobj-pshared.c:594
#6  0x779bbb40 in xnfree (ptr=0x77f425c8) at
heapobj-pshared.c:1203
#7  0x779b46a5 in __threadobj_free (p=0x77f425c8) at
../../include/copperplate/threadobj.h:312
#8  0x779b46d4 in threadobj_free (thobj=0x77f426d0) at
../../include/copperplate/threadobj.h:317
#9  0x779b64d4 in finalize_thread (p=0x77f426d0) at
threadobj.c:1548
#10 0x7735f5f9 in __nptl_deallocate_tsd () at
pthread_create.c:291
#11 0x77360658 in __nptl_deallocate_tsd () at
pthread_create.c:449
#12 start_thread (arg=0x77ff6700) at pthread_create.c:469
#13 0x76e9ad0f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:97



How did you configure Xenomai userspace (beyond --enable-debug=full)?


$ /usr/xenomai/bin/xeno-config --info
Xenomai version: Xenomai/cobalt v3.1-rc3
Linux XportNew 4.19.75-xenomai-3.1-rc3 #3 SMP PREEMPT Wed Nov 6 13:48:28
CET 2019 x86_64 GNU/Linux
Kernel parameters: BOOT_IMAGE=/boot/vmlinuz-4.19.75-xenomai-3.1-rc3
root=/dev/mapper/isw_bdaaiafbje_Volume11 ro dmraid=true quiet splash
nopat
crashkernel=384M-:128M
I-pipe release #7 detected
Cobalt core 3.1-rc3 detected
Compiler: gcc version 4.9.2 (Debian 4.9.2-10+deb8u1)
Build args: --with-core=cobalt --enable-smp --enable-pshared
--enable-debug=full



This is resolving the crash:

diff --git a/lib/boilerplate/avl.c b/lib/boilerplate/avl.c
index 3bf9bf1345..c13ec8a940 100644
--- a/lib/boilerplate/avl.c
+++ b/lib/boilerplate/avl.c
@@ -53,7 +53,7 @@ struct __AVL_T (avlh) * __AVL(inorder)(const struct
__AVL_T(avl) * const avl,
} else {
for (;;) {
next = __AVLH(up)(avl, holder);
-   if (next == __AVL(anchor)(avl))
+   if (!next || next == __AVL(anchor)(avl))
return NULL;
if (holder->type != dir)
break;

But I don't feel like we have a stable setup since 1a79c31c9f8. There
could very well be further cases that now break under the new NULL
return value.

Philippe, please double check all affected code paths.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux




Hi,

I still get an error if I call rt_task_join() after rt_task_delete() (like