RE: [ANNOUNCE] Xenomai v3.1-rc3 available

2019-11-14 Thread Jeff Webb via Xenomai
I am building a Xenomai cobalt kernel under Ubuntu 18.04 based on:

   Linux 4.19.75
   Xenomai 3.1-rc3
   ipipe-core-4.19.75-x86-7.patch

After running prepare-kernel.sh and "makeolddefconfig" (using the unmodified 
default .config), I get this error:

   WARNING: unmet direct dependencies detected for PARAVIRT
 Depends on [n]: HYPERVISOR_GUEST [=y] && !IPIPE [=y]
 Selected by [m]:
 - HYPERV [=m] && X86 [=y] && ACPI [=y] && X86_LOCAL_APIC [=y] && 
HYPERVISOR_GUEST [=y]

I turned off HYPERVISOR_GUEST manually to fix the warning.  I hadn't seen this 
type of error before, but perhaps something could be done in the patch to make 
the default config consistent...  Just FYI.

Otherwise, everything went fine with the build process and initial testing.  I 
will be giving this build some more testing over the next week or so and will 
let you know if I find any issues.

Thanks,

-Jeff





Re: INTR-REMAP error with UDD driver

2019-11-14 Thread Jeff Webb via Xenomai
‐‐‐ Original Message ‐‐‐
On Thursday, November 14, 2019 3:41 PM, Jan Kiszka  
wrote:
> On 14.11.19 14:16, Jeff Webb wrote:
> > ‐‐‐ Original Message ‐‐‐
> > On Thursday, November 14, 2019 1:50 AM, Jan Kiszka jan.kis...@siemens.com 
> > wrote:
> >
> > > On 14.11.19 06:05, Jeff Webb via Xenomai wrote:
> > >
> > > > I would like to revive this thread from several months ago:
> > > > https://xenomai.org/pipermail/xenomai/2019-March/040498.html
> > > > The issue is that on some hardware (a specific rack-mount PC with a 
> > > > PICMG daughtercard on a backplane containing PCI and PCIe slots) I get 
> > > > an INTR-REMAP error when trying to receive legacy (not MSI) interrupts 
> > > > from a custom FPGA-based PCI card using a UDD driver. The card did work 
> > > > properly in one out of the five PCI slots on that machine, but UDD 
> > > > interrupts did not work in the other four slots.
> > > > Please review the original thread for more details about the specific 
> > > > error.
> > > > Here are a few more tidbits I have gathered:
> > > >
> > > > -   The UDD driver / userspace code works fine on the other hardware
> > > >
> > > > -   The UDD driver / userspace code works fine in one PCI slot out of 
> > > > five on this hardware.
> > > >
> > > > -   With another backplane model, but same processor card, the problem 
> > > > occurs in all four of the PCI slots.
> > > >
> > > > -   An almost identical pure-linux UIO version of the driver / 
> > > > userspace code works in all the cases I tested, even when the UDD 
> > > > version fails, and even with the same xenomai-patched kernel used for 
> > > > UDD testing.
> > > >
> > > >
> > > > In one of the previous posts in this thread a few months ago, Per Öberg 
> > > > mentioned experiencing something similar. Based on the information that 
> > > > was shared, I tried my code with linux version 4.9.38, but it still 
> > > > failed. This prompted me to try other linux / ipipe / xenomai 
> > > > combinations. These are my findings:
> > > > Interrupts work:
> > > > xenomai-2.6.5 ipipe-core-3.18.20-x86-7.patch (2016-07-05)
> > > > xenomai-3.0.9+ ipipe-core-3.18.20-x86-7.patch (2016-07-05)
> > > > xenomai-3.0.9+ ipipe-core-4.1.18-x86-9.patch (2017-05-25)
> > > > INTR-REMAP error:
> > > > xenomai-3.0.9+ ipipe-core-4.4.43-x86-6.patch (2017-02-25)
> > > > xenomai-3.0.9+ ipipe-core-4.4.43-x86-7.patch (2017-05-25)
> > > > xenomai-3.0.9+ ipipe-core-4.4.43-x86-8.patch (2017-06-14)
> > > > xenomai-3.1-rc3 ipipe-core-4.4.196-cip38-x86-19.patch (2019-11-04)
> > > > xenomai-3.0.9+ ipipe-core-4.9.38-x86-4.patch (2017-10-03)
> > > > xenomai-3.0.9 ipipe-core-4.14.132-x86-6.patch (2019-07-03)
> > > > The Xenomai 2.6.5 version of course does not use UDD, but uses the old 
> > > > pthread_intr_* userspace functions.
> > > > Hopefully this additional information can shed a little light on the 
> > > > matter.
> > >
> > > This sounds like some RT interrupt enabling issue related to the IOAPIC
> > > in the x86 I-pipe patch. Please also test 4.19.
> >
> > Ok, I will do this.

Today I had a chance to make a build based on the 
ipipe-core-4.19.75-x86-7.patch with xenomai-3.1-rc3.  I haven't had a chance to 
test it thoroughly, but I think it works.  I am sorry I didn't try this 
earlier, and thanks for the reminder to do so.  Since a 4.19 patch wasn't 
available back in early March when I first had the trouble, I got fixated on 
the fact that things worked with very old kernels and forgot to try 4.19 when I 
started looking into this again.  I will continue to test and let you know if I 
find any issues.

> > > Are you using UDD_IRQ_CUSTOM or do you leave the interrupt registration
> > > to the UDD core?
> >
> > I just tell UDD the IRQ number and let it register the interrupt.
> >
> > > And please share your kernel config.
> >
> > I attached one to my original post earlier this year -- you should be able 
> > to download it from the link in the mailing list archive. Let me know if 
> > you need something different. I started with the standard Ubuntu desktop 
> > kernel config and tweaked options from there, so there is a lot of stuff 
> > enabled, obviously.
> >
> > > BTW, interrupt remapping issues can be worked around by disabling the
> > > interrupt remapping feature (e.g. "intremap=off"). But that does not
> > > solve the unterlying issue, of course.
> >
> > I can't remember if I tried this or not. I will give it a go. Obviously, it 
> > would be good to get this fixed in the patch, though.
>
> I've reproduced some problem with INTx/IOAPIC and intremap=on on 4.4 in
> KVM. When we are lucky, it's the same as yours. Will debug that tomorrow.

Thank you so much for looking into this, Jan.

-Jeff




Re: INTR-REMAP error with UDD driver

2019-11-14 Thread Jan Kiszka via Xenomai

On 14.11.19 14:16, Jeff Webb wrote:

‐‐‐ Original Message ‐‐‐
On Thursday, November 14, 2019 1:50 AM, Jan Kiszka  
wrote:

On 14.11.19 06:05, Jeff Webb via Xenomai wrote:

I would like to revive this thread from several months ago:
https://xenomai.org/pipermail/xenomai/2019-March/040498.html
The issue is that on some hardware (a specific rack-mount PC with a PICMG 
daughtercard on a backplane containing PCI and PCIe slots) I get an INTR-REMAP 
error when trying to receive legacy (not MSI) interrupts from a custom 
FPGA-based PCI card using a UDD driver. The card did work properly in one out 
of the five PCI slots on that machine, but UDD interrupts did not work in the 
other four slots.
Please review the original thread for more details about the specific error.
Here are a few more tidbits I have gathered:

-   The UDD driver / userspace code works fine on the other hardware

-   The UDD driver / userspace code works fine in one PCI slot out of five on 
this hardware.

-   With another backplane model, but same processor card, the problem occurs 
in all four of the PCI slots.

-   An almost identical pure-linux UIO version of the driver / userspace code 
works in all the cases I tested, even when the UDD version fails, and even with 
the same xenomai-patched kernel used for UDD testing.


In one of the previous posts in this thread a few months ago, Per Öberg 
mentioned experiencing something similar. Based on the information that was 
shared, I tried my code with linux version 4.9.38, but it still failed. This 
prompted me to try other linux / ipipe / xenomai combinations. These are my 
findings:
Interrupts work:
xenomai-2.6.5 ipipe-core-3.18.20-x86-7.patch (2016-07-05)
xenomai-3.0.9+ ipipe-core-3.18.20-x86-7.patch (2016-07-05)
xenomai-3.0.9+ ipipe-core-4.1.18-x86-9.patch (2017-05-25)
INTR-REMAP error:
xenomai-3.0.9+ ipipe-core-4.4.43-x86-6.patch (2017-02-25)
xenomai-3.0.9+ ipipe-core-4.4.43-x86-7.patch (2017-05-25)
xenomai-3.0.9+ ipipe-core-4.4.43-x86-8.patch (2017-06-14)
xenomai-3.1-rc3 ipipe-core-4.4.196-cip38-x86-19.patch (2019-11-04)
xenomai-3.0.9+ ipipe-core-4.9.38-x86-4.patch (2017-10-03)
xenomai-3.0.9 ipipe-core-4.14.132-x86-6.patch (2019-07-03)
The Xenomai 2.6.5 version of course does not use UDD, but uses the old 
pthread_intr_* userspace functions.
Hopefully this additional information can shed a little light on the matter.


This sounds like some RT interrupt enabling issue related to the IOAPIC
in the x86 I-pipe patch. Please also test 4.19.


Ok, I will do this.


Are you using UDD_IRQ_CUSTOM or do you leave the interrupt registration
to the UDD core?


I just tell UDD the IRQ number and let it register the interrupt.


And please share your kernel config.


I attached one to my original post earlier this year -- you should be able to 
download it from the link in the mailing list archive.  Let me know if you need 
something different.  I started with the standard Ubuntu desktop kernel config 
and tweaked options from there, so there is a lot of stuff enabled, obviously.


BTW, interrupt remapping issues can be worked around by disabling the
interrupt remapping feature (e.g. "intremap=off"). But that does not
solve the unterlying issue, of course.


I can't remember if I tried this or not.  I will give it a go.  Obviously, it 
would be good to get this fixed in the patch, though.



I've reproduced some problem with INTx/IOAPIC and intremap=on on 4.4 in 
KVM. When we are lucky, it's the same as yours. Will debug that tomorrow.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



[PATCH] rtdm: Do not return an error from send/recvmmsg if there are packets

2019-11-14 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

This is in line with Linux behavior.

We likely still miss an equivalent to sk_err in recvmmsg, though.

Reported-by: Lange Norbert 
Signed-off-by: Jan Kiszka 
---
 kernel/cobalt/rtdm/fd.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/cobalt/rtdm/fd.c b/kernel/cobalt/rtdm/fd.c
index 0a1c6e44ed..5b2c3834da 100644
--- a/kernel/cobalt/rtdm/fd.c
+++ b/kernel/cobalt/rtdm/fd.c
@@ -734,14 +734,12 @@ int __rtdm_fd_recvmmsg(int ufd, void __user *u_msgvec, 
unsigned int vlen,
xnlock_put_irqrestore(&nklock, s);
}
 
-   if (datagrams > 0 &&
-   (ret == 0 || ret == -ETIMEDOUT || ret == -EWOULDBLOCK)) {
-   /* NOTE: SO_ERROR should be honored for other errors. */
-   rtdm_fd_put(fd);
-   return datagrams;
-   }
 fail:
rtdm_fd_put(fd);
+
+   if (datagrams > 0)
+   ret = datagrams;
+
 out:
trace_cobalt_fd_recvmmsg_status(current, fd, ufd, ret);
 
@@ -826,13 +824,11 @@ int __rtdm_fd_sendmmsg(int ufd, void __user *u_msgvec, 
unsigned int vlen,
datagrams++;
}
 
-   if (datagrams > 0 && (ret == 0 || ret == -EWOULDBLOCK)) {
-   /* NOTE: SO_ERROR should be honored for other errors. */
-   rtdm_fd_put(fd);
-   return datagrams;
-   }
-
rtdm_fd_put(fd);
+
+   if (datagrams > 0)
+   ret = datagrams;
+
 out:
trace_cobalt_fd_sendmmsg_status(current, fd, ufd, ret);
 
-- 
2.16.4


-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: RTnet sendmmsg and ENOBUFS

2019-11-14 Thread Jan Kiszka via Xenomai

On 14.11.19 19:18, Jan Kiszka wrote:

On 14.11.19 18:55, Lange Norbert wrote:
According to the code in __rtdm_fd_sendmmsg, that’s not what happens, 
ENOBUFS would be returned instead,

And the amount of sent packets is lost forever.

if (datagrams > 0 && (ret == 0 || ret == -EWOULDBLOCK)) {
/* NOTE: SO_ERROR should be honored for other errors. */
rtdm_fd_put(fd);
return datagrams;
}

IMHO this condition would need to added:
((flags | MSG_DONTWAIT) && ret == -ENOBUFS)

(Recvmmsg possibly similarly, havent checked yet)


sendmmsg was only added to Xenomai 3.1. There might be room for 
improvements, if not corrections. So, if we do not return the number of 
sent messages or signal an error where we should not (this is how I read 
the man page currently), this needs a patch...


The implementation of sendmmsg is wrong when comparing it to the man 
page and the reference in the kernel (as this is Linux-only):


/* We only return an error if no datagrams were able to be sent */

says the kernel e.g. and does

if (datagrams != 0)
return datagrams;

It's also missing to trace on certain exits. Will write a patch.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: RTnet sendmmsg and ENOBUFS

2019-11-14 Thread Jan Kiszka via Xenomai

On 14.11.19 18:55, Lange Norbert wrote:

So, for my setup socket_rtskbs is 16, the rt_igp driver rtskbs are 256TX + 
256RX.

As said, our software prepares packets before a timeslice, and would aim to 
minimize systemcalls and interrupts,
packets are sent over raw rtsockets.

if  understand __rtdm_fd_sendmmsg and rt_packet_sendmsg correctly,
sendmsg will pick one socket_rtskbs, copies data from userspace and
then passes this rtskbs to rtdev_xmit.
I don’t see how a free buffers gets passed back, like README.pools describes,
I guess rtskb_acquire should somehow do this.


The buffer returns (not necessarily the same one, though) when the 
packet was truly sent, and the driver ran its TX cleanup. If you submit 
many packets as a chunk, that may block them for a while.




So in short, I am using only one socket_rtskbs temporarily, as the function 
passes
the buffer to the rtdev (rt_igp driver)?


You are using as many rtskbs as it takes to get the data you passed down 
forwarded as packets to the NIC, and that as long as the NIC needs to 
get that data DMA'ed to the transmitter.



I suppose the receive path works similarly.



RX works by accepting a global-pool buffer (this is where incoming 
packets first end up in) filled with data in exchange to an empty rtskb 
from the socket pool. That filled rtskb is put into the socket pool once 
the data was transferred to userspace.




Now if I would want to send nonblocking, ie. as much packets as are possible,
exhausting the rtskbs then I would expect the EAGAIN/EWOULDBLOCK error and 
getting
back the number of successfully queued packets (so I could  drop them and send 
the remaining later).


I don't recall why anymore, but we decided to use a different error code 
in RTnet for this back then, possibly to differentiate this "should 
never ever happen in a deterministic network" from other errors.




According to the code in __rtdm_fd_sendmmsg, that’s not what happens, ENOBUFS 
would be returned instead,
And the amount of sent packets is lost forever.

if (datagrams > 0 && (ret == 0 || ret == -EWOULDBLOCK)) {
/* NOTE: SO_ERROR should be honored for other errors. */
rtdm_fd_put(fd);
return datagrams;
}

IMHO this condition would need to added:
((flags | MSG_DONTWAIT) && ret == -ENOBUFS)

(Recvmmsg possibly similarly, havent checked yet)


sendmmsg was only added to Xenomai 3.1. There might be room for 
improvements, if not corrections. So, if we do not return the number of 
sent messages or signal an error where we should not (this is how I read 
the man page currently), this needs a patch...


Jan


-Original Message-
From: Xenomai  On Behalf Of Lange
Norbert via Xenomai
Sent: Mittwoch, 13. November 2019 18:53
To: Jan Kiszka ; Xenomai
(xenomai@xenomai.org) 
Subject: RE: RTnet sendmmsg and ENOBUFS

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
ATTACHMENTS.



-Original Message-
From: Jan Kiszka 
Sent: Mittwoch, 13. November 2019 18:39
To: Lange Norbert ; Xenomai
(xenomai@xenomai.org) 
Subject: Re: RTnet sendmmsg and ENOBUFS

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR

ATTACHMENTS.



On 13.11.19 16:10, Lange Norbert via Xenomai wrote:

Hello,

for one of our applications, we have (unfortunatly) a single ethernet

connection for Realtime and Nonrealtime.


We solve this by sending timeslices with RT first, then filling up the
remaining space. When stressing the limits (quite possibly beyond if

accounting for bugs), the sendmmsg call over a raw socket returns

ENOBUFS

(even with a single small packet).

I was expecting this call to just block until the resouces are available.


Blocking would mean that the sites which make buffers available again had

to

signal this. The original design idea was to avoid such overhead and rather
rely on the applications to schedule their submissions properly and
preallocate resources accordingly.


Ok.
In other words, this is the same behaviour as using MSG_DONTWAIT
(with a different errno value)





Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte

packets (theoretical max).


What variables are involved (whats the xenomai buffer limits, are they

shared or per interface) and choices do I have?


- I could send the packages nonblocking and wait or drop the remaining
myself
- I could deal with ENOBUFS the same way as EAGAIN (is there any
difference actually)
- I could raise the amount of internal buffer somehow


Check kernel/drivers/net/doc/README.pools



Also while stresstesting I get these messages:

[ 5572.044934] hard_start_xmit returned 16 [ 5572.054989]
hard_start_xmit returned 16 [ 5572.064007] hard_start_xmit returned 16
[ 5572.067893] hard_start_xmit returned 16 [ 5572.071739]
hard_start_xmit returned 16 [ 5572.075586] hard_start_xmit returned 16
[ 5575.096116] hard_start_xmit returned 16 [ 5579.377038]
hard_start_xmit returned 16


This likely comes from NETDEV_TX_BUSY signaled by the driver. Check the
one you use for reasons. May include "I don't have

RE: RTnet sendmmsg and ENOBUFS

2019-11-14 Thread Lange Norbert via Xenomai
So, for my setup socket_rtskbs is 16, the rt_igp driver rtskbs are 256TX + 
256RX.

As said, our software prepares packets before a timeslice, and would aim to 
minimize systemcalls and interrupts,
packets are sent over raw rtsockets.

if  understand __rtdm_fd_sendmmsg and rt_packet_sendmsg correctly,
sendmsg will pick one socket_rtskbs, copies data from userspace and
then passes this rtskbs to rtdev_xmit.
I don’t see how a free buffers gets passed back, like README.pools describes,
I guess rtskb_acquire should somehow do this.

So in short, I am using only one socket_rtskbs temporarily, as the function 
passes
the buffer to the rtdev (rt_igp driver)?
I suppose the receive path works similarly.


Now if I would want to send nonblocking, ie. as much packets as are possible,
exhausting the rtskbs then I would expect the EAGAIN/EWOULDBLOCK error and 
getting
back the number of successfully queued packets (so I could  drop them and send 
the remaining later).

According to the code in __rtdm_fd_sendmmsg, that’s not what happens, ENOBUFS 
would be returned instead,
And the amount of sent packets is lost forever.

if (datagrams > 0 && (ret == 0 || ret == -EWOULDBLOCK)) {
/* NOTE: SO_ERROR should be honored for other errors. */
rtdm_fd_put(fd);
return datagrams;
}

IMHO this condition would need to added:
((flags | MSG_DONTWAIT) && ret == -ENOBUFS)

(Recvmmsg possibly similarly, havent checked yet)

Thanks for the help,
Norbert

> -Original Message-
> From: Xenomai  On Behalf Of Lange
> Norbert via Xenomai
> Sent: Mittwoch, 13. November 2019 18:53
> To: Jan Kiszka ; Xenomai
> (xenomai@xenomai.org) 
> Subject: RE: RTnet sendmmsg and ENOBUFS
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> > -Original Message-
> > From: Jan Kiszka 
> > Sent: Mittwoch, 13. November 2019 18:39
> > To: Lange Norbert ; Xenomai
> > (xenomai@xenomai.org) 
> > Subject: Re: RTnet sendmmsg and ENOBUFS
> >
> > NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >
> >
> > On 13.11.19 16:10, Lange Norbert via Xenomai wrote:
> > > Hello,
> > >
> > > for one of our applications, we have (unfortunatly) a single ethernet
> > connection for Realtime and Nonrealtime.
> > >
> > > We solve this by sending timeslices with RT first, then filling up the
> > > remaining space. When stressing the limits (quite possibly beyond if
> > accounting for bugs), the sendmmsg call over a raw socket returns
> ENOBUFS
> > (even with a single small packet).
> > > I was expecting this call to just block until the resouces are available.
> >
> > Blocking would mean that the sites which make buffers available again had
> to
> > signal this. The original design idea was to avoid such overhead and rather
> > rely on the applications to schedule their submissions properly and
> > preallocate resources accordingly.
>
> Ok.
> In other words, this is the same behaviour as using MSG_DONTWAIT
> (with a different errno value)
>
> >
> > >
> > > Timeslices are 1 ms, so that could be around 12Kbyte total or ~190 60Byte
> > packets (theoretical max).
> > >
> > > What variables are involved (whats the xenomai buffer limits, are they
> > shared or per interface) and choices do I have?
> > >
> > > - I could send the packages nonblocking and wait or drop the remaining
> > > myself
> > > - I could deal with ENOBUFS the same way as EAGAIN (is there any
> > > difference actually)
> > > - I could raise the amount of internal buffer somehow
> >
> > Check kernel/drivers/net/doc/README.pools
> >
> > >
> > > Also while stresstesting I get these messages:
> > >
> > > [ 5572.044934] hard_start_xmit returned 16 [ 5572.054989]
> > > hard_start_xmit returned 16 [ 5572.064007] hard_start_xmit returned 16
> > > [ 5572.067893] hard_start_xmit returned 16 [ 5572.071739]
> > > hard_start_xmit returned 16 [ 5572.075586] hard_start_xmit returned 16
> > > [ 5575.096116] hard_start_xmit returned 16 [ 5579.377038]
> > > hard_start_xmit returned 16
> >
> > This likely comes from NETDEV_TX_BUSY signaled by the driver. Check the
> > one you use for reasons. May include "I don't have buffers left".
>
> Yes it does, I was afraid this would indicate some leaked buffers.
>
> Norbert
> 
>
> This message and any attachments are solely for the use of the intended
> recipients. They may contain privileged and/or confidential information or
> other information protected from disclosure. If you are not an intended
> recipient, you are hereby notified that you received this email in error and
> that any review, dissemination, distribution or copying of this email and any
> attachment is strictly prohibited. If you have received this email in error,
> please contact the sender and delete the message and any attachment from
> your system.
>
> ANDRITZ HYDRO GmbH
>
>
> Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
>
> Firmensitz/ Registered seat: Wien
>
> Firmenbuchgericht/ Court of registry: Handelsg

Re: Xenomai crashes when braking into the debugger

2019-11-14 Thread Jeff Webb via Xenomai
‐‐‐ Original Message ‐‐‐
On Thursday, November 14, 2019 1:43 AM, Jan Kiszka  
wrote:
> On 14.11.19 02:58, Jeff Webb via Xenomai wrote:
>
> > Lange Norbert via Xenomai wrote:
> >
> > > > From: Jan Kiszka 
> > > > On 13.11.19 16:18, Lange Norbert via Xenomai wrote:
> > > >
> > > > > I am running into some bad issues with debugging, can't really narrow
> > > > > down when they happen, but usually when I run through GDB and want to
> > > > > "break" (pause execution), it seems to be related to other Xenomai
> > > > > programs running at the same time (as said its hard to narrow down).
> > > >
> > > > We have a gdb test case. Does it trigger for you as well when you run 
> > > > some
> > > > other program in parallel?
> > > > Also, could you provide the kernel full log? Possibly, enabling the 
> > > > I-pipe
> > > > tracer with panic dump could be useful as well. But the most important 
> > > > step
> > > > would be to create reproducibility for a third party like me.
> > >
> > > Currently the issue is gone, and I don't have time for researching the 
> > > cause.
> > > is panic dump a kernel compilation config?
> >
> > I think one of my colleagues has experienced something similar.
> > He said that a when one application was stopped in a breakpoint,
> > it caused sem_timedwait calls in another application to not time
> > out until execution of the other program was resumed. I will ask
> > and see if he can put together a reproducible test case. I know
> > the problem was repeatable at one point with the two applications
> > he was working with.
>
> This particular behavior is solved in 3.1 by
> https://gitlab.denx.de/Xenomai/xenomai/commit/9ebc2b6ea49406026e9e69d8fa490b3f8d8f0a24.

That is great.  Thanks for pointing this out.

> > I have personally experienced what seems (to me) to be a similar
> > issue involving signal handling where a signal handling thread
> > received a SIGINT via sigwait (other threads had SIGINT blocked),
> > and tried to set a global variable that should have caused the
> > other threads to terminate. The other threads had an issue where
> > they would not wake up from sem_timedwait calls (or even sleep
> > calls) after the SIGINT was received by the other thread, so they
> > would not terminate properly. The same code worked fine under
> > Xenomai 2.6. I tried to create a standalone example to reproduce
> > this today, but I could recreate the problem. I know it was very
> > reproducible when I was constructing a work-around for it.
> > Could it be that some fault occurs that causes subsequent bad
> > behavior with respect to signal handling (SIGINT/debugging) that
> > is fixed by a reboot?
> > Just trying to shed some light on the problem. I think there is
> > a bug here somewhere...
>
> Stand-alone test cases or test sequences are always welcome! Just please
> also make sure 3.1-rc as debugging code changed there quite a bit.

Also good to know.  Thanks again!

-Jeff



Re: INTR-REMAP error with UDD driver

2019-11-14 Thread Jeff Webb via Xenomai
‐‐‐ Original Message ‐‐‐
On Thursday, November 14, 2019 1:50 AM, Jan Kiszka  
wrote:
> On 14.11.19 06:05, Jeff Webb via Xenomai wrote:
> > I would like to revive this thread from several months ago:
> > https://xenomai.org/pipermail/xenomai/2019-March/040498.html
> > The issue is that on some hardware (a specific rack-mount PC with a PICMG 
> > daughtercard on a backplane containing PCI and PCIe slots) I get an 
> > INTR-REMAP error when trying to receive legacy (not MSI) interrupts from a 
> > custom FPGA-based PCI card using a UDD driver. The card did work properly 
> > in one out of the five PCI slots on that machine, but UDD interrupts did 
> > not work in the other four slots.
> > Please review the original thread for more details about the specific error.
> > Here are a few more tidbits I have gathered:
> >
> > -   The UDD driver / userspace code works fine on the other hardware
> >
> > -   The UDD driver / userspace code works fine in one PCI slot out of five 
> > on this hardware.
> >
> > -   With another backplane model, but same processor card, the problem 
> > occurs in all four of the PCI slots.
> >
> > -   An almost identical pure-linux UIO version of the driver / userspace 
> > code works in all the cases I tested, even when the UDD version fails, and 
> > even with the same xenomai-patched kernel used for UDD testing.
> >
> >
> > In one of the previous posts in this thread a few months ago, Per Öberg 
> > mentioned experiencing something similar. Based on the information that was 
> > shared, I tried my code with linux version 4.9.38, but it still failed. 
> > This prompted me to try other linux / ipipe / xenomai combinations. These 
> > are my findings:
> > Interrupts work:
> > xenomai-2.6.5 ipipe-core-3.18.20-x86-7.patch (2016-07-05)
> > xenomai-3.0.9+ ipipe-core-3.18.20-x86-7.patch (2016-07-05)
> > xenomai-3.0.9+ ipipe-core-4.1.18-x86-9.patch (2017-05-25)
> > INTR-REMAP error:
> > xenomai-3.0.9+ ipipe-core-4.4.43-x86-6.patch (2017-02-25)
> > xenomai-3.0.9+ ipipe-core-4.4.43-x86-7.patch (2017-05-25)
> > xenomai-3.0.9+ ipipe-core-4.4.43-x86-8.patch (2017-06-14)
> > xenomai-3.1-rc3 ipipe-core-4.4.196-cip38-x86-19.patch (2019-11-04)
> > xenomai-3.0.9+ ipipe-core-4.9.38-x86-4.patch (2017-10-03)
> > xenomai-3.0.9 ipipe-core-4.14.132-x86-6.patch (2019-07-03)
> > The Xenomai 2.6.5 version of course does not use UDD, but uses the old 
> > pthread_intr_* userspace functions.
> > Hopefully this additional information can shed a little light on the matter.
>
> This sounds like some RT interrupt enabling issue related to the IOAPIC
> in the x86 I-pipe patch. Please also test 4.19.

Ok, I will do this.

> Are you using UDD_IRQ_CUSTOM or do you leave the interrupt registration
> to the UDD core?

I just tell UDD the IRQ number and let it register the interrupt.

> And please share your kernel config.

I attached one to my original post earlier this year -- you should be able to 
download it from the link in the mailing list archive.  Let me know if you need 
something different.  I started with the standard Ubuntu desktop kernel config 
and tweaked options from there, so there is a lot of stuff enabled, obviously.

> BTW, interrupt remapping issues can be worked around by disabling the
> interrupt remapping feature (e.g. "intremap=off"). But that does not
> solve the unterlying issue, of course.

I can't remember if I tried this or not.  I will give it a go.  Obviously, it 
would be good to get this fixed in the patch, though.

Thank you (and Per Öberg) for your help.

-Jeff