Re: [PATCH] testsuite/smokey: relax posix-clock time dependancy

2018-12-20 Thread Jan Kiszka via Xenomai

On 19.11.18 14:52, Henning Schild wrote:

Am Mon, 19 Nov 2018 10:52:34 +0100
schrieb Philippe Gerum :


On 11/19/18 10:40 AM, Henning Schild wrote:

Am Fri, 16 Nov 2018 07:23:47 +0100
schrieb Jan Kiszka :
   

On 09.11.18 10:14, Henning Schild via Xenomai wrote:

The test often asserted. This patch gives the thread a priority,
introduces a 25us margin and prints the value in case we still
fail.

Signed-off-by: Henning Schild 
---
   testsuite/smokey/posix-clock/posix-clock.c | 15 ++-
   1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/testsuite/smokey/posix-clock/posix-clock.c
b/testsuite/smokey/posix-clock/posix-clock.c index
f672a9d52..36e0f5dea 100644 ---
a/testsuite/smokey/posix-clock/posix-clock.c +++
b/testsuite/smokey/posix-clock/posix-clock.c @@ -417,8 +417,10 @@
static int clock_decrease_after_periodic_timer_first_tick(void)
diff = now.tv_sec * 10ULL + now.tv_nsec -
(timer.it_value.tv_sec * 10ULL +
timer.it_value.tv_nsec);
-   if (!smokey_assert(diff < 10))
+   if (!smokey_assert(diff < 10ULL + 25000ULL))
{


Philippe, is this margin also reasonable from your perspective? Or
are we risking to miss a problem this way? I'm currently lacking a
feeling.


Good question. That is an estimate suitable for x86. While it should
not defeat the test, the test failing 60% of the time renders it not
too useful.
   


Did you calibrate the timer shot with the autotune utility before
running that test?


I also saw it failing after calibration, otherwise i would have added
the calibration to xeno-test.



This discussion stranded. Philippe, if you have any further input on the value, 
that could help to resolve this.


Thanks,
Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



I-pipe 4.4.x-cip

2018-12-20 Thread Jan Kiszka via Xenomai

Hi all,

just a quick note that I started a 4.4-cip based I-pipe tree:

https://gitlab.denx.de/Xenomai/ipipe/tree/ipipe-4.4.y-cip

So far it's just a merge of ipipe-4.4.y @ipipe-core-4.4.166-x86-12 with the CIP 
tree @4.4.166-cip29. I'm not yet sure whether to switch 4.4 completely to -cip 
or have both running in parallel for some more time.


For now, I would ask everyone interested in this baseline, may it be on x86 or 
ARM, to try out the new branch.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



RE: Cobalt Preemption of kernel update_fast_timekeeper can cause deadlocks

2018-12-20 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Donnerstag, 20. Dezember 2018 14:33
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Cobalt Preemption of kernel update_fast_timekeeper can cause
> deadlocks
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 20.12.18 13:29, Lange Norbert via Xenomai wrote:
> >> On 19.12.18 19:26, Auel, Kendall via Xenomai wrote:
> >>> Jan,
> >>>
> >>> I'm very much in favor of providing a way to prevent Xenomai modules
> >> from using features which can result in deadlock, if there is a clean
> >> way to detect such a situation.
> >>>
> >>> We used gettimeofday in one of our modules and it mostly worked
> great.
> >> But once in a great while the system would deadlock. Most calls to
> >> gettimeofday are benign and appear to work normally, which is why it
> >> is especially problematic. It would have saved some debug cycles if
> >> there was a kernel log message to warn us of our danger.
> >>>
> >>> Or perhaps we could collect a blacklist of references which will
> >>> produce
> >> warnings when linking a Xenomai module. All of these things are 'nice
> >> to have' but certainly not urgent matters.
> >>
> >> We do have the infrastructure and a small use case for such RT traps
> already:
> >> If you use --mode-check on xeno-config, any usage of malloc and free
> >> from RT contexts will be detected and reported. These calls are evil
> >> as well because they tend no not trigger a syscall in the fast path
> >> and only fail on contention or empty-pool situations of the userspace
> allocator.
> >
> > There is still the issue that the cobald kernel can interrupt the
> > linux kernel while holding a lock.
> > Consider the case that you have a 4 core CPU, several cobalt threads are
> bound to eg. Core 0 (legacy code assuming single core).
> >
> > 1) linux wants to update the timekeeper struct
> > 2) now cobalt preempts the linux kernel while holding the lock on Core
> > 0
> > 3) the cobalt threads run close to each other and thus Core 0 remains in
> cobalt domain for hundreds of ms.
> > 4) finally all cobalt threads (that are bound to core 0) idle and
> > linux can free the lock
> >
> > This means that all Linux threads on *any core* that try to call some
> *gettime functions (possible others) will busywait on the lock.
> >
>
> You do not need to look at the GTOD lock to construct such delays: every
> Linux spinlock taken on one core that is then interrupted by RT workload for
> a longer period can delay other cores doing Linux stuff that needs that lock.
> That is a generic property of the co-kernel architecture - and the reason you
> should allow Linux to run every few ms, on *every* core.

You are right, I did not realize that.
Userspace usually does not spinlock, so I consider those functions a lot more 
critical,
clock_gettime is also heavily used (especially for tracing).

Funny enough, the linux x86 vdso handles clock_gettime(CLOCK_MONOTONIC) but not 
clock_gettime(CLOCK_MONOTONIC_RAW).
Seems the common denominator would be to use rdtsc directly =/
(I know about the pitfalls, but our hardware should have a stable, invariant 
tsc)

>
> > That a rt thread (potentially just temporary promoted non-rt thread, or not
> lazily demoted yet) can additionally deadlock the system sits just on top of
> this issue.
> >
> > Regarding to what I am allowed to do:
> > AFAIK a thread started as cobalt thread can freely switch between
> domains, typically around syscalls and the switches are "lazy". What are the
> rules for a thread that needs to collect some data RT (potentially using some
> RT Mutexes with prio inheritance) calling into DSOs that aren’t compiled with
> the "cobalt wrappings" active (say a logging framework that uses libcs
> clock_gettime).
> > Do I manually have to demote the thread somehow before calling DSO
> functions, is it not allowed at all to use DSOs that were compiled with 
> "cobalt
> wrappings"?
>
> If you are calling into an "unknown" non-RT blob, dropping from RT may
> actually be required. We do not promote explicit mode switches because
> they are not needed if you control (wrap) all your code. This might be  an
> exception.

The non-RT "blob" is the regular linux rootfs in my case, ie. libstdc++ and I 
plan
to use libnttg-ust and stuff like xml parsers.
I understand this as motivation to actually *have* the POSIX Skin (eases legacy 
code as well),
as soon as we can muster the time, then anything RT will be explicit and RT only

> >
> >> with posix, you are already
> >> redirected to the RT-safe implementations of those functions.
> >
> > In my case (posix skin, not "native" as I replied earlier), the call
> > came from another DSO which is unaffected by the link-time wrapping.
> > I would likely have to LD_PRELOAD a checker DSO, seems more sane to
> > me, as the calls could originate from implicitly linked DSO aswell
> > (C++ 

[I-PIPE] ipipe-core-4.4.166-x86-12 released

2018-12-20 Thread xenomai--- via Xenomai
Download URL: 
https://xenomai.org/downloads/ipipe/v4.x/x86/ipipe-core-4.4.166-x86-12.patch

Repository: https://git.xenomai.org/ipipe-x86
Release tag: ipipe-core-4.4.166-x86-12



Re: Cobalt Preemption of kernel update_fast_timekeeper can cause deadlocks

2018-12-20 Thread Jan Kiszka via Xenomai

On 20.12.18 13:29, Lange Norbert via Xenomai wrote:

On 19.12.18 19:26, Auel, Kendall via Xenomai wrote:

Jan,

I'm very much in favor of providing a way to prevent Xenomai modules

from using features which can result in deadlock, if there is a clean way to
detect such a situation.


We used gettimeofday in one of our modules and it mostly worked great.

But once in a great while the system would deadlock. Most calls to
gettimeofday are benign and appear to work normally, which is why it is
especially problematic. It would have saved some debug cycles if there was a
kernel log message to warn us of our danger.


Or perhaps we could collect a blacklist of references which will produce

warnings when linking a Xenomai module. All of these things are 'nice to
have' but certainly not urgent matters.

We do have the infrastructure and a small use case for such RT traps already:
If you use --mode-check on xeno-config, any usage of malloc and free from
RT contexts will be detected and reported. These calls are evil as well
because they tend no not trigger a syscall in the fast path and only fail on
contention or empty-pool situations of the userspace allocator.


There is still the issue that the cobald kernel can interrupt the linux kernel
while holding a lock.
Consider the case that you have a 4 core CPU, several cobalt threads are bound 
to eg. Core 0 (legacy code assuming single core).

1) linux wants to update the timekeeper struct
2) now cobalt preempts the linux kernel while holding the lock on Core 0
3) the cobalt threads run close to each other and thus Core 0 remains in cobalt 
domain for hundreds of ms.
4) finally all cobalt threads (that are bound to core 0) idle and linux can 
free the lock

This means that all Linux threads on *any core* that try to call some *gettime 
functions (possible others) will busywait on the lock.



You do not need to look at the GTOD lock to construct such delays: every Linux 
spinlock taken on one core that is then interrupted by RT workload for a longer 
period can delay other cores doing Linux stuff that needs that lock. That is a 
generic property of the co-kernel architecture - and the reason you should allow 
Linux to run every few ms, on *every* core.



That a rt thread (potentially just temporary promoted non-rt thread, or not 
lazily demoted yet) can additionally deadlock the system sits just on top of 
this issue.

Regarding to what I am allowed to do:
AFAIK a thread started as cobalt thread can freely switch between domains, typically around 
syscalls and the switches are "lazy". What are the rules for a thread that needs to 
collect some data RT (potentially using some RT Mutexes with prio inheritance) calling into DSOs 
that aren’t compiled with the "cobalt wrappings" active (say a logging framework that 
uses libcs clock_gettime).
Do I manually have to demote the thread somehow before calling DSO functions, is it not 
allowed at all to use DSOs that were compiled with "cobalt wrappings"?


If you are calling into an "unknown" non-RT blob, dropping from RT may actually 
be required. We do not promote explicit mode switches because they are not 
needed if you control (wrap) all your code. This might be  an exception.





with posix, you are already
redirected to the RT-safe implementations of those functions.


In my case (posix skin, not "native" as I replied earlier), the call came from 
another DSO which is unaffected by the
link-time wrapping.
I would likely have to LD_PRELOAD a checker DSO, seems more sane to me,
as the calls could originate from implicitly linked DSO aswell (C++ runtime 
library)


Is the reason that the other DSOs are not caught at link-time generic or 
specific to your build? The former case should be documented if it exists.


Irrespective of that, I would definitely be interested in a LD_PRELOAD-based 
checker that you can attach to an application easily, without the need to switch 
to link-time wrapping (which is not needed with non-posix skins).


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



RE: Cobalt Preemption of kernel update_fast_timekeeper can cause deadlocks

2018-12-20 Thread Lange Norbert via Xenomai
> On 19.12.18 19:26, Auel, Kendall via Xenomai wrote:
> > Jan,
> >
> > I'm very much in favor of providing a way to prevent Xenomai modules
> from using features which can result in deadlock, if there is a clean way to
> detect such a situation.
> >
> > We used gettimeofday in one of our modules and it mostly worked great.
> But once in a great while the system would deadlock. Most calls to
> gettimeofday are benign and appear to work normally, which is why it is
> especially problematic. It would have saved some debug cycles if there was a
> kernel log message to warn us of our danger.
> >
> > Or perhaps we could collect a blacklist of references which will produce
> warnings when linking a Xenomai module. All of these things are 'nice to
> have' but certainly not urgent matters.
>
> We do have the infrastructure and a small use case for such RT traps already:
> If you use --mode-check on xeno-config, any usage of malloc and free from
> RT contexts will be detected and reported. These calls are evil as well
> because they tend no not trigger a syscall in the fast path and only fail on
> contention or empty-pool situations of the userspace allocator.

There is still the issue that the cobald kernel can interrupt the linux kernel
while holding a lock.
Consider the case that you have a 4 core CPU, several cobalt threads are bound 
to eg. Core 0 (legacy code assuming single core).

1) linux wants to update the timekeeper struct
2) now cobalt preempts the linux kernel while holding the lock on Core 0
3) the cobalt threads run close to each other and thus Core 0 remains in cobalt 
domain for hundreds of ms.
4) finally all cobalt threads (that are bound to core 0) idle and linux can 
free the lock

This means that all Linux threads on *any core* that try to call some *gettime 
functions (possible others) will busywait on the lock.

That a rt thread (potentially just temporary promoted non-rt thread, or not 
lazily demoted yet) can additionally deadlock the system sits just on top of 
this issue.

Regarding to what I am allowed to do:
AFAIK a thread started as cobalt thread can freely switch between domains, 
typically around syscalls and the switches are "lazy". What are the rules for a 
thread that needs to collect some data RT (potentially using some RT Mutexes 
with prio inheritance) calling into DSOs that aren’t compiled with the "cobalt 
wrappings" active (say a logging framework that uses libcs clock_gettime).
Do I manually have to demote the thread somehow before calling DSO functions, 
is it not allowed at all to use DSOs that were compiled with "cobalt wrappings"?

> with posix, you are already
> redirected to the RT-safe implementations of those functions.

In my case (posix skin, not "native" as I replied earlier), the call came from 
another DSO which is unaffected by the
link-time wrapping.
I would likely have to LD_PRELOAD a checker DSO, seems more sane to me,
as the calls could originate from implicitly linked DSO aswell (C++ runtime 
library)

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: Cobalt Preemption of kernel update_fast_timekeeper can cause deadlocks

2018-12-20 Thread Jan Kiszka via Xenomai

On 19.12.18 19:26, Auel, Kendall via Xenomai wrote:

Jan,

I'm very much in favor of providing a way to prevent Xenomai modules from using 
features which can result in deadlock, if there is a clean way to detect such a 
situation.

We used gettimeofday in one of our modules and it mostly worked great. But once 
in a great while the system would deadlock. Most calls to gettimeofday are 
benign and appear to work normally, which is why it is especially problematic. 
It would have saved some debug cycles if there was a kernel log message to warn 
us of our danger.

Or perhaps we could collect a blacklist of references which will produce 
warnings when linking a Xenomai module. All of these things are 'nice to have' 
but certainly not urgent matters.


We do have the infrastructure and a small use case for such RT traps already: If 
you use --mode-check on xeno-config, any usage of malloc and free from RT 
contexts will be detected and reported. These calls are evil as well because 
they tend no not trigger a syscall in the fast path and only fail on contention 
or empty-pool situations of the userspace allocator.


We could extend that mechanism for gettimeofday & Co. checks, but we need to 
limit that to non-posix applications: with posix, you are already redirected to 
the RT-safe implementations of those functions.


Patches welcome.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Stack dump on ipipe 4.9.135-7 x86_64

2018-12-20 Thread Jan Kiszka via Xenomai

On 20.12.18 09:28, Mauro Salvini via Xenomai wrote:

Hi all,

I'm testing Xenomai 3 on an Intel Braswell board (Atom x5-E8000).
I'm using ipipe kernel at last commit from [1], branch ipipe-4.9.y,
64bit build on a Debian Stretch 9.6 64bit.
Xenomai library is from [2], branch stable/v3.0.x on commit bc53d03f (I
haven't two last commits but seems not related). I tried both 32bit
(mixed ABI) and 64bit builds with same following result.

I launch:

xeno-test -l "dohell -s xxx -p yyy -m xxx 9" -T 9

After a variable time (from minutes to hours) from latency test start I
get a few overruns that I discovered are generated by a kernel stack
dump (attached to mail the dmesg tail). Latency test doesn't stop, and
after this stackdump never reports other overruns or latency peaks
(seems I need to reboot to reproduce stack).

I read in this mailing list that on last patches much work has done on
FPU part, should it be related?

Glad to give other infos if you need.


Thanks for reporting. Maybe we need your config later on, but I'm first of all 
looking at the code, see below.


In general, it's better to run the kernel with frame-pointers enabled to get 
more reliable backtraces, at least when an error occurs.




Thanks in advance, regards.

Mauro

[1] https://gitlab.denx.de/Xenomai/ipipe
[2] https://gitlab.denx.de/Xenomai/xenomai
-- next part --
[  233.205940] 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/xenomai/include/asm/xenomai/fptest.h:43:
 Warning: Linux is compiled to use FPU in kernel-space.
For this reason, switchtest can not test using FPU in Linux 
kernel-space.
[  295.660454] [ cut here ]
[  295.660461] WARNING: CPU: 0 PID: 139 at 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/include/asm/fpu/internal.h:502
 xnarch_leave_root+0x1a4/0x1b0


Henning, the kernel checks fpu->fpregs_active here and finds it off while 
Xenomai looks at fpu.fpstate_active - intentionally or by accident?


Jan


[  295.660465] Modules linked in: binfmt_misc msr iTCO_wdt iTCO_vendor_support 
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw snd_pcm gf128mul glue_helper 
i915 ablk_helper snd_timer cryptd snd soundcore pcspkr drm_kms_helper drm evdev 
lpc_ich mfd_core fb_sys_fops syscopyarea sysfillrect sysimgblt sg shpchp video 
button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sd_mod 
hid_generic usbhid hid crc32c_intel ahci igb i2c_i801 libahci i2c_smbus 
i2c_algo_bit xhci_pci dca ptp pps_core libata xhci_hcd sdhci_pci sdhci usbcore 
scsi_mod mmc_core fjes [last unloaded: rtnet]
[  295.660660] CPU: 0 PID: 139 Comm: jbd2/sda1-8 Tainted: GW   
4.9.135+ #1
[  295.660663] Hardware name: Default string Default string/Q7-BW, BIOS 
V1.20#KW050220A 03/16/2018
[  295.660666] I-pipe domain: Xenomai
[  295.660668]   823402c2  

[  295.660680]  829cd400 8206cb68 985636908040 
985636504080
[  295.660706]  00099f50  bc1580563040 
98567ac9b940
[  295.660718] Call Trace:
[  295.660721]  [] ? dump_stack+0xb5/0xd3
[  295.660724]  [] ? __warn+0xc8/0xf0
[  295.660727]  [] ? xnarch_leave_root+0x1a4/0x1b0
[  295.660729]  [] ? ___xnsched_run+0x3f6/0x4b0
[  295.660732]  [] ? timerfd_handler+0x36/0x50
[  295.660735]  [] ? xntimerq_insert+0x5/0xa0
[  295.660738]  [] ? xnclock_tick+0x1a9/0x2c0
[  295.660740]  [] ? xnintr_core_clock_handler+0x2fa/0x310
[  295.660743]  [] ? dispatch_irq_head+0x8a/0x120
[  295.660746]  [] ? __ipipe_handle_irq+0x85/0x1b0
[  295.660749]  [] ? apic_timer_interrupt+0x89/0xb0
[  295.660752]  [] ? continue_block+0x22/0x54 [crc32c_intel]
[  295.660754]  [] ? crc32c_pcl_intel_update+0x53/0x60 
[crc32c_intel]
[  295.660757]  [] ? 
jbd2_journal_commit_transaction+0x9e0/0x1850 [jbd2]
[  295.660760]  [] ? __switch_to_asm+0x40/0x70
[  295.660763]  [] ? try_to_del_timer_sync+0x4f/0x80
[  295.660766]  [] ? kjournald2+0xe2/0x290 [jbd2]
[  295.660768]  [] ? wake_up_atomic_t+0x30/0x30
[  295.660771]  [] ? commit_timeout+0x10/0x10 [jbd2]
[  295.660774]  [] ? kthread+0xf5/0x110
[  295.660776]  [] ? __switch_to_asm+0x40/0x70
[  295.660779]  [] ? kthread_park+0x60/0x60
[  295.660782]  [] ? ret_from_fork+0x55/0x60
[  295.660785] ---[ end trace b1bfa97fc17a203c ]---
[  705.768058] perf: interrupt took too long (2543 > 2500), lowering 
kernel.perf_event_max_sample_rate to 78500
[  865.772646] perf: interrupt took too long (3207 > 3178), lowering 
kernel.perf_event_max_sample_rate to 62250


--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Porting questiong about mips

2018-12-20 Thread Jan Kiszka via Xenomai

On 20.12.18 09:59, duanwujie wrote:

Hi, Phi and Jan:

         The correct handler is handle_percpu_irq !



Hi, Phi and Jan:

        I try to port the ipipe to mips arch  according the 
https://gitlab.denx.de/Xenomai/xenomai/wikis/Porting_Xenomai_To_A_New_Arm_SOC#terminology 



    Now it worked on the single cpu,but SMP failed. I consider the 
interrupt controller's problem. The flow handler is handle_irq_event_percpu. 
But the article haven't port information about this handler.


        Is there any clue for it.



First question is how your port fails on SMP: Does it no longer get interrupts 
so that you believe the it's interrupt controller related?


I do not know the MIPS arch and how it compares to ARM (which that other 
document covers). Does it have something comparable to the per-CPU interrupts 
(SGI, PPI) of ARM?


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Porting questiong about mips

2018-12-20 Thread duanwujie via Xenomai

Hi, Phi and Jan:

        The correct handler is handle_percpu_irq !



Hi, Phi and Jan:

        I try to port the ipipe to mips arch  according the 
https://gitlab.denx.de/Xenomai/xenomai/wikis/Porting_Xenomai_To_A_New_Arm_SOC#terminology 



    Now it worked on the single cpu,but SMP failed. I consider the 
interrupt controller's problem. The flow handler is 
handle_irq_event_percpu. But the article haven't port information 
about this handler.


        Is there any clue for it.

duan.


--
*Linx Software Corportaion Sichuan Branch
Name:wujie duan
Tel:028-65182023-6202
E-mail: wjd...@linx-info.com
*

--
*Linx Software Corportaion Sichuan Branch
Name:wujie duan
Tel:028-65182023-6202
E-mail: wjd...@linx-info.com
*


Stack dump on ipipe 4.9.135-7 x86_64

2018-12-20 Thread Mauro Salvini via Xenomai
Hi all,

I'm testing Xenomai 3 on an Intel Braswell board (Atom x5-E8000).
I'm using ipipe kernel at last commit from [1], branch ipipe-4.9.y,
64bit build on a Debian Stretch 9.6 64bit.
Xenomai library is from [2], branch stable/v3.0.x on commit bc53d03f (I
haven't two last commits but seems not related). I tried both 32bit
(mixed ABI) and 64bit builds with same following result.

I launch:

xeno-test -l "dohell -s xxx -p yyy -m xxx 9" -T 9

After a variable time (from minutes to hours) from latency test start I
get a few overruns that I discovered are generated by a kernel stack
dump (attached to mail the dmesg tail). Latency test doesn't stop, and
after this stackdump never reports other overruns or latency peaks
(seems I need to reboot to reproduce stack).

I read in this mailing list that on last patches much work has done on
FPU part, should it be related?

Glad to give other infos if you need.

Thanks in advance, regards.

Mauro

[1] https://gitlab.denx.de/Xenomai/ipipe
[2] https://gitlab.denx.de/Xenomai/xenomai
-- next part --
[  233.205940] 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/xenomai/include/asm/xenomai/fptest.h:43:
 Warning: Linux is compiled to use FPU in kernel-space.
   For this reason, switchtest can not test using FPU in Linux 
kernel-space.
[  295.660454] [ cut here ]
[  295.660461] WARNING: CPU: 0 PID: 139 at 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/include/asm/fpu/internal.h:502
 xnarch_leave_root+0x1a4/0x1b0
[  295.660465] Modules linked in: binfmt_misc msr iTCO_wdt iTCO_vendor_support 
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw snd_pcm gf128mul glue_helper 
i915 ablk_helper snd_timer cryptd snd soundcore pcspkr drm_kms_helper drm evdev 
lpc_ich mfd_core fb_sys_fops syscopyarea sysfillrect sysimgblt sg shpchp video 
button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sd_mod 
hid_generic usbhid hid crc32c_intel ahci igb i2c_i801 libahci i2c_smbus 
i2c_algo_bit xhci_pci dca ptp pps_core libata xhci_hcd sdhci_pci sdhci usbcore 
scsi_mod mmc_core fjes [last unloaded: rtnet]
[  295.660660] CPU: 0 PID: 139 Comm: jbd2/sda1-8 Tainted: GW   
4.9.135+ #1
[  295.660663] Hardware name: Default string Default string/Q7-BW, BIOS 
V1.20#KW050220A 03/16/2018
[  295.660666] I-pipe domain: Xenomai
[  295.660668]   823402c2  

[  295.660680]  829cd400 8206cb68 985636908040 
985636504080
[  295.660706]  00099f50  bc1580563040 
98567ac9b940
[  295.660718] Call Trace:
[  295.660721]  [] ? dump_stack+0xb5/0xd3
[  295.660724]  [] ? __warn+0xc8/0xf0
[  295.660727]  [] ? xnarch_leave_root+0x1a4/0x1b0
[  295.660729]  [] ? ___xnsched_run+0x3f6/0x4b0
[  295.660732]  [] ? timerfd_handler+0x36/0x50
[  295.660735]  [] ? xntimerq_insert+0x5/0xa0
[  295.660738]  [] ? xnclock_tick+0x1a9/0x2c0
[  295.660740]  [] ? xnintr_core_clock_handler+0x2fa/0x310
[  295.660743]  [] ? dispatch_irq_head+0x8a/0x120
[  295.660746]  [] ? __ipipe_handle_irq+0x85/0x1b0
[  295.660749]  [] ? apic_timer_interrupt+0x89/0xb0
[  295.660752]  [] ? continue_block+0x22/0x54 [crc32c_intel]
[  295.660754]  [] ? crc32c_pcl_intel_update+0x53/0x60 
[crc32c_intel]
[  295.660757]  [] ? 
jbd2_journal_commit_transaction+0x9e0/0x1850 [jbd2]
[  295.660760]  [] ? __switch_to_asm+0x40/0x70
[  295.660763]  [] ? try_to_del_timer_sync+0x4f/0x80
[  295.660766]  [] ? kjournald2+0xe2/0x290 [jbd2]
[  295.660768]  [] ? wake_up_atomic_t+0x30/0x30
[  295.660771]  [] ? commit_timeout+0x10/0x10 [jbd2]
[  295.660774]  [] ? kthread+0xf5/0x110
[  295.660776]  [] ? __switch_to_asm+0x40/0x70
[  295.660779]  [] ? kthread_park+0x60/0x60
[  295.660782]  [] ? ret_from_fork+0x55/0x60
[  295.660785] ---[ end trace b1bfa97fc17a203c ]---
[  705.768058] perf: interrupt took too long (2543 > 2500), lowering 
kernel.perf_event_max_sample_rate to 78500
[  865.772646] perf: interrupt took too long (3207 > 3178), lowering 
kernel.perf_event_max_sample_rate to 62250