Re: rt_task_self() return value

2019-11-19 Thread Mauro Salvini via Xenomai
On Tue, 2019-11-19 at 17:05 +0100, Jan Kiszka wrote:
> On 19.11.19 17:01, Mauro Salvini wrote:
> > On Tue, 2019-11-19 at 16:41 +0100, Jan Kiszka wrote:
> > > On 19.11.19 16:29, Mauro Salvini via Xenomai wrote:
> > > > Hi all,
> > > > 
> > > > I'm using Xenomai 3.0.9 in cobalt mode and I'm facing an issue
> > > > on
> > > > rt_task_self() returned value.
> > > > 
> > > > Please see this simple code:
> > > > 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > 
> > > > RT_TASK gNewTask;
> > > > 
> > > > static void Task(void *lpArgument)
> > > > {
> > > > rt_printf("Task created.\n");
> > > > gNewTask = *(rt_task_self());
> > > 
> > > This fills the internal data structure RT_TASK with the content
> > > of
> > > another one. Rather than "cloning", did you try using a reference
> > > as
> > > the
> > > API suggests? If that worked in Xenomai 2, it was luck.
> > > 
> > 
> > Hi Jan,
> > 
> > thanks for your answer.
> > Yes, I tried it and even does not work. Changing code in this way:
> > 
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > RT_TASK* gNewTask;
> > 
> > static void Task(void *lpArgument)
> > {
> >  rt_printf("Task created.\n");
> >  gNewTask = rt_task_self();
> > }
> > 
> > int main()
> > {
> >  RT_TASK tNewTask;
> >  int err = rt_task_create(, "TEST", 16384, 10,
> > T_JOINABLE);
> >  rt_printf(" task create: %d\n", err);
> >  err = rt_task_start(, Task, NULL);
> >  rt_printf(" task start: %d\n", err);
> >  
> >  rt_task_sleep(20);
> > 
> >  //err = rt_task_join();
> >  err = rt_task_join(gNewTask);
> >  rt_printf(" task join : %d\n", err);
> > 
> >  return 0;
> > }
> > 
> > returns:
> > 
> > ./test
> >   task create: 0
> > Task created.
> >   task start: 0
> >   task join : -3
> > 
> 
> OK, then we have a regression. Let me check...
> 
> Jan
> 

Thank you Jan for the patch.

Regards
-- 
Mauro



Re: rt_task_self() return value

2019-11-19 Thread Mauro Salvini via Xenomai
On Tue, 2019-11-19 at 16:41 +0100, Jan Kiszka wrote:
> On 19.11.19 16:29, Mauro Salvini via Xenomai wrote:
> > Hi all,
> > 
> > I'm using Xenomai 3.0.9 in cobalt mode and I'm facing an issue on
> > rt_task_self() returned value.
> > 
> > Please see this simple code:
> > 
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > RT_TASK gNewTask;
> > 
> > static void Task(void *lpArgument)
> > {
> > rt_printf("Task created.\n");
> > gNewTask = *(rt_task_self());
> 
> This fills the internal data structure RT_TASK with the content of 
> another one. Rather than "cloning", did you try using a reference as
> the 
> API suggests? If that worked in Xenomai 2, it was luck.
> 

Hi Jan,

thanks for your answer.
Yes, I tried it and even does not work. Changing code in this way:

#include 
#include 
#include 
#include 

RT_TASK* gNewTask;

static void Task(void *lpArgument)
{
rt_printf("Task created.\n");
gNewTask = rt_task_self();
}

int main()
{
RT_TASK tNewTask;
int err = rt_task_create(, "TEST", 16384, 10,
T_JOINABLE);
rt_printf(" task create: %d\n", err);
err = rt_task_start(, Task, NULL);
rt_printf(" task start: %d\n", err);

rt_task_sleep(20);

//err = rt_task_join();
err = rt_task_join(gNewTask);
rt_printf(" task join : %d\n", err);

return 0;
}

returns:

./test
 task create: 0
Task created.
 task start: 0
 task join : -3


Thanks, regards

-- Mauro



rt_task_self() return value

2019-11-19 Thread Mauro Salvini via Xenomai
Hi all,

I'm using Xenomai 3.0.9 in cobalt mode and I'm facing an issue on
rt_task_self() returned value.

Please see this simple code:

#include 
#include 
#include 
#include 

RT_TASK gNewTask;

static void Task(void *lpArgument)
{
rt_printf("Task created.\n");
gNewTask = *(rt_task_self());
}

int main()
{
RT_TASK tNewTask;
int err = rt_task_create(, "TEST", 16384, 10,
T_JOINABLE);
rt_printf(" task create: %d\n", err);
err = rt_task_start(, Task, NULL);
rt_printf(" task start: %d\n", err);

rt_task_sleep(20);

//err = rt_task_join();
err = rt_task_join();
rt_printf(" task join : %d\n", err);

return 0;
}

Code is compiled using xeno-config --alchemy.

Using the rt_task_self() returned value with rt_task_join(), the latter
returns -ESRCH (no such process). Obviously if I use the local tNewTask
variable, return value is 0.

I'm pretty sure that this works without errors with Xenomai2.

Is it the expected behavior with Xenomai3?

Thanks in advance, regards
-- 
Mauro



Re: FPU warn on ipipe 4.9.146-8 i386

2019-01-14 Thread Mauro Salvini via Xenomai
On Mon, 2019-01-14 at 10:39 +0100, Henning Schild wrote:
> Am Fri, 11 Jan 2019 14:47:13 +0100
> schrieb Mauro Salvini :
> 
> > On Fri, 2019-01-11 at 10:40 +0100, Henning Schild wrote:
> > > Am Fri, 11 Jan 2019 09:57:50 +0100
> > > schrieb Mauro Salvini via Xenomai :
> > >   
> > > > Hi all,
> > > > 
> > > > I'm testing same hardware of [1], with kernel 4.9.146 from
> > > > ipipe-
> > > > 4.9.y
> > > > with [2] applied, compiled with ARCH=i386 and Xenomai 3.0.7.  
> > > 
> > > To be honest i386 is not really tested anymore, in fact in 4.14
> > > not
> > > even supported at the moment. If you can you should go for
> > > x86_64.
> > >   
> > 
> > Hi Henning,
> > 
> > Thank you. I'm trying i386 version due to legacy 32bit code that
> > uses
> > rtnet (which cannot be used with mixed ABI).
> 
> Ok, maybe something you might want to fix for the future.
> 
..snip..
> > > > 
> > > could you try this:
> > > 
> > > --- a/arch/x86/kernel/fpu/core.c
> > > +++ b/arch/x86/kernel/fpu/core.c
> > > @@ -426,6 +426,10 @@ void fpu__restore(struct fpu *fpu)
> > > /* Avoid __kernel_fpu_begin() right after
> > > fpregs_activate()
> > > */
> > > kernel_fpu_disable();
> > > trace_x86_fpu_before_restore(fpu);
> > > +   if (fpregs_activate(fpu)) {  
> > 
> > This instruction does not compile due to fpregs_activate() returns
> > void, perhaps did you mean "if (fpregs_active(fpu))"?
> > Given that fpregs_active() have no args, I tried with this:
> > 
> > if (fpu->fpregs_active)
> 
> I did not test what i wrote there, and your fix is what i meant.
> 
> > and warning does not raise (even warning added with this patch).
> 
> In that case a similar patch should probably be included upstream. I
> will prepare a patch for that.
> 
> Henning

Hi Henning,

I don't know if it concerns about FPU and this discussion, but I did
some tests and also with last fixes the system freezes after a variable
time (from few seconds to few hours) since latency test start.
No logs/warns in /var/log/messages, and hard reset is needed.

I'm investigating if it freezes also with latency running alone.

Tell me if it's worth to continue testing or not (considering that i386
is going to be not supported anymore).

Thanks
Mauro

> 
> > > +   WARN_ON_FPU(fpu !=
> > > this_cpu_read_stable(fpu_fpregs_owner_ctx));
> > > +   fpregs_deactivate(fpu);
> > > +   }
> > > fpregs_activate(fpu);
> > > copy_kernel_to_fpregs(>state);
> > > trace_x86_fpu_after_restore(fpu);
> > > 
> > > This would not be a proper fix, especially if you end up seeing
> > > that
> > > warning ...
> > > 
> > > Henning
> > >   
> > > > 
> > > > 
> > > >   
> > > 
> > >   
> 
> 



Re: FPU warn on ipipe 4.9.146-8 i386

2019-01-11 Thread Mauro Salvini via Xenomai
On Fri, 2019-01-11 at 17:53 +0800, limingyu via Xenomai wrote:
> Hi, Mauro
> 
> Maybe you could refer to this disscusion below, and try the latesst 
> stable xenomai code branch
> 
> v3.0.x/stable. 
> https://xenomai.org/pipermail/xenomai/2018-December/040086.html
> 
> And the lasted xenomai code is here: 
> https://gitlab.denx.de/Xenomai/xenomai/tree/stable/v3.0.x
> 
> On 1/11/19 5:40 PM, Henning Schild via Xenomai wrote:
> > I got this dump in dmesg, sometimes just after latency starts,
> 
> 

Hi Limingyu,

Thank you, but I already am on top of v3.0.x/stable (I'm on same
version used at https://xenomai.org/pipermail/xenomai/2018-December/040
142.html)

Regards

Mauro




Re: FPU warn on ipipe 4.9.146-8 i386

2019-01-11 Thread Mauro Salvini via Xenomai
On Fri, 2019-01-11 at 10:40 +0100, Henning Schild wrote:
> Am Fri, 11 Jan 2019 09:57:50 +0100
> schrieb Mauro Salvini via Xenomai :
> 
> > Hi all,
> > 
> > I'm testing same hardware of [1], with kernel 4.9.146 from ipipe-
> > 4.9.y
> > with [2] applied, compiled with ARCH=i386 and Xenomai 3.0.7.
> 
> To be honest i386 is not really tested anymore, in fact in 4.14 not
> even supported at the moment. If you can you should go for x86_64.
> 

Hi Henning,

Thank you. I'm trying i386 version due to legacy 32bit code that uses
rtnet (which cannot be used with mixed ABI).

> > Launching
> > 
> > xeno-test -l "dohell -s xxx -p yyy -m xxx 9" -T 9
> > 
> > I got this dump in dmesg, sometimes just after latency starts,
> > sometimes after few seconds (side effect is a max latency value
> > increase):
> > 
> > [  167.914184] [ cut here ]
> > [  167.914208] WARNING: CPU: 0 PID: 606
> > at /home/build-ws/develop/linux-
> > 4.9.146/arch/x86/include/asm/fpu/internal.h:511
> > fpu__restore+0x1eb/0x2b0 [  167.914216] Modules linked in:
> > intel_rapl
> > intel_powerclamp iTCO_wdt iTCO_vendor_support coretemp kvm_intel
> > kvm
> > irqbypass crc32_pclmul aesni_intel xts aes_i586 lrw gf128mul
> > ablk_helper cryptd snd_pcm intel_cstate snd_timer evdev snd
> > soundcore
> > i915 pcspkr drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect
> > sysimgblt shpchp video lpc_ich mfd_core button ip_tables x_tables
> > autofs4 ext4 crc16 jbd2 fscrypto mbcache hid_generic usbhid hid
> > mmc_block crc32c_intel i2c_i801 i2c_smbus igb i2c_algo_bit xhci_pci
> > ptp pps_core xhci_hcd sdhci_pci sdhci usbcore mmc_core fjes [last
> > unloaded: rtnet] [  167.914768] CPU: 0 PID: 606 Comm: dohell Not
> > tainted 4.9.146+ #1 [  167.914772] Hardware name: Default string
> > Default string/Q7-BW, BIOS V1.20#KW050220A 03/16/2018
> > [  167.914775]
> > I-pipe domain: Linux [  167.914778]  f42e5e44 daeffa2d 
> > db335030 dac1ff3b f42e5e74 dac59dea db34504c
> > [  167.914800]  
> > 025e db335030 01ff dac1ff3b 01ff f4291bc0 0246
> > [  167.914822]  f4291c00 f42e5e88 dac59efb 0009 
> > 
> > f42e5ea4 dac1ff3b [  167.914843] Call Trace:
> > [  167.914846]  [] dump_stack+0x9f/0xc2
> > [  167.914849]  [] ? fpu__restore+0x1eb/0x2b0
> > [  167.914865]  [] __warn+0xea/0x110
> > [  167.914868]  [] ? fpu__restore+0x1eb/0x2b0
> > [  167.914871]  [] warn_slowpath_null+0x2b/0x30
> > [  167.914874]  [] fpu__restore+0x1eb/0x2b0
> > [  167.914877]  [] __fpu__restore_sig+0x2ba/0x680
> > [  167.914879]  [] fpu__restore_sig+0x31/0x50
> > [  167.914882]  [] restore_sigcontext.isra.9+0xf2/0x110
> > [  167.914885]  [] sys_sigreturn+0xa9/0xc0
> > [  167.914888]  [] do_int80_syscall_32+0x85/0x190
> > [  167.914891]  [] entry_INT80_32+0x31/0x31
> > [  167.914898]
> > ---[ end trace e57344f10f300a76 ]---
> 
> I am not sure which path leads you there. But it could well be a
> state
> that was caused by the ipipe patch.
> 
> could you try this:
> 
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -426,6 +426,10 @@ void fpu__restore(struct fpu *fpu)
> /* Avoid __kernel_fpu_begin() right after fpregs_activate()
> */
> kernel_fpu_disable();
> trace_x86_fpu_before_restore(fpu);
> +   if (fpregs_activate(fpu)) {

This instruction does not compile due to fpregs_activate() returns
void, perhaps did you mean "if (fpregs_active(fpu))"?
Given that fpregs_active() have no args, I tried with this:

if (fpu->fpregs_active)

and warning does not raise (even warning added with this patch).

> +   WARN_ON_FPU(fpu !=
> this_cpu_read_stable(fpu_fpregs_owner_ctx));
> +   fpregs_deactivate(fpu);
> +   }
> fpregs_activate(fpu);
> copy_kernel_to_fpregs(>state);
> trace_x86_fpu_after_restore(fpu);
> 
> This would not be a proper fix, especially if you end up seeing that
> warning ...
> 
> Henning
> 
> > I found discussion at [3], and applied patch at [4] that comes from
> > it, but result is the same.
> > 
> > Starting xeno-test without -l argument result is the same.
> > Launching dohell alone (with same arguments as when launched from
> > xeno- test -l), dump does not appear.
> > 
> > Could be a Xenomai-related problem (though the stack seems not
> > concern
> > Xenomai) or it is better to post it on LKML?
> > 
> > Thanks in advance, regards
> > 
> > Mauro
> > 
> > [1] https://xenomai.org/pipermail/xenomai/2018-December/040142.html
> > [2] https://xenomai.org/pipermail/xenomai/2019-January/040172.html
> > [3]
> > https://lore.kernel.org/lkml/20181120102635.ddv3fvavxajjlfqk@linutr
> > onix.de/ [4]
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/co
> > mmit/?h=linux-4.9.y=d3741e0390287056011950493a641524f49fa05a
> > 
> > 
> 
> 



Re: [PATCHv2] cobalt/x86: fix condition in eager fpu code for kernels < 4.14

2019-01-08 Thread Mauro Salvini via Xenomai
On Tue, 2019-01-08 at 18:51 +0100, Henning Schild wrote:
> Am Tue, 8 Jan 2019 15:17:11 +0100
> schrieb Mauro Salvini :
> 
> > On Mon, 2019-01-07 at 14:04 +0100, Henning Schild wrote:
> > > From: Henning Schild 
> > > 
> > > We should mark the current task as not owning the fpu anymore if
> > > it
> > > does
> > > actually own the fpu, not if the fpu itself is active.
> > > 
> > > Fixes cb52e6c7438fa
> > > 
> > > Reported-by: Mauro Salvini 
> > > Signed-off-by: Henning Schild 
> > > ---
> > >  kernel/cobalt/arch/x86/thread.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/cobalt/arch/x86/thread.c
> > > b/kernel/cobalt/arch/x86/thread.c
> > > index a09560075..ba807ac1e 100644
> > > --- a/kernel/cobalt/arch/x86/thread.c
> > > +++ b/kernel/cobalt/arch/x86/thread.c
> > > @@ -475,7 +475,7 @@ void xnarch_leave_root(struct xnthread *root)
> > >   switch_fpu_finish(>thread.fpu,
> > > smp_processor_id()); #else
> > >   /* mark current thread as not owning the FPU anymore */
> > > - if (>thread.fpu.fpstate_active)
> > > + if (fpregs_active())
> > >   fpregs_deactivate(>thread.fpu);
> > >  #endif
> > >  }  
> > 
> > Hi all,
> > 
> > I tried to launch xeno-test several times under same previous
> > conditions and I confirm that this patch fixes the bug.
> 
> Good to hear, thanks!
> 
> > A side note: would not #include in
> > kernel/cobalt/arch/x86/thread.c lines 47,48 and 54 be redundant
> > with
> > same includes in
> > kernel/cobalt/arch/x86/include/asm/xenomai/thread.h
> > (that is indirectly included in thread.c)?
> 
> The latter does not have those includes. It is still very possible
> that there are a few too many but that is what you have inclusion
> guards
> for. And the code you are looking at 47,48 and 54 is all to support
> legacy/old kernels. You are in the IPIPE_X86_FPU_EAGER case.

Yes, sorry, I wanted to write
kernel/cobalt/arch/x86/include/asm/xenomai/wrappers.h

Anyway no problem to have them in both files.

Regards
Mauro

> 
> Henning
> 
> > Thanks, regards
> > Mauro
> 
> 



Re: [PATCHv2] cobalt/x86: fix condition in eager fpu code for kernels < 4.14

2019-01-08 Thread Mauro Salvini via Xenomai
On Mon, 2019-01-07 at 14:04 +0100, Henning Schild wrote:
> From: Henning Schild 
> 
> We should mark the current task as not owning the fpu anymore if it
> does
> actually own the fpu, not if the fpu itself is active.
> 
> Fixes cb52e6c7438fa
> 
> Reported-by: Mauro Salvini 
> Signed-off-by: Henning Schild 
> ---
>  kernel/cobalt/arch/x86/thread.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/cobalt/arch/x86/thread.c
> b/kernel/cobalt/arch/x86/thread.c
> index a09560075..ba807ac1e 100644
> --- a/kernel/cobalt/arch/x86/thread.c
> +++ b/kernel/cobalt/arch/x86/thread.c
> @@ -475,7 +475,7 @@ void xnarch_leave_root(struct xnthread *root)
>   switch_fpu_finish(>thread.fpu, smp_processor_id());
>  #else
>   /* mark current thread as not owning the FPU anymore */
> - if (>thread.fpu.fpstate_active)
> + if (fpregs_active())
>   fpregs_deactivate(>thread.fpu);
>  #endif
>  }

Hi all,

I tried to launch xeno-test several times under same previous
conditions and I confirm that this patch fixes the bug.

A side note: would not #include in
kernel/cobalt/arch/x86/thread.c lines 47,48 and 54 be redundant with
same includes in kernel/cobalt/arch/x86/include/asm/xenomai/thread.h
(that is indirectly included in thread.c)?

Thanks, regards
Mauro



Re: [PATCH] cobalt/x86: fix condition in eager fpu code for kernels < 4.14

2019-01-07 Thread Mauro Salvini via Xenomai
On Sat, 2018-12-22 at 12:42 +0100, Jan Kiszka via Xenomai wrote:
> On 21.12.18 14:44, Henning Schild via Xenomai wrote:
> > From: Henning Schild 
> > 
> > We should mark the current task as not owning the fpu anymore if it
> > does
> > actually own the fpu, not if the fpu itself is active.
> > 
> > Fixes cb52e6c7438fa
> > 
> > Reported-by: Mauro Salvini 
> > Signed-off-by: Henning Schild 
> > ---
> >   kernel/cobalt/arch/x86/thread.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/cobalt/arch/x86/thread.c
> > b/kernel/cobalt/arch/x86/thread.c
> > index a09560075..0c6054338 100644
> > --- a/kernel/cobalt/arch/x86/thread.c
> > +++ b/kernel/cobalt/arch/x86/thread.c
> > @@ -475,7 +475,7 @@ void xnarch_leave_root(struct xnthread *root)
> >     switch_fpu_finish(>thread.fpu,
> > smp_processor_id());
> >   #else
> >     /* mark current thread as not owning the FPU anymore */
> > -   if (>thread.fpu.fpstate_active)
> > +   if (>thread.fpu.fpregs_active)
> 
> Well, if you had used fpregs_active() here, you would have resolved
> also the 
> second bug of that line (which I only spotted now):
> "if ()" is always true...

Hi,

I'm trying this fix, I'll report results.

Regards
Mauro

> 
> Jan
> 
> >     fpregs_deactivate(>thread.fpu);
> >   #endif
> >   }
> > 
> 
> 



Re: Stack dump on ipipe 4.9.135-7 x86_64

2018-12-21 Thread Mauro Salvini via Xenomai
On Fri, 2018-12-21 at 16:11 +0100, Mauro Salvini via Xenomai wrote:
> On Fri, 2018-12-21 at 14:51 +0100, Henning Schild wrote:
> > Am Thu, 20 Dec 2018 10:10:29 +0100
> > schrieb Jan Kiszka :
> > 
> > > On 20.12.18 09:28, Mauro Salvini via Xenomai wrote:
> > > > Hi all,
> > > > 
> > > > I'm testing Xenomai 3 on an Intel Braswell board (Atom x5-
> > > > E8000).
> > > > I'm using ipipe kernel at last commit from [1], branch ipipe-
> > > > 4.9.y,
> > > > 64bit build on a Debian Stretch 9.6 64bit.
> > > > Xenomai library is from [2], branch stable/v3.0.x on commit
> > > > bc53d03f (I haven't two last commits but seems not related). I
> > > > tried both 32bit (mixed ABI) and 64bit builds with same
> > > > following
> > > > result.
> > > > 
> > > > I launch:
> > > > 
> > > > xeno-test -l "dohell -s xxx -p yyy -m xxx 9" -T 9
> > > > 
> > > > After a variable time (from minutes to hours) from latency test
> > > > start I get a few overruns that I discovered are generated by a
> > > > kernel stack dump (attached to mail the dmesg tail). Latency
> > > > test
> > > > doesn't stop, and after this stackdump never reports other
> > > > overruns
> > > > or latency peaks (seems I need to reboot to reproduce stack).
> > > > 
> > > > I read in this mailing list that on last patches much work has
> > > > done
> > > > on FPU part, should it be related?
> > > > 
> > > > Glad to give other infos if you need.  
> > > 
> > > Thanks for reporting. Maybe we need your config later on, but I'm
> > > first of all looking at the code, see below.
> > > 
> > > In general, it's better to run the kernel with frame-pointers
> > > enabled
> > > to get more reliable backtraces, at least when an error occurs.
> > > 
> > > > 
> > > > Thanks in advance, regards.
> > > > 
> > > > Mauro
> > > > 
> > > > [1] https://gitlab.denx.de/Xenomai/ipipe
> > > > [2] https://gitlab.denx.de/Xenomai/xenomai
> > > > -- next part --
> > > > [  233.205940] /home/build-user/develop/linux-ipipe-
> > > > 4.9.y/arch/x86/xenomai/include/asm/xenomai/fptest.h:43:
> > > > Warning: Linux is compiled to use FPU in kernel-space. For this
> > > > reason, switchtest can not test using FPU in Linux kernel-
> > > > space.
> > > > [  295.660454] [ cut here ]
> > > > [  295.660461]
> > > > WARNING: CPU: 0 PID: 139
> > > > at /home/build-user/develop/linux-ipipe-
> > > > 4.9.y/arch/x86/include/asm/fpu/internal.h:502
> > > > xnarch_leave_root+0x1a4/0x1b0  
> > > 
> > > Henning, the kernel checks fpu->fpregs_active here and finds it
> > > off
> > > while Xenomai looks at fpu.fpstate_active - intentionally or by
> > > accident?
> > 
> > That was by accident. In eager mode they should mostly be in sync,
> > unless when playing the nasty tricks ipipe has to play. I guess
> > there
> > is a path where we tried to "unown" a task that we unowned shortly
> > before. I did not try to find that path, i just sent a patch.
> > 
> > Mauro, since you can reproduce the problem you can probably tell if
> > the
> > patch fixes it.
> 
> Hi Henning,
> 
> I tried the patch but result is the same (see attached stackdump).
> 
> I attached also my config if could help.
> Now I'm compiling a kernel with frame-pointers enabled (as suggested
> by
> Jan).
> 

Hi all,

attached the stackdump with frame-pointers activated (seems very
similar to the previous, but I checked and CONFIG_FRAME_POINTER is
active in config...did I do some mistake?).

Unfortunately, I will not be able to do other tests next weeks (seasons
holidays), but feel free to give me possible tests to do when I'll come
back.

By the way, best wishes to all.

Mauro
-- next part --
[  145.774418] 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/xenomai/include/asm/xenomai/fptest.h:43:
 Warning: Linux is compiled to use FPU in kernel-space.
   For this reason, switchtest can not test using FPU in Linux 
kernel-space.
[  181.969937] [ cut here ]
[  181.969944] WARNING: CPU: 0 PID: 139 at 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/include/asm/fpu/internal.h:502
 xnarch_leave_root+0x15e/0x1

Re: Stack dump on ipipe 4.9.135-7 x86_64

2018-12-21 Thread Mauro Salvini via Xenomai
On Fri, 2018-12-21 at 14:51 +0100, Henning Schild wrote:
> Am Thu, 20 Dec 2018 10:10:29 +0100
> schrieb Jan Kiszka :
> 
> > On 20.12.18 09:28, Mauro Salvini via Xenomai wrote:
> > > Hi all,
> > > 
> > > I'm testing Xenomai 3 on an Intel Braswell board (Atom x5-E8000).
> > > I'm using ipipe kernel at last commit from [1], branch ipipe-
> > > 4.9.y,
> > > 64bit build on a Debian Stretch 9.6 64bit.
> > > Xenomai library is from [2], branch stable/v3.0.x on commit
> > > bc53d03f (I haven't two last commits but seems not related). I
> > > tried both 32bit (mixed ABI) and 64bit builds with same following
> > > result.
> > > 
> > > I launch:
> > > 
> > > xeno-test -l "dohell -s xxx -p yyy -m xxx 9" -T 9
> > > 
> > > After a variable time (from minutes to hours) from latency test
> > > start I get a few overruns that I discovered are generated by a
> > > kernel stack dump (attached to mail the dmesg tail). Latency test
> > > doesn't stop, and after this stackdump never reports other
> > > overruns
> > > or latency peaks (seems I need to reboot to reproduce stack).
> > > 
> > > I read in this mailing list that on last patches much work has
> > > done
> > > on FPU part, should it be related?
> > > 
> > > Glad to give other infos if you need.  
> > 
> > Thanks for reporting. Maybe we need your config later on, but I'm
> > first of all looking at the code, see below.
> > 
> > In general, it's better to run the kernel with frame-pointers
> > enabled
> > to get more reliable backtraces, at least when an error occurs.
> > 
> > > 
> > > Thanks in advance, regards.
> > > 
> > > Mauro
> > > 
> > > [1] https://gitlab.denx.de/Xenomai/ipipe
> > > [2] https://gitlab.denx.de/Xenomai/xenomai
> > > -- next part --
> > > [  233.205940] /home/build-user/develop/linux-ipipe-
> > > 4.9.y/arch/x86/xenomai/include/asm/xenomai/fptest.h:43:
> > > Warning: Linux is compiled to use FPU in kernel-space. For this
> > > reason, switchtest can not test using FPU in Linux kernel-space.
> > > [  295.660454] [ cut here ]
> > > [  295.660461]
> > > WARNING: CPU: 0 PID: 139
> > > at /home/build-user/develop/linux-ipipe-
> > > 4.9.y/arch/x86/include/asm/fpu/internal.h:502
> > > xnarch_leave_root+0x1a4/0x1b0  
> > 
> > Henning, the kernel checks fpu->fpregs_active here and finds it off
> > while Xenomai looks at fpu.fpstate_active - intentionally or by
> > accident?
> 
> That was by accident. In eager mode they should mostly be in sync,
> unless when playing the nasty tricks ipipe has to play. I guess there
> is a path where we tried to "unown" a task that we unowned shortly
> before. I did not try to find that path, i just sent a patch.
> 
> Mauro, since you can reproduce the problem you can probably tell if
> the
> patch fixes it.

Hi Henning,

I tried the patch but result is the same (see attached stackdump).

I attached also my config if could help.
Now I'm compiling a kernel with frame-pointers enabled (as suggested by
Jan).

Thanks in advance

Mauro

> 
> Henning
> 
> > Jan
> > 
> > > [  295.660465] Modules linked in: binfmt_misc msr iTCO_wdt
> > > iTCO_vendor_support coretemp kvm_intel kvm irqbypass
> > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> > > aes_x86_64 lrw snd_pcm gf128mul glue_helper i915 ablk_helper
> > > snd_timer cryptd snd soundcore pcspkr drm_kms_helper drm evdev
> > > lpc_ich mfd_core fb_sys_fops syscopyarea sysfillrect sysimgblt sg
> > > shpchp video button ip_tables x_tables autofs4 ext4 crc16 jbd2
> > > fscrypto mbcache sd_mod hid_generic usbhid hid crc32c_intel ahci
> > > igb i2c_i801 libahci i2c_smbus i2c_algo_bit xhci_pci dca ptp
> > > pps_core libata xhci_hcd sdhci_pci sdhci usbcore scsi_mod
> > > mmc_core
> > > fjes [last unloaded: rtnet] [  295.660660] CPU: 0 PID: 139 Comm:
> > > jbd2/sda1-8 Tainted: GW   4.9.135+ #1 [  295.660663]
> > > Hardware name: Default string Default string/Q7-BW, BIOS
> > > V1.20#KW050220A 03/16/2018 [  295.660666] I-pipe domain: Xenomai
> > > [  295.660668]   823402c2
> > > 
> > >  [  295.660680]  829cd400
> > > 8206cb68
> > > 985636908040 985636504080
> > > [  295.660706]  

Stack dump on ipipe 4.9.135-7 x86_64

2018-12-20 Thread Mauro Salvini via Xenomai
Hi all,

I'm testing Xenomai 3 on an Intel Braswell board (Atom x5-E8000).
I'm using ipipe kernel at last commit from [1], branch ipipe-4.9.y,
64bit build on a Debian Stretch 9.6 64bit.
Xenomai library is from [2], branch stable/v3.0.x on commit bc53d03f (I
haven't two last commits but seems not related). I tried both 32bit
(mixed ABI) and 64bit builds with same following result.

I launch:

xeno-test -l "dohell -s xxx -p yyy -m xxx 9" -T 9

After a variable time (from minutes to hours) from latency test start I
get a few overruns that I discovered are generated by a kernel stack
dump (attached to mail the dmesg tail). Latency test doesn't stop, and
after this stackdump never reports other overruns or latency peaks
(seems I need to reboot to reproduce stack).

I read in this mailing list that on last patches much work has done on
FPU part, should it be related?

Glad to give other infos if you need.

Thanks in advance, regards.

Mauro

[1] https://gitlab.denx.de/Xenomai/ipipe
[2] https://gitlab.denx.de/Xenomai/xenomai
-- next part --
[  233.205940] 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/xenomai/include/asm/xenomai/fptest.h:43:
 Warning: Linux is compiled to use FPU in kernel-space.
   For this reason, switchtest can not test using FPU in Linux 
kernel-space.
[  295.660454] [ cut here ]
[  295.660461] WARNING: CPU: 0 PID: 139 at 
/home/build-user/develop/linux-ipipe-4.9.y/arch/x86/include/asm/fpu/internal.h:502
 xnarch_leave_root+0x1a4/0x1b0
[  295.660465] Modules linked in: binfmt_misc msr iTCO_wdt iTCO_vendor_support 
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw snd_pcm gf128mul glue_helper 
i915 ablk_helper snd_timer cryptd snd soundcore pcspkr drm_kms_helper drm evdev 
lpc_ich mfd_core fb_sys_fops syscopyarea sysfillrect sysimgblt sg shpchp video 
button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sd_mod 
hid_generic usbhid hid crc32c_intel ahci igb i2c_i801 libahci i2c_smbus 
i2c_algo_bit xhci_pci dca ptp pps_core libata xhci_hcd sdhci_pci sdhci usbcore 
scsi_mod mmc_core fjes [last unloaded: rtnet]
[  295.660660] CPU: 0 PID: 139 Comm: jbd2/sda1-8 Tainted: GW   
4.9.135+ #1
[  295.660663] Hardware name: Default string Default string/Q7-BW, BIOS 
V1.20#KW050220A 03/16/2018
[  295.660666] I-pipe domain: Xenomai
[  295.660668]   823402c2  

[  295.660680]  829cd400 8206cb68 985636908040 
985636504080
[  295.660706]  00099f50  bc1580563040 
98567ac9b940
[  295.660718] Call Trace:
[  295.660721]  [] ? dump_stack+0xb5/0xd3
[  295.660724]  [] ? __warn+0xc8/0xf0
[  295.660727]  [] ? xnarch_leave_root+0x1a4/0x1b0
[  295.660729]  [] ? ___xnsched_run+0x3f6/0x4b0
[  295.660732]  [] ? timerfd_handler+0x36/0x50
[  295.660735]  [] ? xntimerq_insert+0x5/0xa0
[  295.660738]  [] ? xnclock_tick+0x1a9/0x2c0
[  295.660740]  [] ? xnintr_core_clock_handler+0x2fa/0x310
[  295.660743]  [] ? dispatch_irq_head+0x8a/0x120
[  295.660746]  [] ? __ipipe_handle_irq+0x85/0x1b0
[  295.660749]  [] ? apic_timer_interrupt+0x89/0xb0
[  295.660752]  [] ? continue_block+0x22/0x54 [crc32c_intel]
[  295.660754]  [] ? crc32c_pcl_intel_update+0x53/0x60 
[crc32c_intel]
[  295.660757]  [] ? 
jbd2_journal_commit_transaction+0x9e0/0x1850 [jbd2]
[  295.660760]  [] ? __switch_to_asm+0x40/0x70
[  295.660763]  [] ? try_to_del_timer_sync+0x4f/0x80
[  295.660766]  [] ? kjournald2+0xe2/0x290 [jbd2]
[  295.660768]  [] ? wake_up_atomic_t+0x30/0x30
[  295.660771]  [] ? commit_timeout+0x10/0x10 [jbd2]
[  295.660774]  [] ? kthread+0xf5/0x110
[  295.660776]  [] ? __switch_to_asm+0x40/0x70
[  295.660779]  [] ? kthread_park+0x60/0x60
[  295.660782]  [] ? ret_from_fork+0x55/0x60
[  295.660785] ---[ end trace b1bfa97fc17a203c ]---
[  705.768058] perf: interrupt took too long (2543 > 2500), lowering 
kernel.perf_event_max_sample_rate to 78500
[  865.772646] perf: interrupt took too long (3207 > 3178), lowering 
kernel.perf_event_max_sample_rate to 62250


Re: rtping differences between master and slaves

2018-11-23 Thread Mauro Salvini via Xenomai
On Fri, 2018-11-23 at 13:55 +0100, Jan Kiszka wrote:
> On 23.11.18 13:49, Mauro Salvini wrote:
> > On Fri, 2018-11-23 at 12:49 +0100, Jan Kiszka wrote:
> > > On 23.11.18 11:51, Mauro Salvini via Xenomai wrote:
> > > > Hi all,
> > > > 
> > > > I'm trying RTNet (an old version: 0.9.13) on Xenomai (yet
> > > > another
> > > > old
> > > > version, 2.5.6: unluckily I cannot move to newer versions).
> > > > 
> > > > My setup has 3 identical nodes on isolated RT network: cycle
> > > > time
> > > > of
> > > > 5000us, master with slot offset 0, slave A with slot offset
> > > > 200us
> > > > and
> > > > slave B with slot offset 400us; network is configured using
> > > > rtnet
> > > > utility script.
> > > > 
> > > > Using rtping I observe two different behaviors on slaves and
> > > > master.
> > > > 
> > > > For example, pinging slave A from B:
> > > > 
> > > > ...
> > > > 64 bytes from 10.0.43.91: icmp_seq=1 time=.7 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=2 time=7846.3 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=3 time=7966.6 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=4 time=8096.6 us
> > > > ...
> > > > 64 bytes from 10.0.43.91: icmp_seq=15 time=9461.7 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=16 time=9604.2 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=17 time=4719.5 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=18 time=4844.7 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=19 time=4968.3 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=20 time=5124.4 us
> > > > ...
> > > > 64 bytes from 10.0.43.91: icmp_seq=53 time=9215.4 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=54 time=9331.5 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=55 time=9461.0 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=56 time=9585.3 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=57 time=4712.5 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=58 time=4847.0 us
> > > > 64 bytes from 10.0.43.91: icmp_seq=59 time=4967.2 us
> > > > ...
> > > > 
> > > > so time varies between a minimum equal to time distance between
> > > > two
> > > > slots and a maximum equal to this time plus cycle duration.
> > > > This
> > > > can
> > > > makes sense supposing that Linux timer used to generate ping
> > > > request in
> > > > rtping.c slowly drifts from real-time timer that governs tx
> > > > slot,
> > > > so
> > > > the duration changes between requests depending on when request
> > > > happens
> > > > into the TDMA cycle. Same behavior if I ping A from B or if I
> > > > ping
> > > > master from A or B.
> > > > 
> > > > On master I instead observe this (pinging slave A or B
> > > > equally):
> > > > 
> > > > ...
> > > > 64 bytes from 10.0.43.89: icmp_seq=1 time=2434.1 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=2 time=2443.1 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=3 time=2438.7 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=4 time=2450.0 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=5 time=2447.9 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=6 time=2450.6 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=7 time=2428.5 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=8 time=2442.8 us
> > > > 64 bytes from 10.0.43.89: icmp_seq=9 time=2442.3 us
> > > > ...
> > > > 
> > > > ping duration is constant into a rtping execution (changes
> > > > between
> > > > different executions).
> > > > 
> > > > So I'm puzzling about this difference and I wonder if this is
> > > > normal or
> > > > if there could be problems.
> > > > 
> > > 
> > > As you are running RTmac/TDMA: The latency is affected by when
> > > during
> > > some cycle
> > > you issue the ICMP request. This is not synchronized with the
> > > cycle,
> > > so you will
> > > see random shifts there already. Furthermore, if station A has a
> > > time
> > > slot
> > > before station B and A issues the ping, B may have a change to
> > > reply
> > > in the same
> > > cycle. If you change roles, this is surely not possible, thus you
> > > get
> > > different
> > > round trip times.
> > 
> > Thank Jan for quick answer.
> > 
> > Yes, I understood your explanation, that perfectly clarify rtping
> > between A and B.
> > 
> > Instead, about the constant ping duration when master pings a
> > slave: is
> > due to the fact that on master ICMP requests are synchronized with
> > the
> > cycle?
> 
> I don't recall the details, but chances are high that, because the
> master drives 
> the TDMA cycle, its rtping loop happens to remain in sync with that
> cycle.

Ok, thank you very much Jan.

Regards.

Mauro




rtping differences between master and slaves

2018-11-23 Thread Mauro Salvini via Xenomai
Hi all,

I'm trying RTNet (an old version: 0.9.13) on Xenomai (yet another old
version, 2.5.6: unluckily I cannot move to newer versions).

My setup has 3 identical nodes on isolated RT network: cycle time of
5000us, master with slot offset 0, slave A with slot offset 200us and
slave B with slot offset 400us; network is configured using rtnet
utility script.

Using rtping I observe two different behaviors on slaves and master.

For example, pinging slave A from B:

...
64 bytes from 10.0.43.91: icmp_seq=1 time=.7 us
64 bytes from 10.0.43.91: icmp_seq=2 time=7846.3 us
64 bytes from 10.0.43.91: icmp_seq=3 time=7966.6 us
64 bytes from 10.0.43.91: icmp_seq=4 time=8096.6 us
...
64 bytes from 10.0.43.91: icmp_seq=15 time=9461.7 us
64 bytes from 10.0.43.91: icmp_seq=16 time=9604.2 us
64 bytes from 10.0.43.91: icmp_seq=17 time=4719.5 us
64 bytes from 10.0.43.91: icmp_seq=18 time=4844.7 us
64 bytes from 10.0.43.91: icmp_seq=19 time=4968.3 us
64 bytes from 10.0.43.91: icmp_seq=20 time=5124.4 us
...
64 bytes from 10.0.43.91: icmp_seq=53 time=9215.4 us
64 bytes from 10.0.43.91: icmp_seq=54 time=9331.5 us
64 bytes from 10.0.43.91: icmp_seq=55 time=9461.0 us
64 bytes from 10.0.43.91: icmp_seq=56 time=9585.3 us
64 bytes from 10.0.43.91: icmp_seq=57 time=4712.5 us
64 bytes from 10.0.43.91: icmp_seq=58 time=4847.0 us
64 bytes from 10.0.43.91: icmp_seq=59 time=4967.2 us
...

so time varies between a minimum equal to time distance between two
slots and a maximum equal to this time plus cycle duration. This can
makes sense supposing that Linux timer used to generate ping request in
rtping.c slowly drifts from real-time timer that governs tx slot, so
the duration changes between requests depending on when request happens
into the TDMA cycle. Same behavior if I ping A from B or if I ping
master from A or B.

On master I instead observe this (pinging slave A or B equally):

...
64 bytes from 10.0.43.89: icmp_seq=1 time=2434.1 us
64 bytes from 10.0.43.89: icmp_seq=2 time=2443.1 us
64 bytes from 10.0.43.89: icmp_seq=3 time=2438.7 us
64 bytes from 10.0.43.89: icmp_seq=4 time=2450.0 us
64 bytes from 10.0.43.89: icmp_seq=5 time=2447.9 us
64 bytes from 10.0.43.89: icmp_seq=6 time=2450.6 us
64 bytes from 10.0.43.89: icmp_seq=7 time=2428.5 us
64 bytes from 10.0.43.89: icmp_seq=8 time=2442.8 us
64 bytes from 10.0.43.89: icmp_seq=9 time=2442.3 us
...

ping duration is constant into a rtping execution (changes between
different executions).

So I'm puzzling about this difference and I wonder if this is normal or
if there could be problems.

Thanks in advance, regards

Mauro



[PATCH] copperplate/clockobj: prevent warning with -Wconversion

2018-11-05 Thread Mauro Salvini via Xenomai
make cast explicit to avoid warning when user code is compiled with
-Wconversion

Signed-off-by: Mauro Salvini 
---
 include/copperplate/clockobj.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/copperplate/clockobj.h b/include/copperplate/clockobj.h
index 8568794d9..24c748557 100644
--- a/include/copperplate/clockobj.h
+++ b/include/copperplate/clockobj.h
@@ -136,7 +136,7 @@ void clockobj_ns_to_timespec(ticks_t ns, struct timespec 
*ts)
 {
unsigned long rem;
 
-   ts->tv_sec = cobalt_divrem_billion(ns, );
+   ts->tv_sec = (time_t)cobalt_divrem_billion(ns, );
ts->tv_nsec = rem;
 }
 
-- 
2.11.0