On update to 3.1, select() starts returning errno == EADV

2021-06-06 Thread steve freyder via Xenomai

Hello Xenomai group,

We're in the process of moving from Xenomai 3.0.7/4.1.18 to 
3.1/4.14.85.  Testing one of our apps we found that select() was 
returning -1 with errno == EADV.  This app was written before we knew 
about the recommendation from Philippe contained in this message:



https://xenomai.org/pipermail/xenomai/2019-August/041435.html


We're not mixing Linux/Xenomai descriptors in the select set, the fd's 
(4 of them in this case) are all pointing to Linux FIFOs 
(/dev/somename{1..4}), and we'll ultimately start using __STD(select) 
like the above msg says we should (until this was found we were unaware 
of the importance of using __STD(select)).


But my reason for posting is my confusion about the EADV return.  The 
message says:


>>> Switch from -EBADF to -EADV to notify userland that RTDM does not
>>> manage a file descriptor, which cannot conflict with any sensible
>>> error code the Cobalt core or any RTDM driver may return.

But also says:

> We still want to receive -EBADF on wrong fildes appearing in select()
> descriptors.

Taking both of those into account, I looked at the code in 
lib/cobalt/select.c which implements the glibc fallback:


    if (err == -EBADF || err == -EPERM || err == -ENOSYS)
    return __STD(select(__nfds, __readfds,
    __writefds, __exceptfds, __timeout));

In the specific case of our app, had the above code included a check for 
-EADV, things would operate as they did under 3.0.7.  However, the 
statement "still want to receive -EBADF on wrong fildes appearing in 
select" suggests that the check for -EADV shouldn't be necessary.  There 
must be something at a lower level that is returning -EADV instead of 
-EBADF when faced with a Linux FIFO fd, or, if -EADV is appropriate then 
it seems adding a check for -EADV along with -EBADF, -EPERM, and -ENOSYS 
is also appropriate if backwards compatibility is desired for select().



Thank you,

Best regards,

Steve






Re: ipipe 5.4.107 / 5.4.93 build issues on arm32

2021-04-14 Thread steve freyder via Xenomai

On 4/14/2021 4:50 PM, Greg Gallagher via Xenomai wrote:

On Wed, Apr 14, 2021 at 4:52 PM Thomas Petazzoni <
thomas.petazz...@bootlin.com> wrote:


On Wed, 14 Apr 2021 14:41:29 -0400
Greg Gallagher  wrote:


Ipipe 5.4 and xenomai 3.1 are compatible, this is my mistake, I’ll fix up
and generate a new patch once the latency fix is done.
FWIW, building straight from the ipipe-arm repo on the ipipe/5.4.y branch
has been working for me.

I'm afraid the HEAD of ipipe/5.4.y as of commit
ffaf274ca4cc117c2dc3f9b2ee8ea6218b50995a doesn't build for me. I'm on
this commit + I have prepared the kernel using Xenomai 3.1
./scripts/prepare-kernel.sh --linux=/path/to/kernel + kernel configured
with sama5_defconfig.

And same build failure:

In file included from kernel/cpu.c:23:0:
./include/linux/stop_machine.h: In function ‘stop_machine_cpuslocked’:
./include/linux/stop_machine.h:150:2: error: implicit declaration of
function ‘hard_irq_enable’; did you mean ‘hard_irq_disable’?
[-Werror=implicit-function-declaration]
   hard_irq_enable();
   ^~~
   hard_irq_disable

Best regards,

Thomas
--
Thomas Petazzoni, co-owner and CEO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


I'll dig into this tonight, I usually test with multi_v7_defconfig and it
seems to build fine with ipipe 5.4 and Xenomai 3.1.  I'll test with
sama5_defconfig.

Thanks

Greg


Greg,


We are also having issues with ARM32 (armv7-a) with this build genre 
(5.4.107 + Xenomai 3.1 stable) and the e1000e driver.  The build 
completes without compile/link errors, but when booting we're getting a 
kernel stack dump (upon an ifup of the e1000e interface) that's 
indicating a problem with enabling interrupts at a point where 
interrupts are not supposed to be enabled.  We're not seeing a lot of 
changes to the e1000e tree since 4.14.85 (last release we know of where 
the driver was working properly), so we suspected it might be something 
in Linux or ipipe.  We know that e1000e seems to work properly with 
kernel 5.4.107 (non-Xenomai/ipipe build).



I wonder, do you have access to an ARMV7 system that has an e1000e NIC?  
If so, I wonder could you add an e1000e driver to your modules build and 
see if you get the same stack trace we're getting?  Below is the 
original post I did on this (on 03/15/2021).  We suspect that e1000e 
doesn't fly on any armv7 5.4.107+Xenomai 3.1/stable build.  Naturally 
we'll be happy to test any patches related to these issues.



Best regards,

Steve



Greetings Xenomai list,


We are seeing the following stack trace when we 'ifup' the e1000e 
adapter.  4.19 exhibits same failure, 4.14.96 also fails but with a 
different stack trace.  Last working version is 4.14.85.


Thanks in advance for any assist,

Steve



[   47.635779] [ cut here ]
[   47.637083] ip (658) used greatest stack depth: 3960 bytes left
[   47.640416] WARNING: CPU: 0 PID: 0 at kernel/ipipe/core.c:1968 
__ipipe_spin_unlock_debug+0x50/0x5c

[   47.655297] Modules linked in: xeno_gpio_mxc e1000e
[   47.660198] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.4.93-00202-g1070d76ae3f5-dirty #7

[   47.668386] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   47.674923] I-pipe domain: Linux
[   47.678159] Backtrace:
[   47.680620] [<80992f10>] (dump_backtrace) from [<8099328c>] 
(show_stack+0x20/0x24)

[   47.688199]  r7:81238180 r6:0080 r5: r4:811a66f0
[   47.693867] [<8099326c>] (show_stack) from [<8099cac0>] 
(dump_stack+0xf4/0x128)
[   47.701184] [<8099c9cc>] (dump_stack) from [<8012ba1c>] 
(__warn+0xd0/0x108)
[   47.708154]  r9: r8:0009 r7:07b0 r6:801e1448 
r5:0009 r4:80c02be0
[   47.715905] [<8012b94c>] (__warn) from [<809942b8>] 
(warn_slowpath_fmt+0x6c/0xc4)

[   47.723396]  r7:801e1448 r6:07b0 r5:80c02be0 r4:
[   47.729065] [<80994250>] (warn_slowpath_fmt) from [<801e1448>] 
(__ipipe_spin_unlock_debug+0x50/0x5c)

[   47.738206]  r8: r7: r6:9d6c r5:bf54b4c0 r4:bd354580
[   47.744917] [<801e13f8>] (__ipipe_spin_unlock_debug) from 
[<801a2d90>] (mod_timer+0x190/0x400)
[   47.753537] [<801a2c00>] (mod_timer) from [<7f01a5b8>] 
(e1000_msix_other+0xac/0xb8 [e1000e])
[   47.761984]  r10:80e748d4 r9:0001 r8:81101d78 r7:0137 
r6:bd3549e4 r5:8104

[   47.769822]  r4:bd354000
[   47.772365] [<7f01a50c>] (e1000_msix_other [e1000e]) from 
[<801859b8>] (__handle_irq_event_percpu+0x64/0x240)

[   47.782289]  r7:0137 r6: r5:bd23e870 r4:bc09aa80
[   47.787957] [<80185954>] (__handle_irq_event_percpu) from 
[<80185bd0>] (handle_irq_event_percpu+0x3c/0x98)
[   47.797618]  r10:80e748d4 r9:0001 r8:81106fe8 r7:c0faf65c 
r6:bd23e870 r5:bd23e870

[   47.805456]  r4:bd23e800
[   47.807999] [<80185b94>] (handle_irq_event_percpu) from [<80185c74>] 
(handle_irq_event+0x48/0x6c)

[   47.816878] 

Re: 回复: Shared Memory by using RT_HEAP

2021-04-01 Thread steve freyder via Xenomai
RT tasks can create/map standard Linux shared memory using either 
mmap(), or shm_open().?0?2 However, if you need dynamic memory management 
inside that shared memory, you'd have to add that, along with any 
required locking, so in that case you're better off using rt_heap_xxx() 
routines.


But, if all you need is a fixed-size memory block shared by multiple 
processes (like using H_SINGLE with rt_heap_create() would do), then 
these calls will do the job.?0?2 Normal Linux (non-Xenomai) processes can 
also map to these shared memory blocks using the same calls.


Steve


On 4/1/2021 10:02 AM, Peter Laurich via Xenomai wrote:
No other way that I am aware of. I was troubled by the --session and 
--mem-pool-size switches for a long time as well.


Peter

On 4/1/2021 10:44 AM, zy wrote:

Hi Peter,

You solved what has troubled me for a long time. Thank you for your 
reply.

I know the conception of session of linux for the first time.
In addition, I would like to know if there is other method ?

Best regards,
Zhou Yang

--?0?2?0?2--
*??:* "Peter Laurich" ;
*:*?0?22021??4??1??(??) 8:43
*??:*?0?2"xenomai";
*:*?0?2Re: Shared Memory by using RT_HEAP

Hello,

I suspect that your problem may be the same as I had earlier. If you are
sharing heaps then you have more than one program that needs to have the
same Xenomai session. Here is how I spawn two programs that share the
same heap and other Xenomai objects:

?0?2?0?2?0?2 ./initSys?0?2 --session=akamina --mem-pool-size=20M &
?0?2?0?2?0?2 ./sacqScan --session=akamina --mem-pool-size=20M &

where initSys and sacqScan are two of my programs, I have chosen to name
the session akamina and I wanted a memory pool size of 20M (rather than
the default of 1M).

Good luck with your upgrade. I have found the upgrade from 2 to be
incredibly painful and I am now stuck a little further down the path
than you. I found that I lost functionality that I relied on (e.g.
exchanging events and messages between kernel drivers and user-space
programs) that is not easy to replace. I looked at staying with Xenomai
2 but I do need a more recent kernel so I am stuck.

Good luck.

Peter


On 3/31/2021 9:25 PM, zy via Xenomai wrote:
> Hi all,
>
> On Xenomai-2.6.5, I use rt_heap_create with flag "H_SHARED" to 
create rt_heap
> in one process and call rt_heap_bind to get this rt_heap by the 
same name in another
> process. I use rt_heap as shared memory between process. But on 
Xenomai-3.1, I
> find this method does not work. When calling rt_heap_bind using 
"TM_INFINITE",
> the function doesn't return, using "TM_NONEBLOCK", returns -11. I 
check this
> email system , find "--enable-pshared" should be configured before 
installation?0?2 and

> I do it, but it doesn't work.
> Can anyone give me a detailed solution? Or, shared memory cannot be 
created by

> RT_HEAP on Xenomai-3.
>
> Best regards,
> Zhou Yang


Kernel 5.4 xenomai 3.1 -stable, e1000e, NIC crash on ifup

2021-03-15 Thread steve freyder via Xenomai

Greetings Xenomai list,


We are seeing the following stack trace when we 'ifup' the e1000e 
adapter.  4.19 exhibits same failure, 4.14.96 also fails but with a 
different stack trace.  Last working version is 4.14.85.


Thanks in advance for any assist,

Steve



[   47.635779] [ cut here ]
[   47.637083] ip (658) used greatest stack depth: 3960 bytes left
[   47.640416] WARNING: CPU: 0 PID: 0 at kernel/ipipe/core.c:1968 
__ipipe_spin_unlock_debug+0x50/0x5c

[   47.655297] Modules linked in: xeno_gpio_mxc e1000e
[   47.660198] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.4.93-00202-g1070d76ae3f5-dirty #7

[   47.668386] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   47.674923] I-pipe domain: Linux
[   47.678159] Backtrace:
[   47.680620] [<80992f10>] (dump_backtrace) from [<8099328c>] 
(show_stack+0x20/0x24)

[   47.688199]  r7:81238180 r6:0080 r5: r4:811a66f0
[   47.693867] [<8099326c>] (show_stack) from [<8099cac0>] 
(dump_stack+0xf4/0x128)
[   47.701184] [<8099c9cc>] (dump_stack) from [<8012ba1c>] 
(__warn+0xd0/0x108)
[   47.708154]  r9: r8:0009 r7:07b0 r6:801e1448 
r5:0009 r4:80c02be0
[   47.715905] [<8012b94c>] (__warn) from [<809942b8>] 
(warn_slowpath_fmt+0x6c/0xc4)

[   47.723396]  r7:801e1448 r6:07b0 r5:80c02be0 r4:
[   47.729065] [<80994250>] (warn_slowpath_fmt) from [<801e1448>] 
(__ipipe_spin_unlock_debug+0x50/0x5c)

[   47.738206]  r8: r7: r6:9d6c r5:bf54b4c0 r4:bd354580
[   47.744917] [<801e13f8>] (__ipipe_spin_unlock_debug) from 
[<801a2d90>] (mod_timer+0x190/0x400)
[   47.753537] [<801a2c00>] (mod_timer) from [<7f01a5b8>] 
(e1000_msix_other+0xac/0xb8 [e1000e])
[   47.761984]  r10:80e748d4 r9:0001 r8:81101d78 r7:0137 
r6:bd3549e4 r5:8104

[   47.769822]  r4:bd354000
[   47.772365] [<7f01a50c>] (e1000_msix_other [e1000e]) from 
[<801859b8>] (__handle_irq_event_percpu+0x64/0x240)

[   47.782289]  r7:0137 r6: r5:bd23e870 r4:bc09aa80
[   47.787957] [<80185954>] (__handle_irq_event_percpu) from 
[<80185bd0>] (handle_irq_event_percpu+0x3c/0x98)
[   47.797618]  r10:80e748d4 r9:0001 r8:81106fe8 r7:c0faf65c 
r6:bd23e870 r5:bd23e870

[   47.805456]  r4:bd23e800
[   47.807999] [<80185b94>] (handle_irq_event_percpu) from [<80185c74>] 
(handle_irq_event+0x48/0x6c)

[   47.816878]  r5:bd23e870 r4:bd23e800
[   47.820463] [<80185c2c>] (handle_irq_event) from [<8018ba84>] 
(handle_edge_irq+0xb4/0x1f8)

[   47.828735]  r7:c0faf65c r6:80e8a410 r5:bd23e870 r4:bd23e800
[   47.834403] [<8018b9d0>] (handle_edge_irq) from [<801849d8>] 
(generic_handle_irq+0x30/0x44)
[   47.842762]  r9:0001 r8:0830 r7: r6: 
r5:be412e58 r4:0003
[   47.850513] [<801849a8>] (generic_handle_irq) from [<8057dca8>] 
(dw_handle_msi_irq+0x9c/0xf4)
[   47.859046] [<8057dc0c>] (dw_handle_msi_irq) from [<8057dd34>] 
(dw_chained_msi_isr+0x34/0x80)
[   47.867579]  r9:8116cad4 r8:8111e8a8 r7:001a r6:bee36414 
r5:89dc r4:bee36400
[   47.875331] [<8057dd00>] (dw_chained_msi_isr) from [<801e2dd0>] 
(__ipipe_dispatch_irq+0x154/0x228)

[   47.884297]  r7:001a r6:0001 r5:0019 r4:
[   47.889965] [<801e2c7c>] (__ipipe_dispatch_irq) from [<801028dc>] 
(__ipipe_grab_irq+0x54/0xb4)
[   47.898585]  r9:8116cad4 r8:f4000100 r7:bf3b7ae0 r6:81101eb8 
r5:bf3b5ae0 r4:0019
[   47.906337] [<80102888>] (__ipipe_grab_irq) from [<80102a8c>] 
(gic_handle_irq+0x58/0xb0)

[   47.914437]  r7:811072f8 r6:03ff r5:f400010c r4:81101eb8
[   47.920105] [<80102a34>] (gic_handle_irq) from [<80101c64>] 
(__irq_svc+0x84/0x90)

[   47.927595] Exception stack(0x81101eb8 to 0x81101f00)
[   47.932657] 1ea0: 8000 
[   47.940842] 1ec0: bf3b5ae0  80e8fae0  81106b68 
81106ba8 811abf50 80bfbb3c
[   47.949027] 1ee0: 80e748d4 81101f1c 81101f08 81101f08 801e26f8 
801e26fc 60070013 
[   47.957214]  r9:8110 r8:811abf50 r7:81101eec r6: 
r5:60070013 r4:801e26fc
[   47.964966] [<801e269c>] (ipipe_unstall_root) from [<8010a2d8>] 
(arch_cpu_idle+0x98/0xb0)

[   47.973151]  r5: r4:80e8fae0
[   47.976735] [<8010a240>] (arch_cpu_idle) from [<809a4a20>] 
(default_idle_call+0x44/0x50)

[   47.984833]  r5: r4:8110
[   47.988419] [<809a49dc>] (default_idle_call) from [<8016059c>] 
(do_idle+0xd8/0x1e4)
[   47.996083] [<801604c4>] (do_idle) from [<8016097c>] 
(cpu_startup_entry+0x28/0x2c)
[   48.003661]  r9:0001 r8:811b1000 r7: r6:81106b40 
r5:811b1000 r4:00ce
[   48.011413] [<80160954>] (cpu_startup_entry) from [<8099cc6c>] 
(rest_init+0x9c/0xbc)
[   48.019165] [<8099cbd0>] (rest_init) from [<80e00b44>] 
(arch_call_rest_init+0x18/0x1c)

[   48.027089]  r5:811b1000 r4:80e748d4
[   48.030675] [<80e00b2c>] (arch_call_rest_init) from [<80e00fa0>] 
(start_kernel+0x3e0/0x4a4)

[   48.039035] [<80e00bc0>] (start_kernel) from [<>] (0x0)
[   48.044963] ---[ end t

Re: [Xenomai] bad syscall <0x197>

2021-02-24 Thread steve freyder via Xenomai




On 2/24/2021 2:52 PM, Greg Gallagher wrote:



On Wed, Feb 24, 2021 at 3:45 PM steve freyder via Xenomai 
mailto:xenomai@xenomai.org>> wrote:


Greetings Xenomai list,


I have a Linux OE 4.14.85 build with Xenomai 3.1 -next branch, and am
seeing this:

root@ICB-G3L:~ # uname -a
Linux ICB-G3L 4.14.85_C01571-15S01A01.000.zimg+35a84af5b7 #1 SMP
Mon Feb
22 20:57:38 UTC 2021 armv7l GN
U/Linux
root@ICB-G3L:~ # smokey --run=posix_mutex
[  694.433129] [Xenomai] bad syscall <0x197>
posix_mutex OK

I found this thread:

https://www.mail-archive.com/xenomai@xenomai.org/msg17931.html
<https://www.mail-archive.com/xenomai@xenomai.org/msg17931.html>

But I can't get a clear picture on the proper resolution of this
issue,
other than it's related to a non-existent syscall, perhaps glibc
version, and/or __real_usleep().  I'd like to get this working on
either
4.14 or 4.19, is there a simple workaround for this?

Thanks in advance,

Steve




I forgot all about this, I'll take a look tonight and see what's 
involved.  What gcc version are you using?


Thanks

Greg


Greg,

The compiler is gcc 9.3.0 using Yocto Dunfell.  glibc 2.31...




[Xenomai] bad syscall <0x197>

2021-02-24 Thread steve freyder via Xenomai

Greetings Xenomai list,


I have a Linux OE 4.14.85 build with Xenomai 3.1 -next branch, and am 
seeing this:


root@ICB-G3L:~ # uname -a
Linux ICB-G3L 4.14.85_C01571-15S01A01.000.zimg+35a84af5b7 #1 SMP Mon Feb 
22 20:57:38 UTC 2021 armv7l GN

U/Linux
root@ICB-G3L:~ # smokey --run=posix_mutex
[  694.433129] [Xenomai] bad syscall <0x197>
posix_mutex OK

I found this thread:

https://www.mail-archive.com/xenomai@xenomai.org/msg17931.html

But I can't get a clear picture on the proper resolution of this issue, 
other than it's related to a non-existent syscall, perhaps glibc 
version, and/or __real_usleep().  I'd like to get this working on either 
4.14 or 4.19, is there a simple workaround for this?


Thanks in advance,

Steve






Re: Mode Switch

2021-01-03 Thread steve freyder via Xenomai
OK, so then my theory about it having to do with launching a printf 
helper must be wrong (maybe that only happens on the original thread).  
I don't recall whether you had a flush call after your printf though...


On 1/3/2021 4:22 PM, Leandro Bucci wrote:
I understand, but it's strange because if I do a printf inside the 
task I always have MSW = 2.  Yes maybe Philippe can help.  Thank you too


Il dom 3 gen 2021, 23:16 steve freyder > ha scritto:


Might need some help from Philippe on this one but my thinking
says that thread creation happens in secondary mode, so there's
gotta be at least *one* mode switch on the way to becoming a
cobalt thread running in primary mode, perhaps the second one has
to do with launching the background printf() helper thread?



On 1/3/2021 4:08 PM, Leandro Bucci wrote:

But in the task I don't do any printf, how is it possible that
MSW = 2?

Il dom 3 gen 2021, 23:00 steve freyder mailto:st...@freyder.net>> ha scritto:

Each time I would do something like this:


printf(...) ;

fflush(stdout) ;

rt_task_sleep(1e9/5) ;

rt_task_inquire(...) ;


msw incremented by 1, csw would increment by 2.


On 1/3/2021 2:29 PM, Leandro Bucci via Xenomai wrote:

Hi, I have a strange behavior regarding the "mode switch".
In the attached code, the task should never switch to the Linux domain, 
but
instead I have a value of MSW = 2.
How is it possible?
Even if I do a printf in the task I always get MSW = 2.
I can't understand where the problem is.

#include 
#include 
#include 
#include 
#include 

RT_TASK task;
RT_TASK_INFO info;

void task_body(void *arg)
{
rt_task_inquire(NULL, &info);
}

int main()
{
int err;

err = rt_task_create(&task, "mytask", 0, 1, 0);
if (err != 0){
fprintf(stderr, "failed to create task\n");
exit(EXIT_FAILURE);
}

err = rt_task_start(&task, &task_body, NULL);
if (err != 0){
fprintf(stderr, "failed to start task\n");
exit(EXIT_FAILURE);
}

sleep(5); //sleep for 5 seconds

printf("mode switch = %d\n", (int)(info.stat.msw));

exit(EXIT_SUCCESS);
}




Re: Mode Switch

2021-01-03 Thread steve freyder via Xenomai
Might need some help from Philippe on this one but my thinking says that 
thread creation happens in secondary mode, so there's gotta be at least 
*one* mode switch on the way to becoming a cobalt thread running in 
primary mode, perhaps the second one has to do with launching the 
background printf() helper thread?




On 1/3/2021 4:08 PM, Leandro Bucci wrote:

But in the task I don't do any printf, how is it possible that MSW = 2?

Il dom 3 gen 2021, 23:00 steve freyder > ha scritto:


Each time I would do something like this:


printf(...) ;

fflush(stdout) ;

rt_task_sleep(1e9/5) ;

rt_task_inquire(...) ;


msw incremented by 1, csw would increment by 2.


On 1/3/2021 2:29 PM, Leandro Bucci via Xenomai wrote:

Hi, I have a strange behavior regarding the "mode switch".
In the attached code, the task should never switch to the Linux domain, but
instead I have a value of MSW = 2.
How is it possible?
Even if I do a printf in the task I always get MSW = 2.
I can't understand where the problem is.

#include 
#include 
#include 
#include 
#include 

RT_TASK task;
RT_TASK_INFO info;

void task_body(void *arg)
{
rt_task_inquire(NULL, &info);
}

int main()
{
int err;

err = rt_task_create(&task, "mytask", 0, 1, 0);
if (err != 0){
fprintf(stderr, "failed to create task\n");
exit(EXIT_FAILURE);
}

err = rt_task_start(&task, &task_body, NULL);
if (err != 0){
fprintf(stderr, "failed to start task\n");
exit(EXIT_FAILURE);
}

sleep(5); //sleep for 5 seconds

printf("mode switch = %d\n", (int)(info.stat.msw));

exit(EXIT_SUCCESS);
}




Re: Mode Switch

2021-01-03 Thread steve freyder via Xenomai

Each time I would do something like this:


printf(...) ;

fflush(stdout) ;

rt_task_sleep(1e9/5) ;

rt_task_inquire(...) ;


msw incremented by 1, csw would increment by 2.


On 1/3/2021 2:29 PM, Leandro Bucci via Xenomai wrote:

Hi, I have a strange behavior regarding the "mode switch".
In the attached code, the task should never switch to the Linux domain, but
instead I have a value of MSW = 2.
How is it possible?
Even if I do a printf in the task I always get MSW = 2.
I can't understand where the problem is.

#include 
#include 
#include 
#include 
#include 

RT_TASK task;
RT_TASK_INFO info;

void task_body(void *arg)
{
rt_task_inquire(NULL, &info);
}

int main()
{
int err;

err = rt_task_create(&task, "mytask", 0, 1, 0);
if (err != 0){
fprintf(stderr, "failed to create task\n");
exit(EXIT_FAILURE);
}

err = rt_task_start(&task, &task_body, NULL);
if (err != 0){
fprintf(stderr, "failed to start task\n");
exit(EXIT_FAILURE);
}

sleep(5); //sleep for 5 seconds

printf("mode switch = %d\n", (int)(info.stat.msw));

exit(EXIT_SUCCESS);
}


Re: domain switch

2021-01-02 Thread steve freyder via Xenomai

Right.


AKA, "mode switch", a switch from "primary mode" to "secondary mode", or 
vice versa.


One place you can find that information is in:

/proc/xenomai/sched/acct

there are two fields MSW, and CSW which count mode/context switches 
per-process. This requires an open, a read loop to locate the desired 
pid and extract the desired information, then either rewind or 
close/reopen to do it again - all of which will almost surely generate 
more mode/context switching.



On 1/2/2021 2:54 PM, Leandro Bucci via Xenomai wrote:

Hi, I wanted to know if there was a way to count the number of times a
domain switch happens.
For example the printf () function causes a domain switch, right?


Re: When the linux process could recieve the message transport through XDDP?

2020-05-12 Thread steve freyder via Xenomai

On 5/12/2020 1:37 AM, 孙世龙 wrote:

Hi,Steve:
     Thank you for the clarification.

    >>Will ADOES wake up the related linux process at once when the head
    >>domain write something to the XDDP node?* Or, the linux process
    >>has to wait for the schedule of linux kernel, if the processor is
    >>busy, it may wait for a long long time.
    >The Linux process cannot wake up immediately, otherwise what would
    >be the point of having an RT co-kernel if it didn't preempt the
    >non-RT environment.
     Yea,I can't agree  more.
     Imagine this scenario, your processor have 4 cores and you only 
have one
     RT process which has single thread.So, there are 3 cores could be 
used
     for the linux processes at least. What i am interested in is 
that*if *
*     the linux process wait to receive message through XDDP has to 
compete*

*     for the cpu resources** with so many other linux processes .*
     >If you're going to build code that makes a NRT process a subordinate
     >of an RT process, you're going to have to employ some non-standard
    >logic and get creative about how you deal with the inverted priority
    >relationship that implicitly exists between the RT/NRT schedulers
    >involved in that scenario.
    What do you mean by " employ  some non-standard logic"?Could you 
make me more clearlly?


     Thank you for your attention.
     Look forward to hearing from you.

steve freyder mailto:st...@freyder.net>> 
于2020年5月12日周二 上午11:43写道:


On 5/11/2020 9:16 PM, 孙世龙 via Xenomai wrote:

Hi,
  As i am using XDDP to transport messages bettween the head domain and
linux domain(i mean there are two processes, one is rt process, the other
is linux process.),I wonder that when the linux process could recieve the
message?
  *Will ADOES wake up the related linux process at once when the head
domain write something to the XDDP node?* Or, the linux process has to wait
for the schedule of linux kernel, if the processor is busy, it may wait for
a long long time.

  Thank you for your attention.
  Looking forward to hearing from you.


The Linux process cannot wake up immediately, otherwise what would
be the point of having an RT co-kernel if it didn't preempt the
non-RT environment.


If the RT process wishes to give the non-RT process an opportunity
to run in order to receive the just-sent message, it had better
relinquish the CPU.  Inserting an rt_sleep() call (or equivalent)
might be one (inelegant?) option, but certainly nothing in iPipe
or anything at that layer is going to magically solve that problem
for you.


If you're going to build code that makes a NRT process a
subordinate of an RT process, you're going to have to employ some
non-standard logic and get creative about how you deal with the
inverted priority relationship that implicitly exists between the
RT/NRT schedulers involved in that scenario.  XDDP is a
bidirectional data relay, not a scheduler.


Regards,

Steve


You're right, and I shouldn't have ignored the scenario that you pointed 
out where there are multiple CPUs available.  Certainly if there is more 
than one CPU, then it's possible for the Linux process on the NRT side 
of the XDDP socket to wake up immediately when the RT side sends data on 
the socket assuming that the NRT process is waiting for read completion 
and that it's the highest priority process on the Linux side.  To the 
extent that you can ensure those conditions are met, you'll be able to 
ensure the kind of performance you're looking for.





Re: When the linux process could recieve the message transport through XDDP?

2020-05-11 Thread steve freyder via Xenomai

On 5/11/2020 9:16 PM, 孙世龙 via Xenomai wrote:

Hi,
  As i am using XDDP to transport messages bettween the head domain and
linux domain(i mean there are two processes, one is rt process, the other
is linux process.),I wonder that when the linux process could recieve the
message?
  *Will ADOES wake up the related linux process at once when the head
domain write something to the XDDP node?* Or, the linux process has to wait
for the schedule of linux kernel, if the processor is busy, it may wait for
a long long time.

  Thank you for your attention.
  Looking forward to hearing from you.


The Linux process cannot wake up immediately, otherwise what would be 
the point of having an RT co-kernel if it didn't preempt the non-RT 
environment.



If the RT process wishes to give the non-RT process an opportunity to 
run in order to receive the just-sent message, it had better relinquish 
the CPU.  Inserting an rt_sleep() call (or equivalent) might be one 
(inelegant?) option, but certainly nothing in iPipe or anything at that 
layer is going to magically solve that problem for you.



If you're going to build code that makes a NRT process a subordinate of 
an RT process, you're going to have to employ some non-standard logic 
and get creative about how you deal with the inverted priority 
relationship that implicitly exists between the RT/NRT schedulers 
involved in that scenario.  XDDP is a bidirectional data relay, not a 
scheduler.



Regards,

Steve




Re: rt_task_bind blocking: inter-process ?

2020-02-21 Thread steve freyder via Xenomai

On 2/21/2020 10:10 AM, LOCHE Daniel via Xenomai wrote:

On 20/02/2020 à 22:36, steve freyder via Xenomai wrote:

On 2/20/2020 7:29 AM, LOCHE Daniel via Xenomai wrote:

Hello everyone,

I a currently factoring my application to go from a multi-thread one 
to a multi-process one (C++11).
I am using Xenomai 3.0.5 on Linux Mint 18.04 (on a Intel 8250U 
computer), 90% alchemy library (10% posix).
(either processes or tasks, let's call all of those "tasks" here 
after for easier understanding)


*Context : *Basically, during the setup phase of the execution, 
previously I was reading a list of "task" parameters from a file
and launched consequently the correct amount of rt_task_create() / 
launch() from my main process.
All my RT_TASK links to the threads were stored in the main thread 
to manage them.


Now, I have to go through a fork() for every task, followed by a 
rt_task_shadow(). OK.
But to get in my main process the link to the RT_TASK of every 
xenomai process created, I have to use rt_task_bind().
Issue is here : whatever arguments I give to rt_task_bind(), it 
blocks as long as the searched task name exists... don't know why.


Here is a sample code bellow ... what is wrong exactly ? Is it 
possible to use rt_task_bind with inter-process tasks..? Why the 
"TM_NONBLOCK" does nothing..? Thanks for your help.


Daniel

#include 
#include 
#include 
#include 

int main ()

{
    const char* taskNames[3] = {"task0", "task1", "task2"};
    for (int i = 0, i < 3, ++i)
    {
            pid_t pid = fork()
            if (pid) //
            {
                sleep(1); // wait a little before going through 
another fork.


        } else { // child process
            RT_TASK _t;
            rt_task_shadow(&_t, taskNames[i], 50, 0);
            rt_printf("Success creating task %s", taskNames[i]);
            sleep(15);        /* placeholder, should do real-time stuff
   here, then end process. */
            exit(EXIT_SUCCESS);
        }

   }

    // Only "main" process goes here.

    RT_TASK _t;
    for (int i = 0, i < 3, ++i)
    {
        int ret = rt_task_bind(&t, taskNames[i], TM_NONBLOCK); // 
<== HERE. TM_NONBLOCK, TM_INFINITE or a time value, changes    
nothing never goes here.

        if (ret) printf("Error : could not bind task. Error #%d", ret);
        else printf("Bind to task %s done.", taskNames[i]);
    }
    /* Do stuff here on the binded tasks */
    exit(EXIT_SUCCESS);

}


Daniel,


Are you using the --shared-registry and --session=foo switches when 
you run your program?  I don't believe there's any other way to get 
the registry (which is needed to register your task names so that 
rt_task_bind() can find them) shared across processes. The 
--shared-registry might be a default, but for sure the --session= 
switch is not a default - it defaults to an "anonymous session" which 
means that every process has its own registry - shared with nobody.



Also, calling fork() might be part of the problem, and if you're 
having rt_task_bind() problems then you definitely need to be 
checking the return code from your rt_task_shadow() calls.  Part of 
what happens there is putting the task name into the registry, so if 
that fails, your rt_task_bind() call will also have problems. Why it 
is hanging instead of returning -EAGAIN when you're using TM_NONBLOCK 
and the task doesn't exist, I don't know.



What you're attempting to do here does work if you:


1) disable auto-init using xeno-config --no-auto-init in your build 
flags


2) call xenomai_init() explicitly in the child processes before you 
call rt_task_shadow()


3) don't call xenomai_init() in the parent task until *after* you 
have fork()'ed the child processes



I made a copy of your program and modified it to use the no-auto-init 
option on the xeno-config invocation, attached.



Steve



-- next part --
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/*
* make sure everybody is in the same session and that
* we have registry sharing.
*/
int do_xeno_init(void)  {
   char *args[] = {
   "program",
   "--shared-registry",
   "--session=test",
   NULL
   } ;
   char **argv = args ;
   int argc = (sizeof args/sizeof args[0])-1 ; /* exclude NULL */

   xenomai_init(&argc,(char * const **)&argv) ;
}
#define XENO_INIT() do_xeno_init()

int main (int argc, char **argv)

{
   static char* taskNames[3] = {"task0", "task1", "task2"};

   setvbuf(stdout,NULL,_IOLBF,4096) ;

   for (int i = 0 ; i < 3 ; ++i)
   {
   pid_t pid = fork() ;

   if (pid < 0)  {
   perror("fork") ;
   continue ;
   }
   else if (pid)    // fork OK, we're 

Re: rt_task_bind blocking: inter-process ?

2020-02-20 Thread steve freyder via Xenomai

On 2/20/2020 7:29 AM, LOCHE Daniel via Xenomai wrote:

Hello everyone,

I a currently factoring my application to go from a multi-thread one 
to a multi-process one (C++11).
I am using Xenomai 3.0.5 on Linux Mint 18.04 (on a Intel 8250U 
computer), 90% alchemy library (10% posix).
(either processes or tasks, let's call all of those "tasks" here after 
for easier understanding)


*Context : *Basically, during the setup phase of the execution, 
previously I was reading a list of "task" parameters from a file
and launched consequently the correct amount of rt_task_create() / 
launch() from my main process.
All my RT_TASK links to the threads were stored in the main thread to 
manage them.


Now, I have to go through a fork() for every task, followed by a 
rt_task_shadow(). OK.
But to get in my main process the link to the RT_TASK of every xenomai 
process created, I have to use rt_task_bind().
Issue is here : whatever arguments I give to rt_task_bind(), it blocks 
as long as the searched task name exists... don't know why.


Here is a sample code bellow ... what is wrong exactly ? Is it 
possible to use rt_task_bind with inter-process tasks..? Why the 
"TM_NONBLOCK" does nothing..? Thanks for your help.


Daniel

#include 
#include 
#include 
#include 

int main ()

{
    const char* taskNames[3] = {"task0", "task1", "task2"};
    for (int i = 0, i < 3, ++i)
    {
            pid_t pid = fork()
            if (pid) //
            {
                sleep(1); // wait a little before going through 
another fork.


        } else { // child process
            RT_TASK _t;
            rt_task_shadow(&_t, taskNames[i], 50, 0);
            rt_printf("Success creating task %s", taskNames[i]);
            sleep(15);        /* placeholder, should do real-time stuff
   here, then end process. */
            exit(EXIT_SUCCESS);
        }

   }

    // Only "main" process goes here.

    RT_TASK _t;
    for (int i = 0, i < 3, ++i)
    {
        int ret = rt_task_bind(&t, taskNames[i], TM_NONBLOCK); // <== 
HERE. TM_NONBLOCK, TM_INFINITE or a time value, changes    nothing 
never goes here.

        if (ret) printf("Error : could not bind task. Error #%d", ret);
        else printf("Bind to task %s done.", taskNames[i]);
    }
    /* Do stuff here on the binded tasks */
    exit(EXIT_SUCCESS);

}


Daniel,


Are you using the --shared-registry and --session=foo switches when you 
run your program?  I don't believe there's any other way to get the 
registry (which is needed to register your task names so that 
rt_task_bind() can find them) shared across processes.  The 
--shared-registry might be a default, but for sure the --session= switch 
is not a default - it defaults to an "anonymous session" which means 
that every process has its own registry - shared with nobody.



Also, calling fork() might be part of the problem, and if you're having 
rt_task_bind() problems then you definitely need to be checking the 
return code from your rt_task_shadow() calls.  Part of what happens 
there is putting the task name into the registry, so if that fails, your 
rt_task_bind() call will also have problems. Why it is hanging instead 
of returning -EAGAIN when you're using TM_NONBLOCK and the task doesn't 
exist, I don't know.



What you're attempting to do here does work if you:


1) disable auto-init using xeno-config --no-auto-init in your build flags

2) call xenomai_init() explicitly in the child processes before you call 
rt_task_shadow()


3) don't call xenomai_init() in the parent task until *after* you have 
fork()'ed the child processes



I made a copy of your program and modified it to use the no-auto-init 
option on the xeno-config invocation, attached.



Steve



-- next part --
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/*
* make sure everybody is in the same session and that
* we have registry sharing.
*/
int do_xeno_init(void)  {
   char *args[] = {
   "program",
   "--shared-registry",
   "--session=test",
   NULL
   } ;
   char **argv = args ;
   int argc = (sizeof args/sizeof args[0])-1 ; /* exclude NULL */

   xenomai_init(&argc,(char * const **)&argv) ;
}
#define XENO_INIT() do_xeno_init()

int main (int argc, char **argv)

{
   static char* taskNames[3] = {"task0", "task1", "task2"};

   setvbuf(stdout,NULL,_IOLBF,4096) ;

   for (int i = 0 ; i < 3 ; ++i)
   {
   pid_t pid = fork() ;

   if (pid < 0)  {
   perror("fork") ;
   continue ;
   }
   else if (pid)// fork OK, we're parent
   {
   //sleep(1); // wait a little before going through another fork.
   } else { // child process
   RT_TASK _t;
   int ret = 0 ;

   printf("calling xenomai_init()\n") ;
   XENO_INIT() ;
   rt_printf("process for %s running, pid %lu\n",
   taskNames[i],getpid()) ;
   ret = rt

Re: xenomai app running under gdb hangs on 3.1/arm32

2020-02-19 Thread steve freyder via Xenomai

On 2/19/2020 9:36 AM, Jan Kiszka wrote:

On 19.02.20 15:58, steve freyder via Xenomai wrote:

On 2/19/2020 4:02 AM, Jan Kiszka wrote:

On 19.02.20 09:56, Jan Kiszka via Xenomai wrote:

On 18.02.20 23:33, steve freyder via Xenomai wrote:

Hello again Xenomai group,


Was testing gdb on xenomai 3.1, ran into this (Xenomai 3.1 on 
armv7-a, iMX6-dual-lite)


root@g3l-21:~ # gdb /usr/bin/rtcanrecv
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-emac-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/rtcanrecv...
(No debugging symbols found in /usr/bin/rtcanrecv)
(gdb) break main
Breakpoint 1 at 0x10d50
(gdb) run --trace=10
Starting program: /usr/bin/rtcanrecv --trace=10
warning: Unable to find libthread_db matching inferior's thread 
library, thread debugging will not be available.

--  cold init from program
--  cobalt->init()
--  connected to Cobalt
--  memory locked
--  memory heaps mapped
[New LWP 1931]
--  boilerplate->init()

-- copperplate->init()


[ at this point the program is hung, so press Control-C ]

^C
Thread 1 "rtcanrecv" received signal SIGINT, Interrupt.
0x76fca4a0 in __cobalt_pthread_mutex_lock (mutex=0x484c20c8 
) at 
/home/sdf/xenobuild/imx-xenomai/xenomai-3.1/lib/cobalt/mutex.c:375
375 ret = 
XENOMAI_SYSCALL1(sc_cobalt_mutex_lock, _mutex);

(gdb)

So this appears to be hung taking the mutex on "heapmem_main".  
I'm not running any other Xenomai programs on this system when 
this happens so I presume it's locked by another thread in this 
program but we haven't even made it to main() yet.



This is a stock 3.1 build using the stock /usr/bin/rtcanrecv test 
program.



Any thoughts?



Trying to reproduce via qemu-armhf target of xenomai-images. Are 
you on kernel 4.19.55?




It's not reproducible in that qemu target with xeno_can_virt 
devices. Can you specify the differences in your config?


Jan


Jan,


My kernel is 4.14.85.  Below is my Xenomai configuration from 
--dump-config, and attached is the config.gz from the Linux kernel.


Could you try getting closer with your setup to what we support (4.14 
is EOL, e.g., compiler is very old - all that might be unrelated, but 
we can't test them all)?


If there are major technical hurdles for that, I need at least some 
gdb backtrace from the hanging bootup to give further hints where to 
look at or actually try to reproduce here.


My problem: ARM (and ARM64) is really best-effort for me, and we 
currently lack someone in the Xenomai community to look after it 
(them) after Philippe stepped back.


Jan






based on Xenomai/cobalt v3.1
CONFIG_MMU=1
CONFIG_SMP=1
CONFIG_XENO_BUILD_ARGS=" 'CFLAGS=-march=armv7-a -g -O0' 
'--with-core=cobalt' '--enable-smp' '--enable-pshared' 
'--host=arm-emac-linux-gnueabi' 'host_alias=arm-emac-linux-gnueabi' 
'CC=arm-emac-linux-gnueabi-gcc  -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp 
--sysroot=/opt/emac/5.1/sysroots/armv7a-neon-emac-linux-gnueabi' 
'LDFLAGS=-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed' 'CPPFLAGS=' 
'CPP=arm-emac-linux-gnueabi-gcc -E  -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp 
--sysroot=/opt/emac/5.1/sysroots/armv7a-neon-emac-linux-gnueabi' 
'PKG_CONFIG_PATH=/opt/emac/5.1/sysroots/armv7a-neon-emac-linux-gnueabi/usr/lib/pkgconfig'" 


CONFIG_XENO_BUILD_STRING="x86_64-pc-linux-gnu"
CONFIG_XENO_COBALT=1
CONFIG_XENO_COMPILER="gcc version 5.3.0 (GCC) "
CONFIG_XENO_DEFAULT_PERIOD=100
CONFIG_XENO_FORTIFY=1
CONFIG_XENO_HEAPMEM=1
CONFIG_XENO_HOST_STRING="arm-emac-linux-gnueabi"
CONFIG_XENO_LORES_CLOCK_DISABLED=1
CONFIG_XENO_PREFIX="/usr/xenomai"
CONFIG_XENO_PSHARED=1
CONFIG_XENO_RAW_CLOCK_ENABLED=1
CONFIG_XENO_REVISION_LEVEL=
CONFIG_XENO_SANITY=1
CONFIG_XENO_TLS_MODEL="initial-exec"
CONFIG_XENO_UAPI_LEVEL=15
CONFIG_XENO_VERSION_MAJOR=3
CONFIG_XENO_VERSION_MINOR=1
CONFIG_XENO_VERSION_STRING="3.1"
---
CONFIG_XENO_ASYNC_CANCEL is OFF
CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED is OFF
CONFIG_XENO_DEBUG is OFF
CONFIG_XENO_DEBUG_FULL is OFF
CONFIG_

Re: xenomai app running under gdb hangs on 3.1/arm32

2020-02-19 Thread steve freyder via Xenomai

On 2/19/2020 4:02 AM, Jan Kiszka wrote:

On 19.02.20 09:56, Jan Kiszka via Xenomai wrote:

On 18.02.20 23:33, steve freyder via Xenomai wrote:

Hello again Xenomai group,


Was testing gdb on xenomai 3.1, ran into this (Xenomai 3.1 on 
armv7-a, iMX6-dual-lite)


root@g3l-21:~ # gdb /usr/bin/rtcanrecv
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-emac-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
 <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/rtcanrecv...
(No debugging symbols found in /usr/bin/rtcanrecv)
(gdb) break main
Breakpoint 1 at 0x10d50
(gdb) run --trace=10
Starting program: /usr/bin/rtcanrecv --trace=10
warning: Unable to find libthread_db matching inferior's thread 
library, thread debugging will not be available.

--  cold init from program
--  cobalt->init()
--  connected to Cobalt
--  memory locked
--  memory heaps mapped
[New LWP 1931]
--  boilerplate->init()

-- copperplate->init()


[ at this point the program is hung, so press Control-C ]

^C
Thread 1 "rtcanrecv" received signal SIGINT, Interrupt.
0x76fca4a0 in __cobalt_pthread_mutex_lock (mutex=0x484c20c8 
) at 
/home/sdf/xenobuild/imx-xenomai/xenomai-3.1/lib/cobalt/mutex.c:375
375 ret = XENOMAI_SYSCALL1(sc_cobalt_mutex_lock, 
_mutex);

(gdb)

So this appears to be hung taking the mutex on "heapmem_main".  I'm 
not running any other Xenomai programs on this system when this 
happens so I presume it's locked by another thread in this program 
but we haven't even made it to main() yet.



This is a stock 3.1 build using the stock /usr/bin/rtcanrecv test 
program.



Any thoughts?



Trying to reproduce via qemu-armhf target of xenomai-images. Are you 
on kernel 4.19.55?




It's not reproducible in that qemu target with xeno_can_virt devices. 
Can you specify the differences in your config?


Jan


Jan,


My kernel is 4.14.85.  Below is my Xenomai configuration from 
--dump-config, and attached is the config.gz from the Linux kernel.





based on Xenomai/cobalt v3.1
CONFIG_MMU=1
CONFIG_SMP=1
CONFIG_XENO_BUILD_ARGS=" 'CFLAGS=-march=armv7-a -g -O0' 
'--with-core=cobalt' '--enable-smp' '--enable-pshared' 
'--host=arm-emac-linux-gnueabi' 'host_alias=arm-emac-linux-gnueabi' 
'CC=arm-emac-linux-gnueabi-gcc  -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp 
--sysroot=/opt/emac/5.1/sysroots/armv7a-neon-emac-linux-gnueabi' 
'LDFLAGS=-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed' 'CPPFLAGS=' 
'CPP=arm-emac-linux-gnueabi-gcc -E  -march=armv7-a -mfpu=neon  
-mfloat-abi=softfp 
--sysroot=/opt/emac/5.1/sysroots/armv7a-neon-emac-linux-gnueabi' 
'PKG_CONFIG_PATH=/opt/emac/5.1/sysroots/armv7a-neon-emac-linux-gnueabi/usr/lib/pkgconfig'"

CONFIG_XENO_BUILD_STRING="x86_64-pc-linux-gnu"
CONFIG_XENO_COBALT=1
CONFIG_XENO_COMPILER="gcc version 5.3.0 (GCC) "
CONFIG_XENO_DEFAULT_PERIOD=100
CONFIG_XENO_FORTIFY=1
CONFIG_XENO_HEAPMEM=1
CONFIG_XENO_HOST_STRING="arm-emac-linux-gnueabi"
CONFIG_XENO_LORES_CLOCK_DISABLED=1
CONFIG_XENO_PREFIX="/usr/xenomai"
CONFIG_XENO_PSHARED=1
CONFIG_XENO_RAW_CLOCK_ENABLED=1
CONFIG_XENO_REVISION_LEVEL=
CONFIG_XENO_SANITY=1
CONFIG_XENO_TLS_MODEL="initial-exec"
CONFIG_XENO_UAPI_LEVEL=15
CONFIG_XENO_VERSION_MAJOR=3
CONFIG_XENO_VERSION_MINOR=1
CONFIG_XENO_VERSION_STRING="3.1"
---
CONFIG_XENO_ASYNC_CANCEL is OFF
CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED is OFF
CONFIG_XENO_DEBUG is OFF
CONFIG_XENO_DEBUG_FULL is OFF
CONFIG_XENO_LAZY_SETSCHED is OFF
CONFIG_XENO_LIBS_DLOPEN is OFF
CONFIG_XENO_MERCURY is OFF
CONFIG_XENO_REGISTRY is OFF
CONFIG_XENO_REGISTRY_ROOT is OFF
CONFIG_XENO_TLSF is OFF
CONFIG_XENO_VALGRIND_API is OFF
CONFIG_XENO_WORKAROUND_CONDVAR_PI is OFF
CONFIG_XENO_X86_VSYSCALL is OFF
---
PTHREAD_STACK_DEFAULT=65536
AUTOMATIC_BOOTSTRAP=1

-- next part --
A non-text attachment was scrubbed...
Name: config.gz
Type: application/x-gzip
Size: 25269 bytes
Desc: not available
URL: 
<http://xenomai.org/pipermail/xenomai/attachments/20200219/5405fc9a/attachment.bin>


xenomai app running under gdb hangs on 3.1/arm32

2020-02-18 Thread steve freyder via Xenomai

Hello again Xenomai group,


Was testing gdb on xenomai 3.1, ran into this (Xenomai 3.1 on armv7-a, 
iMX6-dual-lite)


root@g3l-21:~ # gdb /usr/bin/rtcanrecv
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-emac-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/rtcanrecv...
(No debugging symbols found in /usr/bin/rtcanrecv)
(gdb) break main
Breakpoint 1 at 0x10d50
(gdb) run --trace=10
Starting program: /usr/bin/rtcanrecv --trace=10
warning: Unable to find libthread_db matching inferior's thread library, 
thread debugging will not be available.

--  cold init from program
--  cobalt->init()
--  connected to Cobalt
--  memory locked
--  memory heaps mapped
[New LWP 1931]
--  boilerplate->init()

-- copperplate->init()


[ at this point the program is hung, so press Control-C ]

^C
Thread 1 "rtcanrecv" received signal SIGINT, Interrupt.
0x76fca4a0 in __cobalt_pthread_mutex_lock (mutex=0x484c20c8 
) at 
/home/sdf/xenobuild/imx-xenomai/xenomai-3.1/lib/cobalt/mutex.c:375
375 ret = XENOMAI_SYSCALL1(sc_cobalt_mutex_lock, 
_mutex);

(gdb)

So this appears to be hung taking the mutex on "heapmem_main".  I'm not 
running any other Xenomai programs on this system when this happens so I 
presume it's locked by another thread in this program but we haven't 
even made it to main() yet.



This is a stock 3.1 build using the stock /usr/bin/rtcanrecv test program.


Any thoughts?


Thanks,

Steve





3.1 rt_imx_uart.c regression

2020-02-09 Thread steve freyder via Xenomai

Hello Xenomai group.


We appear to have a regression in the rt_imx_uart driver in Xenomai 
3.1.  The problem didn't exist in 3.0.7, and I believe it did exist in 
3.0.10.  Between those two I did not try to narrow it down.  I 
discovered this when connecting a UART on a 3.1 system to a UART on a 
3.0.7 system, both being IMX6 Dual-Lite SOM's, and the cross-connected 
UARTs would not intercommunicate (100% framing errors).  A simple TX/RX 
loopback test on either system was successful, but the cross-connect 
(like cross-link.c might use) was unsuccessful with a mixed 3.0.7 <--> 
3.1 environment.



I traced this to the setting (or not setting) of the UCR2_SRST bit in 
the rt_imx_uart_set_config() routine.  The IMX6 TRM defines this bit as 
SRST_B (B as in 'bar' as in inverted) and so setting this bit means *no 
software reset", so not setting it results in a software reset, which 
results in leaving the UART clock divisors at default and the resulting 
rate is 80mhz.  The user-selected (via ioctl()) baud rate and other 
parameters therefore have no effect.



The patch below fixes this problem.  I hope I have the proper format.


There may be another "benign" issue with this driver, but I would defer 
that to someone more familiar with it.  Specifically, it's with the 
routine rt_imx_uart_setup_ufcr(), whose function appears to be 
duplicated in rt_imx_uart_set_config().  In the driver init, 
rt_imx_uart_setup_ufcr() is called after the call to 
rt_imx_uart_set_config(), but in another place it is not called after 
the set_config() call.  I think rt_imx_uart_setup_ufcr() should be 
removed along with the call to it during initialization but the patch 
below does not include that mod.



Finally, the fact that this made it through testing brings up the test 
methodology for UART driver validation as concerns "character timing".  
With this kind of bug, simple loopback tests probably pass.  In a short 
test program that I built, I had code to read the nanosecond clock 
before initiating transmission, and then again after reception 
completed, and displaying the delta time.  Using that, the problem 
showed itself immediately (the delta ns for transmission of a single 
character was between 40K and 100K, much too fast given my 
ioctl(SET_CONFIG) had requested 9600 baud).  I wonder if it's worth 
adding a similar check to the testing logic.



Best regards,

Steve



==

diff --git a/kernel/drivers/serial/rt_imx_uart.c 
b/kernel/drivers/serial/rt_imx_uart.c

index 2b27b7d..6894c1c 100644
--- a/kernel/drivers/serial/rt_imx_uart.c
+++ b/kernel/drivers/serial/rt_imx_uart.c
@@ -703,9 +703,9 @@ static int rt_imx_uart_set_config(struct 
rt_imx_uart_ctx *ctx,

    uint64_t tdiv64;

    if (ctx->config.data_bits == RTSER_8_BITS)
-   ucr2 = UCR2_WS | UCR2_IRTS;
+   ucr2 = UCR2_WS | UCR2_SRST | UCR2_IRTS;
    else
-   ucr2 = UCR2_IRTS;
+   ucr2 = UCR2_SRST | UCR2_IRTS;

    if (ctx->config.handshake == RTSER_RTSCTS_HAND) {
    if (port->have_rtscts) {

==




Re: undefined reference to `rt_task_create

2019-12-05 Thread steve freyder via Xenomai

On 12/5/2019 1:22 PM, Jan Kiszka via Xenomai wrote:

On 05.12.19 08:50, Electronic Dept. - Vbm Group via Xenomai wrote:

Hello Jan,
thank you for your quick answer. Adding -lalchemy manually, I get the
following:

There is more missing. What does "xeno-config --skin=alchemy --ldflags"
report for you?

Jan


gcc -o cyclic_test cyclic_test.c -I/usr/xenomai/include/cobalt
-I/usr/xenomai/include -march=armv7-a -mfpu=vfp3 -D_GNU_SOURCE
-D_REENTRANT -fasynchronous-unwind-tables -D__COBALT__
-Wl,--no-as-needed -Wl,@/usr/xenomai/lib/modechk.wrappers
/usr/xenomai/lib/xenomai/bootstrap.o -Wl,--wrap=main
-Wl,--dynamic-list=/usr/xenomai/lib/dynlist.ld -L/usr/xenomai/lib
-lcobalt -lalchemy -lmodechk -lpthread -lrt -march=armv7-a -mfpu=vfp3
/tmp/cc8rjnqk.o: In function `rt_timer_read':
cyclic_test.c:(.text+0xe): undefined reference to `clockobj_get_time'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`threadobj_set_schedparam'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_wait_drain'
/usr/xenomai/lib/libalchemy.so: undefined reference to `timerobj_destroy'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_uninit'
/usr/xenomai/lib/libalchemy.so: undefined reference to `pvcluster_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_stat'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_unblock'
/usr/xenomai/lib/libalchemy.so: undefined reference to `semobj_destroy'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`threadobj_wait_period'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_wait_grant'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`threadobj_notify_entry'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_cancel'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_unlock'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`pvsyncluster_addobj'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_lock'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_tskey'
/usr/xenomai/lib/libalchemy.so: undefined reference to `__threadobj_alloc'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`__syncobj_broadcast_drain'
/usr/xenomai/lib/libalchemy.so: undefined reference to `pvsyncluster_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `clockobj_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `timerobj_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `timerobj_start'
/usr/xenomai/lib/libalchemy.so: undefined reference to `eventobj_inquire'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_suspend'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`pvsyncluster_delobj'
/usr/xenomai/lib/libalchemy.so: undefined reference to `semobj_getvalue'
/usr/xenomai/lib/libalchemy.so: undefined reference to `semobj_broadcast'
/usr/xenomai/lib/libalchemy.so: undefined reference to `eventobj_uninit'
/usr/xenomai/lib/libalchemy.so: undefined reference to `pvcluster_addobj'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_spin'
/usr/xenomai/lib/libalchemy.so: undefined reference to `eventobj_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_grant_one'
/usr/xenomai/lib/libalchemy.so: undefined reference to `semobj_post'
/usr/xenomai/lib/libalchemy.so: undefined reference to `semobj_init'
/usr/xenomai/lib/libalchemy.so: undefined reference to `timerobj_stop'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`clockobj_convert_clocks'
/usr/xenomai/lib/libalchemy.so: undefined reference to `eventobj_wait'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`copperplate_create_thread'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_uninit'
/usr/xenomai/lib/libalchemy.so: undefined reference to `semobj_wait'
/usr/xenomai/lib/libalchemy.so: undefined reference to `eventobj_destroy'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_set_mode'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_destroy'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_prologue'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`__syncobj_broadcast_grant'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_peek_drain'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`threadobj_set_periodic'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_sleep'
/usr/xenomai/lib/libalchemy.so: undefined reference to `threadobj_resume'
/usr/xenomai/lib/libalchemy.so: undefined reference to `syncobj_peek_grant'
/usr/xenomai/lib/libalchemy.so: undefined reference to `pvcluster_delobj'
/usr/xenomai/lib/libalchemy.so: undefined reference to
`pvsyncluster_findobj'
/usr/xenomai/lib/libalchemy.so: undefined reference to `eventobj_clear'
/usr/xenomai/lib/libalchemy.so: undefined referenc

Re: rt_task_inquire() equivalent for POSIX ?

2019-11-07 Thread Steve Freyder via Xenomai

On 11/6/2019 3:04 PM, Pierre FICHEUX via Xenomai wrote:

Hi,

Is there a way to get the Xenomai thread "pid" with POSIX API ? I didn't
see anything in the "thread management" section.

thx


Hi,

This file:

https://xenomai.org/documentation/xenomai-3/pdf/xeno3prm.pdf

makes several references to pthread_self() which implies that's the way 
to do what you are looking for here.


However (to your point), it doesn't explicitly document (in section 6.60 
Thread management) pthread_self() itself, whereas this file:


https://xenomai.org/documentation/xenomai-2.6/pdf/posix-api.pdf

in section 3.12.3 documents pthread_self() along with pthread_create() 
and the other "POSIX thread management" functions.


Perhaps when the 3.0 documentation was written, there was a decision 
made to not document any of the "standard POSIX thread management" 
functions unless there was also an RT variant of the function (sitting 
behind a wrapper) which had behaviour beyond what one could expect to 
find in the standard (non RT) POSIX documentation.  Perhaps there's a 
statement to that effect somewhere in the Xenomai-3 documentation, and 
if not, maybe there should be.


Regards,
Steve




--print-buffer-syncdelay, lib/cobalt/init.c:cobalt_help()

2019-05-02 Thread Steve Freyder via Xenomai

Greetings,

I had a dev complain about losing stdio output in a Xenomai app, and
had never looked at rt_printf() and friends before.  I had
problems attempting to use --print-buffer-syncdelay=1 to reduce the
delay to 1(ms), I found that my programs started complaining about an
invalid switch.  Found that cobalt_help() was out of sync with the
actual switch name definition --print-sync-delay.

Here's a lame attempt at a patch.

Regards,
Steve



lib/cobalt:

diff -Naur a/init.c b/init.c
--- a/init.c2018-06-15 10:26:19.692062099 -0500
+++ b/init.c2019-05-02 11:16:07.894454047 -0500
@@ -349,7 +349,7 @@
 fprintf(stderr, "--main-prio=main thread 
priority\n");
 fprintf(stderr, "--print-buffer-size=   size of a print 
relay buffer (16k)\n");
 fprintf(stderr, "--print-buffer-count= number of print 
relay buffers (4)\n");
-fprintf(stderr, "--print-buffer-syncdelay= max delay of 
output synchronization (100 ms)\n");
+fprintf(stderr, "--print-sync-delay=   max delay of 
output synchronization (100 ms)\n");

 }

 static struct setup_descriptor cobalt_interface = {





Re: having problems with daemonizing

2019-04-26 Thread Steve Freyder via Xenomai

On 4/26/2019 4:18 PM, Lowell Gilbert via Xenomai wrote:

Hi.

I have an application working successfully with Xenomai 3.0.8 on a 4.14
kernel. I use Yocto to build the system; when I tried to move to a newer
version of Yocto, my application hung on trying to become a daemon. This
is happening with the daemon() call (which is what I've used up to now)
and with fork().

I built a test application so that I could confirm that this problem
only occurs when I link (and wrap) with Xenomai. However, Xenomai
doesn't seem to do anything significant with fork, so I'm puzzled about
why this might be happening. I am not using libdaemon.

Here are the changes that I thought might be significant:
| newer (nonworking setup)  | older (working) |
| gcc-cross-arm-8.2.0   |   7.3.0 |
| glibc-2.28|2.26 |
| glib-2.0-1_2.58.0 | 1_2.52.3-r0 |
| binutils-cross-arm-2.31.1 |  2.29.1 |
| coreutils-8.30|8.27 |

Does anything jump out as a candidate for causing problems with a fork()
call? Is there anything else I should be considering?

Thanks.

Be well.


I can tell you that I have a hang issue due to fork() in a Xenomai
program if, after the fork(), I don't do an exec().  I believe
the hang is related to registry access, and the fact that the
Unix domain socket connecting to sysregd that is inherited by
the forked process (which has FD_CLOEXEC set) hasn't yet gotten
closed (no exec() yet so no action on FD_CLOEXEC flags yet).

If you are running into the same problem, and you don't require
registry access, you should see the problem go away if you throw
the --no-registry switch on the command line that invokes your
program.  That's not a real fix, but it's perhaps a way to know
if you're seeing a related problem.

In my case, the way I see the "hang" is via an attempt to list
the contents of /run/xenomai using find:

root:~ # find /run/xenomai

If I run a program XX that uses the registry, that does a fork() call
and then does not exec(), and while that program is running, I
execute the above find command, it will hang part way through the
listing.  If I kill program XX, the listing continues (un-hangs).

If I run a program that uses the registry, that does a fork() and
then an exec(), no such hang occurs during the find command.

Phillipe made the change to fix this originally by adding SOCK_CLOEXEC
to the socket() call in sysreg.c, and it did fix it but I realized
much later it fixes it only if you actually call exec(), which in my
code I always do, but more recently one of our developers had some
code that didn't exec(), which was the first time I saw this hang.

Phillipe, I had it on my list to ask you about this but it hasn't
bitten me lately and I forgot until I saw this msg about fork().

I think deamonizing in its canonical form of: fork(), let the forked
process take over, and then exit() in the parent, is problematic when
you have a wrapped main() where the wrappers already initialized the
sysreg mechanism but the process that was done for is now gone, and
the fork()'ed process has no idea it has a sysreg socket in hand.

Perhaps the better answer when daemonizing is to use --no-init and then
have the forked() process do manual xenomai_init() call?

HTH,

Regards,
Steve





Re: rt_dev_send() stalls periodic task

2019-04-22 Thread Steve Freyder via Xenomai

On 4/22/2019 5:56 PM, C Smith wrote:
Please don't think the cross-link.c app config has the magic answer. 
Changing RX timeouts to prevent TX stalls would be an open loop hack 
that might fail the serial traffic jitters differently.   The most 
suspicious difference between the two apps is that : cross-link.c 
behaves very regularly in terms of timing, because it receives its own 
transmissions.   My app receives packets from another computer 
asynchronously (full duplex). So there is likely a random timing 
anomaly causing the problem.  Any properly working UART and driver 
should be able to handle this, National Semiconductor would say. But I 
have a strange serial port, and it is tricking the xeno_16550A driver.


Jan alluded to a buffer-filling race condition in his comment.
I also fear that Receive interrupt handlers are somehow clearing 
Transmit interrupts?


I didn't look at the stack trace when my app hung at startup due to 
that infinite RX config. BTW there was no serial traffic whatsover 
during this hang. I could try it again and look at the stack...
Also, it would be too hard to rewrite my apps to send single bytes. 
Its two computers, rigid packet protocols & CRCs etc. Lots and lots of 
code. Thats why I did that test app.


-C Smith

Anything that can prove that there's no hardware problem causing
lost TX interrupts is good.  I remember you said that you had three
serial ports, and one never failed like this, the other two *do*
fail like this.  Isn't that true?  That seems important.



Re: rt_dev_send() stalls periodic task

2019-04-22 Thread Steve Freyder via Xenomai

On 4/22/2019 2:51 PM, Steve Freyder via Xenomai wrote:

On 4/22/2019 1:45 AM, Jan Kiszka wrote:

On 22.04.19 08:40, C Smith via Xenomai wrote:

Thanks for your insight, Steve. I didn't realize rt_dev_write() doesnt
actually stall until it is called many times and the 4K TX buffer gets
full. (is that right Jan?)
It that is the case, sure I could find a way to check the TX buffer 
fill

level to prevent my app from stalling.

I rewrote the xeno_16550A driver RTSER_RTIOC_GET_STATUS ioctl to 
return to

userspace the contents of the IIR and the IER too.
I'm getting IIR = 0b 0001 0100, so the source of the latest 
interrupt is a

RX (not surprising, as I'm doing full duplex) and there is no THRE
interrupt pending.
So regardless of the ultimate cause, this state will never empty the TX
buffer.

I think my only choice is to try something I had to do once before on a
similarly misbehaving serial port: I'll rewrite the xeno_16550A 
interrupt
handlers to redundantly check for data pending in the TX buffer 
whenever
any interrupt like an RX interrupt happens. I do have bidirectional 
traffic

after all, so the driver will wake up frequently and keep the TX data
transmitting.

Interesting enough, the stall problem did not occur when I used the 
sample
serial code provided by xenomai: cross-link.c . I also rewrote 
cross-link.c

to send a 72 byte packet and receive on the same port (I installed a
physical loopback device on the serial port). No stalls for 12+ 
hours with

packets streaming at 100 Hz.
The only difference in the serial configuration between that 
cross-link.c

app and my app was :
struct rtser_config :
 .rx_timeout= RTSER_DEF_TIMEOUT  // infinite , no 
stall for

many hours in cross-link.c
versus:
 .rx_timeout= 50   // 500us, stalls within an 
hour in my

app
I don't know why an RX setting affects TX behavior. I also can't use
RTSER_DEF_TIMEOUT in my application or it dies when it starts up - 
no clue

why.  But I did try setting
   .rx_timeout= 500   // 5 ms. my app doesnt stall for 
several

hours
and though that did not cause the serial to stall in my app for several
hours of testing, it is just open-loop finger-crossing, and not a real
solution.
I need the TX interrupts to fire reliably. So I think I must rewrite 
that

interrupt handler, as above.



I think we have a race between rt_16550_write filling the software 
queue that

the tx interrupt is supposed to write out and the latter already firing,
consuming that event without seeing the queue filled. I'll think 
about a better
algorithm tomorrow, one that can possibly get rid of some interrupt 
events as well.


Jan


Greetings again,

If cross-link.c is not stalling, but the CSmith application hangs on
startup when using similar settings to what cross-link.c is using, it
tells me that understanding why this "hang on startup" is happening
would be a good idea.  I know this has happened to me when I got an
event from a UART that my code did not handle, and because I did not
handle it, the event continued to fire over and over - a hang. I
theorized that perhaps there's an issue with there being stale data
or a data overrun condition that exists when the app starts up that's
causing this hang.  In either case, it sounds as though the difference
in settings between CSmith app and cross-link.c might be a key factor.

I went back to the previous email trail, and if I interpreted it
correctly, the overall data rate is only about 80% of 115Kbaud. This
suggests that every time there is a write, the 4K software buffer in
the driver should be completely empty - as should the TX FIFO. The
only time that won't be true is when the transmit processing got
stalled (by loss of interrupt, or whatever).

I would be interested to see what happens if the CSmith app
were to be modified to write one byte at a time, with no delay
between rt_dev_write() calls.

Finally, some searching shows that back when the original National
Semiconductor 16550[A] UARTs were first being "cloned" by other
vendors, National created a program called "COMTEST" that was
designed to reveal the "misbehaviour" of those competing chips by
doing extensive testing of the timing and other characteristics and
how it deviated from "the real thing".  I wonder if anyone in this
group knows where a copy of that program (or a more modern version)
might exist?

Regards,
Steve



Apologies, I said "hangs on startup" but the original statement was
"dies on startup".  So the theory was that if that were fixed, and
the timeout was RTSER_DEF_TIMEOUT like it is in cross-link.c, that
this might solve the problem.




Re: rt_dev_send() stalls periodic task

2019-04-22 Thread Steve Freyder via Xenomai

On 4/22/2019 1:45 AM, Jan Kiszka wrote:

On 22.04.19 08:40, C Smith via Xenomai wrote:

Thanks for your insight, Steve. I didn't realize rt_dev_write() doesnt
actually stall until it is called many times and the 4K TX buffer gets
full. (is that right Jan?)
It that is the case, sure I could find a way to check the TX buffer fill
level to prevent my app from stalling.

I rewrote the xeno_16550A driver RTSER_RTIOC_GET_STATUS ioctl to 
return to

userspace the contents of the IIR and the IER too.
I'm getting IIR = 0b 0001 0100, so the source of the latest interrupt 
is a

RX (not surprising, as I'm doing full duplex) and there is no THRE
interrupt pending.
So regardless of the ultimate cause, this state will never empty the TX
buffer.

I think my only choice is to try something I had to do once before on a
similarly misbehaving serial port: I'll rewrite the xeno_16550A 
interrupt

handlers to redundantly check for data pending in the TX buffer whenever
any interrupt like an RX interrupt happens. I do have bidirectional 
traffic

after all, so the driver will wake up frequently and keep the TX data
transmitting.

Interesting enough, the stall problem did not occur when I used the 
sample
serial code provided by xenomai: cross-link.c . I also rewrote 
cross-link.c

to send a 72 byte packet and receive on the same port (I installed a
physical loopback device on the serial port). No stalls for 12+ hours 
with

packets streaming at 100 Hz.
The only difference in the serial configuration between that 
cross-link.c

app and my app was :
struct rtser_config :
 .rx_timeout= RTSER_DEF_TIMEOUT  // infinite , no 
stall for

many hours in cross-link.c
versus:
 .rx_timeout= 50   // 500us, stalls within an 
hour in my

app
I don't know why an RX setting affects TX behavior. I also can't use
RTSER_DEF_TIMEOUT in my application or it dies when it starts up - no 
clue

why.  But I did try setting
   .rx_timeout= 500   // 5 ms. my app doesnt stall for 
several

hours
and though that did not cause the serial to stall in my app for several
hours of testing, it is just open-loop finger-crossing, and not a real
solution.
I need the TX interrupts to fire reliably. So I think I must rewrite 
that

interrupt handler, as above.



I think we have a race between rt_16550_write filling the software 
queue that

the tx interrupt is supposed to write out and the latter already firing,
consuming that event without seeing the queue filled. I'll think about 
a better
algorithm tomorrow, one that can possibly get rid of some interrupt 
events as well.


Jan


Greetings again,

If cross-link.c is not stalling, but the CSmith application hangs on
startup when using similar settings to what cross-link.c is using, it
tells me that understanding why this "hang on startup" is happening
would be a good idea.  I know this has happened to me when I got an
event from a UART that my code did not handle, and because I did not
handle it, the event continued to fire over and over - a hang. I
theorized that perhaps there's an issue with there being stale data
or a data overrun condition that exists when the app starts up that's
causing this hang.  In either case, it sounds as though the difference
in settings between CSmith app and cross-link.c might be a key factor.

I went back to the previous email trail, and if I interpreted it
correctly, the overall data rate is only about 80% of 115Kbaud. This
suggests that every time there is a write, the 4K software buffer in
the driver should be completely empty - as should the TX FIFO. The
only time that won't be true is when the transmit processing got
stalled (by loss of interrupt, or whatever).

I would be interested to see what happens if the CSmith app
were to be modified to write one byte at a time, with no delay
between rt_dev_write() calls.

Finally, some searching shows that back when the original National
Semiconductor 16550[A] UARTs were first being "cloned" by other
vendors, National created a program called "COMTEST" that was
designed to reveal the "misbehaviour" of those competing chips by
doing extensive testing of the timing and other characteristics and
how it deviated from "the real thing".  I wonder if anyone in this
group knows where a copy of that program (or a more modern version)
might exist?

Regards,
Steve




Re: rt_dev_send() stalls periodic task

2019-04-21 Thread Steve Freyder via Xenomai

On 4/20/2019 11:33 PM, C Smith via Xenomai wrote:

Per your suggestion, I added code to call this ioctl, right after the
rt_dev_write() :
rt_dev_ioctl(fd_tty[1], RTSER_RTIOC_GET_STATUS, &serial_status);
I let the transmit stall again, then attached with a gdb, which allows me
to step forward to the ioctl:
serial_status.line_status was 96 decimal, or  0110  binary
which means both transmit holding and transmit shift registers were empty,
thus nothing was queued up in the UART for transmission.
The return value of rt_dev_write() was only 8, after a 72 byte packet was
submitted to rt_dev_write().
So your theory that the TX interrupt got lost seems correct.

First, why does rt_dev_write() wait until all bytes are transmitted ?
Shouldn't it be effectively "non blocking" ?

Second, how might l generate another UART TX interrupt to keep the
transmission going?
Can we modify the serial driver at a low level to check the LSR vs the
bytes in the buffer, and force transmission until the buffer is empty?

thanks,
-C Smith


[ pls excuse the intrusion on your thread, I experienced this same
  problem years ago, 16550A hardware, bare metal,, perhaps I could
  add a couple of thoughts ]

I would point out that Phillipe made some changes to the 3.0.x iMX
UART driver circa 2019/04/01, in what sounds like the same functional
area.  Granted that's different hardware, but it appears to be a
descendant of this driver, so if those changes were good for the iMX
driver, maybe they're good for this one too.

I got curious about "tx_timeout", and why it doesn't help in this
situation, so I looked at the code.  The driver rightly assumes that
the hardware is going to produce a TX interrupt when the FIFO trigger
level is reached.  The TX interrupt handler will pull more bytes from
the (4K) software transmit buffer to fill the TX FIFO, set the
IER.THRE interrupt enable, and return.If the TX interrupt doesn't
fire, that process of emptying the software FIFO into the hardware TX
FIFO stops, and there's no timeout-based provision for restoring the
flow of output, so it's only a matter of time before the software FIFO
overflows, and at that point your writes start to stall.

I might argue that since you are in nonblocking mode, the driver
write routine should be doing this check before attempting to put
anything in the software buffer:

if (userwritelen > freebytesinsoftwarebuffer)  {
return(-EWOULDBLOCK) ;
}

With obvious issues for user buffers larger than 4K in NB mode.

That'd keep your task from hanging, but the output is still going to
stop shortly after losing a THRE interrupt.

BTW, if you truly *are* losing an interrupt, the IER.THRE bit should
be equal to 1 when you look at it in the debugger.  If the IER.THRE
bit is 0, then it means that the driver made the mistake, OR perhaps
that there's a timing problem where the CPU *tried* to set IER.THRE
but the chip wasn't ready and never heard the request.As I
remember it there's a software copy of the last requested output
state of the IER kept in the per-port context structure, so you could
look there to see what the driver last attempted to write to the IER.

I remember you mentioned that you have three such UARTs in your
system, and that one (COM1?) is not having this problem, but the
other two *are*.  I think I would be interested in how the hardware
related to COM1 differs from that of the others.  Are they all on the
motherboard?  Maybe the IRQ assignments are what make the
difference.

Finally, you could run a test where you let the port be handled by
Linux, and exercise it with:

strace dd if=/dev/zero bs=75 of=/dev/ttyXX

to see if you have the same problem with output stopping (eg, dropping
a THRE interrupt).  Keep an eye on your dmesg output while you're
running the test, Linux might have code to detect a dropped transmit
interrupt based on a timer, and if that happens, it should be logged
via printk() and show up in dmesg.

HTH

Regards,
Steve



Re: Possible Cobalt mqueue issue

2019-02-25 Thread Steve Freyder via Xenomai




On 2/25/2019 11:15 AM, Jan Kiszka wrote:

On 25.02.19 17:53, Steve Freyder via Xenomai wrote:

Greetings again,

Recently I have converted my codebase from using Alchemy-based queues 
(rt_queue_xx) to Cobalt (Posix) mqueues for all inter-process 
communication, and using rt_queue queues only for communication 
between threads in the same process.


This is running on Xenomai 3.0.7 built from -next (our vendor does 
the Xenomai/kernel builds):


Was that really 3.0.7, not 3.0.8 or latest stable-3.0.x? There e.g. 
https://gitlab.denx.de/Xenomai/xenomai/commit/4924717ec5cbc694afc1b91ba7d525b80901d44d 
since 3.0.7, and you backtrace kind of looks familiar /wrt that.


Jan



Linux g3l-36 4.1.18_C01571-15S01-00.002.zimg+83fdace666 #4 SMP Tue 
Aug 21 11:22:31 CDT 2018 armv7l GNU/Linux
Now that I see that, you're absolutely right and I do apologize for 
bothering you with this.  After looking at that patch (from October 
2018) I do remember when you originally posted it and thinking at the 
time that I needed to pick it up in a future build.  The build from Aug 
21 2018 was 3.0.7 and didn't include your 4924717e fix.  I guess it is 
time for me to move to a stable-3.0.9 build!


Thanks,
Steve





Possible Cobalt mqueue issue

2019-02-25 Thread Steve Freyder via Xenomai

Greetings again,

Recently I have converted my codebase from using Alchemy-based queues 
(rt_queue_xx) to Cobalt (Posix) mqueues for all inter-process 
communication, and using rt_queue queues only for communication between 
threads in the same process.


This is running on Xenomai 3.0.7 built from -next (our vendor does the 
Xenomai/kernel builds):


Linux g3l-36 4.1.18_C01571-15S01-00.002.zimg+83fdace666 #4 SMP Tue Aug 
21 11:22:31 CDT 2018 armv7l GNU/Linux


This happened as my main process was starting during boot.  I have not 
been able to reproduce this, but I thought maybe the output would be useful.


Thanks in advance,
Best regards,
Steve

=

[   13.056376] I-pipe: Detected stalled head domain, probably caused by 
a bug.

[   13.056376] A critical section may have been left unterminated.
[   13.069983] CPU: 1 PID: 1259 Comm: g2ld-main Not tainted 
4.1.18_C01571-15S01-00.002.zimg+83fdace666 #4

[   13.079309] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   13.085854] Backtrace:
[   13.088362] [<80014a64>] (dump_backtrace) from [<80014c9c>] 
(show_stack+0x20/0x24)

[   13.095948]  r7: r6:0080 r5: r4:80b85c94
[   13.101747] [<80014c7c>] (show_stack) from [<806b679c>] 
(dump_stack+0xa0/0xc4)
[   13.109003] [<806b66fc>] (dump_stack) from [<800ab000>] 
(ipipe_root_only+0x11c/0x188)
[   13.116848]  r9:80c49380 r8: r7:80c49380 r6:80b38e6c 
r5:600b0113 r4:809afba4
[   13.124758] [<800aaee4>] (ipipe_root_only) from [<80021bc0>] 
(do_page_fault+0x2fc/0x4a8)
[   13.132864]  r10:bc34da84 r9:bba5 r8:0004 r7:bc34da40 
r6:0817 r5:bb291d58

[   13.140845]  r4:600b0093 r3:
[   13.144498] [<800218c4>] (do_page_fault) from [<800093ec>] 
(do_DataAbort+0x44/0x1b4)
[   13.152257]  r10:80b3da3c r9:80b38e6c r8:0004 r7:bb291d58 
r6:800218c4 r5:0817

[   13.160238]  r4:80b3dbbc
[   13.162822] [<800093a8>] (do_DataAbort) from [<80015838>] 
(__dabt_svc+0x58/0x80)

[   13.170237] Exception stack(0xbb291d58 to 0xbb291da0)
[   13.175309] 1d40:  00100100
[   13.183510] 1d60:   bb8efc00  809ae9a0 
0001  80b38e6c
[   13.191710] 1d80: 80b3da3c bb291dd4 bb291d58 bb291da0 800eee1c 
801107dc 600b0093 
[   13.199902]  r10:80b3da3c r9:80b38e6c r8: r7:bb291d8c 
r6: r5:600b0093

[   13.207884]  r4:801107dc
[   13.210475] [<801106dc>] (mq_unref_inner) from [<80110ab8>] 
(mq_unref+0x78/0xd8)
[   13.217886]  r10: r9:80b3da3c r8:80b38e6c r7:80c49380 
r6:809ae9a0 r5:bb8efc00

[   13.225867]  r4:
[   13.228449] [<80110a40>] (mq_unref) from [<80110b3c>] 
(mqd_close+0x24/0x28)

[   13.235426]  r7:80c49380 r6:809ae9a0 r5:600b0013 r4:bb8efc00
[   13.241218] [<80110b18>] (mqd_close) from [<80106c58>] 
(__put_fd+0x35c/0x3b8)

[   13.248369]  r5:600b0013 r4:bda592c4
[   13.252020] [<801068fc>] (__put_fd) from [<80107a84>] 
(rtdm_fd_close+0x190/0x2f0)
[   13.259518]  r10:bda592c4 r9:80b3da3c r8:86860b0b r7:80b38e6c 
r6:809ae9a0 r5:0003

[   13.267499]  r4:bda594c8
[   13.270086] [<801078f4>] (rtdm_fd_close) from [<801119a8>] 
(__cobalt_mq_open+0x730/0xc4c)
[   13.278278]  r10:ffef r9:80b3da3c r8:80b38e6c r7:809ae9a0 
r6:bb8efc00 r5:0042

[   13.286259]  r4:
[   13.288844] [<80111278>] (__cobalt_mq_open) from [<80111f54>] 
(CoBaLt_mq_open+0x90/0xa0)
[   13.296949]  r10:80111ec4 r9:80c5c300 r8:80c5c300 r7:c0943808 
r6: r5:7ee16b4c

[   13.304930]  r4:0042
[   13.307517] [<80111ec4>] (CoBaLt_mq_open) from [<8011ebb8>] 
(handle_head_syscall+0xf8/0x3a4)

[   13.315969]  r6:0001 r5:0001 r4:bb291fb0
[   13.320691] [<8011eac0>] (handle_head_syscall) from [<8011f498>] 
(ipipe_fastcall_hook+0x20/0x28)
[   13.329493]  r10:0006 r9:bb29 r8:80010928 r7:000f0042 
r6:7ee16b4c r5:

[   13.337474]  r4:7ee16b8c
[   13.340065] [<8011f478>] (ipipe_fastcall_hook) from [<80010808>] 
(local_restart+0x20/0x44)
[   13.348359] Unable to handle kernel NULL pointer dereference at 
virtual address 0004

[   13.356468] pgd = bb2fc000
[   13.359192] [0004] *pgd=4da75831, *pte=, *ppte=
[   13.365562] Internal error: Oops: 817 [#1] SMP ARM
[   13.370371] Modules linked in: xeno_imx_uart xeno_gpio_mxc rt_e1000e 
rtnet fec e1000e
[   13.378412] CPU: 1 PID: 1259 Comm: g2ld-main Not tainted 
4.1.18_C01571-15S01-00.002.zimg+83fdace666 #4

[   13.387735] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   13.394281] task: bba5 ti: bb29 task.ti: bb29
[   13.399699] PC is at mq_unref_inner+0x100/0x364
[   13.404254] LR is at xnsynch_flush+0x154/0x2a0
[   13.408717] pc : [<801107dc>]lr : [<800eee1c>]psr: 600b0093
[   13.408717] sp : bb291da0  ip : bb291d58  fp : bb291dd4
[   13.420213] r10: 80b3da3c  r9 : 80b38e6c  r8 : 
[   13.425454] r7 : 0001  r6 : 809ae9a0  r5 :   r4 : bb8efc00
[   13.432000] r3 :   r2 :   r1 : 

Re: RTCan missing frames

2019-02-12 Thread Steve Freyder via Xenomai



On 2/12/2019 11:53 AM, Wolfgang Grandegger via Xenomai wrote:


Am 12.02.19 um 18:24 schrieb Johannes Holtz:

Am 12.02.19 um 18:05 schrieb Wolfgang Grandegger:

Am 12.02.19 um 16:19 schrieb Johannes Holtz:

Am 12.02.19 um 16:00 schrieb Wolfgang Grandegger:

Hello Johannes,

Am 12.02.19 um 15:38 schrieb Johannes Holtz:

Am 12.02.19 um 12:24 schrieb Wolfgang Grandegger:

Hello,

Am 12.02.19 um 11:42 schrieb Johannes Holtz:

Am 11.02.19 um 17:46 schrieb Wolfgang Grandegger:

... snip ...

UI suggest to write a simple test program to demonstrate the
issue. It
should just open the socket and trying to receive messages... just
the
necessary stuff. First with a blocking recv() and then
non-blocking.

What hardware and software are you using (arch, board, linux,
xenomai)?

?


Wolfgang.

The source code is attached:

compiled with -I/opt/xenomai/include -D_GNU_SOURCE -D_REENTRANT
-D__XENO__ -lrtdm -L/opt/xenomai/lib -lxenomai -lpthread -lrt
-lnative

can frames sent by rtcansend

Test 1: blocking:

ID:0 DLC:2hex:  81 00   <-- NMT request
ID:709 DLC:1hex:  00<-- answer node #9
ID:708 DLC:1hex:  00<-- answer node #8
ID:703 DLC:1hex:  00<-- answer node #3
ID:705 DLC:0hex: <-- here it gets weird ! DLC == 0
ID:70400 DLC:1hex:  01
ID:70600 DLC:1hex:  01
ID:70100 DLC:1hex:  01
ID:70200 DLC:1hex:  01
ID:101 DLC:1hex:  08

This means that you can receive messages from the CAN bus.


ID:53220 DLC:124 out of bounds. abort.

But that's wired.


Test 2: non blocking:

ID:0 DLC:2hex:  81 00<-- NMT request
ID:709 DLC:1hex:  00 <-- answer node #9
ID:708 DLC:1hex:  00 <-- answer node #8
ID:703 DLC:1hex:  00 <-- answer node #3
ID:705 DLC:0hex:  <-- same issue DLC is 0
ID:70600 DLC:1hex:  01
ID:70400 DLC:1hex:  01
ID:70200 DLC:1hex:  01
ID:70100 DLC:1hex:  01
ID:101 DLC:1hex:  08
ID:53220 DLC:124 out of bounds. abort.

Looks identical.


Also, I found another possible error source and I don't know if this
error picture would corresponds to this.

However,  While reviewing all settings, I noticed that I made a
mistake
with the RXBUF_SIZE which is set to 8096 instead of 8192. Must have
been
asleep when writing this. I'm going to rebuild this module.

Let's try to understand why rt_dev_recv() does return bogus dlc.

What hardware and software are you using (arch, board, can
controlelr,
linux, xenomai)?

Wolfgang.

I modified the program a little to send it's own requests via one
single
socket. The output look like this.

send succeeded
ID:708 DLC:1hex:  00
ID:709 DLC:1hex:  00
ID:703 DLC:1hex:  00
ID:706 DLC:1hex:  00
ID:705 DLC:7hex:  00 00 01 01 00 09 07
send succeeded
send succeeded
send succeeded
send succeeded
send succeeded

And a parallel rtcanrecv output  like this:

root@machinectrl:~# rtcanrecv rtcan0
#0: (1) <0x000> [2] 81 00
#1: (1) <0x708> [1] 00
#2: (1) <0x709> [1] 00
#3: (1) <0x703> [1] 00
#4: (0) <0x706> [0]
#5: (0) <0x500> [1] 01
#6: (0) <0x400> [1] 01
#7: (0) <0x100> [1] 01
#8: (0) <0x200> [1] 01
#9: (0) <0x000> [1] 08
#10: (24) <0x220> [124] 00 00 82 00 00 00 01 08 01 00 00 00 f0 84
04 08

Why is "addr.can_ifindex = 24"? Something is broken in you build.

In the xenomai build? how do I find out what? it all compiled well.

I patched the driver  before building tkernel with
0001-rtcan-ems_pci-driver-update-to-support-more-devices.patch.gz

Can you show me that patch.

See attachment.

I used the ipipe patch from the stable release,

and then build it normally.

Might the RX buf size that I changed a problem?

What exactly did you change? I think "rtcanrecv" does not use it.

Wolfgang

The kernel option CONFIG_XENO_DRIVERS_CAN_RXBUF_SIZE I at first I set it
to a wrong value  8096 but eve after I changed it to 8192 it didnt
change anythingand now i set it back to 1024 with no improvement either.

Ah, a value of 8096 does make trouble, e.g. with expression like the
following:

 sock->recv_tail = (sock->recv_tail + cpy_size) &
 (RTCAN_RXBUF_SIZE - 1);

Could you please make a *clean* rebuild from scratch with a value 2^N,
either 8192 or 1024.

Wolfgang.



#if (RTCAN_RXBUF_SIZE)&((RTCAN_RXBUF_SIZE)-1)
#error RTCAN_RXBUF_SIZE is not a power of 2!
#endif