Re: [Xenomai-core] Analogy/mite

2011-12-09 Thread Anders Blomdell

On 12/08/2011 05:19 PM, Anders Blomdell wrote:

On 12/07/2011 08:58 AM, Anders Blomdell wrote:

On 12/06/2011 11:47 PM, Alexis Berlemont wrote:

Hi

On Thu, Dec 1, 2011 at 4:03 PM, Anders Blomdell
anders.blomd...@control.lth.se wrote:

On 11/30/2011 07:03 PM, Anders Blomdell wrote:


Hi, just found that

echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...


OK seems like we have some interrupt cleanup problem, the following
command
sequence:



OK thank you for the report. I did not have time to look at it yet but
that will be done soon.

Is it blocking for you?

Yes, and even worse is this problem:

# /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1
# /usr/local/sbin/analogy_config -r analogy0
# cat /proc/xenomai/irq
Killed

I was looking into it last week, but is a workshop since monday, will
get back at this tomorrow.

Seems like somebody is stomping out
dev-transfer.irq_desc.rtdm_desc.flags between attach and detach (flags
and all fields in its vicinity is zeroed out), hence the interrupt is
never removed from the interrupt handler tables wreaking havoc with the
entire kernel.


Found the guilty party: a4l_cleanup_transfer, which zeroes out all the 
interrupt data, just before the interrupt should be detached. Somebody 
is being overzealous about keeping memory shiningly clean. We need to 
keep the useful dirt.


--- xenomai-2.6.0/ksrc/drivers/analogy/transfer.c.orig	2011-12-09 
11:22:06.961999598 +0100
+++ xenomai-2.6.0/ksrc/drivers/analogy/transfer.c	2011-12-09 
11:22:29.723999243 +0100

@@ -92,8 +92,6 @@
rtdm_free(tsf-subds);
}

-   memset(tsf, 0, sizeof(a4l_trf_t));
-
return 0;
 }


/Anders


--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Analogy/mite

2011-12-09 Thread Anders Blomdell

On 11/30/2011 07:03 PM, Anders Blomdell wrote:

Hi, just found that

echo :06:01.0  /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...
Well, took quite some time to track down the 'analogy_config -r' bug 
(which was responsible for the kernel OOPS [i.e. after fixing it I have 
not got any OOPSes]).


So back to the original problem, does anybody foresee that a call to 
a4l_ioctl_devcfg(cxt, NULL) from the mite driver would give any problems 
(apart from getting the context pointer from the data structures the 
mite driver has handy)? It is probably not kosher to do ioctl on a 
driver that is not open, but...


/Anders

--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Analogy/mite

2011-12-09 Thread Anders Blomdell

On 12/09/2011 02:35 PM, Anders Blomdell wrote:

On 11/30/2011 07:03 PM, Anders Blomdell wrote:

Hi, just found that

echo :06:01.0  /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...

Well, took quite some time to track down the 'analogy_config -r' bug
(which was responsible for the kernel OOPS [i.e. after fixing it I have
not got any OOPSes]).

So back to the original problem, does anybody foresee that a call to
a4l_ioctl_devcfg(cxt, NULL) from the mite driver would give any problems
(apart from getting the context pointer from the data structures the
mite driver has handy)? It is probably not kosher to do ioctl on a
driver that is not open, but...
Attached is a hack (as can be gleaned from the EXPORT_SYMBOL_GPL), if 
the basic assumption that it's ok to do a a4l_ioctl_devcfg(...) during 
unbind, I could rewrite the logic to pass down a pointer to 
a4l_ioctl_devcfg to avoid this.


/Anders


--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

diff -ur xenomai-2.6.0.orig/include/analogy/device.h xenomai-2.6.0/include/analogy/device.h
--- xenomai-2.6.0.orig/include/analogy/device.h	2011-12-09 16:37:46.777999756 +0100
+++ xenomai-2.6.0/include/analogy/device.h	2011-12-09 16:41:18.660003797 +0100
@@ -43,9 +43,9 @@
 	/* Device specific flags */
 	unsigned long flags;
 
-	/* Driver assigned to this device thanks to attaching
-	   procedure */
+	/* Fields assigned to this device in attaching procedure */
 	a4l_drv_t *driver;
+	a4l_cxt_t *cxt;
 
 	/* Hidden description stuff */
 	struct list_head subdvsq;
diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/device.c xenomai-2.6.0/ksrc/drivers/analogy/device.c
--- xenomai-2.6.0.orig/ksrc/drivers/analogy/device.c	2011-12-09 16:37:48.497999755 +0100
+++ xenomai-2.6.0/ksrc/drivers/analogy/device.c	2011-12-09 16:42:23.163001790 +0100
@@ -291,6 +291,7 @@
 	a4l_dev_t *dev = a4l_get_dev(cxt);
 
 	dev-driver = drv;
+	dev-cxt = cxt;
 
 	if (drv-privdata_size == 0)
 		__a4l_dbg(1, core_dbg,
@@ -331,6 +332,7 @@
 	if (ret != 0  dev-priv != NULL) {
 		rtdm_free(dev-priv);
 		dev-driver = NULL;
+		dev-cxt = NULL;
 	}
 
 	return ret;
@@ -360,6 +362,7 @@
 	/* Free the private field */
 	rtdm_free(dev-priv);
 	dev-driver = NULL;
+	dev-cxt = NULL;
 
 out_release_driver:
 	return ret;
@@ -455,6 +458,7 @@
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(a4l_ioctl_devcfg);
 
 int a4l_ioctl_devinfo(a4l_cxt_t * cxt, void *arg)
 {
diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.c xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.c
--- xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.c	2011-12-09 16:37:48.49755 +0100
+++ xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.c	2011-12-09 16:43:04.147002142 +0100
@@ -103,6 +103,9 @@
 			list_entry(this, struct mite_struct, list);
 
 		if(mite-pcidev == dev) {
+			if (mite-a4ldev) {
+a4l_ioctl_devcfg(mite-a4ldev-cxt, NULL);
+			}
 			list_del(this);
 			kfree(mite);
 			break;
@@ -117,7 +120,8 @@
 	.remove = mite_remove,
 };
 
-int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1)
+int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1,
+		   struct a4l_device *a4ldev)
 {
 	unsigned long length;
 	resource_size_t addr;
@@ -232,6 +236,7 @@
 	}
 
 	mite-used = 1;
+	mite-a4ldev = a4ldev;
 
 	return 0;
 }
@@ -255,6 +260,7 @@
 		pci_release_regions( mite-pcidev );
 
 	mite-used = 0;
+	mite-a4ldev = NULL;
 }
 
 void a4l_mite_list_devices(void)
diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.h xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.h
--- xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.h	2011-12-09 16:37:48.49755 +0100
+++ xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.h	2011-12-09 16:38:33.976999742 +0100
@@ -70,6 +70,7 @@
 	void *mite_io_addr;
 	resource_size_t daq_phys_addr;
 	void *daq_io_addr;
+	struct a4l_device *a4ldev;
 };
 
 static inline
@@ -115,7 +116,8 @@
 	return mite-pcidev-device;
 };
 
-int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1);
+int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1,
+		   struct a4l_device *a4ldev);
 void a4l_mite_unsetup(struct mite_struct *mite);
 void a4l_mite_list_devices(void);
 struct mite_struct * a4l_mite_find_device(int bus,
diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/ni_660x.c xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/ni_660x.c
--- xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/ni_660x.c	2011-12-09 16:37:48.500999755 +0100
+++ xenomai-2.6.0/ksrc/drivers/analogy/national_instruments

Re: [Xenomai-core] Analogy/mite

2011-12-09 Thread Anders Blomdell

On 12/09/2011 04:51 PM, Anders Blomdell wrote:

On 12/09/2011 02:35 PM, Anders Blomdell wrote:

On 11/30/2011 07:03 PM, Anders Blomdell wrote:

Hi, just found that

echo :06:01.0  /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...

Well, took quite some time to track down the 'analogy_config -r' bug
(which was responsible for the kernel OOPS [i.e. after fixing it I have
not got any OOPSes]).

So back to the original problem, does anybody foresee that a call to
a4l_ioctl_devcfg(cxt, NULL) from the mite driver would give any problems
(apart from getting the context pointer from the data structures the
mite driver has handy)? It is probably not kosher to do ioctl on a
driver that is not open, but...

Attached is a hack (as can be gleaned from the EXPORT_SYMBOL_GPL), if
the basic assumption that it's ok to do a a4l_ioctl_devcfg(...) during
unbind, I could rewrite the logic to pass down a pointer to
a4l_ioctl_devcfg to avoid this.


Sloppy me, should of course be:
+   if (mite-a4ldev) {
if (mite-a4ldev  mite-a4ldev-cxt) {
+   a4l_ioctl_devcfg(mite-a4ldev-cxt, NULL);
+   }


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Analogy/mite

2011-12-08 Thread Anders Blomdell

On 12/07/2011 08:58 AM, Anders Blomdell wrote:

On 12/06/2011 11:47 PM, Alexis Berlemont wrote:

Hi

On Thu, Dec 1, 2011 at 4:03 PM, Anders Blomdell
anders.blomd...@control.lth.se wrote:

On 11/30/2011 07:03 PM, Anders Blomdell wrote:


Hi, just found that

echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...


OK seems like we have some interrupt cleanup problem, the following
command
sequence:



OK thank you for the report. I did not have time to look at it yet but
that will be done soon.

Is it blocking for you?

Yes, and even worse is this problem:

# /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1
# /usr/local/sbin/analogy_config -r analogy0
# cat /proc/xenomai/irq
Killed

I was looking into it last week, but is a workshop since monday, will
get back at this tomorrow.
Seems like somebody is stomping out 
dev-transfer.irq_desc.rtdm_desc.flags between attach and detach (flags 
and all fields in its vicinity is zeroed out), hence the interrupt is 
never removed from the interrupt handler tables wreaking havoc with the 
entire kernel.







Alexis.


modprobe xeno_native
modprobe analogy_ni_pcimio
sleep 1
/usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1
/usr/local/sbin/analogy_config -r analogy0
rmmod analogy_ni_pcimio
rmmod analogy_ni_mio
rmmod analogy_ni_tio
rmmod analogy_8255
rmmod analogy_ni_mite
rmmod xeno_analogy

sleep 2

modprobe xeno_native
modprobe analogy_ni_pcimio
sleep 1
/usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1

Gives:

[ 412.623639] Analogy: MITE: Available NI device IDs: 0x70af
[ 413.648335] Analogy: analogy_ni_pcimio: pcimio_attach: found pci-6221
board
[ 413.676105] Analogy: analogy_ni_pcimio: pcimio_attach: found irq 22
[ 413.682385] BUG: unable to handle kernel paging request at f8bc4bf4
[ 413.683367] IP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus]
[ 413.683367] *pdpt = 00aca001 *pde = 31ca5067 *pte =

[ 413.683367] Oops:  [#1] SMP
[ 413.683367] last sysfs file: /sys/bus/pci/drivers/analogy_mite/uevent
[ 413.683367] Modules linked in: analogy_ni_pcimio analogy_ni_mio
analogy_ni_tio analogy_8255 analogy_ni_mite xeno_analogy xeno_native nfs
fscache snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq
snd_seq_device snd_pcm snd_timer snd soundcore rt_e1000 rt_e1000_new
rtnet
xeno_rtdm nfsd lockd nfs_acl auth_rpcgss xeno_nucleus snd_page_alloc
ppdev
iTCO_wdt iTCO_vendor_support microcode sunrpc exportfs i2c_i801 pcspkr
serio_raw e1000e parport_pc parport uinput ipv6 firewire_ohci
firewire_core
ata_generic pata_acpi crc_itu_t pata_marvell i915 drm_kms_helper drm
i2c_algo_bit i2c_core video [last unloaded: xeno_analogy]
[ 413.683367]
[ 413.683367] Pid: 1579, comm: analogy_config Not tainted
2.6.38.8.xenomai.2.6.0.rtnet.26db745.2030.1211 #1 /DG965SS
[ 413.683367] EIP: 0060:[f8846efe] EFLAGS: 00010286 CPU: 1
[ 413.683367] EIP is at xnintr_attach+0x6e/0xfe [xeno_nucleus]
[ 413.683367] EAX: f8bc4be4 EBX: f87d2be4 ECX: 0001 EDX: 0003
[ 413.683367] ESI: f885b840 EDI: fff0 EBP: f169ddf4 ESP: f169dde0
[ 413.683367] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 413.683367] Process analogy_config (pid: 1579, ti=f169c000
task=f40925e0
task.ti=f169c000)
[ 413.683367] I-pipe domain Linux
[ 413.683367] Stack:
[ 413.683367] 205bde08 0001 f87d2be4  0001 f169de10
f89a0c91 f87cea28
[ 413.683367]  0001 f87d2bd8  f169de28 f87ceb64
0001 f87d134f
[ 413.683367] f87d2bd8 f87d2bb8 f169de44 f87cf727 0001 f87d2bb8
0016 f87d2bb8
[ 413.683367] Call Trace:
[ 413.683367] [f89a0c91] rtdm_irq_request+0x37/0x5a [xeno_rtdm]
[ 413.683367] [f87cea28] ? a4l_handle_irq+0x0/0x1f [xeno_analogy]
[ 413.683367] [f87ceb64] __a4l_request_irq+0x38/0x3e [xeno_analogy]
[ 413.683367] [f87cf727] a4l_request_irq+0x67/0xad [xeno_analogy]
[ 413.683367] [f86b1593] pcimio_attach+0x4e0/0x53e [analogy_ni_pcimio]
[ 413.683367] [f87cde93] a4l_assign_driver+0x73/0x100 [xeno_analogy]
[ 413.683367] [f87cdfd9] a4l_device_attach+0x59/0x6e [xeno_analogy]
[ 413.683367] [f87ce0d7] a4l_ioctl_devcfg+0xbd/0xf6 [xeno_analogy]
[ 413.683367] [f87cf943] a4l_ioctl+0x1e/0x20 [xeno_analogy]
[ 413.683367] [f899fa5a] __rt_dev_ioctl+0x4d/0x104 [xeno_rtdm]
[ 413.683367] [c07c35b6] ? do_page_fault+0x2f7/0x322
[ 413.683367] [f89a1a85] sys_rtdm_ioctl+0x2e/0x30 [xeno_rtdm]
[ 413.683367] [f8851414] losyscall_event+0xb1/0x174 [xeno_nucleus]
[ 413.683367] [c04887ab] __ipipe_dispatch_event+0xcb/0x17a
[ 413.683367] [f8851363] ? losyscall_event+0x0/0x174 [xeno_nucleus]
[ 413.683367] [c0415b32] __ipipe_syscall_root+0x50/0xc9
[ 413.683367] [c07c0a21] system_call+0x2d/0x53
[ 413.683367] Code: 00 e8 73 ff ff ff 8b 4b 10 f7 c1 00 00 01 00 89
45 f0
0f 85 92 00 00 00 8b 73 14 c1 e6 06 81 c6 c0 b2 85 f8 8b 46 24 85 c0
74 25
8b

Re: [Xenomai-core] Analogy/mite

2011-12-07 Thread Anders Blomdell

On 12/06/2011 11:47 PM, Alexis Berlemont wrote:

Hi

On Thu, Dec 1, 2011 at 4:03 PM, Anders Blomdell
anders.blomd...@control.lth.se  wrote:

On 11/30/2011 07:03 PM, Anders Blomdell wrote:


Hi, just found that

echo :06:01.0  /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...


OK seems like we have some interrupt cleanup problem, the following command
sequence:



OK thank you for the report. I did not have time to look at it yet but
that will be done soon.

Is it blocking for you?

Yes, and even worse is this problem:

# /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1
# /usr/local/sbin/analogy_config -r analogy0
# cat /proc/xenomai/irq
Killed

I was looking into it last week, but is a workshop since monday, will 
get back at this tomorrow.




Alexis.


modprobe xeno_native
modprobe analogy_ni_pcimio
sleep 1
/usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1
/usr/local/sbin/analogy_config -r analogy0
rmmod analogy_ni_pcimio
rmmod analogy_ni_mio
rmmod analogy_ni_tio
rmmod analogy_8255
rmmod analogy_ni_mite
rmmod xeno_analogy

sleep 2

modprobe xeno_native
modprobe analogy_ni_pcimio
sleep 1
/usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1

Gives:

[  412.623639] Analogy: MITE: Available NI device IDs: 0x70af
[  413.648335] Analogy: analogy_ni_pcimio: pcimio_attach: found pci-6221
board
[  413.676105] Analogy: analogy_ni_pcimio: pcimio_attach: found irq 22
[  413.682385] BUG: unable to handle kernel paging request at f8bc4bf4
[  413.683367] IP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus]
[  413.683367] *pdpt = 00aca001 *pde = 31ca5067 *pte =

[  413.683367] Oops:  [#1] SMP
[  413.683367] last sysfs file: /sys/bus/pci/drivers/analogy_mite/uevent
[  413.683367] Modules linked in: analogy_ni_pcimio analogy_ni_mio
analogy_ni_tio analogy_8255 analogy_ni_mite xeno_analogy xeno_native nfs
fscache snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq
snd_seq_device snd_pcm snd_timer snd soundcore rt_e1000 rt_e1000_new rtnet
xeno_rtdm nfsd lockd nfs_acl auth_rpcgss xeno_nucleus snd_page_alloc ppdev
iTCO_wdt iTCO_vendor_support microcode sunrpc exportfs i2c_i801 pcspkr
serio_raw e1000e parport_pc parport uinput ipv6 firewire_ohci firewire_core
ata_generic pata_acpi crc_itu_t pata_marvell i915 drm_kms_helper drm
i2c_algo_bit i2c_core video [last unloaded: xeno_analogy]
[  413.683367]
[  413.683367] Pid: 1579, comm: analogy_config Not tainted
2.6.38.8.xenomai.2.6.0.rtnet.26db745.2030.1211 #1 /DG965SS
[  413.683367] EIP: 0060:[f8846efe] EFLAGS: 00010286 CPU: 1
[  413.683367] EIP is at xnintr_attach+0x6e/0xfe [xeno_nucleus]
[  413.683367] EAX: f8bc4be4 EBX: f87d2be4 ECX: 0001 EDX: 0003
[  413.683367] ESI: f885b840 EDI: fff0 EBP: f169ddf4 ESP: f169dde0
[  413.683367]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  413.683367] Process analogy_config (pid: 1579, ti=f169c000 task=f40925e0
task.ti=f169c000)
[  413.683367] I-pipe domain Linux
[  413.683367] Stack:
[  413.683367]  205bde08 0001 f87d2be4  0001 f169de10
f89a0c91 f87cea28
[  413.683367]   0001 f87d2bd8  f169de28 f87ceb64
0001 f87d134f
[  413.683367]  f87d2bd8 f87d2bb8 f169de44 f87cf727 0001 f87d2bb8
0016 f87d2bb8
[  413.683367] Call Trace:
[  413.683367]  [f89a0c91] rtdm_irq_request+0x37/0x5a [xeno_rtdm]
[  413.683367]  [f87cea28] ? a4l_handle_irq+0x0/0x1f [xeno_analogy]
[  413.683367]  [f87ceb64] __a4l_request_irq+0x38/0x3e [xeno_analogy]
[  413.683367]  [f87cf727] a4l_request_irq+0x67/0xad [xeno_analogy]
[  413.683367]  [f86b1593] pcimio_attach+0x4e0/0x53e [analogy_ni_pcimio]
[  413.683367]  [f87cde93] a4l_assign_driver+0x73/0x100 [xeno_analogy]
[  413.683367]  [f87cdfd9] a4l_device_attach+0x59/0x6e [xeno_analogy]
[  413.683367]  [f87ce0d7] a4l_ioctl_devcfg+0xbd/0xf6 [xeno_analogy]
[  413.683367]  [f87cf943] a4l_ioctl+0x1e/0x20 [xeno_analogy]
[  413.683367]  [f899fa5a] __rt_dev_ioctl+0x4d/0x104 [xeno_rtdm]
[  413.683367]  [c07c35b6] ? do_page_fault+0x2f7/0x322
[  413.683367]  [f89a1a85] sys_rtdm_ioctl+0x2e/0x30 [xeno_rtdm]
[  413.683367]  [f8851414] losyscall_event+0xb1/0x174 [xeno_nucleus]
[  413.683367]  [c04887ab] __ipipe_dispatch_event+0xcb/0x17a
[  413.683367]  [f8851363] ? losyscall_event+0x0/0x174 [xeno_nucleus]
[  413.683367]  [c0415b32] __ipipe_syscall_root+0x50/0xc9
[  413.683367]  [c07c0a21] system_call+0x2d/0x53
[  413.683367] Code: 00 e8 73 ff ff ff 8b 4b 10 f7 c1 00 00 01 00 89 45 f0
0f 85 92 00 00 00 8b 73 14 c1 e6 06 81 c6 c0 b2 85 f8 8b 46 24 85 c0 74 25
8b  50 10 89 ce 21 d6 83 e6 01 74 73 8b 73 18 39 70 18 75 6b 31
[  413.683367] EIP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus]
SS:ESP 0068:f169dde0
[  413.683367] CR2: f8bc4bf4






/Anders




--
Anders Blomdell  Email

Re: [Xenomai-core] Analogy/mite

2011-12-01 Thread Anders Blomdell

On 11/30/2011 07:03 PM, Anders Blomdell wrote:

Hi, just found that

echo :06:01.0  /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results
in a kernel OOPS.

Will try to look into it further tomorrow...
OK seems like we have some interrupt cleanup problem, the following 
command sequence:


modprobe xeno_native
modprobe analogy_ni_pcimio
sleep 1
/usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1
/usr/local/sbin/analogy_config -r analogy0
rmmod analogy_ni_pcimio
rmmod analogy_ni_mio
rmmod analogy_ni_tio
rmmod analogy_8255
rmmod analogy_ni_mite
rmmod xeno_analogy

sleep 2

modprobe xeno_native
modprobe analogy_ni_pcimio
sleep 1
/usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1

Gives:

[  412.623639] Analogy: MITE: Available NI device IDs: 0x70af
[  413.648335] Analogy: analogy_ni_pcimio: pcimio_attach: found pci-6221 
board

[  413.676105] Analogy: analogy_ni_pcimio: pcimio_attach: found irq 22
[  413.682385] BUG: unable to handle kernel paging request at f8bc4bf4
[  413.683367] IP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus]
[  413.683367] *pdpt = 00aca001 *pde = 31ca5067 *pte = 


[  413.683367] Oops:  [#1] SMP
[  413.683367] last sysfs file: /sys/bus/pci/drivers/analogy_mite/uevent
[  413.683367] Modules linked in: analogy_ni_pcimio analogy_ni_mio 
analogy_ni_tio analogy_8255 analogy_ni_mite xeno_analogy xeno_native nfs 
fscache snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq 
snd_seq_device snd_pcm snd_timer snd soundcore rt_e1000 rt_e1000_new 
rtnet xeno_rtdm nfsd lockd nfs_acl auth_rpcgss xeno_nucleus 
snd_page_alloc ppdev iTCO_wdt iTCO_vendor_support microcode sunrpc 
exportfs i2c_i801 pcspkr serio_raw e1000e parport_pc parport uinput ipv6 
firewire_ohci firewire_core ata_generic pata_acpi crc_itu_t pata_marvell 
i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: 
xeno_analogy]

[  413.683367]
[  413.683367] Pid: 1579, comm: analogy_config Not tainted 
2.6.38.8.xenomai.2.6.0.rtnet.26db745.2030.1211 #1 
/DG965SS

[  413.683367] EIP: 0060:[f8846efe] EFLAGS: 00010286 CPU: 1
[  413.683367] EIP is at xnintr_attach+0x6e/0xfe [xeno_nucleus]
[  413.683367] EAX: f8bc4be4 EBX: f87d2be4 ECX: 0001 EDX: 0003
[  413.683367] ESI: f885b840 EDI: fff0 EBP: f169ddf4 ESP: f169dde0
[  413.683367]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  413.683367] Process analogy_config (pid: 1579, ti=f169c000 
task=f40925e0 task.ti=f169c000)

[  413.683367] I-pipe domain Linux
[  413.683367] Stack:
[  413.683367]  205bde08 0001 f87d2be4  0001 f169de10 
f89a0c91 f87cea28
[  413.683367]   0001 f87d2bd8  f169de28 f87ceb64 
0001 f87d134f
[  413.683367]  f87d2bd8 f87d2bb8 f169de44 f87cf727 0001 f87d2bb8 
0016 f87d2bb8

[  413.683367] Call Trace:
[  413.683367]  [f89a0c91] rtdm_irq_request+0x37/0x5a [xeno_rtdm]
[  413.683367]  [f87cea28] ? a4l_handle_irq+0x0/0x1f [xeno_analogy]
[  413.683367]  [f87ceb64] __a4l_request_irq+0x38/0x3e [xeno_analogy]
[  413.683367]  [f87cf727] a4l_request_irq+0x67/0xad [xeno_analogy]
[  413.683367]  [f86b1593] pcimio_attach+0x4e0/0x53e [analogy_ni_pcimio]
[  413.683367]  [f87cde93] a4l_assign_driver+0x73/0x100 [xeno_analogy]
[  413.683367]  [f87cdfd9] a4l_device_attach+0x59/0x6e [xeno_analogy]
[  413.683367]  [f87ce0d7] a4l_ioctl_devcfg+0xbd/0xf6 [xeno_analogy]
[  413.683367]  [f87cf943] a4l_ioctl+0x1e/0x20 [xeno_analogy]
[  413.683367]  [f899fa5a] __rt_dev_ioctl+0x4d/0x104 [xeno_rtdm]
[  413.683367]  [c07c35b6] ? do_page_fault+0x2f7/0x322
[  413.683367]  [f89a1a85] sys_rtdm_ioctl+0x2e/0x30 [xeno_rtdm]
[  413.683367]  [f8851414] losyscall_event+0xb1/0x174 [xeno_nucleus]
[  413.683367]  [c04887ab] __ipipe_dispatch_event+0xcb/0x17a
[  413.683367]  [f8851363] ? losyscall_event+0x0/0x174 [xeno_nucleus]
[  413.683367]  [c0415b32] __ipipe_syscall_root+0x50/0xc9
[  413.683367]  [c07c0a21] system_call+0x2d/0x53
[  413.683367] Code: 00 e8 73 ff ff ff 8b 4b 10 f7 c1 00 00 01 00 89 45 
f0 0f 85 92 00 00 00 8b 73 14 c1 e6 06 81 c6 c0 b2 85 f8 8b 46 24 85 c0 
74 25 8b 50 10 89 ce 21 d6 83 e6 01 74 73 8b 73 18 39 70 18 75 6b 31
[  413.683367] EIP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus] 
SS:ESP 0068:f169dde0

[  413.683367] CR2: f8bc4bf4





/Anders



--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Analogy/mite

2011-11-30 Thread Anders Blomdell

Hi, just found that

  echo :06:01.0  /sys/bus/pci/drivers/analogy_mite/unbind

does not do the same thing as

  analogy_config -r analogyN

in fact it leaves the system in a state where using the driver results 
in a kernel OOPS.


Will try to look into it further tomorrow...

/Anders
--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Problems with gcc 4.6.0 (rt_task_shadow fails with ENOSYS)

2011-07-08 Thread Anders Blomdell

On 07/08/2011 02:41 PM, Gilles Chanteperdrix wrote:

On 07/07/2011 11:47 PM, Anders Blomdell wrote:

When compiling kernel 2.6.37.3 and xenomai 2.5.6 with gcc version 4.6.0
20110530 (Red Hat 4.6.0-9) (GCC), programs fail with -ENOSYS in
rt_task_shadow. If compiled with gcc version 4.5.1 20100924 (Red Hat
4.5.1-4) (GCC) everything works as expected.


Could you send us the disassembly of the two functions?
Which functions? Print[fk] debugging got me to suspect the 
syscall/skin_mux interface, but I'm a bit at loss of exactly where the 
code ends up.


Regards

Anders


--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Problems with gcc 4.6.0 (rt_task_shadow fails with ENOSYS)

2011-07-08 Thread Anders Blomdell

On 07/08/2011 04:44 PM, Gilles Chanteperdrix wrote:

On 07/08/2011 04:06 PM, Anders Blomdell wrote:

On 07/08/2011 02:41 PM, Gilles Chanteperdrix wrote:

On 07/07/2011 11:47 PM, Anders Blomdell wrote:

When compiling kernel 2.6.37.3 and xenomai 2.5.6 with gcc version 4.6.0
20110530 (Red Hat 4.6.0-9) (GCC), programs fail with -ENOSYS in
rt_task_shadow. If compiled with gcc version 4.5.1 20100924 (Red Hat
4.5.1-4) (GCC) everything works as expected.


Could you send us the disassembly of the two functions?

Which functions? Print[fk] debugging got me to suspect the
syscall/skin_mux interface, but I'm a bit at loss of exactly where the
code ends up.


The two rt_task_shadow, the one which works, and the one which does not.

Ok, attached the two routines taken from respective libnative.so.3

Will try to recompile with gcc-4.6.1 as well.

/Anders

--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

83a0 rt_task_shadow:
83a0:   55  push   %ebp
83a1:   57  push   %edi
83a2:   56  push   %esi
83a3:   53  push   %ebx
83a4:   e8 c0 a1 ff ff  call   2569 __i686.get_pc_thunk.bx
83a9:   81 c3 93 56 00 00   add$0x5693,%ebx
83af:   81 ec ac 08 00 00   sub$0x8ac,%esp
83b5:   8b b4 24 c0 08 00 00mov0x8c0(%esp),%esi
83bc:   e8 3b 9d ff ff  call   20fc xeno_fault_stack@plt
83c1:   85 f6   test   %esi,%esi
83c3:   8d 84 24 90 08 00 00lea0x890(%esp),%eax
83ca:   c7 44 24 04 00 00 00movl   $0x0,0x4(%esp)
83d1:   00 
83d2:   0f 44 f0cmove  %eax,%esi
83d5:   c7 04 24 01 00 00 00movl   $0x1,(%esp)
83dc:   e8 fb 9e ff ff  call   22dc pthread_setcanceltype@plt
83e1:   e8 c6 9e ff ff  call   22ac 
xeno_sigshadow_install_once@plt
83e6:   8b 84 24 c4 08 00 00mov0x8c4(%esp),%eax
83ed:   89 b4 24 78 08 00 00mov%esi,0x878(%esp)
83f4:   89 84 24 7c 08 00 00mov%eax,0x87c(%esp)
83fb:   8b 84 24 c8 08 00 00mov0x8c8(%esp),%eax
8402:   89 84 24 80 08 00 00mov%eax,0x880(%esp)
8409:   8b 84 24 cc 08 00 00mov0x8cc(%esp),%eax
8410:   89 84 24 84 08 00 00mov%eax,0x884(%esp)
8417:   e8 a0 9e ff ff  call   22bc pthread_self@plt
841c:   89 84 24 88 08 00 00mov%eax,0x888(%esp)
8423:   e8 34 9d ff ff  call   215c xeno_init_current_mode@plt
8428:   b9 f4 ff ff ff  mov$0xfff4,%ecx
842d:   85 c0   test   %eax,%eax
842f:   89 84 24 8c 08 00 00mov%eax,0x88c(%esp)
8436:   0f 84 bd 00 00 00   je 84f9 rt_task_shadow+0x159
843c:   8d 83 00 aa ff ff   lea-0x5600(%ebx),%eax
8442:   89 84 24 74 08 00 00mov%eax,0x874(%esp)
8449:   89 ac 24 70 08 00 00mov%ebp,0x870(%esp)
8450:   8b bb e8 ff ff ff   mov-0x18(%ebx),%edi
8456:   8d 84 24 70 08 00 00lea0x870(%esp),%eax
845d:   89 84 24 98 08 00 00mov%eax,0x898(%esp)
8464:   8d ac 24 78 08 00 00lea0x878(%esp),%ebp
846b:   90  nop
846c:   8d 74 26 00 lea0x0(%esi,%eiz,1),%esi
8470:   8b 07   mov(%edi),%eax
8472:   31 c9   xor%ecx,%ecx
8474:   c7 44 24 28 00 00 00movl   $0x0,0x28(%esp)
847b:   00 
847c:   0d 2b 02 00 00  or $0x22b,%eax
8481:   89 84 24 9c 08 00 00mov%eax,0x89c(%esp)
8488:   89 e8   mov%ebp,%eax
848a:   53  push   %ebx
848b:   89 c3   mov%eax,%ebx
848d:   8b 84 24 9c 08 00 00mov0x89c(%esp),%eax
8494:   55  push   %ebp
8495:   8b ac 24 98 08 00 00mov0x898(%esp),%ebp
849c:   cd 80   int$0x80
849e:   5d  pop%ebp
849f:   5b  pop%ebx
84a0:   89 c1   mov%eax,%ecx
84a2:   8b 44 24 28 mov0x28(%esp),%eax
84a6:   85 c0   test   %eax,%eax
84a8:   74 1a   je 84c4 rt_task_shadow+0x124
84aa:   8d 44 24 28 lea0x28(%esp),%eax
84ae:   89 4c 24 08 mov%ecx,0x8(%esp)
84b2:   c7 44 24 04 ab ff ffmovl   $0xffab,0x4(%esp)
84b9:   ff 
84ba:   89 04 24mov%eax,(%esp)
84bd:   e8 aa

[Xenomai-core] Problems with gcc 4.6.0 (rt_task_shadow fails with ENOSYS)

2011-07-07 Thread Anders Blomdell
When compiling kernel 2.6.37.3 and xenomai 2.5.6 with gcc version 4.6.0 
20110530 (Red Hat 4.6.0-9) (GCC), programs fail with -ENOSYS in 
rt_task_shadow. If compiled with gcc version 4.5.1 20100924 (Red Hat 
4.5.1-4) (GCC) everything works as expected.


Regards

Anders Blomdell


--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Duplicate symbols in analogy

2011-03-15 Thread Anders Blomdell
On 2011-03-14 20.29, Anders Blomdell wrote:
 I think it would make sense to change the name conflicts between analogy and
 comedi (range_unknown is one of them), to make it possible to have comedi and
 analogy to coexist on the same machine, anybody in support of this?
Anybody against then? IMHO it's a bad idea to have name conflicts with drivers
in the kernel (even if they are still in the saging area). What prefix should I
add to all modified exported symbols, would this make sense (a4ld == Analogy
for Linux Driver):

  mite_unsetup - a4ld_mite_unsetup
  etc...

Regards

Anders

-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] NI analog card shows wrong IRQ number after reboot

2011-03-14 Thread Anders Blomdell
Which is due to the fact that pci_enable_device (mite.c) is called at mite_setup
instead of mite_init. The bad thing with this, is that interrupt conflicts can
only be found AFTER the driver has been started with analogy_config, which is
often too late (since interrupt conflicts will bring down the machine).

Would it be a good idea to pci_enable_device in mite_init as well, or will that
break something else?

Regards

Anders Blomdell


-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] NI analog card shows wrong IRQ number after reboot

2011-03-14 Thread Anders Blomdell
On 2011-03-14 19.33, Anders Blomdell wrote:
 Which is due to the fact that pci_enable_device (mite.c) is called at 
 mite_setup
 instead of mite_init. The bad thing with this, is that interrupt conflicts can
 only be found AFTER the driver has been started with analogy_config, which is
 often too late (since interrupt conflicts will bring down the machine).
 
 Would it be a good idea to pci_enable_device in mite_init as well, or will 
 that
 break something else?

Many other kernel driver seems to call pci_enable_device from the probe
function, and this does give the card it's proper IRQ:

--- ksrc/drivers/analogy/national_instruments/mite.c.orig   2011-02-16
15:26:01.0 +0100
+++ ksrc/drivers/analogy/national_instruments/mite.c2011-03-14
19:38:18.572674136 +0100
@@ -80,6 +80,7 @@
}

list_add(mite-list, mite_devices);
+   pci_enable_device(mite-pcidev);

return 0;
 }

Regards

Anders


-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Duplicate symbols in analogy

2011-03-14 Thread Anders Blomdell
I think it would make sense to change the name conflicts between analogy and
comedi (range_unknown is one of them), to make it possible to have comedi and
analogy to coexist on the same machine, anybody in support of this?

Regards

Anders Blomdell

-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Problem with gcc-4.5.1

2010-12-08 Thread Anders Blomdell
On 2010-12-08 09.50, Gilles Chanteperdrix wrote:
 Anders Blomdell wrote:
 On 2010-12-07 21.21, Gilles Chanteperdrix wrote:
 Anders Blomdell wrote:
 On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote:
   Anders Blomdell wrote:
   On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote:
   Anders Blomdell wrote:
   When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1
   20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails 
 with the
   attached kernel OOPS, a notable difference between the 4.5.1 compiled
   version and a working one built with gcc-4.4.4 on the same system 
 with
   the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not
   inlined, is this anybody has seen before?
   No, that is new, we need to see the disassembly of 
 __rthal_x86_nodiv_ullimd
  
   objdump -S:
  
   static inline __attribute__((const)) unsigned long long
   __rthal_x86_nodiv_ullimd(const unsigned long long op,
   const unsigned long long frac,
   unsigned integ)
   {
 e7a8:55  push   %ebp
 e7a9:89 e5   mov%esp,%ebp
 e7ab:57  push   %edi
 e7ac:56  push   %esi
 e7ad:53  push   %ebx
 e7ae:83 ec 10sub$0x10,%esp
 e7b1:8d 7d 08lea0x8(%ebp),%edi
 e7b4:e8 fc ff ff ff  call 
 e7b5__rthal_x86_nodiv_ullimd+0xd
 e7b9:8b 1f   mov(%edi),%ebx
 e7bb:8b 4f 04mov0x4(%edi),%ecx
  register unsigned rm __asm__(esi);
  register unsigned rh __asm__(edi);
  unsigned fracl, frach, opl, oph;
  register unsigned long long t;
  
  __rthal_u64tou32(op, oph, opl);
 e7be:89 45 e8mov%eax,-0x18(%ebp)
  __rthal_u64tou32(frac, frach, fracl);
 e7c1:89 5d f0mov%ebx,-0x10(%ebp)
  register unsigned rm __asm__(esi);
  register unsigned rh __asm__(edi);
  unsigned fracl, frach, opl, oph;
  register unsigned long long t;
  
  __rthal_u64tou32(op, oph, opl);
 e7c4:89 55 e4mov%edx,-0x1c(%ebp)
  __rthal_u64tou32(frac, frach, fracl);
 e7c7:89 4d ecmov%ecx,-0x14(%ebp)
  
  __asm__ (mov %[oph], %%eax\n\t
 e7ca:8b 45 e4mov-0x1c(%ebp),%eax
 e7cd:f7 65 ecmull   -0x14(%ebp)
 e7d0:89 c6   mov%eax,%esi
 e7d2:89 d7   mov%edx,%edi
 e7d4:8b 45 e8mov-0x18(%ebp),%eax
 e7d7:f7 65 f0mull   -0x10(%ebp)
 e7da:89 d1   mov%edx,%ecx
 e7dc:d1 e0   shl%eax
 e7de:83 d1 00adc$0x0,%ecx
 e7e1:83 d6 00adc$0x0,%esi
 e7e4:83 d7 00adc$0x0,%edi
 e7e7:8b 45 e4mov-0x1c(%ebp),%eax
 e7ea:f7 65 f0mull   -0x10(%ebp)
 e7ed:01 c1   add%eax,%ecx
 e7ef:11 d6   adc%edx,%esi
 e7f1:83 d7 00adc$0x0,%edi
 e7f4:8b 45 e8mov-0x18(%ebp),%eax
 e7f7:f7 65 ecmull   -0x14(%ebp)
 e7fa:01 c1   add%eax,%ecx
 e7fc:11 d6   adc%edx,%esi
 e7fe:83 d7 00adc$0x0,%edi
 e801:8b 45 e8mov-0x18(%ebp),%eax
 e804:f7 67 08mull   0x8(%edi)
  
   Problem is here: edi is used by gcc as if it contained an address
   whereas it is used by the assembly for the computation. Should be marked
   early clobber. So,
  
   in include/asm-x86/arith_32.h, replace:
  
   : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
  
   with:
  
   : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
  
  

 No cigar (:-()
 Ok. Maybe we can try something less radical, such as:

 : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)

 This is incorrect, but we can hope for the best...
 As previously said, changing the optimization from -Os to anything else for
 xeno_nucleus (see patch in mail dated 'Tue, 07 Dec 2010 17:20:37 +0100'), 
 solved
 that issue (incorrect code + hope for the best - spurious disasters). Rather
 compile time errors than runtime errors.
 
 We are not going to decide instead of the user what optimization level
 to use, if he wants to use -Os, we have to make it work for -Os. If this
 one does not work, we have other things to try.
Then start with something that you belive is correct, I *WILL NOT* test
something which you think is incorrect.

/Anders

-- 
Anders

[Xenomai-core] Problem with gcc-4.5.1

2010-12-07 Thread Anders Blomdell
When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 
20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the 
attached kernel OOPS, a notable difference between the 4.5.1 compiled 
version and a working one built with gcc-4.4.4 on the same system with 
the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not 
inlined, is this anybody has seen before?



Regards

Anders Blomdell

--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

BUG: unable to handle kernel NULL pointer dereference at 0008
IP: [fbf25804] __rthal_x86_nodiv_ullimd+0x5c/0x74 [xeno_nucleus]
*pdpt = 01d91001 *pde =  
Oops:  [#1] SMP 
last sysfs file: /sys/module/microcode/initstate
Modules linked in: xeno_nucleus(+) e1000 snd_timer snd e1000e soundcore 
iTCO_wdt i2c_i801 serio_raw iTCO_vendor_support snd_page_alloc microcode(+) 
pcspkr pata_acpi firewire_ohci ata_generic firewire_core crc_itu_t pata_marvell 
nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core

Pid: 519, comm: modprobe Not tainted 2.6.35.7_xenomai-2.5.5.2_rtnet-39f7fcf #1 
DP35DP/

EIP: 0060:[fbf25804] EFLAGS: 00010246 CPU: 0
EIP is at __rthal_x86_nodiv_ullimd+0x5c/0x74 [xeno_nucleus]
EAX:  EBX: b36c048c ECX:  EDX: 
ESI:  EDI:  EBP: c1ef1f34 ESP: c1ef1f18
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process modprobe (pid: 519, ti=c1ef task=c1e04080 task.ti=c1ef)
I-pipe domain Linux
Stack:
  665d7cba b36c048c  b36c048c 665d7cba c1ef1f54
0 fbf25887 b36c048c 665d7cba 0002 0001 0004 665d7cba c1ef1f60
0 fbf25a3f  c1ef1f84 fbcaf215  1194d800 0001 3b9aca00
Call Trace:
[fbf25887] ? xnarch_ns_to_tsc+0x34/0x4a [xeno_nucleus]
[fbf25a3f] ? xnarch_calibrate_sched+0x1a/0xf2 [xeno_nucleus]
[fbcaf215] ? __xeno_sys_init+0x189/0x2fd [xeno_nucleus]
[fbcaf08c] ? __xeno_sys_init+0x0/0x2fd [xeno_nucleus]
[c0401263] ? do_one_initcall+0x62/0x16f
[c046843c] ? sys_init_module+0x7f/0x19d
[c040299d] ? sysenter_do_call+0x12/0x16
Code: f0 89 d1 d1 e0 83 d1 00 83 d6 00 83 d7 00 8b 45 e4 f7 65 f0 01 c1 11 d6 
83 d7 00 8b 45 e8 f7 65 ec 01 c1 11 d6 83 d7 00 8b 45 e8 f7 67 08 01 f0 11 d7 
8b 55 e4 0f af 57 08 01 fa 83 c4 10 5b 5e 
EIP: [fbf25804] __rthal_x86_nodiv_ullimd+0x5c/0x74 [xeno_nucleus] SS:ESP 
0068:c1ef1f18
CR2: 0008
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Problem with gcc-4.5.1

2010-12-07 Thread Anders Blomdell

On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote:

Anders Blomdell wrote:

When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1
20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the
attached kernel OOPS, a notable difference between the 4.5.1 compiled
version and a working one built with gcc-4.4.4 on the same system with
the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not
inlined, is this anybody has seen before?


No, that is new, we need to see the disassembly of __rthal_x86_nodiv_ullimd


objdump -S:

static inline __attribute__((const)) unsigned long long
__rthal_x86_nodiv_ullimd(const unsigned long long op,
 const unsigned long long frac,
 unsigned integ)
{
e7a8:   55  push   %ebp
e7a9:   89 e5   mov%esp,%ebp
e7ab:   57  push   %edi
e7ac:   56  push   %esi
e7ad:   53  push   %ebx
e7ae:   83 ec 10sub$0x10,%esp
e7b1:   8d 7d 08lea0x8(%ebp),%edi
e7b4:   e8 fc ff ff ff  call   e7b5 
__rthal_x86_nodiv_ullimd+0xd
e7b9:   8b 1f   mov(%edi),%ebx
e7bb:   8b 4f 04mov0x4(%edi),%ecx
register unsigned rm __asm__(esi);
register unsigned rh __asm__(edi);
unsigned fracl, frach, opl, oph;
register unsigned long long t;

__rthal_u64tou32(op, oph, opl);
e7be:   89 45 e8mov%eax,-0x18(%ebp)
__rthal_u64tou32(frac, frach, fracl);
e7c1:   89 5d f0mov%ebx,-0x10(%ebp)
register unsigned rm __asm__(esi);
register unsigned rh __asm__(edi);
unsigned fracl, frach, opl, oph;
register unsigned long long t;

__rthal_u64tou32(op, oph, opl);
e7c4:   89 55 e4mov%edx,-0x1c(%ebp)
__rthal_u64tou32(frac, frach, fracl);
e7c7:   89 4d ecmov%ecx,-0x14(%ebp)

__asm__ (mov %[oph], %%eax\n\t
e7ca:   8b 45 e4mov-0x1c(%ebp),%eax
e7cd:   f7 65 ecmull   -0x14(%ebp)
e7d0:   89 c6   mov%eax,%esi
e7d2:   89 d7   mov%edx,%edi
e7d4:   8b 45 e8mov-0x18(%ebp),%eax
e7d7:   f7 65 f0mull   -0x10(%ebp)
e7da:   89 d1   mov%edx,%ecx
e7dc:   d1 e0   shl%eax
e7de:   83 d1 00adc$0x0,%ecx
e7e1:   83 d6 00adc$0x0,%esi
e7e4:   83 d7 00adc$0x0,%edi
e7e7:   8b 45 e4mov-0x1c(%ebp),%eax
e7ea:   f7 65 f0mull   -0x10(%ebp)
e7ed:   01 c1   add%eax,%ecx
e7ef:   11 d6   adc%edx,%esi
e7f1:   83 d7 00adc$0x0,%edi
e7f4:   8b 45 e8mov-0x18(%ebp),%eax
e7f7:   f7 65 ecmull   -0x14(%ebp)
e7fa:   01 c1   add%eax,%ecx
e7fc:   11 d6   adc%edx,%esi
e7fe:   83 d7 00adc$0x0,%edi
e801:   8b 45 e8mov-0x18(%ebp),%eax
e804:   f7 67 08mull   0x8(%edi)
e807:   01 f0   add%esi,%eax
e809:   11 d7   adc%edx,%edi
e80b:   8b 55 e4mov-0x1c(%ebp),%edx
e80e:   0f af 57 08 imul   0x8(%edi),%edx
e812:   01 fa   add%edi,%edx
 : [opl]m(opl), [oph]m(oph),
   [fracl]m(fracl), [frach]m(frach), [integ]m(integ)
 : cc);

return t;
}
e814:   83 c4 10add$0x10,%esp
e817:   5b  pop%ebx
e818:   5e  pop%esi
e819:   5f  pop%edi
e81a:   5d  pop%ebp
e81b:   c3  ret


But us I said, in the working version, the code seems to be inlined 
everywhere. Should I send the two object modules as well (probably as a 
private message?).


/Anders

--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Problem with gcc-4.5.1

2010-12-07 Thread Anders Blomdell


On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote:
 Anders Blomdell wrote:
 On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote:
 Anders Blomdell wrote:
 When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1
 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails 
with the

 attached kernel OOPS, a notable difference between the 4.5.1 compiled
 version and a working one built with gcc-4.4.4 on the same system with
 the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not
 inlined, is this anybody has seen before?
 No, that is new, we need to see the disassembly of 
__rthal_x86_nodiv_ullimd


 objdump -S:

 static inline __attribute__((const)) unsigned long long
 __rthal_x86_nodiv_ullimd(const unsigned long long op,
 const unsigned long long frac,
 unsigned integ)
 {
   e7a8:55  push   %ebp
   e7a9:89 e5   mov%esp,%ebp
   e7ab:57  push   %edi
   e7ac:56  push   %esi
   e7ad:53  push   %ebx
   e7ae:83 ec 10sub$0x10,%esp
   e7b1:8d 7d 08lea0x8(%ebp),%edi
   e7b4:	e8 fc ff ff ff   	call 
e7b5__rthal_x86_nodiv_ullimd+0xd

   e7b9:8b 1f   mov(%edi),%ebx
   e7bb:8b 4f 04mov0x4(%edi),%ecx
register unsigned rm __asm__(esi);
register unsigned rh __asm__(edi);
unsigned fracl, frach, opl, oph;
register unsigned long long t;

__rthal_u64tou32(op, oph, opl);
   e7be:89 45 e8mov%eax,-0x18(%ebp)
__rthal_u64tou32(frac, frach, fracl);
   e7c1:89 5d f0mov%ebx,-0x10(%ebp)
register unsigned rm __asm__(esi);
register unsigned rh __asm__(edi);
unsigned fracl, frach, opl, oph;
register unsigned long long t;

__rthal_u64tou32(op, oph, opl);
   e7c4:89 55 e4mov%edx,-0x1c(%ebp)
__rthal_u64tou32(frac, frach, fracl);
   e7c7:89 4d ecmov%ecx,-0x14(%ebp)

__asm__ (mov %[oph], %%eax\n\t
   e7ca:8b 45 e4mov-0x1c(%ebp),%eax
   e7cd:f7 65 ecmull   -0x14(%ebp)
   e7d0:89 c6   mov%eax,%esi
   e7d2:89 d7   mov%edx,%edi
   e7d4:8b 45 e8mov-0x18(%ebp),%eax
   e7d7:f7 65 f0mull   -0x10(%ebp)
   e7da:89 d1   mov%edx,%ecx
   e7dc:d1 e0   shl%eax
   e7de:83 d1 00adc$0x0,%ecx
   e7e1:83 d6 00adc$0x0,%esi
   e7e4:83 d7 00adc$0x0,%edi
   e7e7:8b 45 e4mov-0x1c(%ebp),%eax
   e7ea:f7 65 f0mull   -0x10(%ebp)
   e7ed:01 c1   add%eax,%ecx
   e7ef:11 d6   adc%edx,%esi
   e7f1:83 d7 00adc$0x0,%edi
   e7f4:8b 45 e8mov-0x18(%ebp),%eax
   e7f7:f7 65 ecmull   -0x14(%ebp)
   e7fa:01 c1   add%eax,%ecx
   e7fc:11 d6   adc%edx,%esi
   e7fe:83 d7 00adc$0x0,%edi
   e801:8b 45 e8mov-0x18(%ebp),%eax
   e804:f7 67 08mull   0x8(%edi)

 Problem is here: edi is used by gcc as if it contained an address
 whereas it is used by the assembly for the computation. Should be marked
 early clobber. So,

 in include/asm-x86/arith_32.h, replace:

 : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)

 with:

 : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)



No cigar (:-()

arch/x86/include/asm/xenomai/arith_32.h: In function 
‘__rthal_x86_nodiv_ullimd’:
arch/x86/include/asm/xenomai/arith_32.h:154:2: error: can't find a 
register in class ‘DIREG’ while reloading ‘asm’
arch/x86/include/asm/xenomai/arith_32.h:154:2: error: ‘asm’ operand has 
impossible constraints


Forcing compilation with optimizations besides -Os seems to work.

 But us I said, in the working version, the code seems to be inlined
 everywhere. Should I send the two object modules as well (probably as a
 private message?).

 The code should work the same whatever gcc decides regarding inlining.
 Whether we like gcc decision is a different issue.
Agreed

 Note that there is an
 option to get gcc to go back to the old behaviour (inlining as the
 source command).
What option is that?

/Anders

--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

Re: [Xenomai-core] Problem with gcc-4.5.1

2010-12-07 Thread Anders Blomdell

On 12/07/2010 03:14 PM, Anders Blomdell wrote:


On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote:
  Anders Blomdell wrote:
  On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote:
  Anders Blomdell wrote:
  When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1
  20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails
with the
  attached kernel OOPS, a notable difference between the 4.5.1 compiled
  version and a working one built with gcc-4.4.4 on the same system
with
  the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not
  inlined, is this anybody has seen before?
  No, that is new, we need to see the disassembly of
__rthal_x86_nodiv_ullimd
 
  objdump -S:
 
  static inline __attribute__((const)) unsigned long long
  __rthal_x86_nodiv_ullimd(const unsigned long long op,
  const unsigned long long frac,
  unsigned integ)
  {
  e7a8: 55 push %ebp
  e7a9: 89 e5 mov %esp,%ebp
  e7ab: 57 push %edi
  e7ac: 56 push %esi
  e7ad: 53 push %ebx
  e7ae: 83 ec 10 sub $0x10,%esp
  e7b1: 8d 7d 08 lea 0x8(%ebp),%edi
  e7b4: e8 fc ff ff ff call e7b5__rthal_x86_nodiv_ullimd+0xd
  e7b9: 8b 1f mov (%edi),%ebx
  e7bb: 8b 4f 04 mov 0x4(%edi),%ecx
  register unsigned rm __asm__(esi);
  register unsigned rh __asm__(edi);
  unsigned fracl, frach, opl, oph;
  register unsigned long long t;
 
  __rthal_u64tou32(op, oph, opl);
  e7be: 89 45 e8 mov %eax,-0x18(%ebp)
  __rthal_u64tou32(frac, frach, fracl);
  e7c1: 89 5d f0 mov %ebx,-0x10(%ebp)
  register unsigned rm __asm__(esi);
  register unsigned rh __asm__(edi);
  unsigned fracl, frach, opl, oph;
  register unsigned long long t;
 
  __rthal_u64tou32(op, oph, opl);
  e7c4: 89 55 e4 mov %edx,-0x1c(%ebp)
  __rthal_u64tou32(frac, frach, fracl);
  e7c7: 89 4d ec mov %ecx,-0x14(%ebp)
 
  __asm__ (mov %[oph], %%eax\n\t
  e7ca: 8b 45 e4 mov -0x1c(%ebp),%eax
  e7cd: f7 65 ec mull -0x14(%ebp)
  e7d0: 89 c6 mov %eax,%esi
  e7d2: 89 d7 mov %edx,%edi
  e7d4: 8b 45 e8 mov -0x18(%ebp),%eax
  e7d7: f7 65 f0 mull -0x10(%ebp)
  e7da: 89 d1 mov %edx,%ecx
  e7dc: d1 e0 shl %eax
  e7de: 83 d1 00 adc $0x0,%ecx
  e7e1: 83 d6 00 adc $0x0,%esi
  e7e4: 83 d7 00 adc $0x0,%edi
  e7e7: 8b 45 e4 mov -0x1c(%ebp),%eax
  e7ea: f7 65 f0 mull -0x10(%ebp)
  e7ed: 01 c1 add %eax,%ecx
  e7ef: 11 d6 adc %edx,%esi
  e7f1: 83 d7 00 adc $0x0,%edi
  e7f4: 8b 45 e8 mov -0x18(%ebp),%eax
  e7f7: f7 65 ec mull -0x14(%ebp)
  e7fa: 01 c1 add %eax,%ecx
  e7fc: 11 d6 adc %edx,%esi
  e7fe: 83 d7 00 adc $0x0,%edi
  e801: 8b 45 e8 mov -0x18(%ebp),%eax
  e804: f7 67 08 mull 0x8(%edi)
 
  Problem is here: edi is used by gcc as if it contained an address
  whereas it is used by the assembly for the computation. Should be marked
  early clobber. So,
 
  in include/asm-x86/arith_32.h, replace:
 
  : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
 
  with:
 
  : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
 
 

No cigar (:-()

arch/x86/include/asm/xenomai/arith_32.h: In function
‘__rthal_x86_nodiv_ullimd’:
arch/x86/include/asm/xenomai/arith_32.h:154:2: error: can't find a
register in class ‘DIREG’ while reloading ‘asm’
arch/x86/include/asm/xenomai/arith_32.h:154:2: error: ‘asm’ operand has
impossible constraints

Forcing compilation with optimizations besides -Os seems to work.


Patch that makes code compile and generates modules that loads is attached.


  But us I said, in the working version, the code seems to be inlined
  everywhere. Should I send the two object modules as well (probably as a
  private message?).
 
  The code should work the same whatever gcc decides regarding inlining.
  Whether we like gcc decision is a different issue.
Agreed

  Note that there is an
  option to get gcc to go back to the old behaviour (inlining as the
  source command).
What option is that?

/Anders




--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

--- a/include/asm-x86/arith_32.h	2010-05-18 20:31:15.0 +0200
+++ b/include/asm-x86/arith_32.h	2010-12-07 13:22:32.0 +0100
@@ -179,8 +179,8 @@
 		 mov %[oph], %%edx\n\t
 		 imul %[integ], %%edx\n\t
 		 add %[rh], %%edx\n\t
-		 : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
+		 : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
 		 : [opl]m(opl), [oph]m(oph),
 		   [fracl]m(fracl), [frach]m(frach), [integ]m(integ)
 		 : cc);

--- a/ksrc/nucleus/Makefile	2010-05-18 20:31:16.0 +0200
+++ b/ksrc/nucleus/Makefile	2010-12-07 16:09:46.0 +0100
@@ -21,7 +21,7 @@
 # exist on initcalls defined by other object files.
 xeno_nucleus-y += module.o
 
-EXTRA_CFLAGS += -D__IN_XENOMAI__ -Iinclude/xenomai
+EXTRA_CFLAGS += -D__IN_XENOMAI__ -Iinclude/xenomai -O3
 
 else
 
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Problem with gcc-4.5.1

2010-12-07 Thread Anders Blomdell
On 2010-12-07 21.21, Gilles Chanteperdrix wrote:
 Anders Blomdell wrote:
 On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote:
   Anders Blomdell wrote:
   On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote:
   Anders Blomdell wrote:
   When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1
   20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails 
 with the
   attached kernel OOPS, a notable difference between the 4.5.1 compiled
   version and a working one built with gcc-4.4.4 on the same system with
   the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not
   inlined, is this anybody has seen before?
   No, that is new, we need to see the disassembly of 
 __rthal_x86_nodiv_ullimd
  
   objdump -S:
  
   static inline __attribute__((const)) unsigned long long
   __rthal_x86_nodiv_ullimd(const unsigned long long op,
 const unsigned long long frac,
 unsigned integ)
   {
 e7a8:  55  push   %ebp
 e7a9:  89 e5   mov%esp,%ebp
 e7ab:  57  push   %edi
 e7ac:  56  push   %esi
 e7ad:  53  push   %ebx
 e7ae:  83 ec 10sub$0x10,%esp
 e7b1:  8d 7d 08lea0x8(%ebp),%edi
 e7b4:  e8 fc ff ff ff  call 
 e7b5__rthal_x86_nodiv_ullimd+0xd
 e7b9:  8b 1f   mov(%edi),%ebx
 e7bb:  8b 4f 04mov0x4(%edi),%ecx
register unsigned rm __asm__(esi);
register unsigned rh __asm__(edi);
unsigned fracl, frach, opl, oph;
register unsigned long long t;
  
__rthal_u64tou32(op, oph, opl);
 e7be:  89 45 e8mov%eax,-0x18(%ebp)
__rthal_u64tou32(frac, frach, fracl);
 e7c1:  89 5d f0mov%ebx,-0x10(%ebp)
register unsigned rm __asm__(esi);
register unsigned rh __asm__(edi);
unsigned fracl, frach, opl, oph;
register unsigned long long t;
  
__rthal_u64tou32(op, oph, opl);
 e7c4:  89 55 e4mov%edx,-0x1c(%ebp)
__rthal_u64tou32(frac, frach, fracl);
 e7c7:  89 4d ecmov%ecx,-0x14(%ebp)
  
__asm__ (mov %[oph], %%eax\n\t
 e7ca:  8b 45 e4mov-0x1c(%ebp),%eax
 e7cd:  f7 65 ecmull   -0x14(%ebp)
 e7d0:  89 c6   mov%eax,%esi
 e7d2:  89 d7   mov%edx,%edi
 e7d4:  8b 45 e8mov-0x18(%ebp),%eax
 e7d7:  f7 65 f0mull   -0x10(%ebp)
 e7da:  89 d1   mov%edx,%ecx
 e7dc:  d1 e0   shl%eax
 e7de:  83 d1 00adc$0x0,%ecx
 e7e1:  83 d6 00adc$0x0,%esi
 e7e4:  83 d7 00adc$0x0,%edi
 e7e7:  8b 45 e4mov-0x1c(%ebp),%eax
 e7ea:  f7 65 f0mull   -0x10(%ebp)
 e7ed:  01 c1   add%eax,%ecx
 e7ef:  11 d6   adc%edx,%esi
 e7f1:  83 d7 00adc$0x0,%edi
 e7f4:  8b 45 e8mov-0x18(%ebp),%eax
 e7f7:  f7 65 ecmull   -0x14(%ebp)
 e7fa:  01 c1   add%eax,%ecx
 e7fc:  11 d6   adc%edx,%esi
 e7fe:  83 d7 00adc$0x0,%edi
 e801:  8b 45 e8mov-0x18(%ebp),%eax
 e804:  f7 67 08mull   0x8(%edi)
  
   Problem is here: edi is used by gcc as if it contained an address
   whereas it is used by the assembly for the computation. Should be marked
   early clobber. So,
  
   in include/asm-x86/arith_32.h, replace:
  
   : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
  
   with:
  
   : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
  
  

 No cigar (:-()
 
 Ok. Maybe we can try something less radical, such as:
 
 : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t)
 
 This is incorrect, but we can hope for the best...
As previously said, changing the optimization from -Os to anything else for
xeno_nucleus (see patch in mail dated 'Tue, 07 Dec 2010 17:20:37 +0100'), solved
that issue (incorrect code + hope for the best - spurious disasters). Rather
compile time errors than runtime errors.

/Anders

-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-06 Thread Anders Blomdell

Gilles Chanteperdrix wrote:

Anders Blomdell wrote:

Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 05.11.2010 00:24, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 23:06, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

At first sight, here you are more breaking things than cleaning them.
Still, it has the SMP record for my test program, still runs with ftrace 
on (after 2 hours, where it previously failed after maximum 23 minutes).

My version was indeed still buggy, I'm reworking it ATM.

If I get the gist of Jan's changes, they are (using the IPI to transfer 
one bit of information: your cpu needs to reschedule):


xnsched_set_resched:
-  setbits((__sched__)-status, XNRESCHED);

xnpod_schedule_handler:
+   xnsched_set_resched(sched);

If you (we?) decide to keep the debug checks, under what circumstances 
would the current check trigger (in laymans language, that I'll be able 
to understand)?

That's actually what /me is wondering as well. I do not see yet how you
can reliably detect a missed reschedule reliably (that was the purpose
of the debug check) given the racy nature between signaling resched and
processing the resched hints.

The purpose of the debugging change is to detect a change of the
scheduler state which was not followed by setting the XNRESCHED bit.

But that is nucleus business, nothing skins can screw up (as long as
they do not misuse APIs).

Yes, but it happens that we modify the nucleus from time to time.


Getting it to work is relatively simple: we add a scheduler change set
remotely bit to the sched structure which is NOT in the status bit, set
this bit when changing a remote sched (under nklock). In the debug check
code, if the scheduler state changed, and the XNRESCHED bit is not set,
only consider this a but if this new bit is not set. All this is
compiled out if the debug is not enabled.

I still see no benefit in this check. Where to you want to place the bit
set? Aren't that just the same locations where
xnsched_set_[self_]resched already is today?

Well no, that would be another bit in the sched structure which would
allow us to manipulate the status bits from the local cpu. That
supplementary bit would only be changed from a distant CPU, and serve to
detect the race which causes the false positive. The resched bits are
set on the local cpu to get xnpod_schedule to trigger a rescheduling on
the distance cpu. That bit would be set on the remote cpu's sched. Only
when debugging is enabled.


But maybe you can provide some motivating bug scenarios, real ones of
the past or realistic ones of the future.

Of course. The bug is anything which changes the scheduler state but
does not set the XNRESCHED bit. This happened when we started the SMP
port. New scheduling policies would be good candidates for a revival of
this bug.


You don't gain any worthwhile check if you cannot make the
instrumentation required for a stable detection simpler than the proper
problem solution itself. And this is what I'm still skeptical of.
The solution is simple, but finding the problem without the 
instrumentation is way harder than with the instrumentation, so the 
instrumentation is worth something.


Reproducing the false positive is surprisingly easy with a simple
dual-cpu semaphore ping-pong test. So, here is the (tested) patch, 
using a ridiculous long variable name to illustrate what I was 
thinking about:


diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index cf4..454b8e8 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -108,6 +108,9 @@ typedef struct xnsched {
struct xnthread *gktarget;
 #endif

+#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS
+   int debug_resched_from_remote;
+#endif
 } xnsched_t;

 union xnsched_policy_param;
@@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched)
   xnsched_t *current_sched = xnpod_current_sched();\
   __setbits(current_sched-status, XNRESCHED); \
   if (current_sched != (__sched__)){   \
+ if (XENO_DEBUG(NUCLEUS))  \
+ __sched__-debug_resched_from_remote = 1; \
   xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  \
   }\
 } while (0)
diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 4cb707a..50b0f49 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct xnsched 
*sched)
xnarch_cpus_clear(sched-resched);
}
 #endif
+   if (XENO_DEBUG(NUCLEUS)  sched-debug_resched_from_remote) {
+   sched-debug_resched_from_remote = 0;
+   resched = 1;
+   }
clrbits(sched-status, XNRESCHED);
return resched;
 }


I am still uncertain.
Will only work if all is done under nklock, otherwise two

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-05 Thread Anders Blomdell

Gilles Chanteperdrix wrote:

Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 05.11.2010 00:25, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 23:08, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

rework. Safer for now is likely to revert 56ff4329ff, keeping nucleus
debugging off.

That is not enough.

It is, I've reviewed the code today.

The fallouts I am talking about are:
47dac49c71e89b684203e854d1b0172ecacbc555

Not related.


38f2ca83a8e63cc94eaa911ff1c0940c884b5078

An optimization.


5e7cfa5c25672e4478a721eadbd6f6c5b4f88a2f

That fall out of that commit is fixed in my series.


This commit was followed by several others to fix
the fix. You know how things are, someone proposes a fix, which fixes
things for him, but it breaks in the other people configurations (one of
the fallouts was a complete revamp of include/asm-arm/atomic.h for
instance).


I've pushed a series that reverts that commit, then fixes and cleans up
on top of it. Just pushed if you want to take a look. We can find some
alternative debugging mechanism independently (though I'm curious to see
it - it still makes no sense to me).

Since the fix is simply a modification to what we have currently. I
would prefer if we did not remove it. In fact, I think it would be
simpler if we started from what we currently have than reverting past
patches.

Look at the series, it goes step by step to an IMHO clean state. We can
pull out the debugging check removal, though, if you prefer to work on
top of the existing code.
From my point of view, Anders looks for something that works, so 
following the rules that the minimal set of changes minimize the chances
of introducing new bugs while cleaning, I would go for the minimal set 
of changes, such as:


The tested one (on SMP, and UP with and without unlocked ctx switch):

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index df56417..cf4 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -165,28 +165,27 @@ struct xnsched_class {
 #endif /* CONFIG_SMP */
 
 /* Test all resched flags from the given scheduler mask. */

-static inline int xnsched_resched_p(struct xnsched *sched)
+static inline int xnsched_remote_resched_p(struct xnsched *sched)
 {
-   return testbits(sched-status, XNRESCHED);
+   return !xnarch_cpus_empty(sched-resched);
 }
 
-static inline int xnsched_self_resched_p(struct xnsched *sched)

+static inline int xnsched_resched_p(struct xnsched *sched)
 {
return testbits(sched-status, XNRESCHED);
 }
 
 /* Set self resched flag for the given scheduler. */

 #define xnsched_set_self_resched(__sched__) do {   \
-  setbits((__sched__)-status, XNRESCHED);  \
+  __setbits((__sched__)-status, XNRESCHED);\
 } while (0)
 
 /* Set specific resched flag into the local scheduler mask. */

 #define xnsched_set_resched(__sched__) do {\
   xnsched_t *current_sched = xnpod_current_sched();\
-  setbits(current_sched-status, XNRESCHED);\
+  __setbits(current_sched-status, XNRESCHED);  \
   if (current_sched != (__sched__)){   \
   xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);   \
-  setbits((__sched__)-status, XNRESCHED);  \
   }\
 } while (0)
 
diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c

index 862838c..4cb707a 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -276,18 +276,16 @@ EXPORT_SYMBOL_GPL(xnpod_fatal_helper);
 
 void xnpod_schedule_handler(void) /* Called with hw interrupts off. */

 {
-   xnsched_t *sched;
+   xnsched_t *sched = xnpod_current_sched();
 
 	trace_mark(xn_nucleus, sched_remote, MARK_NOARGS);

 #if defined(CONFIG_SMP)  defined(CONFIG_XENO_OPT_PRIOCPL)
-   sched = xnpod_current_sched();
if (testbits(sched-status, XNRPICK)) {
clrbits(sched-status, XNRPICK);
xnshadow_rpi_check();
}
-#else
-   (void)sched;
 #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
+   xnsched_set_self_resched(sched);
xnpod_schedule();
 }
 
@@ -2174,7 +2172,7 @@ static inline int __xnpod_test_resched(struct xnsched *sched)

int resched = testbits(sched-status, XNRESCHED);
 #ifdef CONFIG_SMP
/* Send resched IPI to remote CPU(s). */
-   if (unlikely(xnsched_resched_p(sched))) {
+   if (unlikely(xnsched_remote_resched_p(sched))) {
xnarch_send_ipi(sched-resched);
xnarch_cpus_clear(sched-resched);
}
diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c
index 1fe3331..a0ac627 100644
--- a/ksrc/nucleus/timer.c
+++ b/ksrc/nucleus/timer.c
@@ -97,7 +97,7 @@ void xntimer_next_local_shot(xnsched_t *sched)
__clrbits(sched-status, XNHDEFER);
timer = aplink2timer(h);
  

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-05 Thread Anders Blomdell

Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 05.11.2010 00:24, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 23:06, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

At first sight, here you are more breaking things than cleaning them.
Still, it has the SMP record for my test program, still runs with ftrace 
on (after 2 hours, where it previously failed after maximum 23 minutes).

My version was indeed still buggy, I'm reworking it ATM.

If I get the gist of Jan's changes, they are (using the IPI to transfer 
one bit of information: your cpu needs to reschedule):


xnsched_set_resched:
-  setbits((__sched__)-status, XNRESCHED);

xnpod_schedule_handler:
+   xnsched_set_resched(sched);

If you (we?) decide to keep the debug checks, under what circumstances 
would the current check trigger (in laymans language, that I'll be able 
to understand)?

That's actually what /me is wondering as well. I do not see yet how you
can reliably detect a missed reschedule reliably (that was the purpose
of the debug check) given the racy nature between signaling resched and
processing the resched hints.

The purpose of the debugging change is to detect a change of the
scheduler state which was not followed by setting the XNRESCHED bit.

But that is nucleus business, nothing skins can screw up (as long as
they do not misuse APIs).

Yes, but it happens that we modify the nucleus from time to time.


Getting it to work is relatively simple: we add a scheduler change set
remotely bit to the sched structure which is NOT in the status bit, set
this bit when changing a remote sched (under nklock). In the debug check
code, if the scheduler state changed, and the XNRESCHED bit is not set,
only consider this a but if this new bit is not set. All this is
compiled out if the debug is not enabled.

I still see no benefit in this check. Where to you want to place the bit
set? Aren't that just the same locations where
xnsched_set_[self_]resched already is today?

Well no, that would be another bit in the sched structure which would
allow us to manipulate the status bits from the local cpu. That
supplementary bit would only be changed from a distant CPU, and serve to
detect the race which causes the false positive. The resched bits are
set on the local cpu to get xnpod_schedule to trigger a rescheduling on
the distance cpu. That bit would be set on the remote cpu's sched. Only
when debugging is enabled.


But maybe you can provide some motivating bug scenarios, real ones of
the past or realistic ones of the future.

Of course. The bug is anything which changes the scheduler state but
does not set the XNRESCHED bit. This happened when we started the SMP
port. New scheduling policies would be good candidates for a revival of
this bug.


You don't gain any worthwhile check if you cannot make the
instrumentation required for a stable detection simpler than the proper
problem solution itself. And this is what I'm still skeptical of.


The solution is simple, but finding the problem without the 
instrumentation is way harder than with the instrumentation, so the 
instrumentation is worth something.


Reproducing the false positive is surprisingly easy with a simple
dual-cpu semaphore ping-pong test. So, here is the (tested) patch, 
using a ridiculous long variable name to illustrate what I was 
thinking about:


diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index cf4..454b8e8 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -108,6 +108,9 @@ typedef struct xnsched {
struct xnthread *gktarget;
 #endif

+#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS
+   int debug_resched_from_remote;
+#endif
 } xnsched_t;

 union xnsched_policy_param;
@@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched)
   xnsched_t *current_sched = xnpod_current_sched();\
   __setbits(current_sched-status, XNRESCHED); \
   if (current_sched != (__sched__)){   \
+ if (XENO_DEBUG(NUCLEUS))  \
+ __sched__-debug_resched_from_remote = 1; \
   xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  \
   }\
 } while (0)
diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 4cb707a..50b0f49 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct xnsched 
*sched)
xnarch_cpus_clear(sched-resched);
}
 #endif
+   if (XENO_DEBUG(NUCLEUS)  sched-debug_resched_from_remote) {
+   sched-debug_resched_from_remote = 0;
+   resched = 1;
+   }
clrbits(sched-status, XNRESCHED);
return resched;
 }


I am still uncertain.
Will only work if all is done under nklock, otherwise two almost 
simultaneous xnsched_resched_p from different 

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-04 Thread Anders Blomdell

Jan Kiszka wrote:

Am 04.11.2010 01:13, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 00:56, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 00:44, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 00:18, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 00:11, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 03.11.2010 23:11, Jan Kiszka wrote:

Am 03.11.2010 23:03, Jan Kiszka wrote:

But we not not always use atomic ops for manipulating status bits (but
we do in other cases where this is no need - different story). This may
fix the race:

Err, nonsense. As we manipulate xnsched::status also outside of nklock
protection, we must _always_ use atomic ops.

This screams for a cleanup: local-only bits like XNHTICK or XNINIRQ
should be pushed in a separate status word that can then be safely
modified non-atomically.

Second try to fix and clean up the sched status bits. Anders, please
test.

Jan

diff --git a/include/nucleus/pod.h b/include/nucleus/pod.h
index 01ff0a7..5987a1f 100644
--- a/include/nucleus/pod.h
+++ b/include/nucleus/pod.h
@@ -277,12 +277,10 @@ static inline void xnpod_schedule(void)
 * context is active, or if we are caught in the middle of a
 * unlocked context switch.
 */
-#if XENO_DEBUG(NUCLEUS)
if (testbits(sched-status, XNKCOUT|XNINIRQ|XNSWLOCK))
return;
-#else /* !XENO_DEBUG(NUCLEUS) */
-   if (testbits(sched-status,
-XNKCOUT|XNINIRQ|XNSWLOCK|XNRESCHED) != XNRESCHED)
+#if !XENO_DEBUG(NUCLEUS)
+   if (!sched-resched)
return;
 #endif /* !XENO_DEBUG(NUCLEUS) */

Having only one test was really nice here, maybe we simply read a
barrier before reading the status?


I agree - but the alternative is letting all modifications of
xnsched::status use atomic bitops (that's required when folding all bits
into a single word). And that should be much more costly, specifically
on SMP.

What about issuing a barrier before testing the status?


The problem is not about reading but writing the status concurrently,
thus it's not about the code you see above.

The bits are modified under nklock, which implies a barrier when
unlocked. Furthermore, an IPI is guaranteed to be received on the remote
CPU after this barrier, so, a barrier should be enough to see the
modifications which have been made remotely.

Check nucleus/intr.c for tons of unprotected status modifications.

Ok. Then maybe, we should reconsider the original decision to start
fiddling with the XNRESCHED bit remotely.

...which removed complexity and fixed a race? Let's better review the
checks done in xnpod_schedule vs. its callers, I bet there is more to
save (IOW: remove the need to test for sched-resched).

Not that much complexitiy... and the race was a false positive in debug
code, no big deal. At least it worked, and it has done so for a long
time. No atomic needed, no barrier, only one test in xnpod_schedule. And
a nice invariant: sched-status is always accessed on the local cpu.
What else?


Take a step back and look at the root cause for this issue again. Unlocked

if need-resched
__xnpod_schedule

is inherently racy and will always be (not only for the remote
reschedule case BTW). So we either have to accept this and remove the
debugging check from the scheduler or push the check back to
__xnpod_schedule where it once came from. When this it cleaned up, we
can look into the remote resched protocol again.
Probably being daft here; why not stop fiddling with remote CPU status 
bits and always do a reschedule on IPI irq's?


/Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-04 Thread Anders Blomdell

Jan Kiszka wrote:

Am 04.11.2010 10:26, Jan Kiszka wrote:

Am 04.11.2010 10:16, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Take a step back and look at the root cause for this issue again. Unlocked

if need-resched
__xnpod_schedule

is inherently racy and will always be (not only for the remote
reschedule case BTW).

Ok, let us examine what may happen with this code if we only set the
XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not
matter, because they can not change under our feet. So, we have two
cases for this race:
1- we see the XNRESCHED bit, but it has been cleared once nklock is
locked in __xnpod_schedule.
2- we do not see the XNRESCHED bit, but it get set right after we test it.

1 is not a problem.

Yes, as long as we remove the debug check from the scheduler code (or
fix it somehow). The scheduling code already catches this race.


2 is not a problem, because anything which sets the XNRESCHED (it may
only be an interrupt in fact) bit will cause xnpod_schedule to be called
right after that.

So no, no race here provided that we only set the XNRESCHED bit on the
local cpu.

 So we either have to accept this and remove the

debugging check from the scheduler or push the check back to
__xnpod_schedule where it once came from. When this it cleaned up, we
can look into the remote resched protocol again.

The problem of the debug check is that it checks whether the scheduler
state is modified without the XNRESCHED bit being set. And this is the
problem, because yes, in that case, we have a race: the scheduler state
may be modified before the XNRESCHED bit is set by an IPI.

If we want to fix the debug check, we have to have a special bit, on in
the sched-status flag, only for the purpose of debugging. Or remove the
debug check.

Exactly my point. Is there any benefit in keeping the debug check? The
code to make it work may end up as complex as the logic it verifies,
at least that's my current feeling.



This would be the radical approach of removing the check (and cleaning
up some bits). If it's acceptable, I would split it up properly.

diff --git a/include/nucleus/pod.h b/include/nucleus/pod.h
index 01ff0a7..71f8311 100644
--- a/include/nucleus/pod.h
+++ b/include/nucleus/pod.h
@@ -277,14 +277,9 @@ static inline void xnpod_schedule(void)
 * context is active, or if we are caught in the middle of a
 * unlocked context switch.
 */
-#if XENO_DEBUG(NUCLEUS)
-   if (testbits(sched-status, XNKCOUT|XNINIRQ|XNSWLOCK))
-   return;
-#else /* !XENO_DEBUG(NUCLEUS) */
if (testbits(sched-status,
 XNKCOUT|XNINIRQ|XNSWLOCK|XNRESCHED) != XNRESCHED)
return;
-#endif /* !XENO_DEBUG(NUCLEUS) */
 
 	__xnpod_schedule(sched);

 }
diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index df56417..c832b91 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -177,17 +177,16 @@ static inline int xnsched_self_resched_p(struct xnsched 
*sched)
 
 /* Set self resched flag for the given scheduler. */

 #define xnsched_set_self_resched(__sched__) do {   \
-  setbits((__sched__)-status, XNRESCHED);  \
+   __setbits((__sched__)-status, XNRESCHED);   \
 } while (0)
 
 /* Set specific resched flag into the local scheduler mask. */

 #define xnsched_set_resched(__sched__) do {\
-  xnsched_t *current_sched = xnpod_current_sched();\
-  setbits(current_sched-status, XNRESCHED);\
-  if (current_sched != (__sched__)){   \
-  xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);   \
-  setbits((__sched__)-status, XNRESCHED);  \
-  }\
+   xnsched_t *current_sched = xnpod_current_sched();   \
+   __setbits(current_sched-status, XNRESCHED); \
+   if (current_sched != (__sched__))   \
+   xnarch_cpu_set(xnsched_cpu(__sched__),  \
+  current_sched-resched);  \
 } while (0)
 
 void xnsched_zombie_hooks(struct xnthread *thread);

diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 9e135f3..87dc136 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -284,10 +284,11 @@ void xnpod_schedule_handler(void) /* Called with hw 
interrupts off. */
trace_xn_nucleus_sched_remote(sched);
 #if defined(CONFIG_SMP)  defined(CONFIG_XENO_OPT_PRIOCPL)
if (testbits(sched-status, XNRPICK)) {
-   clrbits(sched-status, XNRPICK);
+   __clrbits(sched-status, XNRPICK);
xnshadow_rpi_check();
}
 #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
+   xnsched_set_resched(sched);
xnpod_schedule();
 }
 
@@ -2162,21 +2163,21 

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-04 Thread Anders Blomdell

Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 10:26, Jan Kiszka wrote:

Am 04.11.2010 10:16, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Take a step back and look at the root cause for this issue again. Unlocked

if need-resched
__xnpod_schedule

is inherently racy and will always be (not only for the remote
reschedule case BTW).

Ok, let us examine what may happen with this code if we only set the
XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not
matter, because they can not change under our feet. So, we have two
cases for this race:
1- we see the XNRESCHED bit, but it has been cleared once nklock is
locked in __xnpod_schedule.
2- we do not see the XNRESCHED bit, but it get set right after we test it.

1 is not a problem.

Yes, as long as we remove the debug check from the scheduler code (or
fix it somehow). The scheduling code already catches this race.


2 is not a problem, because anything which sets the XNRESCHED (it may
only be an interrupt in fact) bit will cause xnpod_schedule to be called
right after that.

So no, no race here provided that we only set the XNRESCHED bit on the
local cpu.

 So we either have to accept this and remove the

debugging check from the scheduler or push the check back to
__xnpod_schedule where it once came from. When this it cleaned up, we
can look into the remote resched protocol again.

The problem of the debug check is that it checks whether the scheduler
state is modified without the XNRESCHED bit being set. And this is the
problem, because yes, in that case, we have a race: the scheduler state
may be modified before the XNRESCHED bit is set by an IPI.

If we want to fix the debug check, we have to have a special bit, on in
the sched-status flag, only for the purpose of debugging. Or remove the
debug check.

Exactly my point. Is there any benefit in keeping the debug check? The
code to make it work may end up as complex as the logic it verifies,
at least that's my current feeling.


This would be the radical approach of removing the check (and cleaning
up some bits). If it's acceptable, I would split it up properly.


This debug check saved our asses when debugging SMP issues, and I
suspect it may help debugging skin issues. So, I think we should try and
keep it.


At first sight, here you are more breaking things than cleaning them.
Still, it has the SMP record for my test program, still runs with ftrace 
on (after 2 hours, where it previously failed after maximum 23 minutes).


If I get the gist of Jan's changes, they are (using the IPI to transfer 
one bit of information: your cpu needs to reschedule):


xnsched_set_resched:
-  setbits((__sched__)-status, XNRESCHED);

xnpod_schedule_handler:
+   xnsched_set_resched(sched);

If you (we?) decide to keep the debug checks, under what circumstances 
would the current check trigger (in laymans language, that I'll be able 
to understand)?


/Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-04 Thread Anders Blomdell

Jan Kiszka wrote:

Am 04.11.2010 14:18, Anders Blomdell wrote:

Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Am 04.11.2010 10:26, Jan Kiszka wrote:

Am 04.11.2010 10:16, Gilles Chanteperdrix wrote:

Jan Kiszka wrote:

Take a step back and look at the root cause for this issue again. Unlocked

if need-resched
__xnpod_schedule

is inherently racy and will always be (not only for the remote
reschedule case BTW).

Ok, let us examine what may happen with this code if we only set the
XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not
matter, because they can not change under our feet. So, we have two
cases for this race:
1- we see the XNRESCHED bit, but it has been cleared once nklock is
locked in __xnpod_schedule.
2- we do not see the XNRESCHED bit, but it get set right after we test it.

1 is not a problem.

Yes, as long as we remove the debug check from the scheduler code (or
fix it somehow). The scheduling code already catches this race.


2 is not a problem, because anything which sets the XNRESCHED (it may
only be an interrupt in fact) bit will cause xnpod_schedule to be called
right after that.

So no, no race here provided that we only set the XNRESCHED bit on the
local cpu.

 So we either have to accept this and remove the

debugging check from the scheduler or push the check back to
__xnpod_schedule where it once came from. When this it cleaned up, we
can look into the remote resched protocol again.

The problem of the debug check is that it checks whether the scheduler
state is modified without the XNRESCHED bit being set. And this is the
problem, because yes, in that case, we have a race: the scheduler state
may be modified before the XNRESCHED bit is set by an IPI.

If we want to fix the debug check, we have to have a special bit, on in
the sched-status flag, only for the purpose of debugging. Or remove the
debug check.

Exactly my point. Is there any benefit in keeping the debug check? The
code to make it work may end up as complex as the logic it verifies,
at least that's my current feeling.


This would be the radical approach of removing the check (and cleaning
up some bits). If it's acceptable, I would split it up properly.

This debug check saved our asses when debugging SMP issues, and I
suspect it may help debugging skin issues. So, I think we should try and
keep it.


At first sight, here you are more breaking things than cleaning them.
Still, it has the SMP record for my test program, still runs with ftrace 
on (after 2 hours, where it previously failed after maximum 23 minutes).


My version was indeed still buggy, I'm reworking it ATM.
Any reason why the two changes below would fail (I need to get things 
working real soon now).


If I get the gist of Jan's changes, they are (using the IPI to transfer 
one bit of information: your cpu needs to reschedule):


xnsched_set_resched:
-  setbits((__sched__)-status, XNRESCHED);

xnpod_schedule_handler:
+   xnsched_set_resched(sched);

If you (we?) decide to keep the debug checks, under what circumstances 
would the current check trigger (in laymans language, that I'll be able 
to understand)?


That's actually what /me is wondering as well. I do not see yet how you
can reliably detect a missed reschedule reliably (that was the purpose
of the debug check) given the racy nature between signaling resched and
processing the resched hints.
The only thing I can think of are atomic set/clear on an independent 
variable.


/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Anders Blomdell

Anders Blomdell wrote:

Jan Kiszka wrote:

Am 01.11.2010 17:55, Anders Blomdell wrote:

Jan Kiszka wrote:

Am 28.10.2010 11:34, Anders Blomdell wrote:

Jan Kiszka wrote:

Am 28.10.2010 09:34, Anders Blomdell wrote:

Anders Blomdell wrote:

Anders Blomdell wrote:

Hi,

I'm trying to use rt_eepro100, for sending raw ethernet packets,
but I'm
experincing occasionally weird behaviour.

Versions of things:

  linux-2.6.34.5
  xenomai-2.5.5.2
  rtnet-39f7fcf

The testprogram runs on two computers with Intel Corporation
82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one
computer
acts as a mirror sending back packets received from the ethernet
(only
those two computers on the network), and the other sends 
packets and

measures roundtrip time. Most packets comes back in approximately
100
us, but occasionally the reception times out (once in about 10
packets or more), but the packets gets immediately received when
reception is retried, which might indicate a race between
rt_dev_recvmsg
and interrupt, but I might miss something obvious.

Changing one of the ethernet cards to a Intel Corporation 82541PI
Gigabit Ethernet Controller (rev 05), while keeping everything 
else

constant, changes behavior somewhat; after receiving a few 10
packets, reception stops entirely (-EAGAIN is returned), while
transmission proceeds as it should (and mirror returns packets).

Any suggestions on what to try?
Since the problem disappears with 'maxcpus=1', I suspect I have a 
SMP

issue (machine is a Core2 Quad), so I'll move to xenomai-core.
(original message can be found at
http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se 



)

Xenomai-core gurus: which is the corrrect way to debug SMP issues?
Can I run I-pipe-tracer and expect to be able save at least 150 
us of

traces for all cpus? Any hints/suggestions/insigths are welcome...

The i-pipe tracer unfortunately only saves traces for a the CPU that
triggered the freeze. To have a full pictures, you may want to try my
ftrace port I posted recently for 2.6.35.

2.6.35.7 ?


Exactly.

Finally managed to get the ftrace to work
(one possible bug: had to manually copy
include/xenomai/trace/xn_nucleus.h to
include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be
very useful...

But I don't think it will give much info at the moment, since no
xenomai/ipipe interrupt activity shows up, and adding that is far above
my league :-(


You could use the function tracer, provided you are able to stop the
trace quickly enough on error.


My current theory is that the problem occurs when something like this
takes place:

  CPU-iCPU-jCPU-kCPU-l

rt_dev_sendmsg
xmit_irq
rt_dev_recvmsgrecv_irq


Can't follow. When races here, and what will go wrong then?

Thats the good question. Find attached:

1. .config (so you can check for stupid mistakes)
2. console log
3. latest version of test program
4. tail of ftrace dump

These are the xenomai tasks running when the test program is active:

CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
  0  0  idle-1  - master R  ROOT/0
  1  0  idle-1  - master R  ROOT/1
  2  0  idle-1  - master R  ROOT/2
  3  0  idle-1  - master R  ROOT/3
  0  0  rt  98  - master W  rtnet-stack
  0  0  rt   0  - master W  rtnet-rtpc
  0  29901  rt  50  - masterraw_test
  0  29906  rt   0  - master X  reporter



The lines of interest from the trace are probably:

[003]  2061.347855: xn_nucleus_thread_resume: thread=f9bf7b00
  thread_name=rtnet-stack mask=2

[003]  2061.347862: xn_nucleus_sched: status=200
[000]  2061.347866: xn_nucleus_sched_remote: status=0

since this is the only place where a packet gets delayed, and the only 
place in the trace where sched_remote reports a status=0
Since the cpu that has rtnet-stack and hence should be resumed is doing 
heavy I/O at the time of fault; could it be that 
send_ipi/schedule_handler needs barriers to make sure taht decisions are 
made on the right status?


/Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Anders Blomdell
On 2010-11-03 12.55, Jan Kiszka wrote:
 Am 03.11.2010 12:50, Jan Kiszka wrote:
 Am 03.11.2010 12:44, Anders Blomdell wrote:
 Anders Blomdell wrote:
 Jan Kiszka wrote:
 Am 01.11.2010 17:55, Anders Blomdell wrote:
 Jan Kiszka wrote:
 Am 28.10.2010 11:34, Anders Blomdell wrote:
 Jan Kiszka wrote:
 Am 28.10.2010 09:34, Anders Blomdell wrote:
 Anders Blomdell wrote:
 Anders Blomdell wrote:
 Hi,

 I'm trying to use rt_eepro100, for sending raw ethernet packets,
 but I'm
 experincing occasionally weird behaviour.

 Versions of things:

   linux-2.6.34.5
   xenomai-2.5.5.2
   rtnet-39f7fcf

 The testprogram runs on two computers with Intel Corporation
 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one
 computer
 acts as a mirror sending back packets received from the ethernet
 (only
 those two computers on the network), and the other sends
 packets and
 measures roundtrip time. Most packets comes back in approximately
 100
 us, but occasionally the reception times out (once in about
 10
 packets or more), but the packets gets immediately received when
 reception is retried, which might indicate a race between
 rt_dev_recvmsg
 and interrupt, but I might miss something obvious.
 Changing one of the ethernet cards to a Intel Corporation 82541PI
 Gigabit Ethernet Controller (rev 05), while keeping everything
 else
 constant, changes behavior somewhat; after receiving a few 10
 packets, reception stops entirely (-EAGAIN is returned), while
 transmission proceeds as it should (and mirror returns packets).

 Any suggestions on what to try?
 Since the problem disappears with 'maxcpus=1', I suspect I have
 a SMP
 issue (machine is a Core2 Quad), so I'll move to xenomai-core.
 (original message can be found at
 http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se


 )

 Xenomai-core gurus: which is the corrrect way to debug SMP issues?
 Can I run I-pipe-tracer and expect to be able save at least 150
 us of
 traces for all cpus? Any hints/suggestions/insigths are welcome...
 The i-pipe tracer unfortunately only saves traces for a the CPU that
 triggered the freeze. To have a full pictures, you may want to
 try my
 ftrace port I posted recently for 2.6.35.
 2.6.35.7 ?

 Exactly.
 Finally managed to get the ftrace to work
 (one possible bug: had to manually copy
 include/xenomai/trace/xn_nucleus.h to
 include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be
 very useful...

 But I don't think it will give much info at the moment, since no
 xenomai/ipipe interrupt activity shows up, and adding that is far above
 my league :-(

 You could use the function tracer, provided you are able to stop the
 trace quickly enough on error.

 My current theory is that the problem occurs when something like this
 takes place:

   CPU-iCPU-jCPU-kCPU-l

 rt_dev_sendmsg
 xmit_irq
 rt_dev_recvmsgrecv_irq

 Can't follow. When races here, and what will go wrong then?
 Thats the good question. Find attached:

 1. .config (so you can check for stupid mistakes)
 2. console log
 3. latest version of test program
 4. tail of ftrace dump

 These are the xenomai tasks running when the test program is active:

 CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
   0  0  idle-1  - master R  ROOT/0
   1  0  idle-1  - master R  ROOT/1
   2  0  idle-1  - master R  ROOT/2
   3  0  idle-1  - master R  ROOT/3
   0  0  rt  98  - master W  rtnet-stack
   0  0  rt   0  - master W  rtnet-rtpc
   0  29901  rt  50  - masterraw_test
   0  29906  rt   0  - master X  reporter



 The lines of interest from the trace are probably:

 [003]  2061.347855: xn_nucleus_thread_resume: thread=f9bf7b00   
   thread_name=rtnet-stack mask=2
 [003]  2061.347862: xn_nucleus_sched: status=200
 [000]  2061.347866: xn_nucleus_sched_remote: status=0

 since this is the only place where a packet gets delayed, and the only
 place in the trace where sched_remote reports a status=0
 Since the cpu that has rtnet-stack and hence should be resumed is doing
 heavy I/O at the time of fault; could it be that
 send_ipi/schedule_handler needs barriers to make sure taht decisions are
 made on the right status?

 That was my first idea as well - but we should run all relevant code
 under nklock here. But please correct me if I miss something.
Wouldn't we need a write-barrier before the send_ipi regardless of what locks we
hold, otherwise no guarantees that the memory write reaches the target cpu
before the interrupt does?

 
 Mmmh -- not everything. The inlined XNRESCHED entry test in
 xnpod_schedule runs outside nklock. But doesn't releasing nklock imply a
 memory write barrier? Let me meditate...
Wouldn't

Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Anders Blomdell

Jan Kiszka wrote:

additional barrier. Can you check this?

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index df56417..66b52ad 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct xnsched 
*sched)
   if (current_sched != (__sched__)){   \
   xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);   \
   setbits((__sched__)-status, XNRESCHED);  \
+  xnarch_memory_barrier(); \
   }\
 } while (0)


In progress, if nothing breaks before, I'll report status tomorrow morning.


Mmmh -- not everything. The inlined XNRESCHED entry test in
xnpod_schedule runs outside nklock. But doesn't releasing nklock imply a
memory write barrier? Let me meditate...

Wouldn't we need a read barrier then (but maybe the irq-handling takes care of
that, not familiar with the code yet)?


A read barrier is not required here as we do not need to order load
operation /wrt each other in the reschedule IRQ handler.

Only if taking the interrupt is equivalent to:

  read interrupts status
  memory_read_barrier
  execute handler

processor manuals should have the answer to this (or it might already be 
in the code)...



You can always help: there is a lot boring^Winteresting tracepoint
conversion waiting in Xenomai, see the few already converted nucleus
tracepoints.

As soon as I have my system running, I'll put some effort into this.

/Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Anders Blomdell

Anders Blomdell wrote:

Jan Kiszka wrote:

additional barrier. Can you check this?

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index df56417..66b52ad 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct 
xnsched *sched)

   if (current_sched != (__sched__)){\
   xnarch_cpu_set(xnsched_cpu(__sched__), 
current_sched-resched);\

   setbits((__sched__)-status, XNRESCHED);\
+  xnarch_memory_barrier();\
   }\
 } while (0)


In progress, if nothing breaks before, I'll report status tomorrow morning.
It still breaks (in approximately the same way). I'm currently putting a 
barrier in the other macro doing a RESCHED, also adding some tracing to 
see if a read barrier is needed.


Interesting side-note:

Harddisk accesses seems to get real slow after error has occured (kernel 
installs progresses with 2-3 modules installed per second), while lots 
of idle time reported on all cpu's, weird...


/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Anders Blomdell

Anders Blomdell wrote:

Anders Blomdell wrote:

Jan Kiszka wrote:

additional barrier. Can you check this?

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index df56417..66b52ad 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct 
xnsched *sched)

   if (current_sched != (__sched__)){\
   xnarch_cpu_set(xnsched_cpu(__sched__), 
current_sched-resched);\

   setbits((__sched__)-status, XNRESCHED);\
+  xnarch_memory_barrier();\
   }\
 } while (0)


In progress, if nothing breaks before, I'll report status tomorrow 
morning.
It still breaks (in approximately the same way). I'm currently putting a 
barrier in the other macro doing a RESCHED, also adding some tracing to 
see if a read barrier is needed.
Nope, no luck there either. Will start interesting tracepoint 
adding/conversion :-(


Any reason why xn_nucleus_sched_remote should ever report status = 0?

/Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-03 Thread Anders Blomdell

Jan Kiszka wrote:

Am 03.11.2010 17:46, Anders Blomdell wrote:

Anders Blomdell wrote:

Anders Blomdell wrote:

Jan Kiszka wrote:

additional barrier. Can you check this?

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index df56417..66b52ad 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct 
xnsched *sched)

   if (current_sched != (__sched__)){\
   xnarch_cpu_set(xnsched_cpu(__sched__), 
current_sched-resched);\

   setbits((__sched__)-status, XNRESCHED);\
+  xnarch_memory_barrier();\
   }\
 } while (0)
In progress, if nothing breaks before, I'll report status tomorrow 
morning.
It still breaks (in approximately the same way). I'm currently putting a 
barrier in the other macro doing a RESCHED, also adding some tracing to 
see if a read barrier is needed.
Nope, no luck there either. Will start interesting tracepoint 
adding/conversion :-(


Strange. But it was too easy anyway...


Any reason why xn_nucleus_sched_remote should ever report status = 0?


Really don't know yet. You could trigger on this state and call
ftrace_stop() then. Provided you had the functions tracer enabled, that
should give a nice pictures of what happened before.


Isn't there a race betweeen these two (still waiting for compilation to 
be finished)?


static inline int __xnpod_test_resched(struct xnsched *sched)
{
int resched = testbits(sched-status, XNRESCHED);
#ifdef CONFIG_SMP
/* Send resched IPI to remote CPU(s). */
if (unlikely(xnsched_resched_p(sched))) {
xnarch_send_ipi(sched-resched);
xnarch_cpus_clear(sched-resched);
}
#endif
clrbits(sched-status, XNRESCHED);
return resched;
}

#define xnsched_set_resched(__sched__) do {   \
  xnsched_t *current_sched = xnpod_current_sched();   \
  setbits(current_sched-status, XNRESCHED);  \
  if (current_sched != (__sched__)) { \
  xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \
  setbits((__sched__)-status, XNRESCHED);\
  xnarch_memory_barrier();\
  }   \
} while (0)

I would suggest (if I have got all the macros right):

static inline int __xnpod_test_resched(struct xnsched *sched)
{
int resched = testbits(sched-status, XNRESCHED);
if (unlikely(resched)) {
#ifdef CONFIG_SMP
/* Send resched IPI to remote CPU(s). */
xnarch_send_ipi(sched-resched);
xnarch_cpus_clear(sched-resched);
#endif
clrbits(sched-status, XNRESCHED);
}
return resched;
}

/Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-11-01 Thread Anders Blomdell

Jan Kiszka wrote:

Am 28.10.2010 11:34, Anders Blomdell wrote:

Jan Kiszka wrote:

Am 28.10.2010 09:34, Anders Blomdell wrote:

Anders Blomdell wrote:

Anders Blomdell wrote:

Hi,

I'm trying to use rt_eepro100, for sending raw ethernet packets,
but I'm
experincing occasionally weird behaviour.

Versions of things:

  linux-2.6.34.5
  xenomai-2.5.5.2
  rtnet-39f7fcf

The testprogram runs on two computers with Intel Corporation
82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one
computer
acts as a mirror sending back packets received from the ethernet (only
those two computers on the network), and the other sends packets and
measures roundtrip time. Most packets comes back in approximately 100
us, but occasionally the reception times out (once in about 10
packets or more), but the packets gets immediately received when
reception is retried, which might indicate a race between
rt_dev_recvmsg
and interrupt, but I might miss something obvious.

Changing one of the ethernet cards to a Intel Corporation 82541PI
Gigabit Ethernet Controller (rev 05), while keeping everything else
constant, changes behavior somewhat; after receiving a few 10
packets, reception stops entirely (-EAGAIN is returned), while
transmission proceeds as it should (and mirror returns packets).

Any suggestions on what to try?

Since the problem disappears with 'maxcpus=1', I suspect I have a SMP
issue (machine is a Core2 Quad), so I'll move to xenomai-core.
(original message can be found at
http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se
)

Xenomai-core gurus: which is the corrrect way to debug SMP issues?
Can I run I-pipe-tracer and expect to be able save at least 150 us of
traces for all cpus? Any hints/suggestions/insigths are welcome...

The i-pipe tracer unfortunately only saves traces for a the CPU that
triggered the freeze. To have a full pictures, you may want to try my
ftrace port I posted recently for 2.6.35.

2.6.35.7 ?



Exactly.

Finally managed to get the ftrace to work
(one possible bug: had to manually copy 
include/xenomai/trace/xn_nucleus.h to 
include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be 
very useful...


But I don't think it will give much info at the moment, since no 
xenomai/ipipe interrupt activity shows up, and adding that is far above 
my league :-(


My current theory is that the problem occurs when something like this 
takes place:


  CPU-i CPU-j   CPU-k   CPU-l

rt_dev_sendmsg
xmit_irq
rt_dev_recvmsg  recv_irq

So now I'll try to instrument the code to see if the assumtion holds. 
Stay tuned...


Regards

Anders



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-10-29 Thread Anders Blomdell
On 2010-10-29 20.06, Jan Kiszka wrote:
 Am 29.10.2010 19:42, Anders Blomdell wrote:
 Jan Kiszka wrote:

 Please provide the full kernel log, ideally also with the I-pipe tracer
 (with panic tracing) enabled.
 Will reconfigure/recompile and do that, with full kernel log do you
 mean all
 bootup info?

 That's best to avoid missing some detail or doing QA ping-pong.

 Full trace attached (finally...)

 
 You have to switch off CONFIG_DMA_API_DEBUG, it's incompatible with Xenomai.
Thanks, will continue with this on monday (build in progress).

With your ftrace port, how does one freeze all cpu's at the same time?


Regards

Anders

-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RTnet-users] Potential problem with rt_eepro100

2010-10-28 Thread Anders Blomdell
Anders Blomdell wrote:
 Anders Blomdell wrote:
 Hi,

 I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm
 experincing occasionally weird behaviour.

 Versions of things:

   linux-2.6.34.5
   xenomai-2.5.5.2
   rtnet-39f7fcf

 The testprogram runs on two computers with Intel Corporation
 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer
 acts as a mirror sending back packets received from the ethernet (only
 those two computers on the network), and the other sends packets and
 measures roundtrip time. Most packets comes back in approximately 100
 us, but occasionally the reception times out (once in about 10
 packets or more), but the packets gets immediately received when
 reception is retried, which might indicate a race between rt_dev_recvmsg
 and interrupt, but I might miss something obvious.
 
 Changing one of the ethernet cards to a Intel Corporation 82541PI 
 Gigabit Ethernet Controller (rev 05), while keeping everything else 
 constant, changes behavior somewhat; after receiving a few 10 
 packets, reception stops entirely (-EAGAIN is returned), while 
 transmission proceeds as it should (and mirror returns packets).
 
 Any suggestions on what to try?

Since the problem disappears with 'maxcpus=1', I suspect I have a SMP 
issue (machine is a Core2 Quad), so I'll move to xenomai-core.
(original message can be found at 
http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se
 
)

Xenomai-core gurus: which is the corrrect way to debug SMP issues?
Can I run I-pipe-tracer and expect to be able save at least 150 us of 
traces for all cpus? Any hints/suggestions/insigths are welcome...

Regards

Anders Blomdell

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RTnet-users] Potential problem with rt_eepro100

2010-10-28 Thread Anders Blomdell
Jan Kiszka wrote:
 Am 28.10.2010 09:34, Anders Blomdell wrote:
 Anders Blomdell wrote:
 Anders Blomdell wrote:
 Hi,

 I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm
 experincing occasionally weird behaviour.

 Versions of things:

   linux-2.6.34.5
   xenomai-2.5.5.2
   rtnet-39f7fcf

 The testprogram runs on two computers with Intel Corporation
 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer
 acts as a mirror sending back packets received from the ethernet (only
 those two computers on the network), and the other sends packets and
 measures roundtrip time. Most packets comes back in approximately 100
 us, but occasionally the reception times out (once in about 10
 packets or more), but the packets gets immediately received when
 reception is retried, which might indicate a race between rt_dev_recvmsg
 and interrupt, but I might miss something obvious.
 Changing one of the ethernet cards to a Intel Corporation 82541PI 
 Gigabit Ethernet Controller (rev 05), while keeping everything else 
 constant, changes behavior somewhat; after receiving a few 10 
 packets, reception stops entirely (-EAGAIN is returned), while 
 transmission proceeds as it should (and mirror returns packets).

 Any suggestions on what to try?
 Since the problem disappears with 'maxcpus=1', I suspect I have a SMP 
 issue (machine is a Core2 Quad), so I'll move to xenomai-core.
 (original message can be found at 
 http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se
  
 )

 Xenomai-core gurus: which is the corrrect way to debug SMP issues?
 Can I run I-pipe-tracer and expect to be able save at least 150 us of 
 traces for all cpus? Any hints/suggestions/insigths are welcome...
 
 The i-pipe tracer unfortunately only saves traces for a the CPU that
 triggered the freeze. To have a full pictures, you may want to try my
 ftrace port I posted recently for 2.6.35.
2.6.35.7 ?

/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Potential problem with rt_eepro100

2010-10-28 Thread Anders Blomdell
Jan Kiszka wrote:
 Am 28.10.2010 11:34, Anders Blomdell wrote:
 Jan Kiszka wrote:
 Am 28.10.2010 09:34, Anders Blomdell wrote:
 Anders Blomdell wrote:
 Anders Blomdell wrote:
 Hi,

 I'm trying to use rt_eepro100, for sending raw ethernet packets,
 but I'm
 experincing occasionally weird behaviour.

 Versions of things:

   linux-2.6.34.5
   xenomai-2.5.5.2
   rtnet-39f7fcf

 The testprogram runs on two computers with Intel Corporation
 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one
 computer
 acts as a mirror sending back packets received from the ethernet (only
 those two computers on the network), and the other sends packets and
 measures roundtrip time. Most packets comes back in approximately 100
 us, but occasionally the reception times out (once in about 10
 packets or more), but the packets gets immediately received when
 reception is retried, which might indicate a race between
 rt_dev_recvmsg
 and interrupt, but I might miss something obvious.
 Changing one of the ethernet cards to a Intel Corporation 82541PI
 Gigabit Ethernet Controller (rev 05), while keeping everything else
 constant, changes behavior somewhat; after receiving a few 10
 packets, reception stops entirely (-EAGAIN is returned), while
 transmission proceeds as it should (and mirror returns packets).

 Any suggestions on what to try?
 Since the problem disappears with 'maxcpus=1', I suspect I have a SMP
 issue (machine is a Core2 Quad), so I'll move to xenomai-core.
 (original message can be found at
 http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se
 )

 Xenomai-core gurus: which is the corrrect way to debug SMP issues?
 Can I run I-pipe-tracer and expect to be able save at least 150 us of
 traces for all cpus? Any hints/suggestions/insigths are welcome...
 The i-pipe tracer unfortunately only saves traces for a the CPU that
 triggered the freeze. To have a full pictures, you may want to try my
 ftrace port I posted recently for 2.6.35.
 2.6.35.7 ?
Well, 2.6.35.7/xenomai/rtnet without ftrace patch freezes after approx 
8000 rounds (16000 packets). Time freshen up find serial port console 
debugging I guess (under the assumption that this is the same bug, but 
easier to reproduce).

/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Comedi drivers in Xenomai porting/integration status ?

2009-02-19 Thread Anders Blomdell
Alexis Berlemont wrote:
 Hi,
 
 That was the reason why, I was really suprised to find Comedi
 integrated into the mainline kernel. What strikes me more is that
 Comedi seems to be left as is. Do you think, it will be cleaned up or
 reworked ?
 Without rework Comedi will not make into mainline (I wouldn't call the
 staging corner mainline). And when reading this
 http://permalink.gmane.org/gmane.linux.kernel/793476, it is probably the
 best time now to propose interface changes and contribute back
 improvements made for the RTDM rework.
 
 How would you proceed ? Maybe, the first step would be to ask on the
 Comedi mailing-list if someone is interested in discussing on the API
 rework. Maybe, someone will answer this time. 
If it is more informative than the mail from 06-04-09 and the presentation.txt
there is definitely a chance :-), I read through it then, found the goals
reasonably sound, and nothing to test, so I waited for some working code to show
up (having too much at my hands already), that time might have come now.

Features I would like to see in a Comedi/RTDM framework are:

1. Drivers should work in Linux, Xenomai (and possibly RTAI and/or RT-Linux)
2. It should be possible to write drivers that live in user-space (serial2002
driver is a big HACK).
3. Stackable drivers (e.g. put a force sensor driver on top of a analog input 
card).
4. A comedilib compatibilty library would be nice (but not necessary)

If all these pieces are in place, I'm more than happy to test/migrate the
drivers I use in my labs (JR3, NI M6221, DaqBoard 2000, ACPI 3106, serial)

Best regards

Anders Blomdell

-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] ns vs. tsc as internal timer base

2006-06-13 Thread Anders Blomdell

Jan Kiszka wrote:

Hi,

To avoid loosing the optimisation again in ns_to_tsc, I thought about
basing the whole internal timer arithmetics on nanoseconds instead of
TSCs as it is now. 
Good idea, makes it simpler to adopt to laptop frequency scaling and deep ACPI 
sleep, i.e. sync Xenomai time to the ACPI timer.


/Anders

--
Anders Blomdell  Email: [EMAIL PROTECTED]
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-22 Thread Anders Blomdell

Dmitry Adamushko wrote:


  For RTDM I'm now almost determined to rework the API in way that only
  HANDLED/UNHANDLED (or what ever their names will be) get exported, any
  additional guru features will remain excluded as long as we have no
  clean usage policy for them.

Good. Then let's go for

HANDLED, UNHANDLED - we may consider them even as 2 scalar values

+

NOENABLE, CHAINED  - additional bits.

They are not encouraged to be used with shared interrupts
(explained in docs + debug messages when XENO_OPT_DEBUG is on).

Any ISR on the shared irq line should understand that it's
just one among the equals. That said, it should not do anything
that may affect other ISRs and not require any special treatment
(like CHAINED or NOENABLE).
If it wants it indeed, then don't declare itself as SHARED.

We may come back to the topic about possible return values of ISRs
a bit later maybe having got more feedback (hm.. hopefully)
on shared irq support.


 
  But the later one is not only about enabling the line, but
  on some archs - about .end-ing it too (sending EOI).
 
  And to support HANDLED_NOENABLE properly, those 2 have to be
  decoupled, i.e.
  EOI should always be sent from xnintr_shirq_handler().
  But the one returning HANDLED_NOENABLE is likely to leave the interrupt
  asserted, hence we can't EOI at this point (unless NO_ENABLE means
  DISABLE).
 
 I guess this is what Dmitry meant: explicitly call disable() if one or
 more ISRs returned NOENABLE - at least on archs where end != enable.
 Will this work? We could then keep on using the existing IRQ-enable API
 from bottom-half IRQ tasks.

Almost.

Let's consider the following only as anorther way of doing some things;
I don't propose to implement it, it's just to illustrate my thoughts.
So one may simply ski-skip-skip it :o)

Let's keep in mind that what is behind .end-ing the IRQ line depends on 
archs.

Sometimes end == enable (EOI was sent on .ack step), while in other cases
end == send_EOI [+ re-enabling].

But anyway, all ISRs are running with a given IRQ line disabled.

Supported values : HANDLED, UNHANDLED, PROPAGATE.

nucleus :: xnintr_irq_handler()
{
ret = 0;

...

for each isr in isr_list[ IRQ ]
{
temp = isr-handler();

if (temp  ret)
ret = temp;
}

if (ret == PROPAGATE)
{
// quite dengerous with shared interrupts, be sure you understand
// what you are doing!

xnarch_chain_irq(irq); // will be .end-ed in Linux domain
}
else
{
// HANDLED or UNHANDLED

xnarch_end_irq();
}

...

}

ENABLE or NOENABLE is missing? Nop.

let's say we have 2 rt-ISRs :

isr1 : HANDLED
isr2 : HANDLED + WISH

WISH == I want the IRQ line remain disabled until later
(e.g. bottom half in rt_task will enable it)

How may isr2 express this WISH? Simply, xnarch_irq_disable/enable() should
support an internal counter that allows them to be called in a nested way.

So e.g. if there are 2 consecutive calls to disable_irq(), then
2 calls to enable_irq() are needed to really enable the IRQ line.

This way, the WISH is only about directly calling xnarch_irq_disable() 
in isr2

and there is no need in ENABLE or NOENABLE flags.

This way, PROPAGATE really means NOEND - the IRQ will be .end-ed in 
Linux domain;

while WISH==NOENABLE - has nothing to do with sending EOI, but only with
enabling/disabling the given IRQ line.

Yes, if one isr (or a few) defers the IRQ line enabling until later, it 
will affect

all others ISR = all interrupts are temporary not accepted on this line.
This scenario is possible under Linux, but should be used with even more
care in real-time system. At least, this way HANDLED_NOENABLE case works
and doesn't lead to lost interrupts on some archs.

Moreover, it avoids the need for ENABLE flag even in non-shared 
interrupt case.

Lokks clean enough to me, i.e. no objections...

--
Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-22 Thread Anders Blomdell

Dmitry Adamushko wrote:


  For RTDM I'm now almost determined to rework the API in way that only
  HANDLED/UNHANDLED (or what ever their names will be) get exported, any
  additional guru features will remain excluded as long as we have no
  clean usage policy for them.

Good. Then let's go for

HANDLED, UNHANDLED - we may consider them even as 2 scalar values

+

NOENABLE, CHAINED  - additional bits.

They are not encouraged to be used with shared interrupts
(explained in docs + debug messages when XENO_OPT_DEBUG is on).

Any ISR on the shared irq line should understand that it's
just one among the equals. That said, it should not do anything
that may affect other ISRs and not require any special treatment
(like CHAINED or NOENABLE).
If it wants it indeed, then don't declare itself as SHARED.

We may come back to the topic about possible return values of ISRs
a bit later maybe having got more feedback (hm.. hopefully)
on shared irq support.


 
  But the later one is not only about enabling the line, but
  on some archs - about .end-ing it too (sending EOI).
 
  And to support HANDLED_NOENABLE properly, those 2 have to be
  decoupled, i.e.
  EOI should always be sent from xnintr_shirq_handler().
  But the one returning HANDLED_NOENABLE is likely to leave the interrupt
  asserted, hence we can't EOI at this point (unless NO_ENABLE means
  DISABLE).
 
 I guess this is what Dmitry meant: explicitly call disable() if one or
 more ISRs returned NOENABLE - at least on archs where end != enable.
 Will this work? We could then keep on using the existing IRQ-enable API
 from bottom-half IRQ tasks.

Almost.

Let's consider the following only as anorther way of doing some things;
I don't propose to implement it, it's just to illustrate my thoughts.
So one may simply ski-skip-skip it :o)

Let's keep in mind that what is behind .end-ing the IRQ line depends on 
archs.

Sometimes end == enable (EOI was sent on .ack step), while in other cases
end == send_EOI [+ re-enabling].

But anyway, all ISRs are running with a given IRQ line disabled.

Supported values : HANDLED, UNHANDLED, PROPAGATE.

nucleus :: xnintr_irq_handler()
{
ret = 0;

...

for each isr in isr_list[ IRQ ]
{
temp = isr-handler();

if (temp  ret)
ret = temp;
}

if (ret == PROPAGATE)
{
// quite dengerous with shared interrupts, be sure you understand
// what you are doing!

xnarch_chain_irq(irq); // will be .end-ed in Linux domain
}
else
{
// HANDLED or UNHANDLED

xnarch_end_irq();
}

...

}

ENABLE or NOENABLE is missing? Nop.

let's say we have 2 rt-ISRs :

isr1 : HANDLED
isr2 : HANDLED + WISH

WISH == I want the IRQ line remain disabled until later
(e.g. bottom half in rt_task will enable it)

How may isr2 express this WISH? Simply, xnarch_irq_disable/enable() should
support an internal counter that allows them to be called in a nested way.

So e.g. if there are 2 consecutive calls to disable_irq(), then
2 calls to enable_irq() are needed to really enable the IRQ line.

This way, the WISH is only about directly calling xnarch_irq_disable() 
in isr2

and there is no need in ENABLE or NOENABLE flags.

This way, PROPAGATE really means NOEND - the IRQ will be .end-ed in 
Linux domain;

while WISH==NOENABLE - has nothing to do with sending EOI, but only with
enabling/disabling the given IRQ line.

Yes, if one isr (or a few) defers the IRQ line enabling until later, it 
will affect

all others ISR = all interrupts are temporary not accepted on this line.
This scenario is possible under Linux, but should be used with even more
care in real-time system. At least, this way HANDLED_NOENABLE case works
and doesn't lead to lost interrupts on some archs.

Moreover, it avoids the need for ENABLE flag even in non-shared 
interrupt case.

Lokks clean enough to me, i.e. no objections...

--
Anders



Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-21 Thread Anders Blomdell

Dmitry Adamushko wrote:


N.B. Amongst other things, some thoughts about CHAINED with shared 
interrupts.



On 20/02/06, *Anders Blomdell* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:




A number of questions arise:

1. What happens if one of the shared handlers leaves the interrupt
asserted,
returns NOENABLE|HANDLED and another return only HANDLED?

2. What happens if one returns PROPAGATE and another returns HANDLED?


Yep, each ISR may return a different value and all of them are
accumulated in the s variable ( s |= intr-isr(intr); ).

So the loop may end up with s which contains all of the possible bits:

(e.g.

isr1 - HANDLED | ENABLE
isr2 - HANDLED (don't want the irq to be enabled)
isr3 - CHAINED

)

s = HANDLED | ENABLE | CHAINED;

Then CHAINED will be ignored because of the following code :
 
+if (s  XN_ISR_ENABLE)

+   xnarch_end_irq(irq);
+else if (s  XN_ISR_CHAINED)(*)
+   xnarch_chain_irq(irq);

Which is the worst way possible of prioritizing them, if a Linux interrupt is
active when we get there with ENABLE|CHAINED, interrupts will be enabled with
the Linux interrupt still asserted - the IRQ-handlers will be called once more,
probably returning ENABLE|CHAINED again - infinite loop...



the current code in the CVS doen not contain else in (*), so that 
ENABLE | CHAINED is possible, though it's a wrong combination.


This said, we suppose that one knows what he is doing.

In the case of a single ISR per line, it's not that difficult to 
achieve. But if there are a few ISRs, then one should analize and take 
into account all possible return values of all the ISRs, as each of them 
may affect others (e.g. if one returns CHAINED when another - HANDLED | 
ENABLE).

Which is somewhat contrary to the concept of shared interrupts, if we have to
take care of the global picture, why make them shared in the first place?
(I like the concept of shared interrupts, but it is important that the framework
gives a separation of concerns)

So my feeling is that CHAINED should not be used by drivers which 
registered their ISRs as SHARED.

Well, CHAINED should not be used by drivers which return ENABLE (and are of
course hence incompatible with true realtime IRQ's)


Moreover, I actually see the only scenario of CHAINED (I provided it 
before) :


all ISRs in the primary domain have reported UNHANDLED  =  nucleus 
propagates the interrupt down the pipeline with xnacrh_chain_irq().
This call actually returns 1 upon successful propagation (some domain 
down the pipeline was interested in this irq) and 0 otherwise.


Upon 0, this is a spurious irq (none of domains was interested in its 
handling).


ok, let's suppose now :

we have 2 ISRs on the same shared line :

isr1 : HANDLED (will be enabled by rt task. Note, rt task must call 
xnarch_end_irq() and not just xnarch_enable_irq()! )


isr2 : CHAINED

So HANDLED | CHAINED is ok for the single ISR on the line, but it may 
lead to HANDLED | CHAINED | ENABLE in a case of the shared line.


rt task that works jointly with isr1 just calls xnarch_end_irq() at some 
moment of time and some ISR in the linux domain does the same later  =  
the line is .end-ed 2 times.


ISR should never return CHAINED as to indicate _only_ that it is not 
interested in this irq, but ~HANDLED or NOINT (if we'll support it) instead.


If the ISR nevertheless wants to propagate the IRQ to the Linux domain 
_explicitly_, it _must not_ register itself as SHARED, i.e. it _must_ be 
the only ISR on this line, otherwise that may lead to the IRQ line being 
.end-ed twice (lost interrupts in some cases).



#define UNHANDLED 0
#define HANDLED_ENABLE 1
#define HANDLED_NOENABLE 2
#define PROPAGATE 3 



Yep, I'd agree with you. Moreover, PROPAGATE should not be used for 
shared interrupts.

My feeling is that it should be considered an error to attach a RT IRQ handler
to a line that has a Linux IRQ handler (this should be possible to check, since
/proc/interrupts contains the relevant information), unless a Linux IRQ-mask
function is installed. This IRQ-mask function should the be called:

  1. each time domains are switched
  2. each time an interrupt is generated

The IRQ-mask function should look something like:

unsigned int rt_irq_mask(struct ipipe_domain *ipd, unsigned int irq)
{
  int result = 0;
  static int enabled = true;
  int enable = enabled;

  if (irq = 0) {
// Interrupt has occured, we are about to run IRQ handlers
if (disable_early) {
  enable = false;
}
if (for_linux(irq)) {
  result = XN_ISR_CHAINED;
}
  } else if (ipd == ipipe_root_domain) {
// Entering Linux
enable = true;
  } else {
// Other doamin, block linux interrupts
enable = false;
  }
  if (enable != enabled) {
enabled = enable
if (enable) {
  // Enable Linux interrupts by unmasking appropriate
  // device registers (and possibly entire IRQ's)
} else {
  // Disable Linux interrupts

Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-21 Thread Anders Blomdell

Dmitry Adamushko wrote:


  Good point, leaves us with 2 possible return values for shared handlers:
 
HANDLED
NOT_HANDLED
 
  i.e. shared handlers should never defer the end'ing of the interrupt 
(which

  makes sense, since this would affect the other [shared] handlers).

HANDLED_NOEBNABLE could be supported too. 
Yes, but it breaks decoupling between shared handlers; interrupts will be 
deferred for all [shared] handlers until it is properly ended.


There would be no need in 
reenventing

a wheel, just do it the way Linux does it.
But it's about some additional re-designing of the current codebase
(e.g. nested calling for irq_enable/disable())
I'm not sure we do need it for something else rather than irq sharing 
code but it affects the rest of the code.


And we have a kind of wrong concept :

XN_ISR_ENABLE (or NOENABLE) corresponds to xnarch_end_irq().

Agree



But the later one is not only about enabling the line, but
on some archs - about .end-ing it too (sending EOI).

And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e.
EOI should always be sent from xnintr_shirq_handler().
But the one returning HANDLED_NOENABLE is likely to leave the interrupt 
asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE).




  Yes, should. And this should is best be handled by
 
  a) Documenting the potential conflict in the same place when describing
  the return values
 
  b) Placing some debug warning in the nucleus' IRQ trampoline function to
  bail out (once per line) when running into such situation
 
  But I'm against any further runtime restrictions, especially as most
  drivers will never return anything else than NOT_HANDLED or HANDLED.
  Actually, this was the reason why I tried to separate the NO_ENABLE and
  PROPAGATE features as *additional* bits from HANDLED and
  NOT_HANDLED/UNHANDLED/NOINT. But I acknowledge that having all valid bit
  combination present as constants can be more helpful for the user. We
  just have to draw some line between the standard values and the
  additional gurus return codes
 
(documentation: don't use NO_ENABLE or
  PROPAGATE unless you understand their side-effects and pitfalls 
precisely).


I agree with you on PROPAGATE case, but NO_ENABLE that, as pointed out 
above,

should (IMHO and at least, in theory) only mean keep the IRQ line disabled
(and have nothing to do with .end-ing the IRQ line) would be better 
supported.

But this is, again as was pointed out above, about some redesigning of the
current code = some overhead that likely affects non-shared aware code too.


So on one hand,

I'm ready to re-work code with :

HANDLED and UNHANDLED (or NOINT)

+ 2 additional bits : NOENABLE and PROPAGATE.

and document it like you suggested don't use NO_ENABLE or
PROPAGATE with shared interrupts
unless you understand their side-effects and pitfalls precisely;

on the other hand,

I'd say that I'm almost ready to vote against merging the irq sharing 
code at all as it looks to be a rather partial solution.
I vote for (even though I'm the one who complains the most), BUT I think it is 
important to keep the rules for using it simple (that's why I worry about the 
plethora of return-flags).



--

Regards

Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-21 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


Dmitry Adamushko wrote:


 Good point, leaves us with 2 possible return values for shared
handlers:

   HANDLED
   NOT_HANDLED

 i.e. shared handlers should never defer the end'ing of the
interrupt (which
 makes sense, since this would affect the other [shared] handlers).

HANDLED_NOEBNABLE could be supported too. 


Yes, but it breaks decoupling between shared handlers; interrupts will
be deferred for all [shared] handlers until it is properly ended.



There would be no need in reenventing
a wheel, just do it the way Linux does it.
But it's about some additional re-designing of the current codebase
(e.g. nested calling for irq_enable/disable())
I'm not sure we do need it for something else rather than irq sharing
code but it affects the rest of the code.

And we have a kind of wrong concept :

XN_ISR_ENABLE (or NOENABLE) corresponds to xnarch_end_irq().


Agree



But the later one is not only about enabling the line, but
on some archs - about .end-ing it too (sending EOI).

And to support HANDLED_NOENABLE properly, those 2 have to be
decoupled, i.e.
EOI should always be sent from xnintr_shirq_handler().


But the one returning HANDLED_NOENABLE is likely to leave the interrupt
asserted, hence we can't EOI at this point (unless NO_ENABLE means
DISABLE).



I guess this is what Dmitry meant: explicitly call disable() if one or
more ISRs returned NOENABLE - at least on archs where end != enable.
Will this work? We could then keep on using the existing IRQ-enable API
from bottom-half IRQ tasks. But what about NOENABLE+PROPAGATE? Will this
special case still mean NOT to end the ISR (as Linux will do)?

Bah, we are running in circles, I'm afraid. I think it's better to call
NOENABLE NOEOI, which will indeed mean to not end this line (as it is
the current situation anyway, isn't it?), and leave the user with what
(s)he can do with such a feature. We found out that there are trillions
of ways to shoot oneself into the foot with NOENABLE and PROPAGATE, and
we cannot prevent most of them. So let's stop trying, at least for this
patch!



 Yes, should. And this should is best be handled by

 a) Documenting the potential conflict in the same place when
describing
 the return values

 b) Placing some debug warning in the nucleus' IRQ trampoline
function to
 bail out (once per line) when running into such situation

 But I'm against any further runtime restrictions, especially as most
 drivers will never return anything else than NOT_HANDLED or HANDLED.
 Actually, this was the reason why I tried to separate the NO_ENABLE
and
 PROPAGATE features as *additional* bits from HANDLED and
 NOT_HANDLED/UNHANDLED/NOINT. But I acknowledge that having all
valid bit
 combination present as constants can be more helpful for the user. We
 just have to draw some line between the standard values and the
 additional gurus return codes

(documentation: don't use NO_ENABLE or
 PROPAGATE unless you understand their side-effects and pitfalls
precisely).

I agree with you on PROPAGATE case, but NO_ENABLE that, as pointed out
above,
should (IMHO and at least, in theory) only mean keep the IRQ line
disabled
(and have nothing to do with .end-ing the IRQ line) would be better
supported.
But this is, again as was pointed out above, about some redesigning of
the
current code = some overhead that likely affects non-shared aware
code too.


So on one hand,

I'm ready to re-work code with :

HANDLED and UNHANDLED (or NOINT)

+ 2 additional bits : NOENABLE and PROPAGATE.

and document it like you suggested don't use NO_ENABLE or
PROPAGATE with shared interrupts
unless you understand their side-effects and pitfalls precisely;

on the other hand,

I'd say that I'm almost ready to vote against merging the irq sharing
code at all as it looks to be a rather partial solution.


I vote for (even though I'm the one who complains the most), BUT I think
it is important to keep the rules for using it simple (that's why I
worry about the plethora of return-flags).




And I'm with you here: My original proposal (2 base-states + 2 bits)
created 8 expressible states while your version only knows 4 states -
those which make sense most (and 2 of them are still the ones recommand
for the masses).

For RTDM I'm now almost determined to rework the API in way that only
HANDLED/UNHANDLED (or what ever their names will be) get exported, any
additional guru features will remain excluded as long as we have no
clean usage policy for them.

You have my vote for this.

--
Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-21 Thread Anders Blomdell

Dmitry Adamushko wrote:


On 21/02/06, *Anders Blomdell* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Dmitry Adamushko wrote:
 
  N.B. Amongst other things, some thoughts about CHAINED with shared
  interrupts.
 
 
  On 20/02/06, *Anders Blomdell*  [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:
 
 
 
  A number of questions arise:
 
  1. What happens if one of the shared handlers leaves the
interrupt
  asserted,
  returns NOENABLE|HANDLED and another return only HANDLED?
 
  2. What happens if one returns PROPAGATE and another returns
HANDLED?
 
 
  Yep, each ISR may return a different value and all of them are
  accumulated in the s variable ( s |= intr-isr(intr); ).
 
  So the loop may end up with s which contains all of the
possible bits:
 
  (e.g.
 
  isr1 - HANDLED | ENABLE
  isr2 - HANDLED (don't want the irq to be enabled)
  isr3 - CHAINED
 
  )
 
  s = HANDLED | ENABLE | CHAINED;
 
  Then CHAINED will be ignored because of the following code :
 
  +if (s  XN_ISR_ENABLE)
  +   xnarch_end_irq(irq);
  +else if (s  XN_ISR_CHAINED)(*)
  +   xnarch_chain_irq(irq);
Which is the worst way possible of prioritizing them, if a Linux
interrupt is
active when we get there with ENABLE|CHAINED, interrupts will be
enabled with
the Linux interrupt still asserted - the IRQ-handlers will be
called once more,
probably returning ENABLE|CHAINED again - infinite loop...

 
  the current code in the CVS doen not contain else in (*), so that
  ENABLE | CHAINED is possible, though it's a wrong combination.
 
  This said, we suppose that one knows what he is doing.
 
  In the case of a single ISR per line, it's not that difficult to
  achieve. But if there are a few ISRs, then one should analize and
take
  into account all possible return values of all the ISRs, as each
of them
  may affect others (e.g. if one returns CHAINED when another -
HANDLED |
  ENABLE).
Which is somewhat contrary to the concept of shared interrupts, if
we have to
take care of the global picture, why make them shared in the first
place?
(I like the concept of shared interrupts, but it is important that
the framework
gives a separation of concerns)


Unfortunately, it looks to me that the current picture (even with your 
scalar values) requires from the user who develops a given IRQ to take 
into account the possible return values of other ISRs.


As I pointed out, the situation when 2 ISRs return HANDLED_NOENABLE may 
lead to problems on some archs.

Good point, leaves us with 2 possible return values for shared handlers:

  HANDLED
  NOT_HANDLED

i.e. shared handlers should never defer the end'ing of the interrupt (which 
makes sense, since this would affect the other [shared] handlers).


--
Anders



Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-21 Thread Anders Blomdell

Dmitry Adamushko wrote:


N.B. Amongst other things, some thoughts about CHAINED with shared 
interrupts.



On 20/02/06, *Anders Blomdell* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:




A number of questions arise:

1. What happens if one of the shared handlers leaves the interrupt
asserted,
returns NOENABLE|HANDLED and another return only HANDLED?

2. What happens if one returns PROPAGATE and another returns HANDLED?


Yep, each ISR may return a different value and all of them are
accumulated in the s variable ( s |= intr-isr(intr); ).

So the loop may end up with s which contains all of the possible bits:

(e.g.

isr1 - HANDLED | ENABLE
isr2 - HANDLED (don't want the irq to be enabled)
isr3 - CHAINED

)

s = HANDLED | ENABLE | CHAINED;

Then CHAINED will be ignored because of the following code :
 
+if (s  XN_ISR_ENABLE)

+   xnarch_end_irq(irq);
+else if (s  XN_ISR_CHAINED)(*)
+   xnarch_chain_irq(irq);

Which is the worst way possible of prioritizing them, if a Linux interrupt is
active when we get there with ENABLE|CHAINED, interrupts will be enabled with
the Linux interrupt still asserted - the IRQ-handlers will be called once more,
probably returning ENABLE|CHAINED again - infinite loop...



the current code in the CVS doen not contain else in (*), so that 
ENABLE | CHAINED is possible, though it's a wrong combination.


This said, we suppose that one knows what he is doing.

In the case of a single ISR per line, it's not that difficult to 
achieve. But if there are a few ISRs, then one should analize and take 
into account all possible return values of all the ISRs, as each of them 
may affect others (e.g. if one returns CHAINED when another - HANDLED | 
ENABLE).

Which is somewhat contrary to the concept of shared interrupts, if we have to
take care of the global picture, why make them shared in the first place?
(I like the concept of shared interrupts, but it is important that the framework
gives a separation of concerns)

So my feeling is that CHAINED should not be used by drivers which 
registered their ISRs as SHARED.

Well, CHAINED should not be used by drivers which return ENABLE (and are of
course hence incompatible with true realtime IRQ's)


Moreover, I actually see the only scenario of CHAINED (I provided it 
before) :


all ISRs in the primary domain have reported UNHANDLED  =  nucleus 
propagates the interrupt down the pipeline with xnacrh_chain_irq().
This call actually returns 1 upon successful propagation (some domain 
down the pipeline was interested in this irq) and 0 otherwise.


Upon 0, this is a spurious irq (none of domains was interested in its 
handling).


ok, let's suppose now :

we have 2 ISRs on the same shared line :

isr1 : HANDLED (will be enabled by rt task. Note, rt task must call 
xnarch_end_irq() and not just xnarch_enable_irq()! )


isr2 : CHAINED

So HANDLED | CHAINED is ok for the single ISR on the line, but it may 
lead to HANDLED | CHAINED | ENABLE in a case of the shared line.


rt task that works jointly with isr1 just calls xnarch_end_irq() at some 
moment of time and some ISR in the linux domain does the same later  =  
the line is .end-ed 2 times.


ISR should never return CHAINED as to indicate _only_ that it is not 
interested in this irq, but ~HANDLED or NOINT (if we'll support it) instead.


If the ISR nevertheless wants to propagate the IRQ to the Linux domain 
_explicitly_, it _must not_ register itself as SHARED, i.e. it _must_ be 
the only ISR on this line, otherwise that may lead to the IRQ line being 
.end-ed twice (lost interrupts in some cases).



#define UNHANDLED 0
#define HANDLED_ENABLE 1
#define HANDLED_NOENABLE 2
#define PROPAGATE 3 



Yep, I'd agree with you. Moreover, PROPAGATE should not be used for 
shared interrupts.

My feeling is that it should be considered an error to attach a RT IRQ handler
to a line that has a Linux IRQ handler (this should be possible to check, since
/proc/interrupts contains the relevant information), unless a Linux IRQ-mask
function is installed. This IRQ-mask function should the be called:

  1. each time domains are switched
  2. each time an interrupt is generated

The IRQ-mask function should look something like:

unsigned int rt_irq_mask(struct ipipe_domain *ipd, unsigned int irq)
{
  int result = 0;
  static int enabled = true;
  int enable = enabled;

  if (irq = 0) {
// Interrupt has occured, we are about to run IRQ handlers
if (disable_early) {
  enable = false;
}
if (for_linux(irq)) {
  result = XN_ISR_CHAINED;
}
  } else if (ipd == ipipe_root_domain) {
// Entering Linux
enable = true;
  } else {
// Other doamin, block linux interrupts
enable = false;
  }
  if (enable != enabled) {
enabled = enable
if (enable) {
  // Enable Linux interrupts by unmasking appropriate
  // device registers (and possibly entire IRQ's)
} else {
  // Disable Linux interrupts

Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-21 Thread Anders Blomdell

Dmitry Adamushko wrote:


  Good point, leaves us with 2 possible return values for shared handlers:
 
HANDLED
NOT_HANDLED
 
  i.e. shared handlers should never defer the end'ing of the interrupt 
(which

  makes sense, since this would affect the other [shared] handlers).

HANDLED_NOEBNABLE could be supported too. 
Yes, but it breaks decoupling between shared handlers; interrupts will be 
deferred for all [shared] handlers until it is properly ended.


There would be no need in 
reenventing

a wheel, just do it the way Linux does it.
But it's about some additional re-designing of the current codebase
(e.g. nested calling for irq_enable/disable())
I'm not sure we do need it for something else rather than irq sharing 
code but it affects the rest of the code.


And we have a kind of wrong concept :

XN_ISR_ENABLE (or NOENABLE) corresponds to xnarch_end_irq().

Agree



But the later one is not only about enabling the line, but
on some archs - about .end-ing it too (sending EOI).

And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e.
EOI should always be sent from xnintr_shirq_handler().
But the one returning HANDLED_NOENABLE is likely to leave the interrupt 
asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE).




  Yes, should. And this should is best be handled by
 
  a) Documenting the potential conflict in the same place when describing
  the return values
 
  b) Placing some debug warning in the nucleus' IRQ trampoline function to
  bail out (once per line) when running into such situation
 
  But I'm against any further runtime restrictions, especially as most
  drivers will never return anything else than NOT_HANDLED or HANDLED.
  Actually, this was the reason why I tried to separate the NO_ENABLE and
  PROPAGATE features as *additional* bits from HANDLED and
  NOT_HANDLED/UNHANDLED/NOINT. But I acknowledge that having all valid bit
  combination present as constants can be more helpful for the user. We
  just have to draw some line between the standard values and the
  additional gurus return codes
 
(documentation: don't use NO_ENABLE or
  PROPAGATE unless you understand their side-effects and pitfalls 
precisely).


I agree with you on PROPAGATE case, but NO_ENABLE that, as pointed out 
above,

should (IMHO and at least, in theory) only mean keep the IRQ line disabled
(and have nothing to do with .end-ing the IRQ line) would be better 
supported.

But this is, again as was pointed out above, about some redesigning of the
current code = some overhead that likely affects non-shared aware code too.


So on one hand,

I'm ready to re-work code with :

HANDLED and UNHANDLED (or NOINT)

+ 2 additional bits : NOENABLE and PROPAGATE.

and document it like you suggested don't use NO_ENABLE or
PROPAGATE with shared interrupts
unless you understand their side-effects and pitfalls precisely;

on the other hand,

I'd say that I'm almost ready to vote against merging the irq sharing 
code at all as it looks to be a rather partial solution.
I vote for (even though I'm the one who complains the most), BUT I think it is 
important to keep the rules for using it simple (that's why I worry about the 
plethora of return-flags).



--

Regards

Anders



Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)

2006-02-20 Thread Anders Blomdell

Jan Kiszka wrote:

Hi Dmitry,

Dmitry Adamushko wrote:


Hi Jan,

let's make yet another revision of the bits :

new XN_ISR_HANDLED  == old XN_ISR_HANDLED + old XN_ISR_NO_ENABLE

ok.

new XN_ISR_NOENABLE == ~ old XN_ISR_ENABLE

ok.

new XN_ISR_PROPAGATE == XN_ISR_CHAINED

ok.




Just to make sure that you understand my weird ideas: each of the three
new XN_ISR_xxx above should be encoded with an individual bit



new XN_ISR_NOINT == ?

does it suppose the interrupt line to be .end-ed (enabled) and irq not to be
propagated? Should be so, I guess, if it's different from 5). Then nucleus
ignores implicit IRQ enable for 5) as well as for 3).

Do we really need that NOINT then, as it seems to be the same as ~HANDLED?

or NOINT == 0 and then it's a scalar value, not a bit.

So one may consider HANDLED == 1 and NOINT == 0 as really scalar values

and

NOENABLE and PROPAGATE as additional bits (used only if needed).




My idea is to urge the user specifying one of the base return types
(HANDLED or NOINT) + any of the two additional bits (NOENABLE and
PROPAGATE).

For correct drivers NOINT could be 0 indeed, but to check that the user
picked a new constant we may want to set NOINT != 0. With the old API
return 0 expressed HANDLED + ~ENABLE for the old API. With the new one
the user signals no interest and the nucleus may raise a warning that a
spurious IRQ occurred. So I would add a debug bit for NOINT here to
optionally (on OPT_XENO_DEBUG) detect old-style usage (return 0).
Moreover, we gain freedom to move bits in the future when every state is
encoded via constants. Or am I too paranoid here?
After reading the above discussion (of which I understand very little), and 
looking at (what I believe to be) the relevant code:


+intr = shirq-handlers;
+
+while (intr)
+{
+s |= intr-isr(intr);
+++intr-hits;
+intr = intr-next;
+}
+xnintr_shirq_unlock(shirq);
+
+--sched-inesting;
+
+if (s  XN_ISR_ENABLE)
+   xnarch_end_irq(irq);
+else if (s  XN_ISR_CHAINED)
+   xnarch_chain_irq(irq);

A number of questions arise:

1. What happens if one of the shared handlers leaves the interrupt asserted, 
returns NOENABLE|HANDLED and another return only HANDLED?


2. What happens if one returns PROPAGATE and another returns HANDLED?

As far as I can tell, after all RT handlers havve run, the following scenarios 
are possible:


1. The interrupt is deasserted (i.e. it was a RT interrupt)
2. The interrupt is still asserted, it will be deasserted later
   by some RT task (i.e. it was a RT interrupt)
3. The interrupt is still asserted and will be deasserted
   by the Linux IRQ handler.

IMHO that leads to the conclusion that the IRQ handlers should return a scalar

#define UNHANDLED 0
#define HANDLED_ENABLE 1
#define HANDLED_NOENABLE 2
#define PROPAGATE 3

and the loop should be

s = UNHANDLED
while (intr) {
  tmp = intr-isr(intr);
  if (tmp  s) { s = tmp; }
  intr = intr-next;
}
if (s == PROPAGATE) {
  xnarch_chain_irq(irq);
} else if (s == HANDLED_ENABLE) {
  xnarch_end_irq(irq);
}

To be really honest, I think that PROPAGATE should be excluded from the RT 
IRQ-handlers, since with the current scheme all RT-handlers has to test if the 
IRQ was a Linux interrupt (otherwise the system will only work when the handler 
that returns PROPAGATE is installed)


--

Anders



Re: [Xenomai-core] More on Shared interrupts

2006-02-10 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


For the last few days, I have tried to figure out a good way to share
interrupts between RT and non-RT domains. This has included looking
through Dmitry's patch, correcting bugs and testing what is possible in
my specific case. I'll therefore try to summarize at least a few of my
thoughts.

1. When looking through Dmitry's patch I get the impression that the
iack handler has very little to do with each interrupt (the test
'prev-iack != intr-iack' is a dead giveaway), but is more of a
domain-specific function (or perhaps even just a placeholder for the
hijacked Linux ack-function).


2. Somewhat inspired by the figure in Life with Adeos, I have
identified the following cases:

 irq K  | --- | ---o|   // Linux only
 ...
 irq L  | ---o| |   // RT-only
 ...
 irq M  | ---o--- | ---o|   // Shared between domains
 ...
 irq N  | ---o---o--- | |   // Shared inside single domain
 ...
 irq O  | ---o---o--- | ---o|   // Shared between and inside single
domain

Xenomai currently handles the K  L cases, Dmitrys patch addresses the N
case, with edge triggered interrupts the M (and O after Dmitry's patch)
case(s) might be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE
from the interrupt handler, for level triggered interrupt the M and O
cases can't be handled.



I guess you mean it the other way around: for the edge-triggered
cross-domain case we would actually have to loop over both the RT and
the Linux handlers until we are sure, that the IRQ line was released once.

I obviously has misunderstood edge triggered :-(


Luckily, I never saw such a scenario which were unavoidable (it hits you
with ISA hardware which tend to have nice IRQ jumpers or other means -
it's just that you often cannot divide several controllers on the same
extension card IRQ-wise apart).



If one looks more closely at the K case (Linux only interrupt), it works
by when an interrupt occurs, the call to irq_end is postponed until the
Linux interrupt handler has run, i.e. further interrupts are disabled.
This can be seen as a lazy version of Philippe's idea of disabling all
non-RT interrupts until the RT-domain is idle, i.e. the interrupt is
disabled only if it indeed occurs.

If this idea should be generalized to the M (and O) case(s), one can't
rely on postponing the irq_end call (since the interrupt is still needed
in the RT-domain), but has to rely on some function that disables all
non-RT hardware that generates interrupts on that irq-line; such a
function naturally has to have intimate knowledge of all hardware that
can generate interrupts in order to be able to disable those interrupt
sources that are non-RT.

If we then take Jan's observation about the many (Linux-only) interrupts
present in an ordinary PC and add it to Philippe's idea of disabling all
non-RT interrupts while executing in the RT-domain, I think that the
following is a workable (and fairly efficient) way of handling this:

Add hardware dependent enable/disable functions, where the enable is
called just before normal execution in a domain starts (i.e. when
playing back interrupts, the disable is still in effect), and disable is
called when normal domain execution end. This does effectively handle
the K case above, with the added benefit that NO non-RT interrupts will
occur during RT execution.

In the 8259 case, the disable function could look something like:

 domain_irq_disable(uint irqmask) {
   if (irqmask  0xff00 != 0xff00) {
 irqmask = ~0x0004; // Cascaded interrupt is still needed
 outb(irqmask  8, PIC_SLAVE_IMR);
   }
   outb(irqmask, PIC_MASTER_IMR);
 }

If we should extend this to handle the M (and O) case(s), the disable
function could look like:

 domain_irq_disable(uint irqmask, shared_irq_t *shared[]) {
   int i;

   for (i = 0 ; i  MAX_IRQ ; i++) {
 if (shared[i]) {
   shared_irq_t *next = shared[i];
   irqmask = ~(1i);
   while (next) {
 next-disable();
 next = next-next;
   }



This obviously means that all non-RT IRQ handlers sharing a line with
the RT domain would have to be registered in that shared[]-list. This
gets close to my old suggestion. Just raises the question how to
organise these interface, both on the RT and the Linux side.



 }
   }
   if (irqmask  0xff00 != 0xff00) {
 irqmask = ~0x0004; // Cascaded interrupt is still needed
 outb(irqmask  8, PIC_SLAVE_IMR);
   }
   outb(irqmask, PIC_MASTER_IMR);
 }

An obvious optimization of the above scheme, is to never call the
disable (or enable) function for the RT-domain, since there all
interrupt processing is protected by the hardware.



Another point is to avoid that looping over disable handlers for IRQs of
the K case. Otherwise, too many device-specific disable handlers had to
be implemented even if only a single Linux device hogs a RT IRQ.
You only have to spin over those IRQ that are actually shared across domains 
(probably just a few in most

Re: [Xenomai-core] More on Shared interrupts

2006-02-10 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


For the last few days, I have tried to figure out a good way to share
interrupts between RT and non-RT domains. This has included looking
through Dmitry's patch, correcting bugs and testing what is possible in
my specific case. I'll therefore try to summarize at least a few of my
thoughts.

1. When looking through Dmitry's patch I get the impression that the
iack handler has very little to do with each interrupt (the test
'prev-iack != intr-iack' is a dead giveaway), but is more of a
domain-specific function (or perhaps even just a placeholder for the
hijacked Linux ack-function).


2. Somewhat inspired by the figure in Life with Adeos, I have
identified the following cases:

 irq K  | --- | ---o|   // Linux only
 ...
 irq L  | ---o| |   // RT-only
 ...
 irq M  | ---o--- | ---o|   // Shared between domains
 ...
 irq N  | ---o---o--- | |   // Shared inside single domain
 ...
 irq O  | ---o---o--- | ---o|   // Shared between and inside single
domain

Xenomai currently handles the K  L cases, Dmitrys patch addresses the N
case, with edge triggered interrupts the M (and O after Dmitry's patch)
case(s) might be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE
from the interrupt handler, for level triggered interrupt the M and O
cases can't be handled.



I guess you mean it the other way around: for the edge-triggered
cross-domain case we would actually have to loop over both the RT and
the Linux handlers until we are sure, that the IRQ line was released once.

I obviously has misunderstood edge triggered :-(


Luckily, I never saw such a scenario which were unavoidable (it hits you
with ISA hardware which tend to have nice IRQ jumpers or other means -
it's just that you often cannot divide several controllers on the same
extension card IRQ-wise apart).



If one looks more closely at the K case (Linux only interrupt), it works
by when an interrupt occurs, the call to irq_end is postponed until the
Linux interrupt handler has run, i.e. further interrupts are disabled.
This can be seen as a lazy version of Philippe's idea of disabling all
non-RT interrupts until the RT-domain is idle, i.e. the interrupt is
disabled only if it indeed occurs.

If this idea should be generalized to the M (and O) case(s), one can't
rely on postponing the irq_end call (since the interrupt is still needed
in the RT-domain), but has to rely on some function that disables all
non-RT hardware that generates interrupts on that irq-line; such a
function naturally has to have intimate knowledge of all hardware that
can generate interrupts in order to be able to disable those interrupt
sources that are non-RT.

If we then take Jan's observation about the many (Linux-only) interrupts
present in an ordinary PC and add it to Philippe's idea of disabling all
non-RT interrupts while executing in the RT-domain, I think that the
following is a workable (and fairly efficient) way of handling this:

Add hardware dependent enable/disable functions, where the enable is
called just before normal execution in a domain starts (i.e. when
playing back interrupts, the disable is still in effect), and disable is
called when normal domain execution end. This does effectively handle
the K case above, with the added benefit that NO non-RT interrupts will
occur during RT execution.

In the 8259 case, the disable function could look something like:

 domain_irq_disable(uint irqmask) {
   if (irqmask  0xff00 != 0xff00) {
 irqmask = ~0x0004; // Cascaded interrupt is still needed
 outb(irqmask  8, PIC_SLAVE_IMR);
   }
   outb(irqmask, PIC_MASTER_IMR);
 }

If we should extend this to handle the M (and O) case(s), the disable
function could look like:

 domain_irq_disable(uint irqmask, shared_irq_t *shared[]) {
   int i;

   for (i = 0 ; i  MAX_IRQ ; i++) {
 if (shared[i]) {
   shared_irq_t *next = shared[i];
   irqmask = ~(1i);
   while (next) {
 next-disable();
 next = next-next;
   }



This obviously means that all non-RT IRQ handlers sharing a line with
the RT domain would have to be registered in that shared[]-list. This
gets close to my old suggestion. Just raises the question how to
organise these interface, both on the RT and the Linux side.



 }
   }
   if (irqmask  0xff00 != 0xff00) {
 irqmask = ~0x0004; // Cascaded interrupt is still needed
 outb(irqmask  8, PIC_SLAVE_IMR);
   }
   outb(irqmask, PIC_MASTER_IMR);
 }

An obvious optimization of the above scheme, is to never call the
disable (or enable) function for the RT-domain, since there all
interrupt processing is protected by the hardware.



Another point is to avoid that looping over disable handlers for IRQs of
the K case. Otherwise, too many device-specific disable handlers had to
be implemented even if only a single Linux device hogs a RT IRQ.
You only have to spin over those IRQ that are actually shared across domains 
(probably just a few in most

Re: [Xenomai-core] [Combo-PATCH] Shared interrupts (final)

2006-02-09 Thread Anders Blomdell

Philippe Gerum wrote:

Jan Kiszka wrote:


Wolfgang Grandegger wrote:


Hello,

Dmitry Adamushko wrote:


Hi,

this is the final set of patches against the SVN trunk of 2006-02-03.

It addresses mostly remarks concerning naming (XN_ISR_ISA -
XN_ISR_EDGE), a few cleanups and updated comments.

Functionally, the support for shared interrupts (a few flags) to the




Not directly your fault: the increasing number of return flags for IRQ
handlers makes me worry that they are used correctly. I can figure out
what they mean (not yet that clearly from the docs), but does someone
else understand all this:

- RT_INTR_HANDLED



ISR says it has handled the IRQ, and does not want any propagation to 
take place down the pipeline. IOW, the IRQ processing stops there.
This says that the interrupt will be -end'ed at some later time (perhaps in the 
interrupt handler task)



- RT_INTR_CHAINED



ISR says it wants the IRQ to be propagated down the pipeline. Nothing is 
said about the fact that the last ISR did or did not handle the IRQ 
locally; this is irrelevant.
This says that the interrupt will eventually be -end'ed by some later stage in 
the pipeline.



- RT_INTR_ENABLE



ISR requests the interrupt dispatcher to re-enable the IRQ line upon 
return (cumulable with HANDLED/CHAINED).

This says that the interrupt will be -end'ed when this interrupt handler 
returns.




- RT_INTR_NOINT



This new one comes from Dmitry's patch for shared IRQ support AFAICS. It 
would mean to continue processing the chain of handlers because the last 
ISR invoked was not concerned by the outstanding IRQ.

Sounds like RT_INTR_CHAINED, except that it's for the current pipeline stage?

Now for the quiz question (powerpc arch):

  1. Assume an edge triggered interrupt
  2. The RT-handler returns RT_INTR_ENABLE | RT_INTR_ENABLE (i.e. shared
 interrupt, but no problem since it's edge-triggered)
  3. Interrupt gets -end'ed right after RT-handler has returned
  4. Linux interrupt eventually handler starts its -end() handler:
local_irq_save_hw(flags);
if (!(irq_desc[irq].status  (IRQ_DISABLED | IRQ_INPROGRESS)))
  ipipe_irq_unlock(irq);
// Next interrupt occurs here!
__ipipe_std_irq_dtype[irq].end(irq);
local_irq_restore_hw(flags);


Wouldn't this lead to a lost interrupt? Or am I overly paranoid?
My distinct feeling is that the return value should be a scalar and not a set!

...

I would vote for the (already scheduled?) extension to register an
optimised IRQ trampoline in case there is actually no sharing taking
place. This would also make the if (irq == XNARCH_TIMER_IRQ) path
obsolete.



I support that. Shared interrupts should be handled properly by Xeno 
since such - I'd say last resort - configuration could be needed; this 
said, we should not see this as the rule but rather as the exception, 
since this is basically required to solve some underlying hw limitations 
wrt interrupt management, and definitely has a significant cost on 
processing each shared IRQ wrt determinism.


Incidentally, there is an interesting optimization on the project's todo 
list 

Is this todo list accessible anywhere?

 that would allow non-RT interrupts to be masked at IC level when
the Xenomai domain is active. We could do that on any arch with 
civilized interrupt management, and that would prevent any asynchronous 
diversion from the critical code when Xenomai is running RT tasks 
(kernel or user-space). Think of this as some hw-controlled interrupt 
shield. Since this feature requires to be able to individually mask each 
interrupt source at IC level, there should be no point in sharing fully 
vectored interrupts in such a configuration anyway. This fact also 
pleads for having the shared IRQ support as a build-time option.


--
Anders Blomdell

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] More on Shared interrupts

2006-02-09 Thread Anders Blomdell
For the last few days, I have tried to figure out a good way to share interrupts 
between RT and non-RT domains. This has included looking through Dmitry's patch, 
correcting bugs and testing what is possible in my specific case. I'll therefore 
try to summarize at least a few of my thoughts.


1. When looking through Dmitry's patch I get the impression that the iack 
handler has very little to do with each interrupt (the test 'prev-iack != 
intr-iack' is a dead giveaway), but is more of a domain-specific function (or 
perhaps even just a placeholder for the hijacked Linux ack-function).



2. Somewhat inspired by the figure in Life with Adeos, I have identified the 
following cases:


  irq K  | --- | ---o|   // Linux only
  ...
  irq L  | ---o| |   // RT-only
  ...
  irq M  | ---o--- | ---o|   // Shared between domains
  ...
  irq N  | ---o---o--- | |   // Shared inside single domain
  ...
  irq O  | ---o---o--- | ---o|   // Shared between and inside single domain

Xenomai currently handles the K  L cases, Dmitrys patch addresses the N case, 
with edge triggered interrupts the M (and O after Dmitry's patch) case(s) might 
be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE from the interrupt 
handler, for level triggered interrupt the M and O cases can't be handled.


If one looks more closely at the K case (Linux only interrupt), it works by when 
an interrupt occurs, the call to irq_end is postponed until the Linux interrupt 
handler has run, i.e. further interrupts are disabled. This can be seen as a 
lazy version of Philippe's idea of disabling all non-RT interrupts until the 
RT-domain is idle, i.e. the interrupt is disabled only if it indeed occurs.


If this idea should be generalized to the M (and O) case(s), one can't rely on 
postponing the irq_end call (since the interrupt is still needed in the 
RT-domain), but has to rely on some function that disables all non-RT hardware 
that generates interrupts on that irq-line; such a function naturally has to 
have intimate knowledge of all hardware that can generate interrupts in order to 
be able to disable those interrupt sources that are non-RT.


If we then take Jan's observation about the many (Linux-only) interrupts present 
in an ordinary PC and add it to Philippe's idea of disabling all non-RT 
interrupts while executing in the RT-domain, I think that the following is a 
workable (and fairly efficient) way of handling this:


Add hardware dependent enable/disable functions, where the enable is called just 
before normal execution in a domain starts (i.e. when playing back interrupts, 
the disable is still in effect), and disable is called when normal domain 
execution end. This does effectively handle the K case above, with the added 
benefit that NO non-RT interrupts will occur during RT execution.


In the 8259 case, the disable function could look something like:

  domain_irq_disable(uint irqmask) {
if (irqmask  0xff00 != 0xff00) {
  irqmask = ~0x0004; // Cascaded interrupt is still needed
  outb(irqmask  8, PIC_SLAVE_IMR);
}
outb(irqmask, PIC_MASTER_IMR);
  }

If we should extend this to handle the M (and O) case(s), the disable function 
could look like:


  domain_irq_disable(uint irqmask, shared_irq_t *shared[]) {
int i;

for (i = 0 ; i  MAX_IRQ ; i++) {
  if (shared[i]) {
shared_irq_t *next = shared[i];
irqmask = ~(1i);
while (next) {
  next-disable();
  next = next-next;
}
  }
}
if (irqmask  0xff00 != 0xff00) {
  irqmask = ~0x0004; // Cascaded interrupt is still needed
  outb(irqmask  8, PIC_SLAVE_IMR);
}
outb(irqmask, PIC_MASTER_IMR);
  }

An obvious optimization of the above scheme, is to never call the disable (or 
enable) function for the RT-domain, since there all interrupt processing is 
protected by the hardware.


Comments, anyone?

--

Anders


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Combo-PATCH] Shared interrupts (final)

2006-02-09 Thread Anders Blomdell

Philippe Gerum wrote:

Jan Kiszka wrote:


Wolfgang Grandegger wrote:


Hello,

Dmitry Adamushko wrote:


Hi,

this is the final set of patches against the SVN trunk of 2006-02-03.

It addresses mostly remarks concerning naming (XN_ISR_ISA -
XN_ISR_EDGE), a few cleanups and updated comments.

Functionally, the support for shared interrupts (a few flags) to the




Not directly your fault: the increasing number of return flags for IRQ
handlers makes me worry that they are used correctly. I can figure out
what they mean (not yet that clearly from the docs), but does someone
else understand all this:

- RT_INTR_HANDLED



ISR says it has handled the IRQ, and does not want any propagation to 
take place down the pipeline. IOW, the IRQ processing stops there.
This says that the interrupt will be -end'ed at some later time (perhaps in the 
interrupt handler task)



- RT_INTR_CHAINED



ISR says it wants the IRQ to be propagated down the pipeline. Nothing is 
said about the fact that the last ISR did or did not handle the IRQ 
locally; this is irrelevant.
This says that the interrupt will eventually be -end'ed by some later stage in 
the pipeline.



- RT_INTR_ENABLE



ISR requests the interrupt dispatcher to re-enable the IRQ line upon 
return (cumulable with HANDLED/CHAINED).

This says that the interrupt will be -end'ed when this interrupt handler 
returns.




- RT_INTR_NOINT



This new one comes from Dmitry's patch for shared IRQ support AFAICS. It 
would mean to continue processing the chain of handlers because the last 
ISR invoked was not concerned by the outstanding IRQ.

Sounds like RT_INTR_CHAINED, except that it's for the current pipeline stage?

Now for the quiz question (powerpc arch):

  1. Assume an edge triggered interrupt
  2. The RT-handler returns RT_INTR_ENABLE | RT_INTR_ENABLE (i.e. shared
 interrupt, but no problem since it's edge-triggered)
  3. Interrupt gets -end'ed right after RT-handler has returned
  4. Linux interrupt eventually handler starts its -end() handler:
local_irq_save_hw(flags);
if (!(irq_desc[irq].status  (IRQ_DISABLED | IRQ_INPROGRESS)))
  ipipe_irq_unlock(irq);
// Next interrupt occurs here!
__ipipe_std_irq_dtype[irq].end(irq);
local_irq_restore_hw(flags);


Wouldn't this lead to a lost interrupt? Or am I overly paranoid?
My distinct feeling is that the return value should be a scalar and not a set!

...

I would vote for the (already scheduled?) extension to register an
optimised IRQ trampoline in case there is actually no sharing taking
place. This would also make the if (irq == XNARCH_TIMER_IRQ) path
obsolete.



I support that. Shared interrupts should be handled properly by Xeno 
since such - I'd say last resort - configuration could be needed; this 
said, we should not see this as the rule but rather as the exception, 
since this is basically required to solve some underlying hw limitations 
wrt interrupt management, and definitely has a significant cost on 
processing each shared IRQ wrt determinism.


Incidentally, there is an interesting optimization on the project's todo 
list 

Is this todo list accessible anywhere?

 that would allow non-RT interrupts to be masked at IC level when
the Xenomai domain is active. We could do that on any arch with 
civilized interrupt management, and that would prevent any asynchronous 
diversion from the critical code when Xenomai is running RT tasks 
(kernel or user-space). Think of this as some hw-controlled interrupt 
shield. Since this feature requires to be able to individually mask each 
interrupt source at IC level, there should be no point in sharing fully 
vectored interrupts in such a configuration anyway. This fact also 
pleads for having the shared IRQ support as a build-time option.


--
Anders Blomdell



[Xenomai-core] More on Shared interrupts

2006-02-09 Thread Anders Blomdell
For the last few days, I have tried to figure out a good way to share interrupts 
between RT and non-RT domains. This has included looking through Dmitry's patch, 
correcting bugs and testing what is possible in my specific case. I'll therefore 
try to summarize at least a few of my thoughts.


1. When looking through Dmitry's patch I get the impression that the iack 
handler has very little to do with each interrupt (the test 'prev-iack != 
intr-iack' is a dead giveaway), but is more of a domain-specific function (or 
perhaps even just a placeholder for the hijacked Linux ack-function).



2. Somewhat inspired by the figure in Life with Adeos, I have identified the 
following cases:


  irq K  | --- | ---o|   // Linux only
  ...
  irq L  | ---o| |   // RT-only
  ...
  irq M  | ---o--- | ---o|   // Shared between domains
  ...
  irq N  | ---o---o--- | |   // Shared inside single domain
  ...
  irq O  | ---o---o--- | ---o|   // Shared between and inside single domain

Xenomai currently handles the K  L cases, Dmitrys patch addresses the N case, 
with edge triggered interrupts the M (and O after Dmitry's patch) case(s) might 
be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE from the interrupt 
handler, for level triggered interrupt the M and O cases can't be handled.


If one looks more closely at the K case (Linux only interrupt), it works by when 
an interrupt occurs, the call to irq_end is postponed until the Linux interrupt 
handler has run, i.e. further interrupts are disabled. This can be seen as a 
lazy version of Philippe's idea of disabling all non-RT interrupts until the 
RT-domain is idle, i.e. the interrupt is disabled only if it indeed occurs.


If this idea should be generalized to the M (and O) case(s), one can't rely on 
postponing the irq_end call (since the interrupt is still needed in the 
RT-domain), but has to rely on some function that disables all non-RT hardware 
that generates interrupts on that irq-line; such a function naturally has to 
have intimate knowledge of all hardware that can generate interrupts in order to 
be able to disable those interrupt sources that are non-RT.


If we then take Jan's observation about the many (Linux-only) interrupts present 
in an ordinary PC and add it to Philippe's idea of disabling all non-RT 
interrupts while executing in the RT-domain, I think that the following is a 
workable (and fairly efficient) way of handling this:


Add hardware dependent enable/disable functions, where the enable is called just 
before normal execution in a domain starts (i.e. when playing back interrupts, 
the disable is still in effect), and disable is called when normal domain 
execution end. This does effectively handle the K case above, with the added 
benefit that NO non-RT interrupts will occur during RT execution.


In the 8259 case, the disable function could look something like:

  domain_irq_disable(uint irqmask) {
if (irqmask  0xff00 != 0xff00) {
  irqmask = ~0x0004; // Cascaded interrupt is still needed
  outb(irqmask  8, PIC_SLAVE_IMR);
}
outb(irqmask, PIC_MASTER_IMR);
  }

If we should extend this to handle the M (and O) case(s), the disable function 
could look like:


  domain_irq_disable(uint irqmask, shared_irq_t *shared[]) {
int i;

for (i = 0 ; i  MAX_IRQ ; i++) {
  if (shared[i]) {
shared_irq_t *next = shared[i];
irqmask = ~(1i);
while (next) {
  next-disable();
  next = next-next;
}
  }
}
if (irqmask  0xff00 != 0xff00) {
  irqmask = ~0x0004; // Cascaded interrupt is still needed
  outb(irqmask  8, PIC_SLAVE_IMR);
}
outb(irqmask, PIC_MASTER_IMR);
  }

An obvious optimization of the above scheme, is to never call the disable (or 
enable) function for the RT-domain, since there all interrupt processing is 
protected by the hardware.


Comments, anyone?

--

Anders




[Xenomai-core] [PATCH] Slow is faster arch/ppc/syslib/open_pic.c

2006-02-07 Thread Anders Blomdell
When trying to run Xenomai on PowerPC with OpenPIC, I have (finally) found that 
interrupt latency is much improved with the following patch:




--- arch/ppc/syslib/open_pic.c~ 2006-01-08 03:15:24.0 +0100
+++ arch/ppc/syslib/open_pic.c  2006-02-07 16:56:14.0 +0100
@@ -820,7 +820,7 @@
  */
 static void openpic_ack_irq(unsigned int irq_nr)
 {
-#ifdef __SLOW_VERSION__
+#if defined(__SLOW_VERSION__) || defined(CONFIG_IPIPE)
openpic_disable_irq(irq_nr);
openpic_eoi();
 #else
@@ -831,7 +831,7 @@

 static void openpic_end_irq(unsigned int irq_nr)
 {
-#ifdef __SLOW_VERSION__
+#if defined(__SLOW_VERSION__) || defined(CONFIG_IPIPE)
if (!(irq_desc[irq_nr].status  (IRQ_DISABLED|IRQ_INPROGRESS))
 irq_desc[irq_nr].action)
openpic_enable_irq(irq_nr);



The reason for this, is that the fast version doesn't call openpic_eoi until the 
interrupt is ended, which means that all RT-interrupts are delayed by a pending 
Linux interrupt.


--

Regards

Anders Blomdell

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [BUG] problems with adeos-ipipe-2.6.14-ppc-1.2-00.patch

2006-02-06 Thread Anders Blomdell

When trying to patch with latest version of this patch, I get:

patching file include/asm-ppc/ipipe.h
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 149.
Hunk #3 FAILED at 160.
Hunk #4 FAILED at 195.

Problem seems to be at line 4168 in the patch, where it says

@@ -0,1 +1,179 @@

but the old [working] patch said

@@ -0,0 +1,178 @@

Seems like the patch is created againt a not totally clean distribution.

--

Regards

Anders Blomdell




___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] some results on my laptop

2006-02-03 Thread Anders Blomdell

Jan Kiszka wrote:

Jan Kiszka wrote:


...
What about other time sources on x86? Which systems already have HPET
these days, and does this source not suffer from frequency scaling? I
once read that HPET is quite easy to program, is this true? IOW, would
it be worth considering to add this to the HAL?



There are actually only few registers:

http://www.intel.com/hardwaredesign/hpetspec_1.pdf

Even a replacement for the TSC is available (Main Counter), but I
guess that some effort will be required to replace all direct usages of
rdtsc in the current Xenomai code, right?
And unfortunately they aren't guaranteed to survive S3 sleep, which laptops 
spend a lot of time in (around 50% when doing coantrol at 100 Hz).


--
Anders



[Xenomai-core] [BUG] version mismatch

2006-02-01 Thread Anders Blomdell

in ksrc/arch/powerpc/patches/adeos-ipipe-2.6.14-ppc-1.2-00.patch:

  #define IPIPE_ARCH_STRING1.1-02

shouldn't this be

  #define IPIPE_ARCH_STRING1.2-00

--

Anders Blomdell

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Are XN_ISR_CHAINED and XN_ISR_ENABLE mutually exclusive?

2006-02-01 Thread Anders Blomdell
While looking into how to implement sharing of interrupts between realtime and 
non-realtime domains (and applying Wolfgang Grandegger's patch 
[https://mail.gna.org/public/xenomai-core/2006-01/msg00233.html], which is 
necessary to make XN_ISR_ENABLE work at all on the PowerPC platform), I'm 
beginning to think that XN_ISR_CHAINED and XN_ISR_ENABLE are mutually exclusive, 
since if both are set, desc-handler-end will be called twice:


  1. When the realtime isr handler returns
  2. When the Linux domain calls it in __do_IRQ

In the solution I have in mind at the moment, I will:

  1. Add an extra iend handler argument to xnintr_init
  2. If XN_ISR_ENABLE is returned from the isr handler,
 replace desc-handler-end with the user supplied
 iend handler.

Hereby I hope to be able to handle interrupts shared between realtime and 
non-realtime domain, without having the realtime domain wait for all 
non-realtime interrupts to finish. This is the scenario I'm thinking of:


  1. A non-RT interrupt occurs
  2. The (RT) isr handler detects the non-RT interrupt,
 disables further non-RT interrupts on that irq-vector, replaces
 desc-handler-end with the user supplied iend handler,
 returns XN_ISR_CHAINED | XN_ISR_ENABLE.
  3. RT interrupts are serviced by the (RT) isr handler,
 returns XN_ISR_ENABLE
  4. The Linux domain get a chance to run the chained interrupt,
 and eventually calls desc-handler-end (supplied iend handler)
  5. The iend handler reenables non-RT interrupts.

Comments on the above are most welcome!

--

Anders Blomdell

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Are XN_ISR_CHAINED and XN_ISR_ENABLE mutually exclusive?

2006-02-01 Thread Anders Blomdell

Anders Blomdell wrote:

Jan Kiszka wrote:


Anders Blomdell wrote:


While looking into how to implement sharing of interrupts between
realtime and non-realtime domains (and applying Wolfgang Grandegger's
patch [https://mail.gna.org/public/xenomai-core/2006-01/msg00233.html],
which is necessary to make XN_ISR_ENABLE work at all on the PowerPC
platform), I'm beginning to think that XN_ISR_CHAINED and XN_ISR_ENABLE
are mutually exclusive, since if both are set, desc-handler-end will
be called twice:

 1. When the realtime isr handler returns
 2. When the Linux domain calls it in __do_IRQ




Yes, those bits are semantically exclusive. Actually, I think passing
both bits could even cause deadlocks if the RT-IRQ is raised again
before the non-RT handler got a chance to clear the IRQ source in 
hardware.


My impression as well, but it's nowhere documented, nor enforced in the 
code.






In the solution I have in mind at the moment, I will:

 1. Add an extra iend handler argument to xnintr_init
 2. If XN_ISR_ENABLE is returned from the isr handler,
replace desc-handler-end with the user supplied
iend handler.

Hereby I hope to be able to handle interrupts shared between realtime
and non-realtime domain, without having the realtime domain wait for all
non-realtime interrupts to finish. This is the scenario I'm thinking of:

 1. A non-RT interrupt occurs
 2. The (RT) isr handler detects the non-RT interrupt,
disables further non-RT interrupts on that irq-vector, replaces




This remains vague to me. How precisely will you disable? I guess at
hardware level, i.e. in a (non-RT) device-specific way: switch off the
bit in some hardware register that says this device can produce IRQs,
right?


Yes.





desc-handler-end with the user supplied iend handler,
returns XN_ISR_CHAINED | XN_ISR_ENABLE.
 3. RT interrupts are serviced by the (RT) isr handler,
returns XN_ISR_ENABLE
 4. The Linux domain get a chance to run the chained interrupt,
and eventually calls desc-handler-end (supplied iend handler)
 5. The iend handler reenables non-RT interrupts.




Then this would switch on that bit again? Note that this may require to
synchronise the hardware access with parts of the non-RT driver.


If the non-RT driver sets that bit in its ISR routine, yes. I have the 
(overly optimistic?) view that the non-RT ISR only does whatever is 
necessary to clear the interrupt and leaves the enable/disable bits 
untouched.
Or perhaps the whole conceptis of no interest to others, and I should put this 
arbitration in the platform specific part (arch/ppc/platform/prpmc800.c) and 
consider the harrier chip as a cascaded interrupt controller, and handle it as such?


--

Anders Blomdell




[Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if 
the interrupt handler takes too long (i.e. next interrupt gets generated before 
the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


Any ideas of where to look?

Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Anders Blomdell
In the following code (ppc), shouldn't first be either declared static or 
deleted? To me it looks like first is always equal to one when the else clause 
is evaluated.


asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) = 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2  first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain 
!test_bit(IPIPE_STALL_FLAG,
  ipipe_root_domain-cpudata[cpuid].status));
}


Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 

You are so right!

 What does the kernel state before these lines?

[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0, 
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt is a 
shared one (Harrier chip, Functional Exception), that is used for both 
message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ 
handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), 
because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has 
left the interrupts turned off.


What I believe should be done, is

  1. When UART interrupt is received, disable further non-RT interrupts
 on this IRQ-line, pend interrupt to Linux.
  2. Handle RT interrupts on this IRQ line
  3. When Linux has finished the pended interrupt, reenable non-RT interrupts.

but I have neither been able to achieve this, nor to verify that it is the right 
thing to do...


Regards

Anders Blomdell


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


Jan Kiszka wrote:


Anders Blomdell wrote:



On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 


You are so right!



What does the kernel state before these lines?


[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0,
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt
is a shared one (Harrier chip, Functional Exception), that is used for
both message-passing (should be RT) and UART (Linux, i.e. non-RT), my
current IRQ handler always pends the interrupt to the linux domain
(RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when
it wasn't a UART interrupt) has left the interrupts turned off.

What I believe should be done, is

 1. When UART interrupt is received, disable further non-RT interrupts
on this IRQ-line, pend interrupt to Linux.
 2. Handle RT interrupts on this IRQ line
 3. When Linux has finished the pended interrupt, reenable non-RT
interrupts.

but I have neither been able to achieve this, nor to verify that it is
the right thing to do...



Your approach is basically what I proposed some years back on rtai-dev
for handling unresolvable shared RT/NRT IRQs. I once successfully tested
such a setup with two network cards, one RT, the other Linux.

So when you are really doomed and cannot change the IRQ line of your RT
device, this is a kind of emergency workaround. Not nice and generic
(you have to write the stub for disabling the NRT IRQ source), but it
should work.

I'm doomed, the interrupts live in the same chip...
The problem is that I have not found any good place to reenable the non-RT 
interrupts.



Anyway, I do not understand what made your spinlock recurs. This shared
IRQ scenario should only cause indeterminism to the RT driver (by
blocking the line until the Linux handler can release it), but it must
not trigger this bug.

OK, seems like  have two problems then, I'll try to hunt it down


/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if 
the interrupt handler takes too long (i.e. next interrupt gets generated before 
the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


Any ideas of where to look?

Regards

Anders Blomdell





[Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Anders Blomdell
In the following code (ppc), shouldn't first be either declared static or 
deleted? To me it looks like first is always equal to one when the else clause 
is evaluated.


asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) = 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2  first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain 
!test_bit(IPIPE_STALL_FLAG,
  ipipe_root_domain-cpudata[cpuid].status));
}


Regards

Anders Blomdell





Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 

You are so right!

 What does the kernel state before these lines?

[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0, 
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt is a 
shared one (Harrier chip, Functional Exception), that is used for both 
message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ 
handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), 
because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has 
left the interrupts turned off.


What I believe should be done, is

  1. When UART interrupt is received, disable further non-RT interrupts
 on this IRQ-line, pend interrupt to Linux.
  2. Handle RT interrupts on this IRQ line
  3. When Linux has finished the pended interrupt, reenable non-RT interrupts.

but I have neither been able to achieve this, nor to verify that it is the right 
thing to do...


Regards

Anders Blomdell




Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


Jan Kiszka wrote:


Anders Blomdell wrote:



On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 


You are so right!



What does the kernel state before these lines?


[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0,
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt
is a shared one (Harrier chip, Functional Exception), that is used for
both message-passing (should be RT) and UART (Linux, i.e. non-RT), my
current IRQ handler always pends the interrupt to the linux domain
(RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when
it wasn't a UART interrupt) has left the interrupts turned off.

What I believe should be done, is

 1. When UART interrupt is received, disable further non-RT interrupts
on this IRQ-line, pend interrupt to Linux.
 2. Handle RT interrupts on this IRQ line
 3. When Linux has finished the pended interrupt, reenable non-RT
interrupts.

but I have neither been able to achieve this, nor to verify that it is
the right thing to do...



Your approach is basically what I proposed some years back on rtai-dev
for handling unresolvable shared RT/NRT IRQs. I once successfully tested
such a setup with two network cards, one RT, the other Linux.

So when you are really doomed and cannot change the IRQ line of your RT
device, this is a kind of emergency workaround. Not nice and generic
(you have to write the stub for disabling the NRT IRQ source), but it
should work.

I'm doomed, the interrupts live in the same chip...
The problem is that I have not found any good place to reenable the non-RT 
interrupts.



Anyway, I do not understand what made your spinlock recurs. This shared
IRQ scenario should only cause indeterminism to the RT driver (by
blocking the line until the Linux handler can release it), but it must
not trigger this bug.

OK, seems like  have two problems then, I'll try to hunt it down


/Anders



[Xenomai-core] [PATCH] Fix to RTDM open problems

2006-01-27 Thread Anders Blomdell

When RTDM is exposed to code like this:

  device1 = rt_dev_open(some_device, O_RDWR);
  device2 = rt_dev_open(some_device, O_RDWR);

I get a SEGFAULT, which I attribute to a missing assignment to context_ptr in 
the case when the device is already busy, the lack of this assignment leads to a 
segfault in cleanup_instance.



--- xenomai-2.1-rc2/ksrc/skins/rtdm/core.c~ 2006-01-07 18:08:34.0 
+0100
+++ xenomai-2.1-rc2/ksrc/skins/rtdm/core.c  2006-01-27 11:14:43.0 
+0100
@@ -136,6 +136,8 @@

 if (context-device) {
 xnlock_put_irqrestore(rt_dev_lock, s);
+
+*context_ptr = NULL;
 return -EBUSY;
 }
 context-device = device;


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH] Fix to RTDM open problems

2006-01-27 Thread Anders Blomdell

When RTDM is exposed to code like this:

  device1 = rt_dev_open(some_device, O_RDWR);
  device2 = rt_dev_open(some_device, O_RDWR);

I get a SEGFAULT, which I attribute to a missing assignment to context_ptr in 
the case when the device is already busy, the lack of this assignment leads to a 
segfault in cleanup_instance.



--- xenomai-2.1-rc2/ksrc/skins/rtdm/core.c~ 2006-01-07 18:08:34.0 
+0100
+++ xenomai-2.1-rc2/ksrc/skins/rtdm/core.c  2006-01-27 11:14:43.0 
+0100
@@ -136,6 +136,8 @@

 if (context-device) {
 xnlock_put_irqrestore(rt_dev_lock, s);
+
+*context_ptr = NULL;
 return -EBUSY;
 }
 context-device = device;




[Xenomai-core] [BUG] Missing DESTDIR?

2006-01-26 Thread Anders Blomdell

in a lot of the Makefile.in files in 2.1-rc2 there are lines like:

  test -z $(somedir) || $(mkdir_p) $(DESTDIR)$(somedir)

shouldn't they read:

  test -z $(DESTDIR)$(somedir) || $(mkdir_p) $(DESTDIR)$(somedir)


Best regards

Anders Blomdell


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core