Re: [Xenomai-core] Analogy/mite
On 12/08/2011 05:19 PM, Anders Blomdell wrote: On 12/07/2011 08:58 AM, Anders Blomdell wrote: On 12/06/2011 11:47 PM, Alexis Berlemont wrote: Hi On Thu, Dec 1, 2011 at 4:03 PM, Anders Blomdell anders.blomd...@control.lth.se wrote: On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... OK seems like we have some interrupt cleanup problem, the following command sequence: OK thank you for the report. I did not have time to look at it yet but that will be done soon. Is it blocking for you? Yes, and even worse is this problem: # /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 # /usr/local/sbin/analogy_config -r analogy0 # cat /proc/xenomai/irq Killed I was looking into it last week, but is a workshop since monday, will get back at this tomorrow. Seems like somebody is stomping out dev-transfer.irq_desc.rtdm_desc.flags between attach and detach (flags and all fields in its vicinity is zeroed out), hence the interrupt is never removed from the interrupt handler tables wreaking havoc with the entire kernel. Found the guilty party: a4l_cleanup_transfer, which zeroes out all the interrupt data, just before the interrupt should be detached. Somebody is being overzealous about keeping memory shiningly clean. We need to keep the useful dirt. --- xenomai-2.6.0/ksrc/drivers/analogy/transfer.c.orig 2011-12-09 11:22:06.961999598 +0100 +++ xenomai-2.6.0/ksrc/drivers/analogy/transfer.c 2011-12-09 11:22:29.723999243 +0100 @@ -92,8 +92,6 @@ rtdm_free(tsf-subds); } - memset(tsf, 0, sizeof(a4l_trf_t)); - return 0; } /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Analogy/mite
On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... Well, took quite some time to track down the 'analogy_config -r' bug (which was responsible for the kernel OOPS [i.e. after fixing it I have not got any OOPSes]). So back to the original problem, does anybody foresee that a call to a4l_ioctl_devcfg(cxt, NULL) from the mite driver would give any problems (apart from getting the context pointer from the data structures the mite driver has handy)? It is probably not kosher to do ioctl on a driver that is not open, but... /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Analogy/mite
On 12/09/2011 02:35 PM, Anders Blomdell wrote: On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... Well, took quite some time to track down the 'analogy_config -r' bug (which was responsible for the kernel OOPS [i.e. after fixing it I have not got any OOPSes]). So back to the original problem, does anybody foresee that a call to a4l_ioctl_devcfg(cxt, NULL) from the mite driver would give any problems (apart from getting the context pointer from the data structures the mite driver has handy)? It is probably not kosher to do ioctl on a driver that is not open, but... Attached is a hack (as can be gleaned from the EXPORT_SYMBOL_GPL), if the basic assumption that it's ok to do a a4l_ioctl_devcfg(...) during unbind, I could rewrite the logic to pass down a pointer to a4l_ioctl_devcfg to avoid this. /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden diff -ur xenomai-2.6.0.orig/include/analogy/device.h xenomai-2.6.0/include/analogy/device.h --- xenomai-2.6.0.orig/include/analogy/device.h 2011-12-09 16:37:46.777999756 +0100 +++ xenomai-2.6.0/include/analogy/device.h 2011-12-09 16:41:18.660003797 +0100 @@ -43,9 +43,9 @@ /* Device specific flags */ unsigned long flags; - /* Driver assigned to this device thanks to attaching - procedure */ + /* Fields assigned to this device in attaching procedure */ a4l_drv_t *driver; + a4l_cxt_t *cxt; /* Hidden description stuff */ struct list_head subdvsq; diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/device.c xenomai-2.6.0/ksrc/drivers/analogy/device.c --- xenomai-2.6.0.orig/ksrc/drivers/analogy/device.c 2011-12-09 16:37:48.497999755 +0100 +++ xenomai-2.6.0/ksrc/drivers/analogy/device.c 2011-12-09 16:42:23.163001790 +0100 @@ -291,6 +291,7 @@ a4l_dev_t *dev = a4l_get_dev(cxt); dev-driver = drv; + dev-cxt = cxt; if (drv-privdata_size == 0) __a4l_dbg(1, core_dbg, @@ -331,6 +332,7 @@ if (ret != 0 dev-priv != NULL) { rtdm_free(dev-priv); dev-driver = NULL; + dev-cxt = NULL; } return ret; @@ -360,6 +362,7 @@ /* Free the private field */ rtdm_free(dev-priv); dev-driver = NULL; + dev-cxt = NULL; out_release_driver: return ret; @@ -455,6 +458,7 @@ return ret; } +EXPORT_SYMBOL_GPL(a4l_ioctl_devcfg); int a4l_ioctl_devinfo(a4l_cxt_t * cxt, void *arg) { diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.c xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.c --- xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.c 2011-12-09 16:37:48.49755 +0100 +++ xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.c 2011-12-09 16:43:04.147002142 +0100 @@ -103,6 +103,9 @@ list_entry(this, struct mite_struct, list); if(mite-pcidev == dev) { + if (mite-a4ldev) { +a4l_ioctl_devcfg(mite-a4ldev-cxt, NULL); + } list_del(this); kfree(mite); break; @@ -117,7 +120,8 @@ .remove = mite_remove, }; -int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1) +int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1, + struct a4l_device *a4ldev) { unsigned long length; resource_size_t addr; @@ -232,6 +236,7 @@ } mite-used = 1; + mite-a4ldev = a4ldev; return 0; } @@ -255,6 +260,7 @@ pci_release_regions( mite-pcidev ); mite-used = 0; + mite-a4ldev = NULL; } void a4l_mite_list_devices(void) diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.h xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.h --- xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/mite.h 2011-12-09 16:37:48.49755 +0100 +++ xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/mite.h 2011-12-09 16:38:33.976999742 +0100 @@ -70,6 +70,7 @@ void *mite_io_addr; resource_size_t daq_phys_addr; void *daq_io_addr; + struct a4l_device *a4ldev; }; static inline @@ -115,7 +116,8 @@ return mite-pcidev-device; }; -int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1); +int a4l_mite_setup(struct mite_struct *mite, int use_iodwbsr_1, + struct a4l_device *a4ldev); void a4l_mite_unsetup(struct mite_struct *mite); void a4l_mite_list_devices(void); struct mite_struct * a4l_mite_find_device(int bus, diff -ur xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/ni_660x.c xenomai-2.6.0/ksrc/drivers/analogy/national_instruments/ni_660x.c --- xenomai-2.6.0.orig/ksrc/drivers/analogy/national_instruments/ni_660x.c 2011-12-09 16:37:48.500999755 +0100 +++ xenomai-2.6.0/ksrc/drivers/analogy/national_instruments
Re: [Xenomai-core] Analogy/mite
On 12/09/2011 04:51 PM, Anders Blomdell wrote: On 12/09/2011 02:35 PM, Anders Blomdell wrote: On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... Well, took quite some time to track down the 'analogy_config -r' bug (which was responsible for the kernel OOPS [i.e. after fixing it I have not got any OOPSes]). So back to the original problem, does anybody foresee that a call to a4l_ioctl_devcfg(cxt, NULL) from the mite driver would give any problems (apart from getting the context pointer from the data structures the mite driver has handy)? It is probably not kosher to do ioctl on a driver that is not open, but... Attached is a hack (as can be gleaned from the EXPORT_SYMBOL_GPL), if the basic assumption that it's ok to do a a4l_ioctl_devcfg(...) during unbind, I could rewrite the logic to pass down a pointer to a4l_ioctl_devcfg to avoid this. Sloppy me, should of course be: + if (mite-a4ldev) { if (mite-a4ldev mite-a4ldev-cxt) { + a4l_ioctl_devcfg(mite-a4ldev-cxt, NULL); + } ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Analogy/mite
On 12/07/2011 08:58 AM, Anders Blomdell wrote: On 12/06/2011 11:47 PM, Alexis Berlemont wrote: Hi On Thu, Dec 1, 2011 at 4:03 PM, Anders Blomdell anders.blomd...@control.lth.se wrote: On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... OK seems like we have some interrupt cleanup problem, the following command sequence: OK thank you for the report. I did not have time to look at it yet but that will be done soon. Is it blocking for you? Yes, and even worse is this problem: # /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 # /usr/local/sbin/analogy_config -r analogy0 # cat /proc/xenomai/irq Killed I was looking into it last week, but is a workshop since monday, will get back at this tomorrow. Seems like somebody is stomping out dev-transfer.irq_desc.rtdm_desc.flags between attach and detach (flags and all fields in its vicinity is zeroed out), hence the interrupt is never removed from the interrupt handler tables wreaking havoc with the entire kernel. Alexis. modprobe xeno_native modprobe analogy_ni_pcimio sleep 1 /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 /usr/local/sbin/analogy_config -r analogy0 rmmod analogy_ni_pcimio rmmod analogy_ni_mio rmmod analogy_ni_tio rmmod analogy_8255 rmmod analogy_ni_mite rmmod xeno_analogy sleep 2 modprobe xeno_native modprobe analogy_ni_pcimio sleep 1 /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 Gives: [ 412.623639] Analogy: MITE: Available NI device IDs: 0x70af [ 413.648335] Analogy: analogy_ni_pcimio: pcimio_attach: found pci-6221 board [ 413.676105] Analogy: analogy_ni_pcimio: pcimio_attach: found irq 22 [ 413.682385] BUG: unable to handle kernel paging request at f8bc4bf4 [ 413.683367] IP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus] [ 413.683367] *pdpt = 00aca001 *pde = 31ca5067 *pte = [ 413.683367] Oops: [#1] SMP [ 413.683367] last sysfs file: /sys/bus/pci/drivers/analogy_mite/uevent [ 413.683367] Modules linked in: analogy_ni_pcimio analogy_ni_mio analogy_ni_tio analogy_8255 analogy_ni_mite xeno_analogy xeno_native nfs fscache snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore rt_e1000 rt_e1000_new rtnet xeno_rtdm nfsd lockd nfs_acl auth_rpcgss xeno_nucleus snd_page_alloc ppdev iTCO_wdt iTCO_vendor_support microcode sunrpc exportfs i2c_i801 pcspkr serio_raw e1000e parport_pc parport uinput ipv6 firewire_ohci firewire_core ata_generic pata_acpi crc_itu_t pata_marvell i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: xeno_analogy] [ 413.683367] [ 413.683367] Pid: 1579, comm: analogy_config Not tainted 2.6.38.8.xenomai.2.6.0.rtnet.26db745.2030.1211 #1 /DG965SS [ 413.683367] EIP: 0060:[f8846efe] EFLAGS: 00010286 CPU: 1 [ 413.683367] EIP is at xnintr_attach+0x6e/0xfe [xeno_nucleus] [ 413.683367] EAX: f8bc4be4 EBX: f87d2be4 ECX: 0001 EDX: 0003 [ 413.683367] ESI: f885b840 EDI: fff0 EBP: f169ddf4 ESP: f169dde0 [ 413.683367] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 413.683367] Process analogy_config (pid: 1579, ti=f169c000 task=f40925e0 task.ti=f169c000) [ 413.683367] I-pipe domain Linux [ 413.683367] Stack: [ 413.683367] 205bde08 0001 f87d2be4 0001 f169de10 f89a0c91 f87cea28 [ 413.683367] 0001 f87d2bd8 f169de28 f87ceb64 0001 f87d134f [ 413.683367] f87d2bd8 f87d2bb8 f169de44 f87cf727 0001 f87d2bb8 0016 f87d2bb8 [ 413.683367] Call Trace: [ 413.683367] [f89a0c91] rtdm_irq_request+0x37/0x5a [xeno_rtdm] [ 413.683367] [f87cea28] ? a4l_handle_irq+0x0/0x1f [xeno_analogy] [ 413.683367] [f87ceb64] __a4l_request_irq+0x38/0x3e [xeno_analogy] [ 413.683367] [f87cf727] a4l_request_irq+0x67/0xad [xeno_analogy] [ 413.683367] [f86b1593] pcimio_attach+0x4e0/0x53e [analogy_ni_pcimio] [ 413.683367] [f87cde93] a4l_assign_driver+0x73/0x100 [xeno_analogy] [ 413.683367] [f87cdfd9] a4l_device_attach+0x59/0x6e [xeno_analogy] [ 413.683367] [f87ce0d7] a4l_ioctl_devcfg+0xbd/0xf6 [xeno_analogy] [ 413.683367] [f87cf943] a4l_ioctl+0x1e/0x20 [xeno_analogy] [ 413.683367] [f899fa5a] __rt_dev_ioctl+0x4d/0x104 [xeno_rtdm] [ 413.683367] [c07c35b6] ? do_page_fault+0x2f7/0x322 [ 413.683367] [f89a1a85] sys_rtdm_ioctl+0x2e/0x30 [xeno_rtdm] [ 413.683367] [f8851414] losyscall_event+0xb1/0x174 [xeno_nucleus] [ 413.683367] [c04887ab] __ipipe_dispatch_event+0xcb/0x17a [ 413.683367] [f8851363] ? losyscall_event+0x0/0x174 [xeno_nucleus] [ 413.683367] [c0415b32] __ipipe_syscall_root+0x50/0xc9 [ 413.683367] [c07c0a21] system_call+0x2d/0x53 [ 413.683367] Code: 00 e8 73 ff ff ff 8b 4b 10 f7 c1 00 00 01 00 89 45 f0 0f 85 92 00 00 00 8b 73 14 c1 e6 06 81 c6 c0 b2 85 f8 8b 46 24 85 c0 74 25 8b
Re: [Xenomai-core] Analogy/mite
On 12/06/2011 11:47 PM, Alexis Berlemont wrote: Hi On Thu, Dec 1, 2011 at 4:03 PM, Anders Blomdell anders.blomd...@control.lth.se wrote: On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... OK seems like we have some interrupt cleanup problem, the following command sequence: OK thank you for the report. I did not have time to look at it yet but that will be done soon. Is it blocking for you? Yes, and even worse is this problem: # /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 # /usr/local/sbin/analogy_config -r analogy0 # cat /proc/xenomai/irq Killed I was looking into it last week, but is a workshop since monday, will get back at this tomorrow. Alexis. modprobe xeno_native modprobe analogy_ni_pcimio sleep 1 /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 /usr/local/sbin/analogy_config -r analogy0 rmmod analogy_ni_pcimio rmmod analogy_ni_mio rmmod analogy_ni_tio rmmod analogy_8255 rmmod analogy_ni_mite rmmod xeno_analogy sleep 2 modprobe xeno_native modprobe analogy_ni_pcimio sleep 1 /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 Gives: [ 412.623639] Analogy: MITE: Available NI device IDs: 0x70af [ 413.648335] Analogy: analogy_ni_pcimio: pcimio_attach: found pci-6221 board [ 413.676105] Analogy: analogy_ni_pcimio: pcimio_attach: found irq 22 [ 413.682385] BUG: unable to handle kernel paging request at f8bc4bf4 [ 413.683367] IP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus] [ 413.683367] *pdpt = 00aca001 *pde = 31ca5067 *pte = [ 413.683367] Oops: [#1] SMP [ 413.683367] last sysfs file: /sys/bus/pci/drivers/analogy_mite/uevent [ 413.683367] Modules linked in: analogy_ni_pcimio analogy_ni_mio analogy_ni_tio analogy_8255 analogy_ni_mite xeno_analogy xeno_native nfs fscache snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore rt_e1000 rt_e1000_new rtnet xeno_rtdm nfsd lockd nfs_acl auth_rpcgss xeno_nucleus snd_page_alloc ppdev iTCO_wdt iTCO_vendor_support microcode sunrpc exportfs i2c_i801 pcspkr serio_raw e1000e parport_pc parport uinput ipv6 firewire_ohci firewire_core ata_generic pata_acpi crc_itu_t pata_marvell i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: xeno_analogy] [ 413.683367] [ 413.683367] Pid: 1579, comm: analogy_config Not tainted 2.6.38.8.xenomai.2.6.0.rtnet.26db745.2030.1211 #1 /DG965SS [ 413.683367] EIP: 0060:[f8846efe] EFLAGS: 00010286 CPU: 1 [ 413.683367] EIP is at xnintr_attach+0x6e/0xfe [xeno_nucleus] [ 413.683367] EAX: f8bc4be4 EBX: f87d2be4 ECX: 0001 EDX: 0003 [ 413.683367] ESI: f885b840 EDI: fff0 EBP: f169ddf4 ESP: f169dde0 [ 413.683367] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 413.683367] Process analogy_config (pid: 1579, ti=f169c000 task=f40925e0 task.ti=f169c000) [ 413.683367] I-pipe domain Linux [ 413.683367] Stack: [ 413.683367] 205bde08 0001 f87d2be4 0001 f169de10 f89a0c91 f87cea28 [ 413.683367] 0001 f87d2bd8 f169de28 f87ceb64 0001 f87d134f [ 413.683367] f87d2bd8 f87d2bb8 f169de44 f87cf727 0001 f87d2bb8 0016 f87d2bb8 [ 413.683367] Call Trace: [ 413.683367] [f89a0c91] rtdm_irq_request+0x37/0x5a [xeno_rtdm] [ 413.683367] [f87cea28] ? a4l_handle_irq+0x0/0x1f [xeno_analogy] [ 413.683367] [f87ceb64] __a4l_request_irq+0x38/0x3e [xeno_analogy] [ 413.683367] [f87cf727] a4l_request_irq+0x67/0xad [xeno_analogy] [ 413.683367] [f86b1593] pcimio_attach+0x4e0/0x53e [analogy_ni_pcimio] [ 413.683367] [f87cde93] a4l_assign_driver+0x73/0x100 [xeno_analogy] [ 413.683367] [f87cdfd9] a4l_device_attach+0x59/0x6e [xeno_analogy] [ 413.683367] [f87ce0d7] a4l_ioctl_devcfg+0xbd/0xf6 [xeno_analogy] [ 413.683367] [f87cf943] a4l_ioctl+0x1e/0x20 [xeno_analogy] [ 413.683367] [f899fa5a] __rt_dev_ioctl+0x4d/0x104 [xeno_rtdm] [ 413.683367] [c07c35b6] ? do_page_fault+0x2f7/0x322 [ 413.683367] [f89a1a85] sys_rtdm_ioctl+0x2e/0x30 [xeno_rtdm] [ 413.683367] [f8851414] losyscall_event+0xb1/0x174 [xeno_nucleus] [ 413.683367] [c04887ab] __ipipe_dispatch_event+0xcb/0x17a [ 413.683367] [f8851363] ? losyscall_event+0x0/0x174 [xeno_nucleus] [ 413.683367] [c0415b32] __ipipe_syscall_root+0x50/0xc9 [ 413.683367] [c07c0a21] system_call+0x2d/0x53 [ 413.683367] Code: 00 e8 73 ff ff ff 8b 4b 10 f7 c1 00 00 01 00 89 45 f0 0f 85 92 00 00 00 8b 73 14 c1 e6 06 81 c6 c0 b2 85 f8 8b 46 24 85 c0 74 25 8b 50 10 89 ce 21 d6 83 e6 01 74 73 8b 73 18 39 70 18 75 6b 31 [ 413.683367] EIP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus] SS:ESP 0068:f169dde0 [ 413.683367] CR2: f8bc4bf4 /Anders -- Anders Blomdell Email
Re: [Xenomai-core] Analogy/mite
On 11/30/2011 07:03 PM, Anders Blomdell wrote: Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... OK seems like we have some interrupt cleanup problem, the following command sequence: modprobe xeno_native modprobe analogy_ni_pcimio sleep 1 /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 /usr/local/sbin/analogy_config -r analogy0 rmmod analogy_ni_pcimio rmmod analogy_ni_mio rmmod analogy_ni_tio rmmod analogy_8255 rmmod analogy_ni_mite rmmod xeno_analogy sleep 2 modprobe xeno_native modprobe analogy_ni_pcimio sleep 1 /usr/local/sbin/analogy_config analogy0 analogy_ni_pcimio 6,1 Gives: [ 412.623639] Analogy: MITE: Available NI device IDs: 0x70af [ 413.648335] Analogy: analogy_ni_pcimio: pcimio_attach: found pci-6221 board [ 413.676105] Analogy: analogy_ni_pcimio: pcimio_attach: found irq 22 [ 413.682385] BUG: unable to handle kernel paging request at f8bc4bf4 [ 413.683367] IP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus] [ 413.683367] *pdpt = 00aca001 *pde = 31ca5067 *pte = [ 413.683367] Oops: [#1] SMP [ 413.683367] last sysfs file: /sys/bus/pci/drivers/analogy_mite/uevent [ 413.683367] Modules linked in: analogy_ni_pcimio analogy_ni_mio analogy_ni_tio analogy_8255 analogy_ni_mite xeno_analogy xeno_native nfs fscache snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore rt_e1000 rt_e1000_new rtnet xeno_rtdm nfsd lockd nfs_acl auth_rpcgss xeno_nucleus snd_page_alloc ppdev iTCO_wdt iTCO_vendor_support microcode sunrpc exportfs i2c_i801 pcspkr serio_raw e1000e parport_pc parport uinput ipv6 firewire_ohci firewire_core ata_generic pata_acpi crc_itu_t pata_marvell i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: xeno_analogy] [ 413.683367] [ 413.683367] Pid: 1579, comm: analogy_config Not tainted 2.6.38.8.xenomai.2.6.0.rtnet.26db745.2030.1211 #1 /DG965SS [ 413.683367] EIP: 0060:[f8846efe] EFLAGS: 00010286 CPU: 1 [ 413.683367] EIP is at xnintr_attach+0x6e/0xfe [xeno_nucleus] [ 413.683367] EAX: f8bc4be4 EBX: f87d2be4 ECX: 0001 EDX: 0003 [ 413.683367] ESI: f885b840 EDI: fff0 EBP: f169ddf4 ESP: f169dde0 [ 413.683367] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 413.683367] Process analogy_config (pid: 1579, ti=f169c000 task=f40925e0 task.ti=f169c000) [ 413.683367] I-pipe domain Linux [ 413.683367] Stack: [ 413.683367] 205bde08 0001 f87d2be4 0001 f169de10 f89a0c91 f87cea28 [ 413.683367] 0001 f87d2bd8 f169de28 f87ceb64 0001 f87d134f [ 413.683367] f87d2bd8 f87d2bb8 f169de44 f87cf727 0001 f87d2bb8 0016 f87d2bb8 [ 413.683367] Call Trace: [ 413.683367] [f89a0c91] rtdm_irq_request+0x37/0x5a [xeno_rtdm] [ 413.683367] [f87cea28] ? a4l_handle_irq+0x0/0x1f [xeno_analogy] [ 413.683367] [f87ceb64] __a4l_request_irq+0x38/0x3e [xeno_analogy] [ 413.683367] [f87cf727] a4l_request_irq+0x67/0xad [xeno_analogy] [ 413.683367] [f86b1593] pcimio_attach+0x4e0/0x53e [analogy_ni_pcimio] [ 413.683367] [f87cde93] a4l_assign_driver+0x73/0x100 [xeno_analogy] [ 413.683367] [f87cdfd9] a4l_device_attach+0x59/0x6e [xeno_analogy] [ 413.683367] [f87ce0d7] a4l_ioctl_devcfg+0xbd/0xf6 [xeno_analogy] [ 413.683367] [f87cf943] a4l_ioctl+0x1e/0x20 [xeno_analogy] [ 413.683367] [f899fa5a] __rt_dev_ioctl+0x4d/0x104 [xeno_rtdm] [ 413.683367] [c07c35b6] ? do_page_fault+0x2f7/0x322 [ 413.683367] [f89a1a85] sys_rtdm_ioctl+0x2e/0x30 [xeno_rtdm] [ 413.683367] [f8851414] losyscall_event+0xb1/0x174 [xeno_nucleus] [ 413.683367] [c04887ab] __ipipe_dispatch_event+0xcb/0x17a [ 413.683367] [f8851363] ? losyscall_event+0x0/0x174 [xeno_nucleus] [ 413.683367] [c0415b32] __ipipe_syscall_root+0x50/0xc9 [ 413.683367] [c07c0a21] system_call+0x2d/0x53 [ 413.683367] Code: 00 e8 73 ff ff ff 8b 4b 10 f7 c1 00 00 01 00 89 45 f0 0f 85 92 00 00 00 8b 73 14 c1 e6 06 81 c6 c0 b2 85 f8 8b 46 24 85 c0 74 25 8b 50 10 89 ce 21 d6 83 e6 01 74 73 8b 73 18 39 70 18 75 6b 31 [ 413.683367] EIP: [f8846efe] xnintr_attach+0x6e/0xfe [xeno_nucleus] SS:ESP 0068:f169dde0 [ 413.683367] CR2: f8bc4bf4 /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Analogy/mite
Hi, just found that echo :06:01.0 /sys/bus/pci/drivers/analogy_mite/unbind does not do the same thing as analogy_config -r analogyN in fact it leaves the system in a state where using the driver results in a kernel OOPS. Will try to look into it further tomorrow... /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Problems with gcc 4.6.0 (rt_task_shadow fails with ENOSYS)
On 07/08/2011 02:41 PM, Gilles Chanteperdrix wrote: On 07/07/2011 11:47 PM, Anders Blomdell wrote: When compiling kernel 2.6.37.3 and xenomai 2.5.6 with gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC), programs fail with -ENOSYS in rt_task_shadow. If compiled with gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) everything works as expected. Could you send us the disassembly of the two functions? Which functions? Print[fk] debugging got me to suspect the syscall/skin_mux interface, but I'm a bit at loss of exactly where the code ends up. Regards Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Problems with gcc 4.6.0 (rt_task_shadow fails with ENOSYS)
On 07/08/2011 04:44 PM, Gilles Chanteperdrix wrote: On 07/08/2011 04:06 PM, Anders Blomdell wrote: On 07/08/2011 02:41 PM, Gilles Chanteperdrix wrote: On 07/07/2011 11:47 PM, Anders Blomdell wrote: When compiling kernel 2.6.37.3 and xenomai 2.5.6 with gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC), programs fail with -ENOSYS in rt_task_shadow. If compiled with gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) everything works as expected. Could you send us the disassembly of the two functions? Which functions? Print[fk] debugging got me to suspect the syscall/skin_mux interface, but I'm a bit at loss of exactly where the code ends up. The two rt_task_shadow, the one which works, and the one which does not. Ok, attached the two routines taken from respective libnative.so.3 Will try to recompile with gcc-4.6.1 as well. /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden 83a0 rt_task_shadow: 83a0: 55 push %ebp 83a1: 57 push %edi 83a2: 56 push %esi 83a3: 53 push %ebx 83a4: e8 c0 a1 ff ff call 2569 __i686.get_pc_thunk.bx 83a9: 81 c3 93 56 00 00 add$0x5693,%ebx 83af: 81 ec ac 08 00 00 sub$0x8ac,%esp 83b5: 8b b4 24 c0 08 00 00mov0x8c0(%esp),%esi 83bc: e8 3b 9d ff ff call 20fc xeno_fault_stack@plt 83c1: 85 f6 test %esi,%esi 83c3: 8d 84 24 90 08 00 00lea0x890(%esp),%eax 83ca: c7 44 24 04 00 00 00movl $0x0,0x4(%esp) 83d1: 00 83d2: 0f 44 f0cmove %eax,%esi 83d5: c7 04 24 01 00 00 00movl $0x1,(%esp) 83dc: e8 fb 9e ff ff call 22dc pthread_setcanceltype@plt 83e1: e8 c6 9e ff ff call 22ac xeno_sigshadow_install_once@plt 83e6: 8b 84 24 c4 08 00 00mov0x8c4(%esp),%eax 83ed: 89 b4 24 78 08 00 00mov%esi,0x878(%esp) 83f4: 89 84 24 7c 08 00 00mov%eax,0x87c(%esp) 83fb: 8b 84 24 c8 08 00 00mov0x8c8(%esp),%eax 8402: 89 84 24 80 08 00 00mov%eax,0x880(%esp) 8409: 8b 84 24 cc 08 00 00mov0x8cc(%esp),%eax 8410: 89 84 24 84 08 00 00mov%eax,0x884(%esp) 8417: e8 a0 9e ff ff call 22bc pthread_self@plt 841c: 89 84 24 88 08 00 00mov%eax,0x888(%esp) 8423: e8 34 9d ff ff call 215c xeno_init_current_mode@plt 8428: b9 f4 ff ff ff mov$0xfff4,%ecx 842d: 85 c0 test %eax,%eax 842f: 89 84 24 8c 08 00 00mov%eax,0x88c(%esp) 8436: 0f 84 bd 00 00 00 je 84f9 rt_task_shadow+0x159 843c: 8d 83 00 aa ff ff lea-0x5600(%ebx),%eax 8442: 89 84 24 74 08 00 00mov%eax,0x874(%esp) 8449: 89 ac 24 70 08 00 00mov%ebp,0x870(%esp) 8450: 8b bb e8 ff ff ff mov-0x18(%ebx),%edi 8456: 8d 84 24 70 08 00 00lea0x870(%esp),%eax 845d: 89 84 24 98 08 00 00mov%eax,0x898(%esp) 8464: 8d ac 24 78 08 00 00lea0x878(%esp),%ebp 846b: 90 nop 846c: 8d 74 26 00 lea0x0(%esi,%eiz,1),%esi 8470: 8b 07 mov(%edi),%eax 8472: 31 c9 xor%ecx,%ecx 8474: c7 44 24 28 00 00 00movl $0x0,0x28(%esp) 847b: 00 847c: 0d 2b 02 00 00 or $0x22b,%eax 8481: 89 84 24 9c 08 00 00mov%eax,0x89c(%esp) 8488: 89 e8 mov%ebp,%eax 848a: 53 push %ebx 848b: 89 c3 mov%eax,%ebx 848d: 8b 84 24 9c 08 00 00mov0x89c(%esp),%eax 8494: 55 push %ebp 8495: 8b ac 24 98 08 00 00mov0x898(%esp),%ebp 849c: cd 80 int$0x80 849e: 5d pop%ebp 849f: 5b pop%ebx 84a0: 89 c1 mov%eax,%ecx 84a2: 8b 44 24 28 mov0x28(%esp),%eax 84a6: 85 c0 test %eax,%eax 84a8: 74 1a je 84c4 rt_task_shadow+0x124 84aa: 8d 44 24 28 lea0x28(%esp),%eax 84ae: 89 4c 24 08 mov%ecx,0x8(%esp) 84b2: c7 44 24 04 ab ff ffmovl $0xffab,0x4(%esp) 84b9: ff 84ba: 89 04 24mov%eax,(%esp) 84bd: e8 aa
[Xenomai-core] Problems with gcc 4.6.0 (rt_task_shadow fails with ENOSYS)
When compiling kernel 2.6.37.3 and xenomai 2.5.6 with gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC), programs fail with -ENOSYS in rt_task_shadow. If compiled with gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) everything works as expected. Regards Anders Blomdell -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Duplicate symbols in analogy
On 2011-03-14 20.29, Anders Blomdell wrote: I think it would make sense to change the name conflicts between analogy and comedi (range_unknown is one of them), to make it possible to have comedi and analogy to coexist on the same machine, anybody in support of this? Anybody against then? IMHO it's a bad idea to have name conflicts with drivers in the kernel (even if they are still in the saging area). What prefix should I add to all modified exported symbols, would this make sense (a4ld == Analogy for Linux Driver): mite_unsetup - a4ld_mite_unsetup etc... Regards Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] NI analog card shows wrong IRQ number after reboot
Which is due to the fact that pci_enable_device (mite.c) is called at mite_setup instead of mite_init. The bad thing with this, is that interrupt conflicts can only be found AFTER the driver has been started with analogy_config, which is often too late (since interrupt conflicts will bring down the machine). Would it be a good idea to pci_enable_device in mite_init as well, or will that break something else? Regards Anders Blomdell -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] NI analog card shows wrong IRQ number after reboot
On 2011-03-14 19.33, Anders Blomdell wrote: Which is due to the fact that pci_enable_device (mite.c) is called at mite_setup instead of mite_init. The bad thing with this, is that interrupt conflicts can only be found AFTER the driver has been started with analogy_config, which is often too late (since interrupt conflicts will bring down the machine). Would it be a good idea to pci_enable_device in mite_init as well, or will that break something else? Many other kernel driver seems to call pci_enable_device from the probe function, and this does give the card it's proper IRQ: --- ksrc/drivers/analogy/national_instruments/mite.c.orig 2011-02-16 15:26:01.0 +0100 +++ ksrc/drivers/analogy/national_instruments/mite.c2011-03-14 19:38:18.572674136 +0100 @@ -80,6 +80,7 @@ } list_add(mite-list, mite_devices); + pci_enable_device(mite-pcidev); return 0; } Regards Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Duplicate symbols in analogy
I think it would make sense to change the name conflicts between analogy and comedi (range_unknown is one of them), to make it possible to have comedi and analogy to coexist on the same machine, anybody in support of this? Regards Anders Blomdell -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Problem with gcc-4.5.1
On 2010-12-08 09.50, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 2010-12-07 21.21, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the attached kernel OOPS, a notable difference between the 4.5.1 compiled version and a working one built with gcc-4.4.4 on the same system with the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not inlined, is this anybody has seen before? No, that is new, we need to see the disassembly of __rthal_x86_nodiv_ullimd objdump -S: static inline __attribute__((const)) unsigned long long __rthal_x86_nodiv_ullimd(const unsigned long long op, const unsigned long long frac, unsigned integ) { e7a8:55 push %ebp e7a9:89 e5 mov%esp,%ebp e7ab:57 push %edi e7ac:56 push %esi e7ad:53 push %ebx e7ae:83 ec 10sub$0x10,%esp e7b1:8d 7d 08lea0x8(%ebp),%edi e7b4:e8 fc ff ff ff call e7b5__rthal_x86_nodiv_ullimd+0xd e7b9:8b 1f mov(%edi),%ebx e7bb:8b 4f 04mov0x4(%edi),%ecx register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7be:89 45 e8mov%eax,-0x18(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c1:89 5d f0mov%ebx,-0x10(%ebp) register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7c4:89 55 e4mov%edx,-0x1c(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c7:89 4d ecmov%ecx,-0x14(%ebp) __asm__ (mov %[oph], %%eax\n\t e7ca:8b 45 e4mov-0x1c(%ebp),%eax e7cd:f7 65 ecmull -0x14(%ebp) e7d0:89 c6 mov%eax,%esi e7d2:89 d7 mov%edx,%edi e7d4:8b 45 e8mov-0x18(%ebp),%eax e7d7:f7 65 f0mull -0x10(%ebp) e7da:89 d1 mov%edx,%ecx e7dc:d1 e0 shl%eax e7de:83 d1 00adc$0x0,%ecx e7e1:83 d6 00adc$0x0,%esi e7e4:83 d7 00adc$0x0,%edi e7e7:8b 45 e4mov-0x1c(%ebp),%eax e7ea:f7 65 f0mull -0x10(%ebp) e7ed:01 c1 add%eax,%ecx e7ef:11 d6 adc%edx,%esi e7f1:83 d7 00adc$0x0,%edi e7f4:8b 45 e8mov-0x18(%ebp),%eax e7f7:f7 65 ecmull -0x14(%ebp) e7fa:01 c1 add%eax,%ecx e7fc:11 d6 adc%edx,%esi e7fe:83 d7 00adc$0x0,%edi e801:8b 45 e8mov-0x18(%ebp),%eax e804:f7 67 08mull 0x8(%edi) Problem is here: edi is used by gcc as if it contained an address whereas it is used by the assembly for the computation. Should be marked early clobber. So, in include/asm-x86/arith_32.h, replace: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) with: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) No cigar (:-() Ok. Maybe we can try something less radical, such as: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) This is incorrect, but we can hope for the best... As previously said, changing the optimization from -Os to anything else for xeno_nucleus (see patch in mail dated 'Tue, 07 Dec 2010 17:20:37 +0100'), solved that issue (incorrect code + hope for the best - spurious disasters). Rather compile time errors than runtime errors. We are not going to decide instead of the user what optimization level to use, if he wants to use -Os, we have to make it work for -Os. If this one does not work, we have other things to try. Then start with something that you belive is correct, I *WILL NOT* test something which you think is incorrect. /Anders -- Anders
[Xenomai-core] Problem with gcc-4.5.1
When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the attached kernel OOPS, a notable difference between the 4.5.1 compiled version and a working one built with gcc-4.4.4 on the same system with the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not inlined, is this anybody has seen before? Regards Anders Blomdell -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden BUG: unable to handle kernel NULL pointer dereference at 0008 IP: [fbf25804] __rthal_x86_nodiv_ullimd+0x5c/0x74 [xeno_nucleus] *pdpt = 01d91001 *pde = Oops: [#1] SMP last sysfs file: /sys/module/microcode/initstate Modules linked in: xeno_nucleus(+) e1000 snd_timer snd e1000e soundcore iTCO_wdt i2c_i801 serio_raw iTCO_vendor_support snd_page_alloc microcode(+) pcspkr pata_acpi firewire_ohci ata_generic firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core Pid: 519, comm: modprobe Not tainted 2.6.35.7_xenomai-2.5.5.2_rtnet-39f7fcf #1 DP35DP/ EIP: 0060:[fbf25804] EFLAGS: 00010246 CPU: 0 EIP is at __rthal_x86_nodiv_ullimd+0x5c/0x74 [xeno_nucleus] EAX: EBX: b36c048c ECX: EDX: ESI: EDI: EBP: c1ef1f34 ESP: c1ef1f18 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process modprobe (pid: 519, ti=c1ef task=c1e04080 task.ti=c1ef) I-pipe domain Linux Stack: 665d7cba b36c048c b36c048c 665d7cba c1ef1f54 0 fbf25887 b36c048c 665d7cba 0002 0001 0004 665d7cba c1ef1f60 0 fbf25a3f c1ef1f84 fbcaf215 1194d800 0001 3b9aca00 Call Trace: [fbf25887] ? xnarch_ns_to_tsc+0x34/0x4a [xeno_nucleus] [fbf25a3f] ? xnarch_calibrate_sched+0x1a/0xf2 [xeno_nucleus] [fbcaf215] ? __xeno_sys_init+0x189/0x2fd [xeno_nucleus] [fbcaf08c] ? __xeno_sys_init+0x0/0x2fd [xeno_nucleus] [c0401263] ? do_one_initcall+0x62/0x16f [c046843c] ? sys_init_module+0x7f/0x19d [c040299d] ? sysenter_do_call+0x12/0x16 Code: f0 89 d1 d1 e0 83 d1 00 83 d6 00 83 d7 00 8b 45 e4 f7 65 f0 01 c1 11 d6 83 d7 00 8b 45 e8 f7 65 ec 01 c1 11 d6 83 d7 00 8b 45 e8 f7 67 08 01 f0 11 d7 8b 55 e4 0f af 57 08 01 fa 83 c4 10 5b 5e EIP: [fbf25804] __rthal_x86_nodiv_ullimd+0x5c/0x74 [xeno_nucleus] SS:ESP 0068:c1ef1f18 CR2: 0008 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Problem with gcc-4.5.1
On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the attached kernel OOPS, a notable difference between the 4.5.1 compiled version and a working one built with gcc-4.4.4 on the same system with the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not inlined, is this anybody has seen before? No, that is new, we need to see the disassembly of __rthal_x86_nodiv_ullimd objdump -S: static inline __attribute__((const)) unsigned long long __rthal_x86_nodiv_ullimd(const unsigned long long op, const unsigned long long frac, unsigned integ) { e7a8: 55 push %ebp e7a9: 89 e5 mov%esp,%ebp e7ab: 57 push %edi e7ac: 56 push %esi e7ad: 53 push %ebx e7ae: 83 ec 10sub$0x10,%esp e7b1: 8d 7d 08lea0x8(%ebp),%edi e7b4: e8 fc ff ff ff call e7b5 __rthal_x86_nodiv_ullimd+0xd e7b9: 8b 1f mov(%edi),%ebx e7bb: 8b 4f 04mov0x4(%edi),%ecx register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7be: 89 45 e8mov%eax,-0x18(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c1: 89 5d f0mov%ebx,-0x10(%ebp) register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7c4: 89 55 e4mov%edx,-0x1c(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c7: 89 4d ecmov%ecx,-0x14(%ebp) __asm__ (mov %[oph], %%eax\n\t e7ca: 8b 45 e4mov-0x1c(%ebp),%eax e7cd: f7 65 ecmull -0x14(%ebp) e7d0: 89 c6 mov%eax,%esi e7d2: 89 d7 mov%edx,%edi e7d4: 8b 45 e8mov-0x18(%ebp),%eax e7d7: f7 65 f0mull -0x10(%ebp) e7da: 89 d1 mov%edx,%ecx e7dc: d1 e0 shl%eax e7de: 83 d1 00adc$0x0,%ecx e7e1: 83 d6 00adc$0x0,%esi e7e4: 83 d7 00adc$0x0,%edi e7e7: 8b 45 e4mov-0x1c(%ebp),%eax e7ea: f7 65 f0mull -0x10(%ebp) e7ed: 01 c1 add%eax,%ecx e7ef: 11 d6 adc%edx,%esi e7f1: 83 d7 00adc$0x0,%edi e7f4: 8b 45 e8mov-0x18(%ebp),%eax e7f7: f7 65 ecmull -0x14(%ebp) e7fa: 01 c1 add%eax,%ecx e7fc: 11 d6 adc%edx,%esi e7fe: 83 d7 00adc$0x0,%edi e801: 8b 45 e8mov-0x18(%ebp),%eax e804: f7 67 08mull 0x8(%edi) e807: 01 f0 add%esi,%eax e809: 11 d7 adc%edx,%edi e80b: 8b 55 e4mov-0x1c(%ebp),%edx e80e: 0f af 57 08 imul 0x8(%edi),%edx e812: 01 fa add%edi,%edx : [opl]m(opl), [oph]m(oph), [fracl]m(fracl), [frach]m(frach), [integ]m(integ) : cc); return t; } e814: 83 c4 10add$0x10,%esp e817: 5b pop%ebx e818: 5e pop%esi e819: 5f pop%edi e81a: 5d pop%ebp e81b: c3 ret But us I said, in the working version, the code seems to be inlined everywhere. Should I send the two object modules as well (probably as a private message?). /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Problem with gcc-4.5.1
On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the attached kernel OOPS, a notable difference between the 4.5.1 compiled version and a working one built with gcc-4.4.4 on the same system with the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not inlined, is this anybody has seen before? No, that is new, we need to see the disassembly of __rthal_x86_nodiv_ullimd objdump -S: static inline __attribute__((const)) unsigned long long __rthal_x86_nodiv_ullimd(const unsigned long long op, const unsigned long long frac, unsigned integ) { e7a8:55 push %ebp e7a9:89 e5 mov%esp,%ebp e7ab:57 push %edi e7ac:56 push %esi e7ad:53 push %ebx e7ae:83 ec 10sub$0x10,%esp e7b1:8d 7d 08lea0x8(%ebp),%edi e7b4: e8 fc ff ff ff call e7b5__rthal_x86_nodiv_ullimd+0xd e7b9:8b 1f mov(%edi),%ebx e7bb:8b 4f 04mov0x4(%edi),%ecx register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7be:89 45 e8mov%eax,-0x18(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c1:89 5d f0mov%ebx,-0x10(%ebp) register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7c4:89 55 e4mov%edx,-0x1c(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c7:89 4d ecmov%ecx,-0x14(%ebp) __asm__ (mov %[oph], %%eax\n\t e7ca:8b 45 e4mov-0x1c(%ebp),%eax e7cd:f7 65 ecmull -0x14(%ebp) e7d0:89 c6 mov%eax,%esi e7d2:89 d7 mov%edx,%edi e7d4:8b 45 e8mov-0x18(%ebp),%eax e7d7:f7 65 f0mull -0x10(%ebp) e7da:89 d1 mov%edx,%ecx e7dc:d1 e0 shl%eax e7de:83 d1 00adc$0x0,%ecx e7e1:83 d6 00adc$0x0,%esi e7e4:83 d7 00adc$0x0,%edi e7e7:8b 45 e4mov-0x1c(%ebp),%eax e7ea:f7 65 f0mull -0x10(%ebp) e7ed:01 c1 add%eax,%ecx e7ef:11 d6 adc%edx,%esi e7f1:83 d7 00adc$0x0,%edi e7f4:8b 45 e8mov-0x18(%ebp),%eax e7f7:f7 65 ecmull -0x14(%ebp) e7fa:01 c1 add%eax,%ecx e7fc:11 d6 adc%edx,%esi e7fe:83 d7 00adc$0x0,%edi e801:8b 45 e8mov-0x18(%ebp),%eax e804:f7 67 08mull 0x8(%edi) Problem is here: edi is used by gcc as if it contained an address whereas it is used by the assembly for the computation. Should be marked early clobber. So, in include/asm-x86/arith_32.h, replace: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) with: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) No cigar (:-() arch/x86/include/asm/xenomai/arith_32.h: In function ‘__rthal_x86_nodiv_ullimd’: arch/x86/include/asm/xenomai/arith_32.h:154:2: error: can't find a register in class ‘DIREG’ while reloading ‘asm’ arch/x86/include/asm/xenomai/arith_32.h:154:2: error: ‘asm’ operand has impossible constraints Forcing compilation with optimizations besides -Os seems to work. But us I said, in the working version, the code seems to be inlined everywhere. Should I send the two object modules as well (probably as a private message?). The code should work the same whatever gcc decides regarding inlining. Whether we like gcc decision is a different issue. Agreed Note that there is an option to get gcc to go back to the old behaviour (inlining as the source command). What option is that? /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden
Re: [Xenomai-core] Problem with gcc-4.5.1
On 12/07/2010 03:14 PM, Anders Blomdell wrote: On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the attached kernel OOPS, a notable difference between the 4.5.1 compiled version and a working one built with gcc-4.4.4 on the same system with the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not inlined, is this anybody has seen before? No, that is new, we need to see the disassembly of __rthal_x86_nodiv_ullimd objdump -S: static inline __attribute__((const)) unsigned long long __rthal_x86_nodiv_ullimd(const unsigned long long op, const unsigned long long frac, unsigned integ) { e7a8: 55 push %ebp e7a9: 89 e5 mov %esp,%ebp e7ab: 57 push %edi e7ac: 56 push %esi e7ad: 53 push %ebx e7ae: 83 ec 10 sub $0x10,%esp e7b1: 8d 7d 08 lea 0x8(%ebp),%edi e7b4: e8 fc ff ff ff call e7b5__rthal_x86_nodiv_ullimd+0xd e7b9: 8b 1f mov (%edi),%ebx e7bb: 8b 4f 04 mov 0x4(%edi),%ecx register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7be: 89 45 e8 mov %eax,-0x18(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c1: 89 5d f0 mov %ebx,-0x10(%ebp) register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7c4: 89 55 e4 mov %edx,-0x1c(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c7: 89 4d ec mov %ecx,-0x14(%ebp) __asm__ (mov %[oph], %%eax\n\t e7ca: 8b 45 e4 mov -0x1c(%ebp),%eax e7cd: f7 65 ec mull -0x14(%ebp) e7d0: 89 c6 mov %eax,%esi e7d2: 89 d7 mov %edx,%edi e7d4: 8b 45 e8 mov -0x18(%ebp),%eax e7d7: f7 65 f0 mull -0x10(%ebp) e7da: 89 d1 mov %edx,%ecx e7dc: d1 e0 shl %eax e7de: 83 d1 00 adc $0x0,%ecx e7e1: 83 d6 00 adc $0x0,%esi e7e4: 83 d7 00 adc $0x0,%edi e7e7: 8b 45 e4 mov -0x1c(%ebp),%eax e7ea: f7 65 f0 mull -0x10(%ebp) e7ed: 01 c1 add %eax,%ecx e7ef: 11 d6 adc %edx,%esi e7f1: 83 d7 00 adc $0x0,%edi e7f4: 8b 45 e8 mov -0x18(%ebp),%eax e7f7: f7 65 ec mull -0x14(%ebp) e7fa: 01 c1 add %eax,%ecx e7fc: 11 d6 adc %edx,%esi e7fe: 83 d7 00 adc $0x0,%edi e801: 8b 45 e8 mov -0x18(%ebp),%eax e804: f7 67 08 mull 0x8(%edi) Problem is here: edi is used by gcc as if it contained an address whereas it is used by the assembly for the computation. Should be marked early clobber. So, in include/asm-x86/arith_32.h, replace: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) with: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) No cigar (:-() arch/x86/include/asm/xenomai/arith_32.h: In function ‘__rthal_x86_nodiv_ullimd’: arch/x86/include/asm/xenomai/arith_32.h:154:2: error: can't find a register in class ‘DIREG’ while reloading ‘asm’ arch/x86/include/asm/xenomai/arith_32.h:154:2: error: ‘asm’ operand has impossible constraints Forcing compilation with optimizations besides -Os seems to work. Patch that makes code compile and generates modules that loads is attached. But us I said, in the working version, the code seems to be inlined everywhere. Should I send the two object modules as well (probably as a private message?). The code should work the same whatever gcc decides regarding inlining. Whether we like gcc decision is a different issue. Agreed Note that there is an option to get gcc to go back to the old behaviour (inlining as the source command). What option is that? /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden --- a/include/asm-x86/arith_32.h 2010-05-18 20:31:15.0 +0200 +++ b/include/asm-x86/arith_32.h 2010-12-07 13:22:32.0 +0100 @@ -179,8 +179,8 @@ mov %[oph], %%edx\n\t imul %[integ], %%edx\n\t add %[rh], %%edx\n\t - : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) + : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) : [opl]m(opl), [oph]m(oph), [fracl]m(fracl), [frach]m(frach), [integ]m(integ) : cc); --- a/ksrc/nucleus/Makefile 2010-05-18 20:31:16.0 +0200 +++ b/ksrc/nucleus/Makefile 2010-12-07 16:09:46.0 +0100 @@ -21,7 +21,7 @@ # exist on initcalls defined by other object files. xeno_nucleus-y += module.o -EXTRA_CFLAGS += -D__IN_XENOMAI__ -Iinclude/xenomai +EXTRA_CFLAGS += -D__IN_XENOMAI__ -Iinclude/xenomai -O3 else ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Problem with gcc-4.5.1
On 2010-12-07 21.21, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 12/07/2010 01:09 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: On 12/07/2010 12:51 PM, Gilles Chanteperdrix wrote: Anders Blomdell wrote: When compiling Xenomai on Fedora-14 with gcc-4.5.1 [version 4.5.1 20100924 (Red Hat 4.5.1-4)], the loading of xeno_nucleus fails with the attached kernel OOPS, a notable difference between the 4.5.1 compiled version and a working one built with gcc-4.4.4 on the same system with the same configuration, sis tthat __rthal_x86_nodiv_ullimd is not inlined, is this anybody has seen before? No, that is new, we need to see the disassembly of __rthal_x86_nodiv_ullimd objdump -S: static inline __attribute__((const)) unsigned long long __rthal_x86_nodiv_ullimd(const unsigned long long op, const unsigned long long frac, unsigned integ) { e7a8: 55 push %ebp e7a9: 89 e5 mov%esp,%ebp e7ab: 57 push %edi e7ac: 56 push %esi e7ad: 53 push %ebx e7ae: 83 ec 10sub$0x10,%esp e7b1: 8d 7d 08lea0x8(%ebp),%edi e7b4: e8 fc ff ff ff call e7b5__rthal_x86_nodiv_ullimd+0xd e7b9: 8b 1f mov(%edi),%ebx e7bb: 8b 4f 04mov0x4(%edi),%ecx register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7be: 89 45 e8mov%eax,-0x18(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c1: 89 5d f0mov%ebx,-0x10(%ebp) register unsigned rm __asm__(esi); register unsigned rh __asm__(edi); unsigned fracl, frach, opl, oph; register unsigned long long t; __rthal_u64tou32(op, oph, opl); e7c4: 89 55 e4mov%edx,-0x1c(%ebp) __rthal_u64tou32(frac, frach, fracl); e7c7: 89 4d ecmov%ecx,-0x14(%ebp) __asm__ (mov %[oph], %%eax\n\t e7ca: 8b 45 e4mov-0x1c(%ebp),%eax e7cd: f7 65 ecmull -0x14(%ebp) e7d0: 89 c6 mov%eax,%esi e7d2: 89 d7 mov%edx,%edi e7d4: 8b 45 e8mov-0x18(%ebp),%eax e7d7: f7 65 f0mull -0x10(%ebp) e7da: 89 d1 mov%edx,%ecx e7dc: d1 e0 shl%eax e7de: 83 d1 00adc$0x0,%ecx e7e1: 83 d6 00adc$0x0,%esi e7e4: 83 d7 00adc$0x0,%edi e7e7: 8b 45 e4mov-0x1c(%ebp),%eax e7ea: f7 65 f0mull -0x10(%ebp) e7ed: 01 c1 add%eax,%ecx e7ef: 11 d6 adc%edx,%esi e7f1: 83 d7 00adc$0x0,%edi e7f4: 8b 45 e8mov-0x18(%ebp),%eax e7f7: f7 65 ecmull -0x14(%ebp) e7fa: 01 c1 add%eax,%ecx e7fc: 11 d6 adc%edx,%esi e7fe: 83 d7 00adc$0x0,%edi e801: 8b 45 e8mov-0x18(%ebp),%eax e804: f7 67 08mull 0x8(%edi) Problem is here: edi is used by gcc as if it contained an address whereas it is used by the assembly for the computation. Should be marked early clobber. So, in include/asm-x86/arith_32.h, replace: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) with: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) No cigar (:-() Ok. Maybe we can try something less radical, such as: : [rl]=c(rl), [rm]=S(rm), [rh]=D(rh), =A(t) This is incorrect, but we can hope for the best... As previously said, changing the optimization from -Os to anything else for xeno_nucleus (see patch in mail dated 'Tue, 07 Dec 2010 17:20:37 +0100'), solved that issue (incorrect code + hope for the best - spurious disasters). Rather compile time errors than runtime errors. /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Gilles Chanteperdrix wrote: Anders Blomdell wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.11.2010 00:24, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 23:06, Gilles Chanteperdrix wrote: Jan Kiszka wrote: At first sight, here you are more breaking things than cleaning them. Still, it has the SMP record for my test program, still runs with ftrace on (after 2 hours, where it previously failed after maximum 23 minutes). My version was indeed still buggy, I'm reworking it ATM. If I get the gist of Jan's changes, they are (using the IPI to transfer one bit of information: your cpu needs to reschedule): xnsched_set_resched: - setbits((__sched__)-status, XNRESCHED); xnpod_schedule_handler: + xnsched_set_resched(sched); If you (we?) decide to keep the debug checks, under what circumstances would the current check trigger (in laymans language, that I'll be able to understand)? That's actually what /me is wondering as well. I do not see yet how you can reliably detect a missed reschedule reliably (that was the purpose of the debug check) given the racy nature between signaling resched and processing the resched hints. The purpose of the debugging change is to detect a change of the scheduler state which was not followed by setting the XNRESCHED bit. But that is nucleus business, nothing skins can screw up (as long as they do not misuse APIs). Yes, but it happens that we modify the nucleus from time to time. Getting it to work is relatively simple: we add a scheduler change set remotely bit to the sched structure which is NOT in the status bit, set this bit when changing a remote sched (under nklock). In the debug check code, if the scheduler state changed, and the XNRESCHED bit is not set, only consider this a but if this new bit is not set. All this is compiled out if the debug is not enabled. I still see no benefit in this check. Where to you want to place the bit set? Aren't that just the same locations where xnsched_set_[self_]resched already is today? Well no, that would be another bit in the sched structure which would allow us to manipulate the status bits from the local cpu. That supplementary bit would only be changed from a distant CPU, and serve to detect the race which causes the false positive. The resched bits are set on the local cpu to get xnpod_schedule to trigger a rescheduling on the distance cpu. That bit would be set on the remote cpu's sched. Only when debugging is enabled. But maybe you can provide some motivating bug scenarios, real ones of the past or realistic ones of the future. Of course. The bug is anything which changes the scheduler state but does not set the XNRESCHED bit. This happened when we started the SMP port. New scheduling policies would be good candidates for a revival of this bug. You don't gain any worthwhile check if you cannot make the instrumentation required for a stable detection simpler than the proper problem solution itself. And this is what I'm still skeptical of. The solution is simple, but finding the problem without the instrumentation is way harder than with the instrumentation, so the instrumentation is worth something. Reproducing the false positive is surprisingly easy with a simple dual-cpu semaphore ping-pong test. So, here is the (tested) patch, using a ridiculous long variable name to illustrate what I was thinking about: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index cf4..454b8e8 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -108,6 +108,9 @@ typedef struct xnsched { struct xnthread *gktarget; #endif +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS + int debug_resched_from_remote; +#endif } xnsched_t; union xnsched_policy_param; @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched) xnsched_t *current_sched = xnpod_current_sched();\ __setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)){ \ + if (XENO_DEBUG(NUCLEUS)) \ + __sched__-debug_resched_from_remote = 1; \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ }\ } while (0) diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 4cb707a..50b0f49 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) xnarch_cpus_clear(sched-resched); } #endif + if (XENO_DEBUG(NUCLEUS) sched-debug_resched_from_remote) { + sched-debug_resched_from_remote = 0; + resched = 1; + } clrbits(sched-status, XNRESCHED); return resched; } I am still uncertain. Will only work if all is done under nklock, otherwise two
Re: [Xenomai-core] Potential problem with rt_eepro100
Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.11.2010 00:25, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 23:08, Gilles Chanteperdrix wrote: Jan Kiszka wrote: rework. Safer for now is likely to revert 56ff4329ff, keeping nucleus debugging off. That is not enough. It is, I've reviewed the code today. The fallouts I am talking about are: 47dac49c71e89b684203e854d1b0172ecacbc555 Not related. 38f2ca83a8e63cc94eaa911ff1c0940c884b5078 An optimization. 5e7cfa5c25672e4478a721eadbd6f6c5b4f88a2f That fall out of that commit is fixed in my series. This commit was followed by several others to fix the fix. You know how things are, someone proposes a fix, which fixes things for him, but it breaks in the other people configurations (one of the fallouts was a complete revamp of include/asm-arm/atomic.h for instance). I've pushed a series that reverts that commit, then fixes and cleans up on top of it. Just pushed if you want to take a look. We can find some alternative debugging mechanism independently (though I'm curious to see it - it still makes no sense to me). Since the fix is simply a modification to what we have currently. I would prefer if we did not remove it. In fact, I think it would be simpler if we started from what we currently have than reverting past patches. Look at the series, it goes step by step to an IMHO clean state. We can pull out the debugging check removal, though, if you prefer to work on top of the existing code. From my point of view, Anders looks for something that works, so following the rules that the minimal set of changes minimize the chances of introducing new bugs while cleaning, I would go for the minimal set of changes, such as: The tested one (on SMP, and UP with and without unlocked ctx switch): diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..cf4 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -165,28 +165,27 @@ struct xnsched_class { #endif /* CONFIG_SMP */ /* Test all resched flags from the given scheduler mask. */ -static inline int xnsched_resched_p(struct xnsched *sched) +static inline int xnsched_remote_resched_p(struct xnsched *sched) { - return testbits(sched-status, XNRESCHED); + return !xnarch_cpus_empty(sched-resched); } -static inline int xnsched_self_resched_p(struct xnsched *sched) +static inline int xnsched_resched_p(struct xnsched *sched) { return testbits(sched-status, XNRESCHED); } /* Set self resched flag for the given scheduler. */ #define xnsched_set_self_resched(__sched__) do { \ - setbits((__sched__)-status, XNRESCHED); \ + __setbits((__sched__)-status, XNRESCHED);\ } while (0) /* Set specific resched flag into the local scheduler mask. */ #define xnsched_set_resched(__sched__) do {\ xnsched_t *current_sched = xnpod_current_sched();\ - setbits(current_sched-status, XNRESCHED);\ + __setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)){ \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ - setbits((__sched__)-status, XNRESCHED); \ }\ } while (0) diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 862838c..4cb707a 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -276,18 +276,16 @@ EXPORT_SYMBOL_GPL(xnpod_fatal_helper); void xnpod_schedule_handler(void) /* Called with hw interrupts off. */ { - xnsched_t *sched; + xnsched_t *sched = xnpod_current_sched(); trace_mark(xn_nucleus, sched_remote, MARK_NOARGS); #if defined(CONFIG_SMP) defined(CONFIG_XENO_OPT_PRIOCPL) - sched = xnpod_current_sched(); if (testbits(sched-status, XNRPICK)) { clrbits(sched-status, XNRPICK); xnshadow_rpi_check(); } -#else - (void)sched; #endif /* CONFIG_SMP CONFIG_XENO_OPT_PRIOCPL */ + xnsched_set_self_resched(sched); xnpod_schedule(); } @@ -2174,7 +2172,7 @@ static inline int __xnpod_test_resched(struct xnsched *sched) int resched = testbits(sched-status, XNRESCHED); #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ - if (unlikely(xnsched_resched_p(sched))) { + if (unlikely(xnsched_remote_resched_p(sched))) { xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); } diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c index 1fe3331..a0ac627 100644 --- a/ksrc/nucleus/timer.c +++ b/ksrc/nucleus/timer.c @@ -97,7 +97,7 @@ void xntimer_next_local_shot(xnsched_t *sched) __clrbits(sched-status, XNHDEFER); timer = aplink2timer(h);
Re: [Xenomai-core] Potential problem with rt_eepro100
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 05.11.2010 00:24, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 23:06, Gilles Chanteperdrix wrote: Jan Kiszka wrote: At first sight, here you are more breaking things than cleaning them. Still, it has the SMP record for my test program, still runs with ftrace on (after 2 hours, where it previously failed after maximum 23 minutes). My version was indeed still buggy, I'm reworking it ATM. If I get the gist of Jan's changes, they are (using the IPI to transfer one bit of information: your cpu needs to reschedule): xnsched_set_resched: - setbits((__sched__)-status, XNRESCHED); xnpod_schedule_handler: + xnsched_set_resched(sched); If you (we?) decide to keep the debug checks, under what circumstances would the current check trigger (in laymans language, that I'll be able to understand)? That's actually what /me is wondering as well. I do not see yet how you can reliably detect a missed reschedule reliably (that was the purpose of the debug check) given the racy nature between signaling resched and processing the resched hints. The purpose of the debugging change is to detect a change of the scheduler state which was not followed by setting the XNRESCHED bit. But that is nucleus business, nothing skins can screw up (as long as they do not misuse APIs). Yes, but it happens that we modify the nucleus from time to time. Getting it to work is relatively simple: we add a scheduler change set remotely bit to the sched structure which is NOT in the status bit, set this bit when changing a remote sched (under nklock). In the debug check code, if the scheduler state changed, and the XNRESCHED bit is not set, only consider this a but if this new bit is not set. All this is compiled out if the debug is not enabled. I still see no benefit in this check. Where to you want to place the bit set? Aren't that just the same locations where xnsched_set_[self_]resched already is today? Well no, that would be another bit in the sched structure which would allow us to manipulate the status bits from the local cpu. That supplementary bit would only be changed from a distant CPU, and serve to detect the race which causes the false positive. The resched bits are set on the local cpu to get xnpod_schedule to trigger a rescheduling on the distance cpu. That bit would be set on the remote cpu's sched. Only when debugging is enabled. But maybe you can provide some motivating bug scenarios, real ones of the past or realistic ones of the future. Of course. The bug is anything which changes the scheduler state but does not set the XNRESCHED bit. This happened when we started the SMP port. New scheduling policies would be good candidates for a revival of this bug. You don't gain any worthwhile check if you cannot make the instrumentation required for a stable detection simpler than the proper problem solution itself. And this is what I'm still skeptical of. The solution is simple, but finding the problem without the instrumentation is way harder than with the instrumentation, so the instrumentation is worth something. Reproducing the false positive is surprisingly easy with a simple dual-cpu semaphore ping-pong test. So, here is the (tested) patch, using a ridiculous long variable name to illustrate what I was thinking about: diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index cf4..454b8e8 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -108,6 +108,9 @@ typedef struct xnsched { struct xnthread *gktarget; #endif +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS + int debug_resched_from_remote; +#endif } xnsched_t; union xnsched_policy_param; @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched *sched) xnsched_t *current_sched = xnpod_current_sched();\ __setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)){ \ + if (XENO_DEBUG(NUCLEUS)) \ + __sched__-debug_resched_from_remote = 1; \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ }\ } while (0) diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 4cb707a..50b0f49 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct xnsched *sched) xnarch_cpus_clear(sched-resched); } #endif + if (XENO_DEBUG(NUCLEUS) sched-debug_resched_from_remote) { + sched-debug_resched_from_remote = 0; + resched = 1; + } clrbits(sched-status, XNRESCHED); return resched; } I am still uncertain. Will only work if all is done under nklock, otherwise two almost simultaneous xnsched_resched_p from different
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 04.11.2010 01:13, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 00:56, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 00:44, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 00:18, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 00:11, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 03.11.2010 23:11, Jan Kiszka wrote: Am 03.11.2010 23:03, Jan Kiszka wrote: But we not not always use atomic ops for manipulating status bits (but we do in other cases where this is no need - different story). This may fix the race: Err, nonsense. As we manipulate xnsched::status also outside of nklock protection, we must _always_ use atomic ops. This screams for a cleanup: local-only bits like XNHTICK or XNINIRQ should be pushed in a separate status word that can then be safely modified non-atomically. Second try to fix and clean up the sched status bits. Anders, please test. Jan diff --git a/include/nucleus/pod.h b/include/nucleus/pod.h index 01ff0a7..5987a1f 100644 --- a/include/nucleus/pod.h +++ b/include/nucleus/pod.h @@ -277,12 +277,10 @@ static inline void xnpod_schedule(void) * context is active, or if we are caught in the middle of a * unlocked context switch. */ -#if XENO_DEBUG(NUCLEUS) if (testbits(sched-status, XNKCOUT|XNINIRQ|XNSWLOCK)) return; -#else /* !XENO_DEBUG(NUCLEUS) */ - if (testbits(sched-status, -XNKCOUT|XNINIRQ|XNSWLOCK|XNRESCHED) != XNRESCHED) +#if !XENO_DEBUG(NUCLEUS) + if (!sched-resched) return; #endif /* !XENO_DEBUG(NUCLEUS) */ Having only one test was really nice here, maybe we simply read a barrier before reading the status? I agree - but the alternative is letting all modifications of xnsched::status use atomic bitops (that's required when folding all bits into a single word). And that should be much more costly, specifically on SMP. What about issuing a barrier before testing the status? The problem is not about reading but writing the status concurrently, thus it's not about the code you see above. The bits are modified under nklock, which implies a barrier when unlocked. Furthermore, an IPI is guaranteed to be received on the remote CPU after this barrier, so, a barrier should be enough to see the modifications which have been made remotely. Check nucleus/intr.c for tons of unprotected status modifications. Ok. Then maybe, we should reconsider the original decision to start fiddling with the XNRESCHED bit remotely. ...which removed complexity and fixed a race? Let's better review the checks done in xnpod_schedule vs. its callers, I bet there is more to save (IOW: remove the need to test for sched-resched). Not that much complexitiy... and the race was a false positive in debug code, no big deal. At least it worked, and it has done so for a long time. No atomic needed, no barrier, only one test in xnpod_schedule. And a nice invariant: sched-status is always accessed on the local cpu. What else? Take a step back and look at the root cause for this issue again. Unlocked if need-resched __xnpod_schedule is inherently racy and will always be (not only for the remote reschedule case BTW). So we either have to accept this and remove the debugging check from the scheduler or push the check back to __xnpod_schedule where it once came from. When this it cleaned up, we can look into the remote resched protocol again. Probably being daft here; why not stop fiddling with remote CPU status bits and always do a reschedule on IPI irq's? /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 04.11.2010 10:26, Jan Kiszka wrote: Am 04.11.2010 10:16, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Take a step back and look at the root cause for this issue again. Unlocked if need-resched __xnpod_schedule is inherently racy and will always be (not only for the remote reschedule case BTW). Ok, let us examine what may happen with this code if we only set the XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not matter, because they can not change under our feet. So, we have two cases for this race: 1- we see the XNRESCHED bit, but it has been cleared once nklock is locked in __xnpod_schedule. 2- we do not see the XNRESCHED bit, but it get set right after we test it. 1 is not a problem. Yes, as long as we remove the debug check from the scheduler code (or fix it somehow). The scheduling code already catches this race. 2 is not a problem, because anything which sets the XNRESCHED (it may only be an interrupt in fact) bit will cause xnpod_schedule to be called right after that. So no, no race here provided that we only set the XNRESCHED bit on the local cpu. So we either have to accept this and remove the debugging check from the scheduler or push the check back to __xnpod_schedule where it once came from. When this it cleaned up, we can look into the remote resched protocol again. The problem of the debug check is that it checks whether the scheduler state is modified without the XNRESCHED bit being set. And this is the problem, because yes, in that case, we have a race: the scheduler state may be modified before the XNRESCHED bit is set by an IPI. If we want to fix the debug check, we have to have a special bit, on in the sched-status flag, only for the purpose of debugging. Or remove the debug check. Exactly my point. Is there any benefit in keeping the debug check? The code to make it work may end up as complex as the logic it verifies, at least that's my current feeling. This would be the radical approach of removing the check (and cleaning up some bits). If it's acceptable, I would split it up properly. diff --git a/include/nucleus/pod.h b/include/nucleus/pod.h index 01ff0a7..71f8311 100644 --- a/include/nucleus/pod.h +++ b/include/nucleus/pod.h @@ -277,14 +277,9 @@ static inline void xnpod_schedule(void) * context is active, or if we are caught in the middle of a * unlocked context switch. */ -#if XENO_DEBUG(NUCLEUS) - if (testbits(sched-status, XNKCOUT|XNINIRQ|XNSWLOCK)) - return; -#else /* !XENO_DEBUG(NUCLEUS) */ if (testbits(sched-status, XNKCOUT|XNINIRQ|XNSWLOCK|XNRESCHED) != XNRESCHED) return; -#endif /* !XENO_DEBUG(NUCLEUS) */ __xnpod_schedule(sched); } diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..c832b91 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -177,17 +177,16 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) /* Set self resched flag for the given scheduler. */ #define xnsched_set_self_resched(__sched__) do { \ - setbits((__sched__)-status, XNRESCHED); \ + __setbits((__sched__)-status, XNRESCHED); \ } while (0) /* Set specific resched flag into the local scheduler mask. */ #define xnsched_set_resched(__sched__) do {\ - xnsched_t *current_sched = xnpod_current_sched();\ - setbits(current_sched-status, XNRESCHED);\ - if (current_sched != (__sched__)){ \ - xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ - setbits((__sched__)-status, XNRESCHED); \ - }\ + xnsched_t *current_sched = xnpod_current_sched(); \ + __setbits(current_sched-status, XNRESCHED); \ + if (current_sched != (__sched__)) \ + xnarch_cpu_set(xnsched_cpu(__sched__), \ + current_sched-resched); \ } while (0) void xnsched_zombie_hooks(struct xnthread *thread); diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c index 9e135f3..87dc136 100644 --- a/ksrc/nucleus/pod.c +++ b/ksrc/nucleus/pod.c @@ -284,10 +284,11 @@ void xnpod_schedule_handler(void) /* Called with hw interrupts off. */ trace_xn_nucleus_sched_remote(sched); #if defined(CONFIG_SMP) defined(CONFIG_XENO_OPT_PRIOCPL) if (testbits(sched-status, XNRPICK)) { - clrbits(sched-status, XNRPICK); + __clrbits(sched-status, XNRPICK); xnshadow_rpi_check(); } #endif /* CONFIG_SMP CONFIG_XENO_OPT_PRIOCPL */ + xnsched_set_resched(sched); xnpod_schedule(); } @@ -2162,21 +2163,21
Re: [Xenomai-core] Potential problem with rt_eepro100
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 10:26, Jan Kiszka wrote: Am 04.11.2010 10:16, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Take a step back and look at the root cause for this issue again. Unlocked if need-resched __xnpod_schedule is inherently racy and will always be (not only for the remote reschedule case BTW). Ok, let us examine what may happen with this code if we only set the XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not matter, because they can not change under our feet. So, we have two cases for this race: 1- we see the XNRESCHED bit, but it has been cleared once nklock is locked in __xnpod_schedule. 2- we do not see the XNRESCHED bit, but it get set right after we test it. 1 is not a problem. Yes, as long as we remove the debug check from the scheduler code (or fix it somehow). The scheduling code already catches this race. 2 is not a problem, because anything which sets the XNRESCHED (it may only be an interrupt in fact) bit will cause xnpod_schedule to be called right after that. So no, no race here provided that we only set the XNRESCHED bit on the local cpu. So we either have to accept this and remove the debugging check from the scheduler or push the check back to __xnpod_schedule where it once came from. When this it cleaned up, we can look into the remote resched protocol again. The problem of the debug check is that it checks whether the scheduler state is modified without the XNRESCHED bit being set. And this is the problem, because yes, in that case, we have a race: the scheduler state may be modified before the XNRESCHED bit is set by an IPI. If we want to fix the debug check, we have to have a special bit, on in the sched-status flag, only for the purpose of debugging. Or remove the debug check. Exactly my point. Is there any benefit in keeping the debug check? The code to make it work may end up as complex as the logic it verifies, at least that's my current feeling. This would be the radical approach of removing the check (and cleaning up some bits). If it's acceptable, I would split it up properly. This debug check saved our asses when debugging SMP issues, and I suspect it may help debugging skin issues. So, I think we should try and keep it. At first sight, here you are more breaking things than cleaning them. Still, it has the SMP record for my test program, still runs with ftrace on (after 2 hours, where it previously failed after maximum 23 minutes). If I get the gist of Jan's changes, they are (using the IPI to transfer one bit of information: your cpu needs to reschedule): xnsched_set_resched: - setbits((__sched__)-status, XNRESCHED); xnpod_schedule_handler: + xnsched_set_resched(sched); If you (we?) decide to keep the debug checks, under what circumstances would the current check trigger (in laymans language, that I'll be able to understand)? /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 04.11.2010 14:18, Anders Blomdell wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Am 04.11.2010 10:26, Jan Kiszka wrote: Am 04.11.2010 10:16, Gilles Chanteperdrix wrote: Jan Kiszka wrote: Take a step back and look at the root cause for this issue again. Unlocked if need-resched __xnpod_schedule is inherently racy and will always be (not only for the remote reschedule case BTW). Ok, let us examine what may happen with this code if we only set the XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not matter, because they can not change under our feet. So, we have two cases for this race: 1- we see the XNRESCHED bit, but it has been cleared once nklock is locked in __xnpod_schedule. 2- we do not see the XNRESCHED bit, but it get set right after we test it. 1 is not a problem. Yes, as long as we remove the debug check from the scheduler code (or fix it somehow). The scheduling code already catches this race. 2 is not a problem, because anything which sets the XNRESCHED (it may only be an interrupt in fact) bit will cause xnpod_schedule to be called right after that. So no, no race here provided that we only set the XNRESCHED bit on the local cpu. So we either have to accept this and remove the debugging check from the scheduler or push the check back to __xnpod_schedule where it once came from. When this it cleaned up, we can look into the remote resched protocol again. The problem of the debug check is that it checks whether the scheduler state is modified without the XNRESCHED bit being set. And this is the problem, because yes, in that case, we have a race: the scheduler state may be modified before the XNRESCHED bit is set by an IPI. If we want to fix the debug check, we have to have a special bit, on in the sched-status flag, only for the purpose of debugging. Or remove the debug check. Exactly my point. Is there any benefit in keeping the debug check? The code to make it work may end up as complex as the logic it verifies, at least that's my current feeling. This would be the radical approach of removing the check (and cleaning up some bits). If it's acceptable, I would split it up properly. This debug check saved our asses when debugging SMP issues, and I suspect it may help debugging skin issues. So, I think we should try and keep it. At first sight, here you are more breaking things than cleaning them. Still, it has the SMP record for my test program, still runs with ftrace on (after 2 hours, where it previously failed after maximum 23 minutes). My version was indeed still buggy, I'm reworking it ATM. Any reason why the two changes below would fail (I need to get things working real soon now). If I get the gist of Jan's changes, they are (using the IPI to transfer one bit of information: your cpu needs to reschedule): xnsched_set_resched: - setbits((__sched__)-status, XNRESCHED); xnpod_schedule_handler: + xnsched_set_resched(sched); If you (we?) decide to keep the debug checks, under what circumstances would the current check trigger (in laymans language, that I'll be able to understand)? That's actually what /me is wondering as well. I do not see yet how you can reliably detect a missed reschedule reliably (that was the purpose of the debug check) given the racy nature between signaling resched and processing the resched hints. The only thing I can think of are atomic set/clear on an independent variable. /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Anders Blomdell wrote: Jan Kiszka wrote: Am 01.11.2010 17:55, Anders Blomdell wrote: Jan Kiszka wrote: Am 28.10.2010 11:34, Anders Blomdell wrote: Jan Kiszka wrote: Am 28.10.2010 09:34, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Hi, I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm experincing occasionally weird behaviour. Versions of things: linux-2.6.34.5 xenomai-2.5.5.2 rtnet-39f7fcf The testprogram runs on two computers with Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer acts as a mirror sending back packets received from the ethernet (only those two computers on the network), and the other sends packets and measures roundtrip time. Most packets comes back in approximately 100 us, but occasionally the reception times out (once in about 10 packets or more), but the packets gets immediately received when reception is retried, which might indicate a race between rt_dev_recvmsg and interrupt, but I might miss something obvious. Changing one of the ethernet cards to a Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05), while keeping everything else constant, changes behavior somewhat; after receiving a few 10 packets, reception stops entirely (-EAGAIN is returned), while transmission proceeds as it should (and mirror returns packets). Any suggestions on what to try? Since the problem disappears with 'maxcpus=1', I suspect I have a SMP issue (machine is a Core2 Quad), so I'll move to xenomai-core. (original message can be found at http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se ) Xenomai-core gurus: which is the corrrect way to debug SMP issues? Can I run I-pipe-tracer and expect to be able save at least 150 us of traces for all cpus? Any hints/suggestions/insigths are welcome... The i-pipe tracer unfortunately only saves traces for a the CPU that triggered the freeze. To have a full pictures, you may want to try my ftrace port I posted recently for 2.6.35. 2.6.35.7 ? Exactly. Finally managed to get the ftrace to work (one possible bug: had to manually copy include/xenomai/trace/xn_nucleus.h to include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be very useful... But I don't think it will give much info at the moment, since no xenomai/ipipe interrupt activity shows up, and adding that is far above my league :-( You could use the function tracer, provided you are able to stop the trace quickly enough on error. My current theory is that the problem occurs when something like this takes place: CPU-iCPU-jCPU-kCPU-l rt_dev_sendmsg xmit_irq rt_dev_recvmsgrecv_irq Can't follow. When races here, and what will go wrong then? Thats the good question. Find attached: 1. .config (so you can check for stupid mistakes) 2. console log 3. latest version of test program 4. tail of ftrace dump These are the xenomai tasks running when the test program is active: CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master R ROOT/0 1 0 idle-1 - master R ROOT/1 2 0 idle-1 - master R ROOT/2 3 0 idle-1 - master R ROOT/3 0 0 rt 98 - master W rtnet-stack 0 0 rt 0 - master W rtnet-rtpc 0 29901 rt 50 - masterraw_test 0 29906 rt 0 - master X reporter The lines of interest from the trace are probably: [003] 2061.347855: xn_nucleus_thread_resume: thread=f9bf7b00 thread_name=rtnet-stack mask=2 [003] 2061.347862: xn_nucleus_sched: status=200 [000] 2061.347866: xn_nucleus_sched_remote: status=0 since this is the only place where a packet gets delayed, and the only place in the trace where sched_remote reports a status=0 Since the cpu that has rtnet-stack and hence should be resumed is doing heavy I/O at the time of fault; could it be that send_ipi/schedule_handler needs barriers to make sure taht decisions are made on the right status? /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
On 2010-11-03 12.55, Jan Kiszka wrote: Am 03.11.2010 12:50, Jan Kiszka wrote: Am 03.11.2010 12:44, Anders Blomdell wrote: Anders Blomdell wrote: Jan Kiszka wrote: Am 01.11.2010 17:55, Anders Blomdell wrote: Jan Kiszka wrote: Am 28.10.2010 11:34, Anders Blomdell wrote: Jan Kiszka wrote: Am 28.10.2010 09:34, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Hi, I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm experincing occasionally weird behaviour. Versions of things: linux-2.6.34.5 xenomai-2.5.5.2 rtnet-39f7fcf The testprogram runs on two computers with Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer acts as a mirror sending back packets received from the ethernet (only those two computers on the network), and the other sends packets and measures roundtrip time. Most packets comes back in approximately 100 us, but occasionally the reception times out (once in about 10 packets or more), but the packets gets immediately received when reception is retried, which might indicate a race between rt_dev_recvmsg and interrupt, but I might miss something obvious. Changing one of the ethernet cards to a Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05), while keeping everything else constant, changes behavior somewhat; after receiving a few 10 packets, reception stops entirely (-EAGAIN is returned), while transmission proceeds as it should (and mirror returns packets). Any suggestions on what to try? Since the problem disappears with 'maxcpus=1', I suspect I have a SMP issue (machine is a Core2 Quad), so I'll move to xenomai-core. (original message can be found at http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se ) Xenomai-core gurus: which is the corrrect way to debug SMP issues? Can I run I-pipe-tracer and expect to be able save at least 150 us of traces for all cpus? Any hints/suggestions/insigths are welcome... The i-pipe tracer unfortunately only saves traces for a the CPU that triggered the freeze. To have a full pictures, you may want to try my ftrace port I posted recently for 2.6.35. 2.6.35.7 ? Exactly. Finally managed to get the ftrace to work (one possible bug: had to manually copy include/xenomai/trace/xn_nucleus.h to include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be very useful... But I don't think it will give much info at the moment, since no xenomai/ipipe interrupt activity shows up, and adding that is far above my league :-( You could use the function tracer, provided you are able to stop the trace quickly enough on error. My current theory is that the problem occurs when something like this takes place: CPU-iCPU-jCPU-kCPU-l rt_dev_sendmsg xmit_irq rt_dev_recvmsgrecv_irq Can't follow. When races here, and what will go wrong then? Thats the good question. Find attached: 1. .config (so you can check for stupid mistakes) 2. console log 3. latest version of test program 4. tail of ftrace dump These are the xenomai tasks running when the test program is active: CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master R ROOT/0 1 0 idle-1 - master R ROOT/1 2 0 idle-1 - master R ROOT/2 3 0 idle-1 - master R ROOT/3 0 0 rt 98 - master W rtnet-stack 0 0 rt 0 - master W rtnet-rtpc 0 29901 rt 50 - masterraw_test 0 29906 rt 0 - master X reporter The lines of interest from the trace are probably: [003] 2061.347855: xn_nucleus_thread_resume: thread=f9bf7b00 thread_name=rtnet-stack mask=2 [003] 2061.347862: xn_nucleus_sched: status=200 [000] 2061.347866: xn_nucleus_sched_remote: status=0 since this is the only place where a packet gets delayed, and the only place in the trace where sched_remote reports a status=0 Since the cpu that has rtnet-stack and hence should be resumed is doing heavy I/O at the time of fault; could it be that send_ipi/schedule_handler needs barriers to make sure taht decisions are made on the right status? That was my first idea as well - but we should run all relevant code under nklock here. But please correct me if I miss something. Wouldn't we need a write-barrier before the send_ipi regardless of what locks we hold, otherwise no guarantees that the memory write reaches the target cpu before the interrupt does? Mmmh -- not everything. The inlined XNRESCHED entry test in xnpod_schedule runs outside nklock. But doesn't releasing nklock imply a memory write barrier? Let me meditate... Wouldn't
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: additional barrier. Can you check this? diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..66b52ad 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) if (current_sched != (__sched__)){ \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ setbits((__sched__)-status, XNRESCHED); \ + xnarch_memory_barrier(); \ }\ } while (0) In progress, if nothing breaks before, I'll report status tomorrow morning. Mmmh -- not everything. The inlined XNRESCHED entry test in xnpod_schedule runs outside nklock. But doesn't releasing nklock imply a memory write barrier? Let me meditate... Wouldn't we need a read barrier then (but maybe the irq-handling takes care of that, not familiar with the code yet)? A read barrier is not required here as we do not need to order load operation /wrt each other in the reschedule IRQ handler. Only if taking the interrupt is equivalent to: read interrupts status memory_read_barrier execute handler processor manuals should have the answer to this (or it might already be in the code)... You can always help: there is a lot boring^Winteresting tracepoint conversion waiting in Xenomai, see the few already converted nucleus tracepoints. As soon as I have my system running, I'll put some effort into this. /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Anders Blomdell wrote: Jan Kiszka wrote: additional barrier. Can you check this? diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..66b52ad 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) if (current_sched != (__sched__)){\ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);\ setbits((__sched__)-status, XNRESCHED);\ + xnarch_memory_barrier();\ }\ } while (0) In progress, if nothing breaks before, I'll report status tomorrow morning. It still breaks (in approximately the same way). I'm currently putting a barrier in the other macro doing a RESCHED, also adding some tracing to see if a read barrier is needed. Interesting side-note: Harddisk accesses seems to get real slow after error has occured (kernel installs progresses with 2-3 modules installed per second), while lots of idle time reported on all cpu's, weird... /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Anders Blomdell wrote: Anders Blomdell wrote: Jan Kiszka wrote: additional barrier. Can you check this? diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..66b52ad 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) if (current_sched != (__sched__)){\ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);\ setbits((__sched__)-status, XNRESCHED);\ + xnarch_memory_barrier();\ }\ } while (0) In progress, if nothing breaks before, I'll report status tomorrow morning. It still breaks (in approximately the same way). I'm currently putting a barrier in the other macro doing a RESCHED, also adding some tracing to see if a read barrier is needed. Nope, no luck there either. Will start interesting tracepoint adding/conversion :-( Any reason why xn_nucleus_sched_remote should ever report status = 0? /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 03.11.2010 17:46, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Jan Kiszka wrote: additional barrier. Can you check this? diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h index df56417..66b52ad 100644 --- a/include/nucleus/sched.h +++ b/include/nucleus/sched.h @@ -187,6 +187,7 @@ static inline int xnsched_self_resched_p(struct xnsched *sched) if (current_sched != (__sched__)){\ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);\ setbits((__sched__)-status, XNRESCHED);\ + xnarch_memory_barrier();\ }\ } while (0) In progress, if nothing breaks before, I'll report status tomorrow morning. It still breaks (in approximately the same way). I'm currently putting a barrier in the other macro doing a RESCHED, also adding some tracing to see if a read barrier is needed. Nope, no luck there either. Will start interesting tracepoint adding/conversion :-( Strange. But it was too easy anyway... Any reason why xn_nucleus_sched_remote should ever report status = 0? Really don't know yet. You could trigger on this state and call ftrace_stop() then. Provided you had the functions tracer enabled, that should give a nice pictures of what happened before. Isn't there a race betweeen these two (still waiting for compilation to be finished)? static inline int __xnpod_test_resched(struct xnsched *sched) { int resched = testbits(sched-status, XNRESCHED); #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ if (unlikely(xnsched_resched_p(sched))) { xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); } #endif clrbits(sched-status, XNRESCHED); return resched; } #define xnsched_set_resched(__sched__) do { \ xnsched_t *current_sched = xnpod_current_sched(); \ setbits(current_sched-status, XNRESCHED); \ if (current_sched != (__sched__)) { \ xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched); \ setbits((__sched__)-status, XNRESCHED);\ xnarch_memory_barrier();\ } \ } while (0) I would suggest (if I have got all the macros right): static inline int __xnpod_test_resched(struct xnsched *sched) { int resched = testbits(sched-status, XNRESCHED); if (unlikely(resched)) { #ifdef CONFIG_SMP /* Send resched IPI to remote CPU(s). */ xnarch_send_ipi(sched-resched); xnarch_cpus_clear(sched-resched); #endif clrbits(sched-status, XNRESCHED); } return resched; } /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 28.10.2010 11:34, Anders Blomdell wrote: Jan Kiszka wrote: Am 28.10.2010 09:34, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Hi, I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm experincing occasionally weird behaviour. Versions of things: linux-2.6.34.5 xenomai-2.5.5.2 rtnet-39f7fcf The testprogram runs on two computers with Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer acts as a mirror sending back packets received from the ethernet (only those two computers on the network), and the other sends packets and measures roundtrip time. Most packets comes back in approximately 100 us, but occasionally the reception times out (once in about 10 packets or more), but the packets gets immediately received when reception is retried, which might indicate a race between rt_dev_recvmsg and interrupt, but I might miss something obvious. Changing one of the ethernet cards to a Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05), while keeping everything else constant, changes behavior somewhat; after receiving a few 10 packets, reception stops entirely (-EAGAIN is returned), while transmission proceeds as it should (and mirror returns packets). Any suggestions on what to try? Since the problem disappears with 'maxcpus=1', I suspect I have a SMP issue (machine is a Core2 Quad), so I'll move to xenomai-core. (original message can be found at http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se ) Xenomai-core gurus: which is the corrrect way to debug SMP issues? Can I run I-pipe-tracer and expect to be able save at least 150 us of traces for all cpus? Any hints/suggestions/insigths are welcome... The i-pipe tracer unfortunately only saves traces for a the CPU that triggered the freeze. To have a full pictures, you may want to try my ftrace port I posted recently for 2.6.35. 2.6.35.7 ? Exactly. Finally managed to get the ftrace to work (one possible bug: had to manually copy include/xenomai/trace/xn_nucleus.h to include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be very useful... But I don't think it will give much info at the moment, since no xenomai/ipipe interrupt activity shows up, and adding that is far above my league :-( My current theory is that the problem occurs when something like this takes place: CPU-i CPU-j CPU-k CPU-l rt_dev_sendmsg xmit_irq rt_dev_recvmsg recv_irq So now I'll try to instrument the code to see if the assumtion holds. Stay tuned... Regards Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
On 2010-10-29 20.06, Jan Kiszka wrote: Am 29.10.2010 19:42, Anders Blomdell wrote: Jan Kiszka wrote: Please provide the full kernel log, ideally also with the I-pipe tracer (with panic tracing) enabled. Will reconfigure/recompile and do that, with full kernel log do you mean all bootup info? That's best to avoid missing some detail or doing QA ping-pong. Full trace attached (finally...) You have to switch off CONFIG_DMA_API_DEBUG, it's incompatible with Xenomai. Thanks, will continue with this on monday (build in progress). With your ftrace port, how does one freeze all cpu's at the same time? Regards Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RTnet-users] Potential problem with rt_eepro100
Anders Blomdell wrote: Anders Blomdell wrote: Hi, I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm experincing occasionally weird behaviour. Versions of things: linux-2.6.34.5 xenomai-2.5.5.2 rtnet-39f7fcf The testprogram runs on two computers with Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer acts as a mirror sending back packets received from the ethernet (only those two computers on the network), and the other sends packets and measures roundtrip time. Most packets comes back in approximately 100 us, but occasionally the reception times out (once in about 10 packets or more), but the packets gets immediately received when reception is retried, which might indicate a race between rt_dev_recvmsg and interrupt, but I might miss something obvious. Changing one of the ethernet cards to a Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05), while keeping everything else constant, changes behavior somewhat; after receiving a few 10 packets, reception stops entirely (-EAGAIN is returned), while transmission proceeds as it should (and mirror returns packets). Any suggestions on what to try? Since the problem disappears with 'maxcpus=1', I suspect I have a SMP issue (machine is a Core2 Quad), so I'll move to xenomai-core. (original message can be found at http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se ) Xenomai-core gurus: which is the corrrect way to debug SMP issues? Can I run I-pipe-tracer and expect to be able save at least 150 us of traces for all cpus? Any hints/suggestions/insigths are welcome... Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RTnet-users] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 28.10.2010 09:34, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Hi, I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm experincing occasionally weird behaviour. Versions of things: linux-2.6.34.5 xenomai-2.5.5.2 rtnet-39f7fcf The testprogram runs on two computers with Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer acts as a mirror sending back packets received from the ethernet (only those two computers on the network), and the other sends packets and measures roundtrip time. Most packets comes back in approximately 100 us, but occasionally the reception times out (once in about 10 packets or more), but the packets gets immediately received when reception is retried, which might indicate a race between rt_dev_recvmsg and interrupt, but I might miss something obvious. Changing one of the ethernet cards to a Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05), while keeping everything else constant, changes behavior somewhat; after receiving a few 10 packets, reception stops entirely (-EAGAIN is returned), while transmission proceeds as it should (and mirror returns packets). Any suggestions on what to try? Since the problem disappears with 'maxcpus=1', I suspect I have a SMP issue (machine is a Core2 Quad), so I'll move to xenomai-core. (original message can be found at http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se ) Xenomai-core gurus: which is the corrrect way to debug SMP issues? Can I run I-pipe-tracer and expect to be able save at least 150 us of traces for all cpus? Any hints/suggestions/insigths are welcome... The i-pipe tracer unfortunately only saves traces for a the CPU that triggered the freeze. To have a full pictures, you may want to try my ftrace port I posted recently for 2.6.35. 2.6.35.7 ? /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Potential problem with rt_eepro100
Jan Kiszka wrote: Am 28.10.2010 11:34, Anders Blomdell wrote: Jan Kiszka wrote: Am 28.10.2010 09:34, Anders Blomdell wrote: Anders Blomdell wrote: Anders Blomdell wrote: Hi, I'm trying to use rt_eepro100, for sending raw ethernet packets, but I'm experincing occasionally weird behaviour. Versions of things: linux-2.6.34.5 xenomai-2.5.5.2 rtnet-39f7fcf The testprogram runs on two computers with Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) controller, where one computer acts as a mirror sending back packets received from the ethernet (only those two computers on the network), and the other sends packets and measures roundtrip time. Most packets comes back in approximately 100 us, but occasionally the reception times out (once in about 10 packets or more), but the packets gets immediately received when reception is retried, which might indicate a race between rt_dev_recvmsg and interrupt, but I might miss something obvious. Changing one of the ethernet cards to a Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05), while keeping everything else constant, changes behavior somewhat; after receiving a few 10 packets, reception stops entirely (-EAGAIN is returned), while transmission proceeds as it should (and mirror returns packets). Any suggestions on what to try? Since the problem disappears with 'maxcpus=1', I suspect I have a SMP issue (machine is a Core2 Quad), so I'll move to xenomai-core. (original message can be found at http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se ) Xenomai-core gurus: which is the corrrect way to debug SMP issues? Can I run I-pipe-tracer and expect to be able save at least 150 us of traces for all cpus? Any hints/suggestions/insigths are welcome... The i-pipe tracer unfortunately only saves traces for a the CPU that triggered the freeze. To have a full pictures, you may want to try my ftrace port I posted recently for 2.6.35. 2.6.35.7 ? Well, 2.6.35.7/xenomai/rtnet without ftrace patch freezes after approx 8000 rounds (16000 packets). Time freshen up find serial port console debugging I guess (under the assumption that this is the same bug, but easier to reproduce). /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Comedi drivers in Xenomai porting/integration status ?
Alexis Berlemont wrote: Hi, That was the reason why, I was really suprised to find Comedi integrated into the mainline kernel. What strikes me more is that Comedi seems to be left as is. Do you think, it will be cleaned up or reworked ? Without rework Comedi will not make into mainline (I wouldn't call the staging corner mainline). And when reading this http://permalink.gmane.org/gmane.linux.kernel/793476, it is probably the best time now to propose interface changes and contribute back improvements made for the RTDM rework. How would you proceed ? Maybe, the first step would be to ask on the Comedi mailing-list if someone is interested in discussing on the API rework. Maybe, someone will answer this time. If it is more informative than the mail from 06-04-09 and the presentation.txt there is definitely a chance :-), I read through it then, found the goals reasonably sound, and nothing to test, so I waited for some working code to show up (having too much at my hands already), that time might have come now. Features I would like to see in a Comedi/RTDM framework are: 1. Drivers should work in Linux, Xenomai (and possibly RTAI and/or RT-Linux) 2. It should be possible to write drivers that live in user-space (serial2002 driver is a big HACK). 3. Stackable drivers (e.g. put a force sensor driver on top of a analog input card). 4. A comedilib compatibilty library would be nice (but not necessary) If all these pieces are in place, I'm more than happy to test/migrate the drivers I use in my labs (JR3, NI M6221, DaqBoard 2000, ACPI 3106, serial) Best regards Anders Blomdell -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] ns vs. tsc as internal timer base
Jan Kiszka wrote: Hi, To avoid loosing the optimisation again in ns_to_tsc, I thought about basing the whole internal timer arithmetics on nanoseconds instead of TSCs as it is now. Good idea, makes it simpler to adopt to laptop frequency scaling and deep ACPI sleep, i.e. sync Xenomai time to the ACPI timer. /Anders -- Anders Blomdell Email: [EMAIL PROTECTED] Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: For RTDM I'm now almost determined to rework the API in way that only HANDLED/UNHANDLED (or what ever their names will be) get exported, any additional guru features will remain excluded as long as we have no clean usage policy for them. Good. Then let's go for HANDLED, UNHANDLED - we may consider them even as 2 scalar values + NOENABLE, CHAINED - additional bits. They are not encouraged to be used with shared interrupts (explained in docs + debug messages when XENO_OPT_DEBUG is on). Any ISR on the shared irq line should understand that it's just one among the equals. That said, it should not do anything that may affect other ISRs and not require any special treatment (like CHAINED or NOENABLE). If it wants it indeed, then don't declare itself as SHARED. We may come back to the topic about possible return values of ISRs a bit later maybe having got more feedback (hm.. hopefully) on shared irq support. But the later one is not only about enabling the line, but on some archs - about .end-ing it too (sending EOI). And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e. EOI should always be sent from xnintr_shirq_handler(). But the one returning HANDLED_NOENABLE is likely to leave the interrupt asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE). I guess this is what Dmitry meant: explicitly call disable() if one or more ISRs returned NOENABLE - at least on archs where end != enable. Will this work? We could then keep on using the existing IRQ-enable API from bottom-half IRQ tasks. Almost. Let's consider the following only as anorther way of doing some things; I don't propose to implement it, it's just to illustrate my thoughts. So one may simply ski-skip-skip it :o) Let's keep in mind that what is behind .end-ing the IRQ line depends on archs. Sometimes end == enable (EOI was sent on .ack step), while in other cases end == send_EOI [+ re-enabling]. But anyway, all ISRs are running with a given IRQ line disabled. Supported values : HANDLED, UNHANDLED, PROPAGATE. nucleus :: xnintr_irq_handler() { ret = 0; ... for each isr in isr_list[ IRQ ] { temp = isr-handler(); if (temp ret) ret = temp; } if (ret == PROPAGATE) { // quite dengerous with shared interrupts, be sure you understand // what you are doing! xnarch_chain_irq(irq); // will be .end-ed in Linux domain } else { // HANDLED or UNHANDLED xnarch_end_irq(); } ... } ENABLE or NOENABLE is missing? Nop. let's say we have 2 rt-ISRs : isr1 : HANDLED isr2 : HANDLED + WISH WISH == I want the IRQ line remain disabled until later (e.g. bottom half in rt_task will enable it) How may isr2 express this WISH? Simply, xnarch_irq_disable/enable() should support an internal counter that allows them to be called in a nested way. So e.g. if there are 2 consecutive calls to disable_irq(), then 2 calls to enable_irq() are needed to really enable the IRQ line. This way, the WISH is only about directly calling xnarch_irq_disable() in isr2 and there is no need in ENABLE or NOENABLE flags. This way, PROPAGATE really means NOEND - the IRQ will be .end-ed in Linux domain; while WISH==NOENABLE - has nothing to do with sending EOI, but only with enabling/disabling the given IRQ line. Yes, if one isr (or a few) defers the IRQ line enabling until later, it will affect all others ISR = all interrupts are temporary not accepted on this line. This scenario is possible under Linux, but should be used with even more care in real-time system. At least, this way HANDLED_NOENABLE case works and doesn't lead to lost interrupts on some archs. Moreover, it avoids the need for ENABLE flag even in non-shared interrupt case. Lokks clean enough to me, i.e. no objections... -- Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: For RTDM I'm now almost determined to rework the API in way that only HANDLED/UNHANDLED (or what ever their names will be) get exported, any additional guru features will remain excluded as long as we have no clean usage policy for them. Good. Then let's go for HANDLED, UNHANDLED - we may consider them even as 2 scalar values + NOENABLE, CHAINED - additional bits. They are not encouraged to be used with shared interrupts (explained in docs + debug messages when XENO_OPT_DEBUG is on). Any ISR on the shared irq line should understand that it's just one among the equals. That said, it should not do anything that may affect other ISRs and not require any special treatment (like CHAINED or NOENABLE). If it wants it indeed, then don't declare itself as SHARED. We may come back to the topic about possible return values of ISRs a bit later maybe having got more feedback (hm.. hopefully) on shared irq support. But the later one is not only about enabling the line, but on some archs - about .end-ing it too (sending EOI). And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e. EOI should always be sent from xnintr_shirq_handler(). But the one returning HANDLED_NOENABLE is likely to leave the interrupt asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE). I guess this is what Dmitry meant: explicitly call disable() if one or more ISRs returned NOENABLE - at least on archs where end != enable. Will this work? We could then keep on using the existing IRQ-enable API from bottom-half IRQ tasks. Almost. Let's consider the following only as anorther way of doing some things; I don't propose to implement it, it's just to illustrate my thoughts. So one may simply ski-skip-skip it :o) Let's keep in mind that what is behind .end-ing the IRQ line depends on archs. Sometimes end == enable (EOI was sent on .ack step), while in other cases end == send_EOI [+ re-enabling]. But anyway, all ISRs are running with a given IRQ line disabled. Supported values : HANDLED, UNHANDLED, PROPAGATE. nucleus :: xnintr_irq_handler() { ret = 0; ... for each isr in isr_list[ IRQ ] { temp = isr-handler(); if (temp ret) ret = temp; } if (ret == PROPAGATE) { // quite dengerous with shared interrupts, be sure you understand // what you are doing! xnarch_chain_irq(irq); // will be .end-ed in Linux domain } else { // HANDLED or UNHANDLED xnarch_end_irq(); } ... } ENABLE or NOENABLE is missing? Nop. let's say we have 2 rt-ISRs : isr1 : HANDLED isr2 : HANDLED + WISH WISH == I want the IRQ line remain disabled until later (e.g. bottom half in rt_task will enable it) How may isr2 express this WISH? Simply, xnarch_irq_disable/enable() should support an internal counter that allows them to be called in a nested way. So e.g. if there are 2 consecutive calls to disable_irq(), then 2 calls to enable_irq() are needed to really enable the IRQ line. This way, the WISH is only about directly calling xnarch_irq_disable() in isr2 and there is no need in ENABLE or NOENABLE flags. This way, PROPAGATE really means NOEND - the IRQ will be .end-ed in Linux domain; while WISH==NOENABLE - has nothing to do with sending EOI, but only with enabling/disabling the given IRQ line. Yes, if one isr (or a few) defers the IRQ line enabling until later, it will affect all others ISR = all interrupts are temporary not accepted on this line. This scenario is possible under Linux, but should be used with even more care in real-time system. At least, this way HANDLED_NOENABLE case works and doesn't lead to lost interrupts on some archs. Moreover, it avoids the need for ENABLE flag even in non-shared interrupt case. Lokks clean enough to me, i.e. no objections... -- Anders
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: N.B. Amongst other things, some thoughts about CHAINED with shared interrupts. On 20/02/06, *Anders Blomdell* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: A number of questions arise: 1. What happens if one of the shared handlers leaves the interrupt asserted, returns NOENABLE|HANDLED and another return only HANDLED? 2. What happens if one returns PROPAGATE and another returns HANDLED? Yep, each ISR may return a different value and all of them are accumulated in the s variable ( s |= intr-isr(intr); ). So the loop may end up with s which contains all of the possible bits: (e.g. isr1 - HANDLED | ENABLE isr2 - HANDLED (don't want the irq to be enabled) isr3 - CHAINED ) s = HANDLED | ENABLE | CHAINED; Then CHAINED will be ignored because of the following code : +if (s XN_ISR_ENABLE) + xnarch_end_irq(irq); +else if (s XN_ISR_CHAINED)(*) + xnarch_chain_irq(irq); Which is the worst way possible of prioritizing them, if a Linux interrupt is active when we get there with ENABLE|CHAINED, interrupts will be enabled with the Linux interrupt still asserted - the IRQ-handlers will be called once more, probably returning ENABLE|CHAINED again - infinite loop... the current code in the CVS doen not contain else in (*), so that ENABLE | CHAINED is possible, though it's a wrong combination. This said, we suppose that one knows what he is doing. In the case of a single ISR per line, it's not that difficult to achieve. But if there are a few ISRs, then one should analize and take into account all possible return values of all the ISRs, as each of them may affect others (e.g. if one returns CHAINED when another - HANDLED | ENABLE). Which is somewhat contrary to the concept of shared interrupts, if we have to take care of the global picture, why make them shared in the first place? (I like the concept of shared interrupts, but it is important that the framework gives a separation of concerns) So my feeling is that CHAINED should not be used by drivers which registered their ISRs as SHARED. Well, CHAINED should not be used by drivers which return ENABLE (and are of course hence incompatible with true realtime IRQ's) Moreover, I actually see the only scenario of CHAINED (I provided it before) : all ISRs in the primary domain have reported UNHANDLED = nucleus propagates the interrupt down the pipeline with xnacrh_chain_irq(). This call actually returns 1 upon successful propagation (some domain down the pipeline was interested in this irq) and 0 otherwise. Upon 0, this is a spurious irq (none of domains was interested in its handling). ok, let's suppose now : we have 2 ISRs on the same shared line : isr1 : HANDLED (will be enabled by rt task. Note, rt task must call xnarch_end_irq() and not just xnarch_enable_irq()! ) isr2 : CHAINED So HANDLED | CHAINED is ok for the single ISR on the line, but it may lead to HANDLED | CHAINED | ENABLE in a case of the shared line. rt task that works jointly with isr1 just calls xnarch_end_irq() at some moment of time and some ISR in the linux domain does the same later = the line is .end-ed 2 times. ISR should never return CHAINED as to indicate _only_ that it is not interested in this irq, but ~HANDLED or NOINT (if we'll support it) instead. If the ISR nevertheless wants to propagate the IRQ to the Linux domain _explicitly_, it _must not_ register itself as SHARED, i.e. it _must_ be the only ISR on this line, otherwise that may lead to the IRQ line being .end-ed twice (lost interrupts in some cases). #define UNHANDLED 0 #define HANDLED_ENABLE 1 #define HANDLED_NOENABLE 2 #define PROPAGATE 3 Yep, I'd agree with you. Moreover, PROPAGATE should not be used for shared interrupts. My feeling is that it should be considered an error to attach a RT IRQ handler to a line that has a Linux IRQ handler (this should be possible to check, since /proc/interrupts contains the relevant information), unless a Linux IRQ-mask function is installed. This IRQ-mask function should the be called: 1. each time domains are switched 2. each time an interrupt is generated The IRQ-mask function should look something like: unsigned int rt_irq_mask(struct ipipe_domain *ipd, unsigned int irq) { int result = 0; static int enabled = true; int enable = enabled; if (irq = 0) { // Interrupt has occured, we are about to run IRQ handlers if (disable_early) { enable = false; } if (for_linux(irq)) { result = XN_ISR_CHAINED; } } else if (ipd == ipipe_root_domain) { // Entering Linux enable = true; } else { // Other doamin, block linux interrupts enable = false; } if (enable != enabled) { enabled = enable if (enable) { // Enable Linux interrupts by unmasking appropriate // device registers (and possibly entire IRQ's) } else { // Disable Linux interrupts
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: Good point, leaves us with 2 possible return values for shared handlers: HANDLED NOT_HANDLED i.e. shared handlers should never defer the end'ing of the interrupt (which makes sense, since this would affect the other [shared] handlers). HANDLED_NOEBNABLE could be supported too. Yes, but it breaks decoupling between shared handlers; interrupts will be deferred for all [shared] handlers until it is properly ended. There would be no need in reenventing a wheel, just do it the way Linux does it. But it's about some additional re-designing of the current codebase (e.g. nested calling for irq_enable/disable()) I'm not sure we do need it for something else rather than irq sharing code but it affects the rest of the code. And we have a kind of wrong concept : XN_ISR_ENABLE (or NOENABLE) corresponds to xnarch_end_irq(). Agree But the later one is not only about enabling the line, but on some archs - about .end-ing it too (sending EOI). And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e. EOI should always be sent from xnintr_shirq_handler(). But the one returning HANDLED_NOENABLE is likely to leave the interrupt asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE). Yes, should. And this should is best be handled by a) Documenting the potential conflict in the same place when describing the return values b) Placing some debug warning in the nucleus' IRQ trampoline function to bail out (once per line) when running into such situation But I'm against any further runtime restrictions, especially as most drivers will never return anything else than NOT_HANDLED or HANDLED. Actually, this was the reason why I tried to separate the NO_ENABLE and PROPAGATE features as *additional* bits from HANDLED and NOT_HANDLED/UNHANDLED/NOINT. But I acknowledge that having all valid bit combination present as constants can be more helpful for the user. We just have to draw some line between the standard values and the additional gurus return codes (documentation: don't use NO_ENABLE or PROPAGATE unless you understand their side-effects and pitfalls precisely). I agree with you on PROPAGATE case, but NO_ENABLE that, as pointed out above, should (IMHO and at least, in theory) only mean keep the IRQ line disabled (and have nothing to do with .end-ing the IRQ line) would be better supported. But this is, again as was pointed out above, about some redesigning of the current code = some overhead that likely affects non-shared aware code too. So on one hand, I'm ready to re-work code with : HANDLED and UNHANDLED (or NOINT) + 2 additional bits : NOENABLE and PROPAGATE. and document it like you suggested don't use NO_ENABLE or PROPAGATE with shared interrupts unless you understand their side-effects and pitfalls precisely; on the other hand, I'd say that I'm almost ready to vote against merging the irq sharing code at all as it looks to be a rather partial solution. I vote for (even though I'm the one who complains the most), BUT I think it is important to keep the rules for using it simple (that's why I worry about the plethora of return-flags). -- Regards Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Jan Kiszka wrote: Anders Blomdell wrote: Dmitry Adamushko wrote: Good point, leaves us with 2 possible return values for shared handlers: HANDLED NOT_HANDLED i.e. shared handlers should never defer the end'ing of the interrupt (which makes sense, since this would affect the other [shared] handlers). HANDLED_NOEBNABLE could be supported too. Yes, but it breaks decoupling between shared handlers; interrupts will be deferred for all [shared] handlers until it is properly ended. There would be no need in reenventing a wheel, just do it the way Linux does it. But it's about some additional re-designing of the current codebase (e.g. nested calling for irq_enable/disable()) I'm not sure we do need it for something else rather than irq sharing code but it affects the rest of the code. And we have a kind of wrong concept : XN_ISR_ENABLE (or NOENABLE) corresponds to xnarch_end_irq(). Agree But the later one is not only about enabling the line, but on some archs - about .end-ing it too (sending EOI). And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e. EOI should always be sent from xnintr_shirq_handler(). But the one returning HANDLED_NOENABLE is likely to leave the interrupt asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE). I guess this is what Dmitry meant: explicitly call disable() if one or more ISRs returned NOENABLE - at least on archs where end != enable. Will this work? We could then keep on using the existing IRQ-enable API from bottom-half IRQ tasks. But what about NOENABLE+PROPAGATE? Will this special case still mean NOT to end the ISR (as Linux will do)? Bah, we are running in circles, I'm afraid. I think it's better to call NOENABLE NOEOI, which will indeed mean to not end this line (as it is the current situation anyway, isn't it?), and leave the user with what (s)he can do with such a feature. We found out that there are trillions of ways to shoot oneself into the foot with NOENABLE and PROPAGATE, and we cannot prevent most of them. So let's stop trying, at least for this patch! Yes, should. And this should is best be handled by a) Documenting the potential conflict in the same place when describing the return values b) Placing some debug warning in the nucleus' IRQ trampoline function to bail out (once per line) when running into such situation But I'm against any further runtime restrictions, especially as most drivers will never return anything else than NOT_HANDLED or HANDLED. Actually, this was the reason why I tried to separate the NO_ENABLE and PROPAGATE features as *additional* bits from HANDLED and NOT_HANDLED/UNHANDLED/NOINT. But I acknowledge that having all valid bit combination present as constants can be more helpful for the user. We just have to draw some line between the standard values and the additional gurus return codes (documentation: don't use NO_ENABLE or PROPAGATE unless you understand their side-effects and pitfalls precisely). I agree with you on PROPAGATE case, but NO_ENABLE that, as pointed out above, should (IMHO and at least, in theory) only mean keep the IRQ line disabled (and have nothing to do with .end-ing the IRQ line) would be better supported. But this is, again as was pointed out above, about some redesigning of the current code = some overhead that likely affects non-shared aware code too. So on one hand, I'm ready to re-work code with : HANDLED and UNHANDLED (or NOINT) + 2 additional bits : NOENABLE and PROPAGATE. and document it like you suggested don't use NO_ENABLE or PROPAGATE with shared interrupts unless you understand their side-effects and pitfalls precisely; on the other hand, I'd say that I'm almost ready to vote against merging the irq sharing code at all as it looks to be a rather partial solution. I vote for (even though I'm the one who complains the most), BUT I think it is important to keep the rules for using it simple (that's why I worry about the plethora of return-flags). And I'm with you here: My original proposal (2 base-states + 2 bits) created 8 expressible states while your version only knows 4 states - those which make sense most (and 2 of them are still the ones recommand for the masses). For RTDM I'm now almost determined to rework the API in way that only HANDLED/UNHANDLED (or what ever their names will be) get exported, any additional guru features will remain excluded as long as we have no clean usage policy for them. You have my vote for this. -- Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: On 21/02/06, *Anders Blomdell* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dmitry Adamushko wrote: N.B. Amongst other things, some thoughts about CHAINED with shared interrupts. On 20/02/06, *Anders Blomdell* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: A number of questions arise: 1. What happens if one of the shared handlers leaves the interrupt asserted, returns NOENABLE|HANDLED and another return only HANDLED? 2. What happens if one returns PROPAGATE and another returns HANDLED? Yep, each ISR may return a different value and all of them are accumulated in the s variable ( s |= intr-isr(intr); ). So the loop may end up with s which contains all of the possible bits: (e.g. isr1 - HANDLED | ENABLE isr2 - HANDLED (don't want the irq to be enabled) isr3 - CHAINED ) s = HANDLED | ENABLE | CHAINED; Then CHAINED will be ignored because of the following code : +if (s XN_ISR_ENABLE) + xnarch_end_irq(irq); +else if (s XN_ISR_CHAINED)(*) + xnarch_chain_irq(irq); Which is the worst way possible of prioritizing them, if a Linux interrupt is active when we get there with ENABLE|CHAINED, interrupts will be enabled with the Linux interrupt still asserted - the IRQ-handlers will be called once more, probably returning ENABLE|CHAINED again - infinite loop... the current code in the CVS doen not contain else in (*), so that ENABLE | CHAINED is possible, though it's a wrong combination. This said, we suppose that one knows what he is doing. In the case of a single ISR per line, it's not that difficult to achieve. But if there are a few ISRs, then one should analize and take into account all possible return values of all the ISRs, as each of them may affect others (e.g. if one returns CHAINED when another - HANDLED | ENABLE). Which is somewhat contrary to the concept of shared interrupts, if we have to take care of the global picture, why make them shared in the first place? (I like the concept of shared interrupts, but it is important that the framework gives a separation of concerns) Unfortunately, it looks to me that the current picture (even with your scalar values) requires from the user who develops a given IRQ to take into account the possible return values of other ISRs. As I pointed out, the situation when 2 ISRs return HANDLED_NOENABLE may lead to problems on some archs. Good point, leaves us with 2 possible return values for shared handlers: HANDLED NOT_HANDLED i.e. shared handlers should never defer the end'ing of the interrupt (which makes sense, since this would affect the other [shared] handlers). -- Anders
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: N.B. Amongst other things, some thoughts about CHAINED with shared interrupts. On 20/02/06, *Anders Blomdell* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: A number of questions arise: 1. What happens if one of the shared handlers leaves the interrupt asserted, returns NOENABLE|HANDLED and another return only HANDLED? 2. What happens if one returns PROPAGATE and another returns HANDLED? Yep, each ISR may return a different value and all of them are accumulated in the s variable ( s |= intr-isr(intr); ). So the loop may end up with s which contains all of the possible bits: (e.g. isr1 - HANDLED | ENABLE isr2 - HANDLED (don't want the irq to be enabled) isr3 - CHAINED ) s = HANDLED | ENABLE | CHAINED; Then CHAINED will be ignored because of the following code : +if (s XN_ISR_ENABLE) + xnarch_end_irq(irq); +else if (s XN_ISR_CHAINED)(*) + xnarch_chain_irq(irq); Which is the worst way possible of prioritizing them, if a Linux interrupt is active when we get there with ENABLE|CHAINED, interrupts will be enabled with the Linux interrupt still asserted - the IRQ-handlers will be called once more, probably returning ENABLE|CHAINED again - infinite loop... the current code in the CVS doen not contain else in (*), so that ENABLE | CHAINED is possible, though it's a wrong combination. This said, we suppose that one knows what he is doing. In the case of a single ISR per line, it's not that difficult to achieve. But if there are a few ISRs, then one should analize and take into account all possible return values of all the ISRs, as each of them may affect others (e.g. if one returns CHAINED when another - HANDLED | ENABLE). Which is somewhat contrary to the concept of shared interrupts, if we have to take care of the global picture, why make them shared in the first place? (I like the concept of shared interrupts, but it is important that the framework gives a separation of concerns) So my feeling is that CHAINED should not be used by drivers which registered their ISRs as SHARED. Well, CHAINED should not be used by drivers which return ENABLE (and are of course hence incompatible with true realtime IRQ's) Moreover, I actually see the only scenario of CHAINED (I provided it before) : all ISRs in the primary domain have reported UNHANDLED = nucleus propagates the interrupt down the pipeline with xnacrh_chain_irq(). This call actually returns 1 upon successful propagation (some domain down the pipeline was interested in this irq) and 0 otherwise. Upon 0, this is a spurious irq (none of domains was interested in its handling). ok, let's suppose now : we have 2 ISRs on the same shared line : isr1 : HANDLED (will be enabled by rt task. Note, rt task must call xnarch_end_irq() and not just xnarch_enable_irq()! ) isr2 : CHAINED So HANDLED | CHAINED is ok for the single ISR on the line, but it may lead to HANDLED | CHAINED | ENABLE in a case of the shared line. rt task that works jointly with isr1 just calls xnarch_end_irq() at some moment of time and some ISR in the linux domain does the same later = the line is .end-ed 2 times. ISR should never return CHAINED as to indicate _only_ that it is not interested in this irq, but ~HANDLED or NOINT (if we'll support it) instead. If the ISR nevertheless wants to propagate the IRQ to the Linux domain _explicitly_, it _must not_ register itself as SHARED, i.e. it _must_ be the only ISR on this line, otherwise that may lead to the IRQ line being .end-ed twice (lost interrupts in some cases). #define UNHANDLED 0 #define HANDLED_ENABLE 1 #define HANDLED_NOENABLE 2 #define PROPAGATE 3 Yep, I'd agree with you. Moreover, PROPAGATE should not be used for shared interrupts. My feeling is that it should be considered an error to attach a RT IRQ handler to a line that has a Linux IRQ handler (this should be possible to check, since /proc/interrupts contains the relevant information), unless a Linux IRQ-mask function is installed. This IRQ-mask function should the be called: 1. each time domains are switched 2. each time an interrupt is generated The IRQ-mask function should look something like: unsigned int rt_irq_mask(struct ipipe_domain *ipd, unsigned int irq) { int result = 0; static int enabled = true; int enable = enabled; if (irq = 0) { // Interrupt has occured, we are about to run IRQ handlers if (disable_early) { enable = false; } if (for_linux(irq)) { result = XN_ISR_CHAINED; } } else if (ipd == ipipe_root_domain) { // Entering Linux enable = true; } else { // Other doamin, block linux interrupts enable = false; } if (enable != enabled) { enabled = enable if (enable) { // Enable Linux interrupts by unmasking appropriate // device registers (and possibly entire IRQ's) } else { // Disable Linux interrupts
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Dmitry Adamushko wrote: Good point, leaves us with 2 possible return values for shared handlers: HANDLED NOT_HANDLED i.e. shared handlers should never defer the end'ing of the interrupt (which makes sense, since this would affect the other [shared] handlers). HANDLED_NOEBNABLE could be supported too. Yes, but it breaks decoupling between shared handlers; interrupts will be deferred for all [shared] handlers until it is properly ended. There would be no need in reenventing a wheel, just do it the way Linux does it. But it's about some additional re-designing of the current codebase (e.g. nested calling for irq_enable/disable()) I'm not sure we do need it for something else rather than irq sharing code but it affects the rest of the code. And we have a kind of wrong concept : XN_ISR_ENABLE (or NOENABLE) corresponds to xnarch_end_irq(). Agree But the later one is not only about enabling the line, but on some archs - about .end-ing it too (sending EOI). And to support HANDLED_NOENABLE properly, those 2 have to be decoupled, i.e. EOI should always be sent from xnintr_shirq_handler(). But the one returning HANDLED_NOENABLE is likely to leave the interrupt asserted, hence we can't EOI at this point (unless NO_ENABLE means DISABLE). Yes, should. And this should is best be handled by a) Documenting the potential conflict in the same place when describing the return values b) Placing some debug warning in the nucleus' IRQ trampoline function to bail out (once per line) when running into such situation But I'm against any further runtime restrictions, especially as most drivers will never return anything else than NOT_HANDLED or HANDLED. Actually, this was the reason why I tried to separate the NO_ENABLE and PROPAGATE features as *additional* bits from HANDLED and NOT_HANDLED/UNHANDLED/NOINT. But I acknowledge that having all valid bit combination present as constants can be more helpful for the user. We just have to draw some line between the standard values and the additional gurus return codes (documentation: don't use NO_ENABLE or PROPAGATE unless you understand their side-effects and pitfalls precisely). I agree with you on PROPAGATE case, but NO_ENABLE that, as pointed out above, should (IMHO and at least, in theory) only mean keep the IRQ line disabled (and have nothing to do with .end-ing the IRQ line) would be better supported. But this is, again as was pointed out above, about some redesigning of the current code = some overhead that likely affects non-shared aware code too. So on one hand, I'm ready to re-work code with : HANDLED and UNHANDLED (or NOINT) + 2 additional bits : NOENABLE and PROPAGATE. and document it like you suggested don't use NO_ENABLE or PROPAGATE with shared interrupts unless you understand their side-effects and pitfalls precisely; on the other hand, I'd say that I'm almost ready to vote against merging the irq sharing code at all as it looks to be a rather partial solution. I vote for (even though I'm the one who complains the most), BUT I think it is important to keep the rules for using it simple (that's why I worry about the plethora of return-flags). -- Regards Anders
Re: [Xenomai-core] Re: [PATCH] Shared interrupts (ready to merge)
Jan Kiszka wrote: Hi Dmitry, Dmitry Adamushko wrote: Hi Jan, let's make yet another revision of the bits : new XN_ISR_HANDLED == old XN_ISR_HANDLED + old XN_ISR_NO_ENABLE ok. new XN_ISR_NOENABLE == ~ old XN_ISR_ENABLE ok. new XN_ISR_PROPAGATE == XN_ISR_CHAINED ok. Just to make sure that you understand my weird ideas: each of the three new XN_ISR_xxx above should be encoded with an individual bit new XN_ISR_NOINT == ? does it suppose the interrupt line to be .end-ed (enabled) and irq not to be propagated? Should be so, I guess, if it's different from 5). Then nucleus ignores implicit IRQ enable for 5) as well as for 3). Do we really need that NOINT then, as it seems to be the same as ~HANDLED? or NOINT == 0 and then it's a scalar value, not a bit. So one may consider HANDLED == 1 and NOINT == 0 as really scalar values and NOENABLE and PROPAGATE as additional bits (used only if needed). My idea is to urge the user specifying one of the base return types (HANDLED or NOINT) + any of the two additional bits (NOENABLE and PROPAGATE). For correct drivers NOINT could be 0 indeed, but to check that the user picked a new constant we may want to set NOINT != 0. With the old API return 0 expressed HANDLED + ~ENABLE for the old API. With the new one the user signals no interest and the nucleus may raise a warning that a spurious IRQ occurred. So I would add a debug bit for NOINT here to optionally (on OPT_XENO_DEBUG) detect old-style usage (return 0). Moreover, we gain freedom to move bits in the future when every state is encoded via constants. Or am I too paranoid here? After reading the above discussion (of which I understand very little), and looking at (what I believe to be) the relevant code: +intr = shirq-handlers; + +while (intr) +{ +s |= intr-isr(intr); +++intr-hits; +intr = intr-next; +} +xnintr_shirq_unlock(shirq); + +--sched-inesting; + +if (s XN_ISR_ENABLE) + xnarch_end_irq(irq); +else if (s XN_ISR_CHAINED) + xnarch_chain_irq(irq); A number of questions arise: 1. What happens if one of the shared handlers leaves the interrupt asserted, returns NOENABLE|HANDLED and another return only HANDLED? 2. What happens if one returns PROPAGATE and another returns HANDLED? As far as I can tell, after all RT handlers havve run, the following scenarios are possible: 1. The interrupt is deasserted (i.e. it was a RT interrupt) 2. The interrupt is still asserted, it will be deasserted later by some RT task (i.e. it was a RT interrupt) 3. The interrupt is still asserted and will be deasserted by the Linux IRQ handler. IMHO that leads to the conclusion that the IRQ handlers should return a scalar #define UNHANDLED 0 #define HANDLED_ENABLE 1 #define HANDLED_NOENABLE 2 #define PROPAGATE 3 and the loop should be s = UNHANDLED while (intr) { tmp = intr-isr(intr); if (tmp s) { s = tmp; } intr = intr-next; } if (s == PROPAGATE) { xnarch_chain_irq(irq); } else if (s == HANDLED_ENABLE) { xnarch_end_irq(irq); } To be really honest, I think that PROPAGATE should be excluded from the RT IRQ-handlers, since with the current scheme all RT-handlers has to test if the IRQ was a Linux interrupt (otherwise the system will only work when the handler that returns PROPAGATE is installed) -- Anders
Re: [Xenomai-core] More on Shared interrupts
Jan Kiszka wrote: Anders Blomdell wrote: For the last few days, I have tried to figure out a good way to share interrupts between RT and non-RT domains. This has included looking through Dmitry's patch, correcting bugs and testing what is possible in my specific case. I'll therefore try to summarize at least a few of my thoughts. 1. When looking through Dmitry's patch I get the impression that the iack handler has very little to do with each interrupt (the test 'prev-iack != intr-iack' is a dead giveaway), but is more of a domain-specific function (or perhaps even just a placeholder for the hijacked Linux ack-function). 2. Somewhat inspired by the figure in Life with Adeos, I have identified the following cases: irq K | --- | ---o| // Linux only ... irq L | ---o| | // RT-only ... irq M | ---o--- | ---o| // Shared between domains ... irq N | ---o---o--- | | // Shared inside single domain ... irq O | ---o---o--- | ---o| // Shared between and inside single domain Xenomai currently handles the K L cases, Dmitrys patch addresses the N case, with edge triggered interrupts the M (and O after Dmitry's patch) case(s) might be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE from the interrupt handler, for level triggered interrupt the M and O cases can't be handled. I guess you mean it the other way around: for the edge-triggered cross-domain case we would actually have to loop over both the RT and the Linux handlers until we are sure, that the IRQ line was released once. I obviously has misunderstood edge triggered :-( Luckily, I never saw such a scenario which were unavoidable (it hits you with ISA hardware which tend to have nice IRQ jumpers or other means - it's just that you often cannot divide several controllers on the same extension card IRQ-wise apart). If one looks more closely at the K case (Linux only interrupt), it works by when an interrupt occurs, the call to irq_end is postponed until the Linux interrupt handler has run, i.e. further interrupts are disabled. This can be seen as a lazy version of Philippe's idea of disabling all non-RT interrupts until the RT-domain is idle, i.e. the interrupt is disabled only if it indeed occurs. If this idea should be generalized to the M (and O) case(s), one can't rely on postponing the irq_end call (since the interrupt is still needed in the RT-domain), but has to rely on some function that disables all non-RT hardware that generates interrupts on that irq-line; such a function naturally has to have intimate knowledge of all hardware that can generate interrupts in order to be able to disable those interrupt sources that are non-RT. If we then take Jan's observation about the many (Linux-only) interrupts present in an ordinary PC and add it to Philippe's idea of disabling all non-RT interrupts while executing in the RT-domain, I think that the following is a workable (and fairly efficient) way of handling this: Add hardware dependent enable/disable functions, where the enable is called just before normal execution in a domain starts (i.e. when playing back interrupts, the disable is still in effect), and disable is called when normal domain execution end. This does effectively handle the K case above, with the added benefit that NO non-RT interrupts will occur during RT execution. In the 8259 case, the disable function could look something like: domain_irq_disable(uint irqmask) { if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } If we should extend this to handle the M (and O) case(s), the disable function could look like: domain_irq_disable(uint irqmask, shared_irq_t *shared[]) { int i; for (i = 0 ; i MAX_IRQ ; i++) { if (shared[i]) { shared_irq_t *next = shared[i]; irqmask = ~(1i); while (next) { next-disable(); next = next-next; } This obviously means that all non-RT IRQ handlers sharing a line with the RT domain would have to be registered in that shared[]-list. This gets close to my old suggestion. Just raises the question how to organise these interface, both on the RT and the Linux side. } } if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } An obvious optimization of the above scheme, is to never call the disable (or enable) function for the RT-domain, since there all interrupt processing is protected by the hardware. Another point is to avoid that looping over disable handlers for IRQs of the K case. Otherwise, too many device-specific disable handlers had to be implemented even if only a single Linux device hogs a RT IRQ. You only have to spin over those IRQ that are actually shared across domains (probably just a few in most
Re: [Xenomai-core] More on Shared interrupts
Jan Kiszka wrote: Anders Blomdell wrote: For the last few days, I have tried to figure out a good way to share interrupts between RT and non-RT domains. This has included looking through Dmitry's patch, correcting bugs and testing what is possible in my specific case. I'll therefore try to summarize at least a few of my thoughts. 1. When looking through Dmitry's patch I get the impression that the iack handler has very little to do with each interrupt (the test 'prev-iack != intr-iack' is a dead giveaway), but is more of a domain-specific function (or perhaps even just a placeholder for the hijacked Linux ack-function). 2. Somewhat inspired by the figure in Life with Adeos, I have identified the following cases: irq K | --- | ---o| // Linux only ... irq L | ---o| | // RT-only ... irq M | ---o--- | ---o| // Shared between domains ... irq N | ---o---o--- | | // Shared inside single domain ... irq O | ---o---o--- | ---o| // Shared between and inside single domain Xenomai currently handles the K L cases, Dmitrys patch addresses the N case, with edge triggered interrupts the M (and O after Dmitry's patch) case(s) might be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE from the interrupt handler, for level triggered interrupt the M and O cases can't be handled. I guess you mean it the other way around: for the edge-triggered cross-domain case we would actually have to loop over both the RT and the Linux handlers until we are sure, that the IRQ line was released once. I obviously has misunderstood edge triggered :-( Luckily, I never saw such a scenario which were unavoidable (it hits you with ISA hardware which tend to have nice IRQ jumpers or other means - it's just that you often cannot divide several controllers on the same extension card IRQ-wise apart). If one looks more closely at the K case (Linux only interrupt), it works by when an interrupt occurs, the call to irq_end is postponed until the Linux interrupt handler has run, i.e. further interrupts are disabled. This can be seen as a lazy version of Philippe's idea of disabling all non-RT interrupts until the RT-domain is idle, i.e. the interrupt is disabled only if it indeed occurs. If this idea should be generalized to the M (and O) case(s), one can't rely on postponing the irq_end call (since the interrupt is still needed in the RT-domain), but has to rely on some function that disables all non-RT hardware that generates interrupts on that irq-line; such a function naturally has to have intimate knowledge of all hardware that can generate interrupts in order to be able to disable those interrupt sources that are non-RT. If we then take Jan's observation about the many (Linux-only) interrupts present in an ordinary PC and add it to Philippe's idea of disabling all non-RT interrupts while executing in the RT-domain, I think that the following is a workable (and fairly efficient) way of handling this: Add hardware dependent enable/disable functions, where the enable is called just before normal execution in a domain starts (i.e. when playing back interrupts, the disable is still in effect), and disable is called when normal domain execution end. This does effectively handle the K case above, with the added benefit that NO non-RT interrupts will occur during RT execution. In the 8259 case, the disable function could look something like: domain_irq_disable(uint irqmask) { if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } If we should extend this to handle the M (and O) case(s), the disable function could look like: domain_irq_disable(uint irqmask, shared_irq_t *shared[]) { int i; for (i = 0 ; i MAX_IRQ ; i++) { if (shared[i]) { shared_irq_t *next = shared[i]; irqmask = ~(1i); while (next) { next-disable(); next = next-next; } This obviously means that all non-RT IRQ handlers sharing a line with the RT domain would have to be registered in that shared[]-list. This gets close to my old suggestion. Just raises the question how to organise these interface, both on the RT and the Linux side. } } if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } An obvious optimization of the above scheme, is to never call the disable (or enable) function for the RT-domain, since there all interrupt processing is protected by the hardware. Another point is to avoid that looping over disable handlers for IRQs of the K case. Otherwise, too many device-specific disable handlers had to be implemented even if only a single Linux device hogs a RT IRQ. You only have to spin over those IRQ that are actually shared across domains (probably just a few in most
Re: [Xenomai-core] [Combo-PATCH] Shared interrupts (final)
Philippe Gerum wrote: Jan Kiszka wrote: Wolfgang Grandegger wrote: Hello, Dmitry Adamushko wrote: Hi, this is the final set of patches against the SVN trunk of 2006-02-03. It addresses mostly remarks concerning naming (XN_ISR_ISA - XN_ISR_EDGE), a few cleanups and updated comments. Functionally, the support for shared interrupts (a few flags) to the Not directly your fault: the increasing number of return flags for IRQ handlers makes me worry that they are used correctly. I can figure out what they mean (not yet that clearly from the docs), but does someone else understand all this: - RT_INTR_HANDLED ISR says it has handled the IRQ, and does not want any propagation to take place down the pipeline. IOW, the IRQ processing stops there. This says that the interrupt will be -end'ed at some later time (perhaps in the interrupt handler task) - RT_INTR_CHAINED ISR says it wants the IRQ to be propagated down the pipeline. Nothing is said about the fact that the last ISR did or did not handle the IRQ locally; this is irrelevant. This says that the interrupt will eventually be -end'ed by some later stage in the pipeline. - RT_INTR_ENABLE ISR requests the interrupt dispatcher to re-enable the IRQ line upon return (cumulable with HANDLED/CHAINED). This says that the interrupt will be -end'ed when this interrupt handler returns. - RT_INTR_NOINT This new one comes from Dmitry's patch for shared IRQ support AFAICS. It would mean to continue processing the chain of handlers because the last ISR invoked was not concerned by the outstanding IRQ. Sounds like RT_INTR_CHAINED, except that it's for the current pipeline stage? Now for the quiz question (powerpc arch): 1. Assume an edge triggered interrupt 2. The RT-handler returns RT_INTR_ENABLE | RT_INTR_ENABLE (i.e. shared interrupt, but no problem since it's edge-triggered) 3. Interrupt gets -end'ed right after RT-handler has returned 4. Linux interrupt eventually handler starts its -end() handler: local_irq_save_hw(flags); if (!(irq_desc[irq].status (IRQ_DISABLED | IRQ_INPROGRESS))) ipipe_irq_unlock(irq); // Next interrupt occurs here! __ipipe_std_irq_dtype[irq].end(irq); local_irq_restore_hw(flags); Wouldn't this lead to a lost interrupt? Or am I overly paranoid? My distinct feeling is that the return value should be a scalar and not a set! ... I would vote for the (already scheduled?) extension to register an optimised IRQ trampoline in case there is actually no sharing taking place. This would also make the if (irq == XNARCH_TIMER_IRQ) path obsolete. I support that. Shared interrupts should be handled properly by Xeno since such - I'd say last resort - configuration could be needed; this said, we should not see this as the rule but rather as the exception, since this is basically required to solve some underlying hw limitations wrt interrupt management, and definitely has a significant cost on processing each shared IRQ wrt determinism. Incidentally, there is an interesting optimization on the project's todo list Is this todo list accessible anywhere? that would allow non-RT interrupts to be masked at IC level when the Xenomai domain is active. We could do that on any arch with civilized interrupt management, and that would prevent any asynchronous diversion from the critical code when Xenomai is running RT tasks (kernel or user-space). Think of this as some hw-controlled interrupt shield. Since this feature requires to be able to individually mask each interrupt source at IC level, there should be no point in sharing fully vectored interrupts in such a configuration anyway. This fact also pleads for having the shared IRQ support as a build-time option. -- Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] More on Shared interrupts
For the last few days, I have tried to figure out a good way to share interrupts between RT and non-RT domains. This has included looking through Dmitry's patch, correcting bugs and testing what is possible in my specific case. I'll therefore try to summarize at least a few of my thoughts. 1. When looking through Dmitry's patch I get the impression that the iack handler has very little to do with each interrupt (the test 'prev-iack != intr-iack' is a dead giveaway), but is more of a domain-specific function (or perhaps even just a placeholder for the hijacked Linux ack-function). 2. Somewhat inspired by the figure in Life with Adeos, I have identified the following cases: irq K | --- | ---o| // Linux only ... irq L | ---o| | // RT-only ... irq M | ---o--- | ---o| // Shared between domains ... irq N | ---o---o--- | | // Shared inside single domain ... irq O | ---o---o--- | ---o| // Shared between and inside single domain Xenomai currently handles the K L cases, Dmitrys patch addresses the N case, with edge triggered interrupts the M (and O after Dmitry's patch) case(s) might be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE from the interrupt handler, for level triggered interrupt the M and O cases can't be handled. If one looks more closely at the K case (Linux only interrupt), it works by when an interrupt occurs, the call to irq_end is postponed until the Linux interrupt handler has run, i.e. further interrupts are disabled. This can be seen as a lazy version of Philippe's idea of disabling all non-RT interrupts until the RT-domain is idle, i.e. the interrupt is disabled only if it indeed occurs. If this idea should be generalized to the M (and O) case(s), one can't rely on postponing the irq_end call (since the interrupt is still needed in the RT-domain), but has to rely on some function that disables all non-RT hardware that generates interrupts on that irq-line; such a function naturally has to have intimate knowledge of all hardware that can generate interrupts in order to be able to disable those interrupt sources that are non-RT. If we then take Jan's observation about the many (Linux-only) interrupts present in an ordinary PC and add it to Philippe's idea of disabling all non-RT interrupts while executing in the RT-domain, I think that the following is a workable (and fairly efficient) way of handling this: Add hardware dependent enable/disable functions, where the enable is called just before normal execution in a domain starts (i.e. when playing back interrupts, the disable is still in effect), and disable is called when normal domain execution end. This does effectively handle the K case above, with the added benefit that NO non-RT interrupts will occur during RT execution. In the 8259 case, the disable function could look something like: domain_irq_disable(uint irqmask) { if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } If we should extend this to handle the M (and O) case(s), the disable function could look like: domain_irq_disable(uint irqmask, shared_irq_t *shared[]) { int i; for (i = 0 ; i MAX_IRQ ; i++) { if (shared[i]) { shared_irq_t *next = shared[i]; irqmask = ~(1i); while (next) { next-disable(); next = next-next; } } } if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } An obvious optimization of the above scheme, is to never call the disable (or enable) function for the RT-domain, since there all interrupt processing is protected by the hardware. Comments, anyone? -- Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Combo-PATCH] Shared interrupts (final)
Philippe Gerum wrote: Jan Kiszka wrote: Wolfgang Grandegger wrote: Hello, Dmitry Adamushko wrote: Hi, this is the final set of patches against the SVN trunk of 2006-02-03. It addresses mostly remarks concerning naming (XN_ISR_ISA - XN_ISR_EDGE), a few cleanups and updated comments. Functionally, the support for shared interrupts (a few flags) to the Not directly your fault: the increasing number of return flags for IRQ handlers makes me worry that they are used correctly. I can figure out what they mean (not yet that clearly from the docs), but does someone else understand all this: - RT_INTR_HANDLED ISR says it has handled the IRQ, and does not want any propagation to take place down the pipeline. IOW, the IRQ processing stops there. This says that the interrupt will be -end'ed at some later time (perhaps in the interrupt handler task) - RT_INTR_CHAINED ISR says it wants the IRQ to be propagated down the pipeline. Nothing is said about the fact that the last ISR did or did not handle the IRQ locally; this is irrelevant. This says that the interrupt will eventually be -end'ed by some later stage in the pipeline. - RT_INTR_ENABLE ISR requests the interrupt dispatcher to re-enable the IRQ line upon return (cumulable with HANDLED/CHAINED). This says that the interrupt will be -end'ed when this interrupt handler returns. - RT_INTR_NOINT This new one comes from Dmitry's patch for shared IRQ support AFAICS. It would mean to continue processing the chain of handlers because the last ISR invoked was not concerned by the outstanding IRQ. Sounds like RT_INTR_CHAINED, except that it's for the current pipeline stage? Now for the quiz question (powerpc arch): 1. Assume an edge triggered interrupt 2. The RT-handler returns RT_INTR_ENABLE | RT_INTR_ENABLE (i.e. shared interrupt, but no problem since it's edge-triggered) 3. Interrupt gets -end'ed right after RT-handler has returned 4. Linux interrupt eventually handler starts its -end() handler: local_irq_save_hw(flags); if (!(irq_desc[irq].status (IRQ_DISABLED | IRQ_INPROGRESS))) ipipe_irq_unlock(irq); // Next interrupt occurs here! __ipipe_std_irq_dtype[irq].end(irq); local_irq_restore_hw(flags); Wouldn't this lead to a lost interrupt? Or am I overly paranoid? My distinct feeling is that the return value should be a scalar and not a set! ... I would vote for the (already scheduled?) extension to register an optimised IRQ trampoline in case there is actually no sharing taking place. This would also make the if (irq == XNARCH_TIMER_IRQ) path obsolete. I support that. Shared interrupts should be handled properly by Xeno since such - I'd say last resort - configuration could be needed; this said, we should not see this as the rule but rather as the exception, since this is basically required to solve some underlying hw limitations wrt interrupt management, and definitely has a significant cost on processing each shared IRQ wrt determinism. Incidentally, there is an interesting optimization on the project's todo list Is this todo list accessible anywhere? that would allow non-RT interrupts to be masked at IC level when the Xenomai domain is active. We could do that on any arch with civilized interrupt management, and that would prevent any asynchronous diversion from the critical code when Xenomai is running RT tasks (kernel or user-space). Think of this as some hw-controlled interrupt shield. Since this feature requires to be able to individually mask each interrupt source at IC level, there should be no point in sharing fully vectored interrupts in such a configuration anyway. This fact also pleads for having the shared IRQ support as a build-time option. -- Anders Blomdell
[Xenomai-core] More on Shared interrupts
For the last few days, I have tried to figure out a good way to share interrupts between RT and non-RT domains. This has included looking through Dmitry's patch, correcting bugs and testing what is possible in my specific case. I'll therefore try to summarize at least a few of my thoughts. 1. When looking through Dmitry's patch I get the impression that the iack handler has very little to do with each interrupt (the test 'prev-iack != intr-iack' is a dead giveaway), but is more of a domain-specific function (or perhaps even just a placeholder for the hijacked Linux ack-function). 2. Somewhat inspired by the figure in Life with Adeos, I have identified the following cases: irq K | --- | ---o| // Linux only ... irq L | ---o| | // RT-only ... irq M | ---o--- | ---o| // Shared between domains ... irq N | ---o---o--- | | // Shared inside single domain ... irq O | ---o---o--- | ---o| // Shared between and inside single domain Xenomai currently handles the K L cases, Dmitrys patch addresses the N case, with edge triggered interrupts the M (and O after Dmitry's patch) case(s) might be handled by returning RT_INTR_CHAINED | RT_INTR_ENABLE from the interrupt handler, for level triggered interrupt the M and O cases can't be handled. If one looks more closely at the K case (Linux only interrupt), it works by when an interrupt occurs, the call to irq_end is postponed until the Linux interrupt handler has run, i.e. further interrupts are disabled. This can be seen as a lazy version of Philippe's idea of disabling all non-RT interrupts until the RT-domain is idle, i.e. the interrupt is disabled only if it indeed occurs. If this idea should be generalized to the M (and O) case(s), one can't rely on postponing the irq_end call (since the interrupt is still needed in the RT-domain), but has to rely on some function that disables all non-RT hardware that generates interrupts on that irq-line; such a function naturally has to have intimate knowledge of all hardware that can generate interrupts in order to be able to disable those interrupt sources that are non-RT. If we then take Jan's observation about the many (Linux-only) interrupts present in an ordinary PC and add it to Philippe's idea of disabling all non-RT interrupts while executing in the RT-domain, I think that the following is a workable (and fairly efficient) way of handling this: Add hardware dependent enable/disable functions, where the enable is called just before normal execution in a domain starts (i.e. when playing back interrupts, the disable is still in effect), and disable is called when normal domain execution end. This does effectively handle the K case above, with the added benefit that NO non-RT interrupts will occur during RT execution. In the 8259 case, the disable function could look something like: domain_irq_disable(uint irqmask) { if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } If we should extend this to handle the M (and O) case(s), the disable function could look like: domain_irq_disable(uint irqmask, shared_irq_t *shared[]) { int i; for (i = 0 ; i MAX_IRQ ; i++) { if (shared[i]) { shared_irq_t *next = shared[i]; irqmask = ~(1i); while (next) { next-disable(); next = next-next; } } } if (irqmask 0xff00 != 0xff00) { irqmask = ~0x0004; // Cascaded interrupt is still needed outb(irqmask 8, PIC_SLAVE_IMR); } outb(irqmask, PIC_MASTER_IMR); } An obvious optimization of the above scheme, is to never call the disable (or enable) function for the RT-domain, since there all interrupt processing is protected by the hardware. Comments, anyone? -- Anders
[Xenomai-core] [PATCH] Slow is faster arch/ppc/syslib/open_pic.c
When trying to run Xenomai on PowerPC with OpenPIC, I have (finally) found that interrupt latency is much improved with the following patch: --- arch/ppc/syslib/open_pic.c~ 2006-01-08 03:15:24.0 +0100 +++ arch/ppc/syslib/open_pic.c 2006-02-07 16:56:14.0 +0100 @@ -820,7 +820,7 @@ */ static void openpic_ack_irq(unsigned int irq_nr) { -#ifdef __SLOW_VERSION__ +#if defined(__SLOW_VERSION__) || defined(CONFIG_IPIPE) openpic_disable_irq(irq_nr); openpic_eoi(); #else @@ -831,7 +831,7 @@ static void openpic_end_irq(unsigned int irq_nr) { -#ifdef __SLOW_VERSION__ +#if defined(__SLOW_VERSION__) || defined(CONFIG_IPIPE) if (!(irq_desc[irq_nr].status (IRQ_DISABLED|IRQ_INPROGRESS)) irq_desc[irq_nr].action) openpic_enable_irq(irq_nr); The reason for this, is that the fast version doesn't call openpic_eoi until the interrupt is ended, which means that all RT-interrupts are delayed by a pending Linux interrupt. -- Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [BUG] problems with adeos-ipipe-2.6.14-ppc-1.2-00.patch
When trying to patch with latest version of this patch, I get: patching file include/asm-ppc/ipipe.h Hunk #1 FAILED at 1. Hunk #2 FAILED at 149. Hunk #3 FAILED at 160. Hunk #4 FAILED at 195. Problem seems to be at line 4168 in the patch, where it says @@ -0,1 +1,179 @@ but the old [working] patch said @@ -0,0 +1,178 @@ Seems like the patch is created againt a not totally clean distribution. -- Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] some results on my laptop
Jan Kiszka wrote: Jan Kiszka wrote: ... What about other time sources on x86? Which systems already have HPET these days, and does this source not suffer from frequency scaling? I once read that HPET is quite easy to program, is this true? IOW, would it be worth considering to add this to the HAL? There are actually only few registers: http://www.intel.com/hardwaredesign/hpetspec_1.pdf Even a replacement for the TSC is available (Main Counter), but I guess that some effort will be required to replace all direct usages of rdtsc in the current Xenomai code, right? And unfortunately they aren't guaranteed to survive S3 sleep, which laptops spend a lot of time in (around 50% when doing coantrol at 100 Hz). -- Anders
[Xenomai-core] [BUG] version mismatch
in ksrc/arch/powerpc/patches/adeos-ipipe-2.6.14-ppc-1.2-00.patch: #define IPIPE_ARCH_STRING1.1-02 shouldn't this be #define IPIPE_ARCH_STRING1.2-00 -- Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] Are XN_ISR_CHAINED and XN_ISR_ENABLE mutually exclusive?
While looking into how to implement sharing of interrupts between realtime and non-realtime domains (and applying Wolfgang Grandegger's patch [https://mail.gna.org/public/xenomai-core/2006-01/msg00233.html], which is necessary to make XN_ISR_ENABLE work at all on the PowerPC platform), I'm beginning to think that XN_ISR_CHAINED and XN_ISR_ENABLE are mutually exclusive, since if both are set, desc-handler-end will be called twice: 1. When the realtime isr handler returns 2. When the Linux domain calls it in __do_IRQ In the solution I have in mind at the moment, I will: 1. Add an extra iend handler argument to xnintr_init 2. If XN_ISR_ENABLE is returned from the isr handler, replace desc-handler-end with the user supplied iend handler. Hereby I hope to be able to handle interrupts shared between realtime and non-realtime domain, without having the realtime domain wait for all non-realtime interrupts to finish. This is the scenario I'm thinking of: 1. A non-RT interrupt occurs 2. The (RT) isr handler detects the non-RT interrupt, disables further non-RT interrupts on that irq-vector, replaces desc-handler-end with the user supplied iend handler, returns XN_ISR_CHAINED | XN_ISR_ENABLE. 3. RT interrupts are serviced by the (RT) isr handler, returns XN_ISR_ENABLE 4. The Linux domain get a chance to run the chained interrupt, and eventually calls desc-handler-end (supplied iend handler) 5. The iend handler reenables non-RT interrupts. Comments on the above are most welcome! -- Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Are XN_ISR_CHAINED and XN_ISR_ENABLE mutually exclusive?
Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: While looking into how to implement sharing of interrupts between realtime and non-realtime domains (and applying Wolfgang Grandegger's patch [https://mail.gna.org/public/xenomai-core/2006-01/msg00233.html], which is necessary to make XN_ISR_ENABLE work at all on the PowerPC platform), I'm beginning to think that XN_ISR_CHAINED and XN_ISR_ENABLE are mutually exclusive, since if both are set, desc-handler-end will be called twice: 1. When the realtime isr handler returns 2. When the Linux domain calls it in __do_IRQ Yes, those bits are semantically exclusive. Actually, I think passing both bits could even cause deadlocks if the RT-IRQ is raised again before the non-RT handler got a chance to clear the IRQ source in hardware. My impression as well, but it's nowhere documented, nor enforced in the code. In the solution I have in mind at the moment, I will: 1. Add an extra iend handler argument to xnintr_init 2. If XN_ISR_ENABLE is returned from the isr handler, replace desc-handler-end with the user supplied iend handler. Hereby I hope to be able to handle interrupts shared between realtime and non-realtime domain, without having the realtime domain wait for all non-realtime interrupts to finish. This is the scenario I'm thinking of: 1. A non-RT interrupt occurs 2. The (RT) isr handler detects the non-RT interrupt, disables further non-RT interrupts on that irq-vector, replaces This remains vague to me. How precisely will you disable? I guess at hardware level, i.e. in a (non-RT) device-specific way: switch off the bit in some hardware register that says this device can produce IRQs, right? Yes. desc-handler-end with the user supplied iend handler, returns XN_ISR_CHAINED | XN_ISR_ENABLE. 3. RT interrupts are serviced by the (RT) isr handler, returns XN_ISR_ENABLE 4. The Linux domain get a chance to run the chained interrupt, and eventually calls desc-handler-end (supplied iend handler) 5. The iend handler reenables non-RT interrupts. Then this would switch on that bit again? Note that this may require to synchronise the hardware access with parts of the non-RT driver. If the non-RT driver sets that bit in its ISR routine, yes. I have the (overly optimistic?) view that the non-RT ISR only does whatever is necessary to clear the interrupt and leaves the enable/disable bits untouched. Or perhaps the whole conceptis of no interest to others, and I should put this arbitration in the platform specific part (arch/ppc/platform/prpmc800.c) and consider the harrier chip as a cascaded interrupt controller, and handle it as such? -- Anders Blomdell
[Xenomai-core] [BUG] Interrupt problem on powerpc
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [BUG?] dead code in ipipe_grab_irq
In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) = 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain !test_bit(IPIPE_STALL_FLAG, ipipe_root_domain-cpudata[cpuid].status)); } Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. I'm doomed, the interrupts live in the same chip... The problem is that I have not found any good place to reenable the non-RT interrupts. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. OK, seems like have two problems then, I'll try to hunt it down /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [BUG] Interrupt problem on powerpc
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell
[Xenomai-core] [BUG?] dead code in ipipe_grab_irq
In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) = 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain !test_bit(IPIPE_STALL_FLAG, ipipe_root_domain-cpudata[cpuid].status)); } Regards Anders Blomdell
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Regards Anders Blomdell
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. I'm doomed, the interrupts live in the same chip... The problem is that I have not found any good place to reenable the non-RT interrupts. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. OK, seems like have two problems then, I'll try to hunt it down /Anders
[Xenomai-core] [PATCH] Fix to RTDM open problems
When RTDM is exposed to code like this: device1 = rt_dev_open(some_device, O_RDWR); device2 = rt_dev_open(some_device, O_RDWR); I get a SEGFAULT, which I attribute to a missing assignment to context_ptr in the case when the device is already busy, the lack of this assignment leads to a segfault in cleanup_instance. --- xenomai-2.1-rc2/ksrc/skins/rtdm/core.c~ 2006-01-07 18:08:34.0 +0100 +++ xenomai-2.1-rc2/ksrc/skins/rtdm/core.c 2006-01-27 11:14:43.0 +0100 @@ -136,6 +136,8 @@ if (context-device) { xnlock_put_irqrestore(rt_dev_lock, s); + +*context_ptr = NULL; return -EBUSY; } context-device = device; ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [PATCH] Fix to RTDM open problems
When RTDM is exposed to code like this: device1 = rt_dev_open(some_device, O_RDWR); device2 = rt_dev_open(some_device, O_RDWR); I get a SEGFAULT, which I attribute to a missing assignment to context_ptr in the case when the device is already busy, the lack of this assignment leads to a segfault in cleanup_instance. --- xenomai-2.1-rc2/ksrc/skins/rtdm/core.c~ 2006-01-07 18:08:34.0 +0100 +++ xenomai-2.1-rc2/ksrc/skins/rtdm/core.c 2006-01-27 11:14:43.0 +0100 @@ -136,6 +136,8 @@ if (context-device) { xnlock_put_irqrestore(rt_dev_lock, s); + +*context_ptr = NULL; return -EBUSY; } context-device = device;
[Xenomai-core] [BUG] Missing DESTDIR?
in a lot of the Makefile.in files in 2.1-rc2 there are lines like: test -z $(somedir) || $(mkdir_p) $(DESTDIR)$(somedir) shouldn't they read: test -z $(DESTDIR)$(somedir) || $(mkdir_p) $(DESTDIR)$(somedir) Best regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core