El Dimecres, 5 d'octubre de 2016, a les 12:39:04, Jan Kiszka va escriure: > On 2016-10-04 17:36, Leopold Palomo-Avellaneda wrote: > > El Dilluns, 3 d'octubre de 2016, a les 18:12:12, Leopold Palomo-Avellaneda > > va> > > escriure: > >> Hi, > >> > >> I have been making some tests and I have arrived to the conclusion that > >> the > >> PC that I would like to install Xenomai and RTNET doesn't like it. > >> > >> It's a PC with a motherboard Gigabyte Q170M-D3H-CF. I'm running 4.1.18 > >> with > >> xenomai 3.0.3. AFAIK, the xenomai tests works. However, when I try to run > >> RTNET, I got crashes: > >> > >> BUG: unable to handle kernel paging request at 00007f47ea0ef878 > >> > >> IP: [<ffffffffa0231580>] rt_udp_ioctl+0x50/0x74 [rtudp] > >> PGD 458887067 PUD 4590a1067 PMD 45921f067 PTE 8000000438863867 > >> Oops: 0001 [#1] PREEMPT SMP > >> Modules linked in: rt_igb rt_loopback rtcfg rtudp rtipv4 rtmac rtpacket > >> > >> rtnet e100 mii ctr ccm binfmt_misc nfsd > >> > >> CPU: 4 PID: 6773 Comm: LWRJointPositio Not tainted 4.1.18-xenomai-3.0.3 > >> #1 > >> Hardware name: Gigabyte Technology Co., Ltd. To be filled by > >> > >> O.E.M./Q170M-D3H- CF, BIOS F1 10/13/2015 > >> > >> task: ffff880459a26010 ti: ffff880459a38000 task.ti: ffff880459a38000 > >> RIP: 0010:[<ffffffffa0231580>] [<ffffffffa0231580>] > >> rt_udp_ioctl+0x50/0x74 > >> > >> [rtudp] > >> > >> RSP: 0018:ffff880459a3be08 EFLAGS: 00010246 > >> RAX: 00007f47ea0ef870 RBX: ffff880458d59400 RCX: ffff880458d59440 > >> RDX: 0000000000000000 RSI: 0000000040100022 RDI: ffff880458d59400 > >> RBP: 0000000000000003 R08: ffff880460297420 R09: 000000000000004e > >> R10: 00000000000000dc R11: ffff880459a3bdc0 R12: ffff880459a26010 > >> R13: ffffc90001f05008 R14: 0000000040100022 R15: ffffffff81b85ec0 > >> FS: 00007f47ea0f0700(0000) GS:ffff880460200000(0000) > >> > >> knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >> > >> CR2: 00007f47ea0ef878 CR3: 000000045890c000 CR4: 00000000003406e0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> I-pipe domain Linux > >> > >> Stack: > >> ffffffffa0231535 ffffffff8116fb70 ffff880459a265c0 00007f47ea0ef870 > >> ffff8804599975d0 0000000000000010 ffff880459a3beb8 ffff880459a3be48 > >> 0000000000000002 ffff880459a26010 00007f47ea0ef870 ffff880459a26010 > >> > >> Call Trace: > >> [<ffffffffa0231535>] ? rt_udp_ioctl+0x5/0x74 [rtudp] > >> [<ffffffff8116fb70>] ? rtdm_fd_ioctl+0x100/0x270 > >> [<ffffffff81174b40>] ? CoBaLt_fcntl+0x20/0x20 > >> [<ffffffff81174b40>] ? CoBaLt_fcntl+0x20/0x20 > >> [<ffffffff81174b50>] ? CoBaLt_ioctl+0x10/0x20 > >> [<ffffffff81174b45>] ? CoBaLt_ioctl+0x5/0x20 > >> [<ffffffff8118450a>] ? ipipe_syscall_hook+0x11a/0x360 > >> [<ffffffff81108da7>] ? __ipipe_notify_syscall+0xe7/0x1d0 > >> [<ffffffff81107185>] ? __ipipe_restore_root_nosync+0x5/0x30 > >> [<ffffffff8158fb34>] ? pipeline_syscall+0x9/0x16 > >> > >> Code: 23 00 10 40 75 15 8b 50 08 48 8b 30 48 89 cf 48 83 c4 08 e9 a3 fd > >> ff > >> > >> ff 0f 1f 00 48 89 c2 48 83 c4 08 e9 5 > >> > >> RIP [<ffffffffa0231580>] rt_udp_ioctl+0x50/0x74 [rtudp] > >> > >> RSP <ffff880459a3be08> > >> > >> CR2: 00007f47ea0ef878 > >> ---[ end trace 085d23e71de3ae4b ]--- > >> > >> The funny (or ugly thing) is that, same kernel (I'm using debian > >> packages) > >> and almost the same Xenomai (compiled in each machine but with the same > >> configure options) works in another similar box, with the same network > >> cards (rt_igb). My application doesn't crash. > >> > >> I also have tested another network card (rt_e1000_new) with the same core > >> dump. > >> > >> So, any idea how can I find some light in this? I don't know if it's a > >> rtnet issue of a combination of kernel and hardware issue. > > > > digging more in this I have found some interesting data. Although I though > > that previous message was equal to all the crashes is not true. I have > > much > > > > more messages with this error: > > BUG: unable to handle kernel paging request at 00007ffda8577680 > > IP: [<ffffffff812fe5c8>] strncmp+0x8/0x50 > > PGD 4589e3067 PUD 45c719067 PMD 459a88067 PTE 8000000453c52867 > > Oops: 0001 [#1] SMP > > Modules linked in: rt_loopback rtcfg rtudp rtipv4 rtmac rtpacket ctr ccm > > > > binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache > > sunrpc joydev rt_e1000e rt_e1000 hid_generic usbhid nls_utf8 nls_cp437 > > snd_hda_codec_hdmi vfat fat ppdev snd_hda_codec_realtek > > snd_hda_codec_generic x86_pkg_temp_thermal rt_e1000_new coretemp rt_igb > > rt_eepro100 kvm_intel rtnet kvm crct10dif_pclmul crc32_pclmul arc4 > > snd_hda_intel aesni_intel > > snd_hda_controller aes_x86_64 snd_hda_codec lrw snd_hda_core gf128mul > > snd_hwdep glue_helper snd_pcm ablk_helper cryptd snd_timer i915 snd evdev > > soundcore pcspkr efivars serio_raw i2c_i801 drm_kms_helper drm wmi battery > > i2c_algo_bit parport_pc video parport shpchp tpm_infineon tpm_tis tpm > > button ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 rfkill fuse > > > > autofs4 ext4 crc16 mbcache jbd2 sg sd_mod crc32c_intel ahci libahci > > xhci_pci> > > libata xhci_hcd e100 mii scsi_mod usbcore usb_common fan thermal_sys > > i2c_hid hid i2c_core > > > > CPU: 7 PID: 1047 Comm: slaveinfo_rt Not tainted 4.1.18-xenomai-3.0.3 #2 > > Hardware name: Gigabyte Technology Co., Ltd. To be filled by > > O.E.M./Q170M-D3H-> > > CF, BIOS F1 10/13/2015 > > > > task: ffff88045b0faaa0 ti: ffff88045b44c000 task.ti: ffff88045b44c000 > > RIP: 0010:[<ffffffff812fe5c8>] [<ffffffff812fe5c8>] strncmp+0x8/0x50 > > RSP: 0018:ffff88045b44fda0 EFLAGS: 00010202 > > RAX: ffffc90001f07008 RBX: ffffffffa0366740 RCX: 0000000000000072 > > RDX: 0000000000000010 RSI: 00007ffda8577680 RDI: ffff880459aaa004 > > RBP: ffff880459aaa000 R08: ffff880460597420 R09: 0000000000000056 > > R10: 00000000000000dc R11: ffff88045b44fdc0 R12: 00007ffda8577680 > > R13: 00007ffda8577680 R14: 0000000040180021 R15: ffffffff81b832c0 > > FS: 00007f3094175740(0000) GS:ffff880460500000(0000) > > knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 00007ffda8577680 CR3: 000000045a12a000 CR4: 00000000003406e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > I-pipe domain Linux > > > > Stack: > > ffffffffa035f151 0000000000052f08 0000000000000000 00007ffda8577680 > > ffffffffa035f621 ffff880459a17000 0000000040180021 ffff88045b0faaa0 > > ffffffffa03627be ffff880459a17000 0000000000000003 ffff88045b0faaa0 > > > > Call Trace: > > [<ffffffffa035f151>] ? __rtdev_get_by_name+0x31/0x60 [rtnet] > > [<ffffffffa035f621>] ? rtdev_get_by_name+0x51/0xd0 [rtnet] > > [<ffffffffa03627be>] ? rt_socket_if_ioctl+0x2e/0x2f0 [rtnet] > > [<ffffffff8116505c>] ? rtdm_fd_ioctl+0xfc/0x220 > > [<ffffffff81169d10>] ? CoBaLt_fcntl+0x20/0x20 > > [<ffffffff81169d10>] ? CoBaLt_fcntl+0x20/0x20 > > [<ffffffff81169d20>] ? CoBaLt_ioctl+0x10/0x20 > > [<ffffffff81169d15>] ? CoBaLt_ioctl+0x5/0x20 > > [<ffffffff8117932a>] ? ipipe_syscall_hook+0x25a/0x330 > > [<ffffffff81100097>] ? __ipipe_notify_syscall+0xe7/0x1d0 > > [<ffffffff811e7845>] ? fput+0x5/0x90 > > [<ffffffff81567cf4>] ? pipeline_syscall+0x9/0x16 > > > > it shows that the crash is produced by __rtdev_get_by_name called from > > rtdev_get_by_name called from rt_socket_if_ioctl ... with a strncmp > > > > that function is defined kernel/drivers/net/stack/rtdev.c > > > > static struct rtnet_device *__rtdev_get_by_name(const char *name) > > { > > > > int i; > > struct rtnet_device *rtdev; > > > > > > for (i = 0; i < MAX_RT_DEVICES; i++) { > > > > rtdev = rtnet_devices[i]; > > if ((rtdev != NULL) && (strncmp(rtdev->name, name, IFNAMSIZ) == > > 0)) > > > > return rtdev; > > > > } > > return NULL; > > > > } > > > > however I couldn't understand why this function crashes in this box and > > not in the other box that I have tested. I will update BIOS and see what > > happen. > > > > In any case, any help will be appreciated. > > Instrument the code with printk to retrieve which parameters are in > which state before they are evaluated (and cause the crash). That's the > general answer that almost always applies if you don't see the cause.
I tried to do that. I simply add a printk trying to show the values of (i) and rtdev->name. However, after that the box crash with hundreds of messages so I couldn't see any valuable data. I guess that there's something more deep that fails here. In any case, to me it's strange that the same code works in one box and makes a kernel crash in another box. Working on a user application. Using the same kernel and the same Xenomai version. > In this case, I would say that kernel space is accessing an invalid > userspace pointer (00007ffda8577680). That can happen with nasty RTnet, > because it lacks safe userspace address accesses. So, userspace bugs > quickly because kernel crashes. Long-pending to-do... Well, I have dona another test. I have used a simple program, not made by me. Just en example that uses raw sockets https://gist.github.com/austinmarton/1922600 I have compiled with: gcc -I/usr/xenomai/include/cobalt -I/usr/xenomai/include -D_GNU_SOURCE - D_REENTRANT -D__COBALT__ -D__COBALT_WRAP__ sendRaw.c - Wl,@/usr/xenomai/lib/cobalt.wrappers /usr/xenomai/lib/xenomai/bootstrap.o - Wl,--wrap=main -Wl,--dynamic-list=/usr/xenomai/lib/dynlist.ld - L/usr/xenomai/lib -lcobalt -lpthread -lrt -o sendRaw And it crash with the same: BUG: unable to handle kernel paging request at 00007ffe9c534390 [ 5122.346329] IP: [<ffffffff812fe5c8>] strncmp+0x8/0x50 [ 5122.346341] PGD 45caee067 PUD 45add6067 PMD 45a75d067 PTE 800000044e767867 [ 5122.346357] Oops: 0001 [#1] SMP [ 5122.346365] Modules linked in: rt_igb rt_loopback rtcfg rtudp rtipv4 rtmac rtpacket rtnet ptp pps_core dca ctr ccm snd_hda_codec_hdmi binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc joydev hid_generic nls_utf8 x86_pkg_temp_thermal nls_cp437 coretemp usbhid vfat snd_hda_codec_realtek kvm_intel ppdev fat snd_hda_codec_generic evdev kvm crct10dif_pclmul crc32_pclmul snd_hda_intel aesni_intel snd_hda_controller aes_x86_64 snd_hda_codec lrw snd_hda_core gf128mul glue_helper ablk_helper snd_hwdep cryptd i915 snd_pcm snd_timer snd drm_kms_helper serio_raw efivars pcspkr soundcore drm arc4 shpchp i2c_algo_bit i2c_i801 parport_pc battery parport wmi video tpm_tis tpm button ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 rfkill fuse autofs4 ext4 crc16 mbcache jbd2 sg sd_mod [ 5122.346552] crc32c_intel psmouse ahci libahci xhci_pci libata xhci_hcd e100 mii scsi_mod usbcore usb_common fan thermal_sys i2c_hid hid i2c_core [last unloaded: e1000e] [ 5122.346591] CPU: 5 PID: 1517 Comm: sendRaw Not tainted 4.1.18-xenomai-3.0.3 #1 [ 5122.346604] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q170M-D3H, BIOS F2 01/11/2016 [ 5122.346622] task: ffff88045885e960 ti: ffff880458a68000 task.ti: ffff880458a68000 [ 5122.346639] RIP: 0010:[<ffffffff812fe5c8>] [<ffffffff812fe5c8>] strncmp+0x8/0x50 [ 5122.346653] RSP: 0018:ffff880458a6bda0 EFLAGS: 00010202 [ 5122.346663] RAX: ffffc90001f02008 RBX: ffffffffa0493740 RCX: 0000000000000072 [ 5122.346676] RDX: 0000000000000010 RSI: 00007ffe9c534390 RDI: ffff88045cafb804 [ 5122.346688] RBP: ffff88045cafb800 R08: ffff880460397420 R09: 000000000000004e [ 5122.346700] R10: 00000000000000dc R11: ffff880458a6bdc0 R12: 00007ffe9c534390 [ 5122.346713] R13: 00007ffe9c534390 R14: 0000000000008933 R15: ffffffff81b832c0 [ 5122.346725] FS: 00007fd66ac08740(0000) GS:ffff880460300000(0000) knlGS:0000000000000000 [ 5122.346739] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 5122.346750] CR2: 00007ffe9c534390 CR3: 000000045890c000 CR4: 00000000003406e0 [ 5122.346762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5122.346775] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5122.346787] I-pipe domain Linux [ 5122.346793] Stack: [ 5122.346797] ffffffffa048c151 0000000000052f08 0000000000000000 00007ffe9c534390 [ 5122.346813] ffffffffa048c621 ffff8804599a8a00 0000000000008933 ffff88045885e960 [ 5122.346829] ffffffffa048f7be ffff8804599a8a00 0000000000000003 ffff88045885e960 [ 5122.346844] Call Trace: [ 5122.346851] [<ffffffffa048c151>] ? __rtdev_get_by_name+0x31/0x60 [rtnet] [ 5122.346864] [<ffffffffa048c621>] ? rtdev_get_by_name+0x51/0xd0 [rtnet] [ 5122.346876] [<ffffffffa048f7be>] ? rt_socket_if_ioctl+0x2e/0x2f0 [rtnet] [ 5122.346890] [<ffffffff8116505c>] ? rtdm_fd_ioctl+0xfc/0x220 [ 5122.346901] [<ffffffff81169d10>] ? CoBaLt_fcntl+0x20/0x20 [ 5122.346911] [<ffffffff81169d10>] ? CoBaLt_fcntl+0x20/0x20 [ 5122.346921] [<ffffffff81169d20>] ? CoBaLt_ioctl+0x10/0x20 [ 5122.346931] [<ffffffff81169d15>] ? CoBaLt_ioctl+0x5/0x20 [ 5122.346941] [<ffffffff8117932a>] ? ipipe_syscall_hook+0x25a/0x330 Checking it, I think that it's a problem pf using ioctl command to select the device. I have tried (and I can repeat if it's necessary) to use the POSIX layer and the Native (alchemy) layer. Any idea? Leopold -- -- Linux User 152692 GPG: 05F4A7A949A2D9AA Catalonia ------------------------------------- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? _______________________________________________ Xenomai mailing list Xenomai@xenomai.org https://xenomai.org/mailman/listinfo/xenomai