For the " Source file is more recent than executable" message, could this simply be due to the fact that I copied the kernel source to the lab and then ran the gdb commands as shown? As such, the newly copied files would have a newer timestamp than the kernel/tipc.ko files. (The kernel is actual built on a separate compiler than the test lab machine.)
Or could I get that message for another reason? -----Original Message----- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-22-17 2:11 PM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Subject: RE: TIPC Oops in tipc_sk_recv > -----Original Message----- > From: Butler, Peter [mailto:pbut...@sonusnet.com] > Sent: Wednesday, February 22, 2017 01:04 PM > To: Jon Maloy <jon.ma...@ericsson.com>; tipc- > discuss...@lists.sourceforge.net > Cc: Butler, Peter <pbut...@sonusnet.com> > Subject: RE: TIPC Oops in tipc_sk_recv > > I took a stab at it this way - not sure if I am doing this correctly or not. > > [root@myVMslot12 ~]# gdb /boot/vmlinuz-4.4.0 /proc/kcore GNU gdb > (GDB) Fedora (7.3.50.20110722-13.fc16) Copyright (C) 2011 Free > Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > BFD: /boot/vmlinuz-4.4.0: Warning: Ignoring section flag > IMAGE_SCN_MEM_NOT_PAGED in section .bss > BFD: /boot/vmlinuz-4.4.0: Warning: Ignoring section flag > IMAGE_SCN_MEM_NOT_PAGED in section .bss Reading symbols from > /boot/vmlinuz-4.4.0...(no debugging symbols found)...done. > > warning: core file may not match specified executable file. > [New process 1] > Core was generated by `BOOT_IMAGE=/vmlinuz-4.4.0 root=UUID=b419f9ff- > 80ce-459e-855c-614d86a48105 ro rd.'. > #0 0x0000000000000000 in ?? () > (gdb) file /lib/modules/4.4.0/kernel/net/tipc/tipc.ko > warning: core file may not match specified executable file. > Reading symbols from /lib/modules/4.4.0/kernel/net/tipc/tipc.ko...done. > (gdb) list *(tipc_sk_rcv+0x238) > 0x14898 is in tipc_sk_rcv (net/tipc/msg.h:131). > warning: Source file is more recent than executable. Seems like you didn't rebuild after you updated the source file? Try again just to make sure. > 126 return (struct tipc_msg *)skb->data; > 127 } > 128 > 129 static inline u32 msg_word(struct tipc_msg *m, u32 pos) > 130 { > 131 return ntohl(m->hdr[pos]); If this is correct, you are receiving a corrupt buffer where the data pointer is invalid. This is typical if the buffer already has been released. ///jon > 132 } > 133 > 134 static inline void msg_set_word(struct tipc_msg *m, u32 w, u32 val) > 135 { > > > > > -----Original Message----- > From: Butler, Peter > Sent: February-22-17 12:45 PM > To: Jon Maloy <jon.ma...@ericsson.com>; tipc- > discuss...@lists.sourceforge.net > Cc: Butler, Peter <pbut...@sonusnet.com> > Subject: RE: TIPC Oops in tipc_sk_recv > > Hi Jon > > Thanks for the info. > > One thing I should clarify. Although we are running the 4.4.0 kernel, > we had backported a number of post-4.4.0 TIPC patches into our 4.4.0 > kernel. As such, the offset in question (tipc_sk_rcv+0x238) will not > match that in the vanilla 4.4.0 source. > > Should I post the entire socket.c file to this list for your review? > Or is there an easy way for me to do a similar listing using our > actual tipc.ko file here in the lab? > > Peter > > > > > -----Original Message----- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: February-22-17 12:29 PM > To: Butler, Peter <pbut...@sonusnet.com>; tipc- > discuss...@lists.sourceforge.net > Subject: RE: TIPC Oops in tipc_sk_recv > > Hi Peter, > Very hard to make any suggestions on how to reproduce this. What I can > see is that it is a STREAM message being sent from a node local > socket, i.e., it doesn't go via any interface. The crash seems to > happen when the receiving socket is owned by the user, and while we > are instead adding the message to the backlog queue: > > Reading symbols from net/tipc/tipc.ko...done. > (gdb) list *(tipc_sk_rcv+0x238) > 0x13d78 is in tipc_sk_rcv (./arch/x86/include/asm/atomic.h:214). > 209 static __always_inline int __atomic_add_unless(atomic_t *v, int a, int > u) > 210 { > 211 int c, old; > 212 c = atomic_read(v); > 213 for (;;) { > 214 if (unlikely(c == (u))) > 215 break; > 216 old = atomic_cmpxchg((v), c, c + (a)); > 217 if (likely(old == c)) > 218 break; > > This is about what I can get out of it at the moment. Maybe you should > try a high-load test between two local sockets (try the benchmark demo > from > tipcutils) and see what you can achieve. > > BR > ///jon > > > > -----Original Message----- > > From: Butler, Peter [mailto:pbut...@sonusnet.com] > > Sent: Wednesday, February 22, 2017 10:40 AM > > To: Jon Maloy <jon.ma...@ericsson.com>; tipc- > > discuss...@lists.sourceforge.net > > Cc: Butler, Peter <pbut...@sonusnet.com> > > Subject: RE: TIPC Oops in tipc_sk_recv > > > > If you have any suggestions as to procedures/tricks you think might > > trigger this bug I can certainly attempt to do so in the lab. > > Obviously we can't attempt to reproduce it on the customer's (live) system. > > > > > > > > -----Original Message----- > > From: Butler, Peter > > Sent: February-21-17 3:39 PM > > To: Jon Maloy <jon.ma...@ericsson.com>; tipc- > > discuss...@lists.sourceforge.net > > Cc: Butler, Peter <pbut...@sonusnet.com> > > Subject: RE: TIPC Oops in tipc_sk_recv > > > > Unfortunately this occurred on a customer system so it is not > > readily reproducible. We have not seen this occur in our lab. > > > > For what it's worth, it occurred while the process was in > > TASK_UNINTERRUPTIBLE. As such, the kernel could not actually kill > > off the associated process despite the Oops, and the process > > remained forever frozen in the 'D' state and the card had to be rebooted. > > > > > > > > > > -----Original Message----- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: February-21-17 3:36 PM > > To: Butler, Peter <pbut...@sonusnet.com>; tipc- > > discuss...@lists.sourceforge.net > > Subject: RE: TIPC Oops in tipc_sk_recv > > > > Hi Peter, > > I don't think this is any known bug. Is it repeatable? > > > > ///jon > > > > > -----Original Message----- > > > From: Butler, Peter [mailto:pbut...@sonusnet.com] > > > Sent: Tuesday, February 21, 2017 12:14 PM > > > To: tipc-discussion@lists.sourceforge.net > > > Cc: Butler, Peter <pbut...@sonusnet.com> > > > Subject: [tipc-discussion] TIPC Oops in tipc_sk_recv > > > > > > This was with kernel 4.4.0, however I don't see any fix > > > specifically related to this in any subsequent 4.4.x kernel... > > > > > > BUG: unable to handle kernel NULL pointer dereference at > > > 00000000000000d8 > > > IP: [<ffffffffa0148868>] tipc_sk_rcv+0x238/0x4d0 [tipc] PGD > > > 34f4c0067 PUD > > > 34ed95067 PMD 0 > > > Oops: 0000 [#1] SMP > > > Modules linked in: nf_log_ipv4 nf_log_common xt_LOG sctp libcrc32c > > > e1000e tipc udp_tunnel ip6_udp_tunnel iTCO_wdt 8021q garp > xt_physdev > > > br_netfilter bridge stp llc nf_conntrack_ipv4 ipmiq_drv(O) > > > nf_defrag_ipv4 > > > sio_mmc(O) ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 > > > nf_defrag_ipv6 xt_state nf_conntrack event_drv(O) ip6table_filter > > > lockd ip6_tables > > > pt_timer_info(O) ddi(O) grace usb_storage ixgbe igb > > > iTCO_vendor_support i2c_algo_bit ptp i2c_i801 pps_core lpc_ich > > > i2c_core intel_ips mfd_core pcspkr ioatdma sunrpc dca tpm_tis mdio > > > tpm > > [last unloaded: iTCO_wdt] > > > CPU: 2 PID: 12144 Comm: dinamo Tainted: G O 4.4.0 #23 > > > Hardware name: PT AMC124/Base Board Product Name, BIOS > > > LGNAJFIP.PTI.0012.P15 01/15/2014 > > > task: ffff880036ad8000 ti: ffff880036900000 task.ti: > > > ffff880036900000 > > > RIP: 0010:[<ffffffffa0148868>] [<ffffffffa0148868>] > > > tipc_sk_rcv+0x238/0x4d0 [tipc] > > > RSP: 0018:ffff880036903bb8 EFLAGS: 00010292 > > > RAX: 0000000000000000 RBX: ffff88034def3970 RCX: 0000000000000001 > > > RDX: 0000000000000101 RSI: 0000000000000292 RDI: ffff88034def3984 > > > RBP: ffff880036903c28 R08: 0000000000000101 R09: 0000000000000004 > > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff880036903d28 > > > R13: 00000000bd1fd8b2 R14: ffff88034def3840 R15: ffff880036903d3c > > > FS: 00007f1e86299740(0000) GS:ffff88035fc40000(0000) > > > knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00000000000000d8 CR3: 0000000036835000 CR4: 00000000000006e0 > > > Stack: > > > 000000000000009b ffff880036903d28 0000000000000018 > > > ffff88034def38c8 > > > ffffffff81ce6240 ffff8802b9bdba00 ffff880036903ca8 > > > ffffffffa013bd7e > > > ffff8802b99d5ee8 ffff880036903c60 0000000000000000 > > > ffff88003693cb00 Call > > > Trace: > > > [<ffffffffa013bd7e>] ? tipc_msg_build+0xde/0x4f0 [tipc] > > > [<ffffffffa014358f>] tipc_node_xmit+0x11f/0x150 [tipc] > > > [<ffffffffa01470ba>] > > > __tipc_send_stream+0x16a/0x300 [tipc] [<ffffffff81625eb5>] ? > > > tcp_sendmsg+0x4d5/0xb00 [<ffffffffa0147292>] > > > tipc_send_stream+0x42/0x70 [tipc] [<ffffffff815bcf77>] > > > sock_sendmsg+0x47/0x50 [<ffffffff815bd03f>] > > > sock_write_iter+0x7f/0xd0 [<ffffffff811d799a>] > > > __vfs_write+0xaa/0xe0 [<ffffffff811d8b16>] > > > vfs_write+0xb6/0x1a0 [<ffffffff811d8e3f>] SyS_write+0x4f/0xb0 > > > [<ffffffff816de6d7>] entry_SYSCALL_64_fastpath+0x12/0x6a > > > Code: 89 de 4c 89 f7 e8 29 d3 ff ff 48 8b 7d a8 e8 60 59 59 e1 49 > > > 8d 9e 30 01 00 > > > 00 49 3b 9e 30 01 00 00 74 30 48 89 df e8 b8 b6 47 e1 <48> 8b 90 > > > d8 > > > 00 > > > 00 00 48 8b 7d b0 44 89 e9 48 89 c6 48 89 45 c0 RIP > > > [<ffffffffa0148868>] > > > tipc_sk_rcv+0x238/0x4d0 [tipc] RSP <ffff880036903bb8> > > > CR2: 00000000000000d8 > > > ---[ end trace 1c2d69738941d565 ]--- > > > > > > > > > ------------------------------------------------------------------ > > > -- > > > -- > > > -------- Check out the vibrant tech community on one of the > > > world's most engaging tech sites, SlashDot.org! > > > http://sdm.link/slashdot > > > _______________________________________________ > > > tipc-discussion mailing list > > > tipc-discussion@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion