Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-04 Thread Shaun Crampton
On 03/09/2015 13:10, "Eric Dumazet" wrote: >On Thu, 2015-09-03 at 10:09 +, Shaun Crampton wrote: >> >... >> >> Is there anything I can do on a running system to help figure this >>out? >> >> Some sort of kernel equivalent to pmap to find out what module or >>device >> >> owns that chunk of

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-04 Thread Shaun Crampton
On 03/09/2015 13:10, "Eric Dumazet" wrote: >On Thu, 2015-09-03 at 10:09 +, Shaun Crampton wrote: >> >... >> >> Is there anything I can do on a running system to help figure this >>out? >> >> Some sort of kernel equivalent to pmap to find out what module or >>device

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Eric Dumazet
On Thu, 2015-09-03 at 10:09 +, Shaun Crampton wrote: > >... > >> Is there anything I can do on a running system to help figure this out? > >> Some sort of kernel equivalent to pmap to find out what module or device > >> owns that chunk of memory? > > > >Hmm, perhaps /proc/kallsyms could point

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Shaun Crampton
>... >> Is there anything I can do on a running system to help figure this out? >> Some sort of kernel equivalent to pmap to find out what module or device >> owns that chunk of memory? > >Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81 >and 0xa008772b could be

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Daniel Borkmann
On 09/03/2015 10:13 AM, Shaun Crampton wrote: ... Is there anything I can do on a running system to help figure this out? Some sort of kernel equivalent to pmap to find out what module or device owns that chunk of memory? Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Shaun Crampton
>Looking at this one, I am still puzzeled where 0xa008772b and >0xa008772b comes from ... some driver, bridge ...? Is there anything I can do on a running system to help figure this out? Some sort of kernel equivalent to pmap to find out what module or device owns that chunk of

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Daniel Borkmann
On 09/03/2015 10:13 AM, Shaun Crampton wrote: ... Is there anything I can do on a running system to help figure this out? Some sort of kernel equivalent to pmap to find out what module or device owns that chunk of memory? Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Shaun Crampton
>Looking at this one, I am still puzzeled where 0xa008772b and >0xa008772b comes from ... some driver, bridge ...? Is there anything I can do on a running system to help figure this out? Some sort of kernel equivalent to pmap to find out what module or device owns that chunk of

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Shaun Crampton
>... >> Is there anything I can do on a running system to help figure this out? >> Some sort of kernel equivalent to pmap to find out what module or device >> owns that chunk of memory? > >Hmm, perhaps /proc/kallsyms could point to something. 0xa0087d81 >and 0xa008772b could be

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-03 Thread Eric Dumazet
On Thu, 2015-09-03 at 10:09 +, Shaun Crampton wrote: > >... > >> Is there anything I can do on a running system to help figure this out? > >> Some sort of kernel equivalent to pmap to find out what module or device > >> owns that chunk of memory? > > > >Hmm, perhaps /proc/kallsyms could point

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-02 Thread Daniel Borkmann
On 09/02/2015 06:39 PM, Shaun Crampton wrote: Make sure you backported commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a ("udp: fix dst races with multicast early demux") I just tried the latest CoreOS alpha, which had that patch. Sadly, I saw just as many reboots. Here's a sample of the

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-02 Thread Shaun Crampton
> Make sure you backported commit > 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a > ("udp: fix dst races with multicast early demux") I just tried the latest CoreOS alpha, which had that patch. Sadly, I saw just as many reboots. Here's a sample of the different types of Oopses I see (I've put the

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-02 Thread Shaun Crampton
> Make sure you backported commit > 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a > ("udp: fix dst races with multicast early demux") I just tried the latest CoreOS alpha, which had that patch. Sadly, I saw just as many reboots. Here's a sample of the different types of Oopses I see (I've put the

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-09-02 Thread Daniel Borkmann
On 09/02/2015 06:39 PM, Shaun Crampton wrote: Make sure you backported commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a ("udp: fix dst races with multicast early demux") I just tried the latest CoreOS alpha, which had that patch. Sadly, I saw just as many reboots. Here's a sample of the

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Michael Marineau
On Thu, Aug 27, 2015 at 9:40 AM, David Miller wrote: > From: Michael Marineau > Date: Thu, 27 Aug 2015 09:16:06 -0700 > >> On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet wrote: >>> Make sure you backported commit >>> 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a >>> ("udp: fix dst races with

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread David Miller
From: Michael Marineau Date: Thu, 27 Aug 2015 09:16:06 -0700 > On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet wrote: >> Make sure you backported commit >> 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a >> ("udp: fix dst races with multicast early demux") > > Oh, interesting. Looks like that patch

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Michael Marineau
On Thu, Aug 27, 2015 at 9:30 AM, Eric Dumazet wrote: > On Thu, 2015-08-27 at 09:16 -0700, Michael Marineau wrote: > >> >> Oh, interesting. Looks like that patch didn't get CC'd to stable >> though, is there a reason for that or just oversight? > > We never CC stable for networking patches. > >

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Eric Dumazet
On Thu, 2015-08-27 at 09:16 -0700, Michael Marineau wrote: > > Oh, interesting. Looks like that patch didn't get CC'd to stable > though, is there a reason for that or just oversight? We never CC stable for networking patches. David Miller prefers to take care of this himself. ( this is in

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Michael Marineau
On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet wrote: > On Wed, 2015-08-26 at 13:54 -0700, Michael Marineau wrote: >> On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert wrote: >> > On Wed, 26 Aug 2015 08:46:59 + >> > Shaun Crampton wrote: >> > >> >> Testing our app at scale on Google¹s GCE,

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Eric Dumazet
On Wed, 2015-08-26 at 13:54 -0700, Michael Marineau wrote: > On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert wrote: > > On Wed, 26 Aug 2015 08:46:59 + > > Shaun Crampton wrote: > > > >> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over > >> approximately 1 hour, I see

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Eric Dumazet
On Wed, 2015-08-26 at 13:54 -0700, Michael Marineau wrote: On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert cebbert.l...@gmail.com wrote: On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton shaun.cramp...@metaswitch.com wrote: Testing our app at scale on Google¹s GCE, running ~1000 CoreOS

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread David Miller
From: Michael Marineau michael.marin...@coreos.com Date: Thu, 27 Aug 2015 09:16:06 -0700 On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet eric.duma...@gmail.com wrote: Make sure you backported commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a (udp: fix dst races with multicast early demux) Oh,

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Eric Dumazet
On Thu, 2015-08-27 at 09:16 -0700, Michael Marineau wrote: Oh, interesting. Looks like that patch didn't get CC'd to stable though, is there a reason for that or just oversight? We never CC stable for networking patches. David Miller prefers to take care of this himself. ( this is in

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Michael Marineau
On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Wed, 2015-08-26 at 13:54 -0700, Michael Marineau wrote: On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert cebbert.l...@gmail.com wrote: On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton shaun.cramp...@metaswitch.com

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Michael Marineau
On Thu, Aug 27, 2015 at 9:30 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Thu, 2015-08-27 at 09:16 -0700, Michael Marineau wrote: Oh, interesting. Looks like that patch didn't get CC'd to stable though, is there a reason for that or just oversight? We never CC stable for networking

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-27 Thread Michael Marineau
On Thu, Aug 27, 2015 at 9:40 AM, David Miller da...@davemloft.net wrote: From: Michael Marineau michael.marin...@coreos.com Date: Thu, 27 Aug 2015 09:16:06 -0700 On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet eric.duma...@gmail.com wrote: Make sure you backported commit

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Michael Marineau
On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert wrote: > On Wed, 26 Aug 2015 08:46:59 + > Shaun Crampton wrote: > >> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over >> approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses >> below and then reboot (I¹m

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Shaun Crampton
>And the kernel thinks it's >outside of any normal text section, so it does not try to dump any >code from before the instruction pointer. > > 0: 48 8b 88 40 03 00 00mov0x340(%rax),%rcx > 7: e8 1d dd dd ff callq 0xff29 > c: 5d pop

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Chuck Ebbert
On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton wrote: > Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over > approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses > below and then reboot (I¹m not sure if the different oopses are related to > each

ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Shaun Crampton
Please CC me in any responses, thanks. Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses below and then reboot (I¹m not sure if the different oopses are related to each other). The app is Project

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Michael Marineau
On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert cebbert.l...@gmail.com wrote: On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton shaun.cramp...@metaswitch.com wrote: Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over approximately 1 hour, I see about 1 in 50 hosts hit one

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Chuck Ebbert
On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton shaun.cramp...@metaswitch.com wrote: Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses below and then reboot (I¹m not sure if the different oopses

ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Shaun Crampton
Please CC me in any responses, thanks. Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses below and then reboot (I¹m not sure if the different oopses are related to each other). The app is Project

Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Shaun Crampton
And the kernel thinks it's outside of any normal text section, so it does not try to dump any code from before the instruction pointer. 0: 48 8b 88 40 03 00 00mov0x340(%rax),%rcx 7: e8 1d dd dd ff callq 0xff29 c: 5d pop%rbp d: