Re: Latest net-next from GIT panic

2017-09-21 Thread Eric Dumazet
On Thu, 2017-09-21 at 15:18 +0200, Paweł Staszewski wrote: > ok after adding patch all is working from now for about 1 hour of normal > traffic witc all bgp sessions connected and about 600k prefixes in kernel. Great, I am doing to submit an official patch, uniting skb_dst_force() and skb_dst_f

Re: Latest net-next from GIT panic

2017-09-21 Thread Paweł Staszewski
W dniu 2017-09-21 o 13:31, Paweł Staszewski pisze: W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: OK we have two problems here 1) We need to unify skb_dst_force()  ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably

Re: Latest net-next from GIT panic

2017-09-21 Thread Paweł Staszewski
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had i

Re: Latest net-next from GIT panic

2017-09-21 Thread Paweł Staszewski
W dniu 2017-09-21 o 13:12, Paweł Staszewski pisze: W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: Thanks very much Pawel for the fee

Re: Latest net-next from GIT panic

2017-09-21 Thread Paweł Staszewski
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4

Re: Latest net-next from GIT panic

2017-09-21 Thread Eric Dumazet
On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: > > W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: > > On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: > >>> Thanks very much Pawel for the feedback. > >>> > >>> I was looking into the code (specifically IPv4 part) and found that in > >

Re: Latest net-next from GIT panic

2017-09-21 Thread Paweł Staszewski
W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4 part) and found that in free_fib_info_rcu(), we call free_nh_exceptions() without holding the fnhe_lock. I

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: > > Thanks very much Pawel for the feedback. > > > > I was looking into the code (specifically IPv4 part) and found that in > > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > > fnhe_lock. I am wondering if that could cause

Re: Latest net-next from GIT panic

2017-09-20 Thread Wei Wang
> Thanks very much Pawel for the feedback. > > I was looking into the code (specifically IPv4 part) and found that in > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > fnhe_lock. I am wondering if that could cause some race condition on > fnhe->fnhe_rth_input/output so a dou

Re: Latest net-next from GIT panic

2017-09-20 Thread Wei Wang
>>> bisected again and same result: >>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>> Author: Wei Wang >>> Date: Sat Jun 17 10:42:32 2017 -0700 >>> >>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>> >>

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 23:25, Paweł Staszewski pisze: W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36,

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Er

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothi

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We o

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only cal

Re: Latest net-next from GIT panic

2017-09-20 Thread Cong Wang
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: > On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >> but dmesg at this time shows nothing about interfaces or flaps. >> >> This is very odd. >> >> We only free netdevice in free_netdev() and it is only called when >> we unregister a netdevi

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: > but dmesg at this time shows nothing about interfaces or flaps. > > This is very odd. > > We only free netdevice in free_netdev() and it is only called when > we unregister a netdevice. Otherwise pcpu_refcnt is impossible > to be NULL. If the

Re: Latest net-next from GIT panic

2017-09-20 Thread Cong Wang
On Wed, Sep 20, 2017 at 10:55 AM, Paweł Staszewski wrote: > > > W dniu 2017-09-20 o 19:50, Cong Wang pisze: > > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet > wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > W

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 19:46, Wei Wang pisze: This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/n

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
On Wed, 2017-09-20 at 10:50 -0700, Cong Wang wrote: > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > > used a top posting way to report his bisection. > > > > Wei, can you take a look at Pawel report ? > > > > Crash

Re: Latest net-next from GIT panic

2017-09-20 Thread Cong Wang
On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet wrote: > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) >

Re: Latest net-next from GIT panic

2017-09-20 Thread Wei Wang
>> This is why I suggested to replace the BUG() in another mail >> >> So : >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index >> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >> 100644 >> --- a/include/linux/netdevice.h >> +++ b/in

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 16:40, Eric Dumazet pisze: On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote: Nit much more after adding this patch https://bugzilla.kernel.org/attachment.cgi?id=258529 This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/n

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote: > Nit much more after adding this patch > > https://bugzilla.kernel.org/attachment.cgi?id=258529 > This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h ind

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Nit much more after adding this patch https://bugzilla.kernel.org/attachment.cgi?id=258529 W dniu 2017-09-20 o 15:44, Eric Dumazet pisze: On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote: W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: Could you try this debug patch ? diff --git a/

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote: > > W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: > > Could you try this debug patch ? > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index > > f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: Could you try this debug patch ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 --- a/include/linux/netdevice.h +++ b/include/lin

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
On Wed, 2017-09-20 at 06:34 -0700, Eric Dumazet wrote: > Could you try this debug patch ? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index > f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 > 100644 > --- a/include/linux/netdevice.h

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
Could you try this debug patch ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,14 @@ void netdev_r

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Yes sorry for top-posting also. Configuration: Ethernet devices: lspci | grep Etherne 02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 04:00.0 Ethernet controller: In

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
So far path for bisect was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc760179

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet
Sorry for top-posting, but this is to give context to Wei, since Pawel used a top posting way to report his bisection. Wei, can you take a look at Pawel report ? Crash happens in dst_destroy() at following : if (dst->dev) dev_put(dst->dev); <> dst->dev is not NULL, but netdev->pcpu_refcnt

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
hmm But after b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date:   Sat Jun 17 10:42:32 2017 -0700     ipv4: mark DST_NOGC and remove the operation of dst_free()     With the previous preparation patches, we a

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
And the last one git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree With this have kernel panic same as always git bisect bad Bisecting: 0 revisions left to test after

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Almost there Bisecting: 6 revisions left to test after this (roughly 3 steps) [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f]

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git b

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout:     Do

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Ok looks like ending bisection Latest bisected kernel when there is no kernel panic 4.12.0+ (from next)  - but only this warning: [  309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 timed out [  309.030034] [ cut here ] [  309.030040] WARNING: CPU: 35 PI

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Trying to make video from ipmi :) with that results: https://bugzilla.kernel.org/attachment.cgi?id=258521 catched two more lines where it starts - panic from 4.13.2. Now will try tro do some bisection W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: Hi Will try bisecting tonight W

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski
Hi Will try bisecting tonight W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org

Re: Latest net-next from GIT panic

2017-09-19 Thread Eric Dumazet
On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: > Just checked kernel 4.13.2 and same problem > > Just after start all 6 bgp sessions - and kernel starts to learn routes > it panic. > > https://bugzilla.kernel.org/attachment.cgi?id=258509 > Unfortunately we have not enough informat

Re: Latest net-next from GIT panic

2017-09-19 Thread Paweł Staszewski
Latest working kernel with same configuration and kernel config 4.12.13 There is no panic after routes from all 6x bgp sessions are learned. ip r | wc -l 653112 W dniu 2017-09-20 o 02:06, Paweł Staszewski pisze: Just checked kernel 4.13.2 and same problem Just after start all 6 bgp session

Re: Latest net-next from GIT panic

2017-09-19 Thread Paweł Staszewski
Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 W dniu 2017-09-20 o 02:01, Paweł Staszewski pisze: Some information about enviroment: Server is acting as a ip

Re: Latest net-next from GIT panic

2017-09-19 Thread Paweł Staszewski
Some information about enviroment: Server is acting as a ip router with bgp There are 6x bgp sessions - each with full bgp table ~600k prefixes And it looks like panic is appearing after bgp sessions are connected - not by traffic - cause at time when panic occured there is almost no traffic.

Re: Latest net-next from GIT panic

2017-09-19 Thread Paweł Staszewski
Added few more screenshoots from kernels 4.14-rc1(net-next) and 4.14-rc1(linux-next) https://bugzilla.kernel.org/show_bug.cgi?id=197005 W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.o

Latest net-next from GIT panic

2017-09-19 Thread Paweł Staszewski
Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499