On Thu, 2017-09-21 at 15:18 +0200, Paweł Staszewski wrote:
> ok after adding patch all is working from now for about 1 hour of normal
> traffic witc all bgp sessions connected and about 600k prefixes in kernel.
Great, I am doing to submit an official patch, uniting skb_dst_force()
and skb_dst_f
W dniu 2017-09-21 o 13:31, Paweł Staszewski pisze:
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze:
OK we have two problems here
1) We need to unify skb_dst_force() ( for net tree )
2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from
lower device. This will considerably
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze:
OK we have two problems here
1) We need to unify skb_dst_force() ( for net tree )
2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from
lower device. This will considerably help your performance.
For 1), this is what I had i
W dniu 2017-09-21 o 13:12, Paweł Staszewski pisze:
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze:
On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote:
W dniu 2017-09-21 o 03:17, Eric Dumazet pisze:
On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
Thanks very much Pawel for the fee
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze:
On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote:
W dniu 2017-09-21 o 03:17, Eric Dumazet pisze:
On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
Thanks very much Pawel for the feedback.
I was looking into the code (specifically IPv4
On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote:
>
> W dniu 2017-09-21 o 03:17, Eric Dumazet pisze:
> > On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
> >>> Thanks very much Pawel for the feedback.
> >>>
> >>> I was looking into the code (specifically IPv4 part) and found that in
> >
W dniu 2017-09-21 o 03:17, Eric Dumazet pisze:
On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
Thanks very much Pawel for the feedback.
I was looking into the code (specifically IPv4 part) and found that in
free_fib_info_rcu(), we call free_nh_exceptions() without holding the
fnhe_lock. I
On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
> > Thanks very much Pawel for the feedback.
> >
> > I was looking into the code (specifically IPv4 part) and found that in
> > free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> > fnhe_lock. I am wondering if that could cause
> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a dou
>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang
>>> Date: Sat Jun 17 10:42:32 2017 -0700
>>>
>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>
W dniu 2017-09-20 o 23:25, Paweł Staszewski pisze:
W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
W dniu 2017-09-20 o 20:36,
W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
W dniu 2017-09-20 o 20:36, Cong Wang pisze:
On Wed, Sep 20, 2017 at 11:30 AM, Er
W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
W dniu 2017-09-20 o 20:36, Cong Wang pisze:
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
wrote:
On Wed, 2017-09-20 at 11:22 -0700
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
W dniu 2017-09-20 o 20:36, Cong Wang pisze:
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
wrote:
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
but dmesg at this time shows nothi
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
W dniu 2017-09-20 o 20:36, Cong Wang pisze:
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
wrote:
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
but dmesg at this time shows nothing about interfaces or flaps.
This is very odd.
We o
W dniu 2017-09-20 o 20:36, Cong Wang pisze:
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote:
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
but dmesg at this time shows nothing about interfaces or flaps.
This is very odd.
We only free netdevice in free_netdev() and it is only cal
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote:
> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>> but dmesg at this time shows nothing about interfaces or flaps.
>>
>> This is very odd.
>>
>> We only free netdevice in free_netdev() and it is only called when
>> we unregister a netdevi
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
> but dmesg at this time shows nothing about interfaces or flaps.
>
> This is very odd.
>
> We only free netdevice in free_netdev() and it is only called when
> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
> to be NULL.
If the
On Wed, Sep 20, 2017 at 10:55 AM, Paweł Staszewski
wrote:
>
>
> W dniu 2017-09-20 o 19:50, Cong Wang pisze:
>
> On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet
> wrote:
>
> Sorry for top-posting, but this is to give context to Wei, since Pawel
> used a top posting way to report his bisection.
>
> W
W dniu 2017-09-20 o 19:46, Wei Wang pisze:
This is why I suggested to replace the BUG() in another mail
So :
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74
100644
--- a/include/linux/n
On Wed, 2017-09-20 at 10:50 -0700, Cong Wang wrote:
> On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet wrote:
> > Sorry for top-posting, but this is to give context to Wei, since Pawel
> > used a top posting way to report his bisection.
> >
> > Wei, can you take a look at Pawel report ?
> >
> > Crash
On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet wrote:
> Sorry for top-posting, but this is to give context to Wei, since Pawel
> used a top posting way to report his bisection.
>
> Wei, can you take a look at Pawel report ?
>
> Crash happens in dst_destroy() at following :
>
> if (dst->dev)
>
>> This is why I suggested to replace the BUG() in another mail
>>
>> So :
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index
>> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74
>> 100644
>> --- a/include/linux/netdevice.h
>> +++ b/in
W dniu 2017-09-20 o 16:40, Eric Dumazet pisze:
On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote:
Nit much more after adding this patch
https://bugzilla.kernel.org/attachment.cgi?id=258529
This is why I suggested to replace the BUG() in another mail
So :
diff --git a/include/linux/n
On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote:
> Nit much more after adding this patch
>
> https://bugzilla.kernel.org/attachment.cgi?id=258529
>
This is why I suggested to replace the BUG() in another mail
So :
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
ind
Nit much more after adding this patch
https://bugzilla.kernel.org/attachment.cgi?id=258529
W dniu 2017-09-20 o 15:44, Eric Dumazet pisze:
On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote:
W dniu 2017-09-20 o 15:34, Eric Dumazet pisze:
Could you try this debug patch ?
diff --git a/
On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote:
>
> W dniu 2017-09-20 o 15:34, Eric Dumazet pisze:
> > Could you try this debug patch ?
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index
> > f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048
W dniu 2017-09-20 o 15:34, Eric Dumazet pisze:
Could you try this debug patch ?
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
100644
--- a/include/linux/netdevice.h
+++ b/include/lin
On Wed, 2017-09-20 at 06:34 -0700, Eric Dumazet wrote:
> Could you try this debug patch ?
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index
> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
> 100644
> --- a/include/linux/netdevice.h
Could you try this debug patch ?
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,14 @@ void netdev_r
Yes sorry for top-posting also.
Configuration:
Ethernet devices:
lspci | grep Etherne
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
04:00.0 Ethernet controller: In
So far path for bisect was:
git bisect start
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
'pinctrl-v4.13-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc760179
Sorry for top-posting, but this is to give context to Wei, since Pawel
used a top posting way to report his bisection.
Wei, can you take a look at Pawel report ?
Crash happens in dst_destroy() at following :
if (dst->dev)
dev_put(dst->dev); <>
dst->dev is not NULL, but netdev->pcpu_refcnt
hmm
But after
b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
Author: Wei Wang
Date: Sat Jun 17 10:42:32 2017 -0700
ipv4: mark DST_NOGC and remove the operation of dst_free()
With the previous preparation patches, we a
And the last one
git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for
insertion into fib6 tree
With this have kernel panic same as always
git bisect bad
Bisecting: 0 revisions left to test after
Almost there
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe()
properly
W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze:
Ok resumed and soo far:
Panic:
# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f]
Ok resumed and soo far:
Panic:
# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using
stack larger than 1024.
git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
No panic:
# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
'udp-reduce-cache-pressure'
git b
Soo far bisected and marked:
git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6
Ok kernel crashed with different panic that i didnt catch when i was
doing bisect and now my bisection is broken :)
git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be overwritten by
checkout:
Do
Ok looks like ending bisection
Latest bisected kernel when there is no kernel panic 4.12.0+ (from
next) - but only this warning:
[ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 timed out
[ 309.030034] [ cut here ]
[ 309.030040] WARNING: CPU: 35 PI
Trying to make video from ipmi :)
with that results:
https://bugzilla.kernel.org/attachment.cgi?id=258521
catched two more lines where it starts - panic from 4.13.2.
Now will try tro do some bisection
W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze:
Hi
Will try bisecting tonight
W
Hi
Will try bisecting tonight
W dniu 2017-09-20 o 05:24, Eric Dumazet pisze:
On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote:
Just checked kernel 4.13.2 and same problem
Just after start all 6 bgp sessions - and kernel starts to learn routes
it panic.
https://bugzilla.kernel.org
On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote:
> Just checked kernel 4.13.2 and same problem
>
> Just after start all 6 bgp sessions - and kernel starts to learn routes
> it panic.
>
> https://bugzilla.kernel.org/attachment.cgi?id=258509
>
Unfortunately we have not enough informat
Latest working kernel with same configuration and kernel config 4.12.13
There is no panic after routes from all 6x bgp sessions are learned.
ip r | wc -l
653112
W dniu 2017-09-20 o 02:06, Paweł Staszewski pisze:
Just checked kernel 4.13.2 and same problem
Just after start all 6 bgp session
Just checked kernel 4.13.2 and same problem
Just after start all 6 bgp sessions - and kernel starts to learn routes
it panic.
https://bugzilla.kernel.org/attachment.cgi?id=258509
W dniu 2017-09-20 o 02:01, Paweł Staszewski pisze:
Some information about enviroment:
Server is acting as a ip
Some information about enviroment:
Server is acting as a ip router with bgp
There are 6x bgp sessions - each with full bgp table ~600k prefixes
And it looks like panic is appearing after bgp sessions are connected -
not by traffic - cause at time when panic occured there is almost no
traffic.
Added few more screenshoots from kernels 4.14-rc1(net-next) and
4.14-rc1(linux-next)
https://bugzilla.kernel.org/show_bug.cgi?id=197005
W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze:
Just tried latest net-next git and found kernel panic.
Below link to bugzilla.
https://bugzilla.kernel.o
Just tried latest net-next git and found kernel panic.
Below link to bugzilla.
https://bugzilla.kernel.org/attachment.cgi?id=258499
48 matches
Mail list logo