Re: Multicast routing + sch_fq not working since 4.20 (bisected)

2021-03-01 Thread Andre Tomt
On 01.03.2021 18:23, Eric Dumazet wrote: On Mon, Mar 1, 2021 at 6:19 PM Eric Dumazet wrote: On Mon, Mar 1, 2021 at 6:15 PM Andre Tomt wrote: TLDR; Multicast routing (at least IPv4) in combination with sch_fq is not working since kernel 4.20-rc1 and up to and including 5.12-rc1. Other

Multicast routing + sch_fq not working since 4.20 (bisected)

2021-03-01 Thread Andre Tomt
TLDR; Multicast routing (at least IPv4) in combination with sch_fq is not working since kernel 4.20-rc1 and up to and including 5.12-rc1. Other tested qdisc schedulers work fine (pfifo_fast, fq_codel, cake) Hello all I've been chasing a issue with multicast routing the past few days where not

Re: [net PATCH] net: tls, correctly account for copied bytes with multiple sk_msgs

2019-06-11 Thread Andre Tomt
make sense to me as in my testing corruption occurs even when sendfile is always returning that it sent all the bytes requested. So all this resending(?) likely happens within the kernel. The fix does appear to work just fine however. Tested-by: Andre Tomt To reproduce this do multiple copies

Re: kTLS broken somewhere between 4.18 and 5.0

2019-05-30 Thread Andre Tomt
On 07.05.2019 16:45, John Fastabend wrote: Andre Tomt wrote: On 14.04.2019 22:40, John Fastabend wrote: On 4/13/19 6:56 PM, Andre Tomt wrote: On 13.04.2019 17:34, Steinar H. Gunderson wrote: Hi, I've been using kTLS for a while, with my video reflector Cubemap (https://git.sesse.n

Re: kTLS broken somewhere between 4.18 and 5.0

2019-05-02 Thread Andre Tomt
On 14.04.2019 22:40, John Fastabend wrote: On 4/13/19 6:56 PM, Andre Tomt wrote: On 13.04.2019 17:34, Steinar H. Gunderson wrote: Hi, I've been using kTLS for a while, with my video reflector Cubemap (https://git.sesse.net/?p=cubemap). After I upgraded my server from 4.18.11 to

Re: kTLS broken somewhere between 4.18 and 5.0

2019-04-13 Thread Andre Tomt
On 13.04.2019 17:34, Steinar H. Gunderson wrote: Hi, I've been using kTLS for a while, with my video reflector Cubemap (https://git.sesse.net/?p=cubemap). After I upgraded my server from 4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent with send() (HTTP headers, HLS play

Re: hw csum failure + conntrack with more debugging information

2018-12-13 Thread Andre Tomt
On 18.11.2018 02:12, Eric Dumazet wrote: On Sat, Nov 17, 2018 at 3:18 PM Andre Tomt <mailto:an...@tomt.net>> wrote: I added Cong Wang's hw csum failure debug patch to my 4.19.2 tree and got a splat with a bit more information. > [47273.905616] p0xe0

Re: hw csum failure + conntrack with more debugging information

2018-11-21 Thread Andre Tomt
On 18.11.2018 02:12, Eric Dumazet wrote: Please try this patch, we suspect mlx4 support for CHECKSUM_COMPLETE is wrong. (Only IPv4 handled, but I suspect a similar fix is needed for IPv6) Not conclusive, but.. Have not seen any splats or other weirdness since applying this patch 3 days ago.

Re: hw csum failure + conntrack with more debugging information

2018-11-18 Thread Andre Tomt
On 18.11.2018 02:12, Eric Dumazet wrote: On Sat, Nov 17, 2018 at 3:18 PM Andre Tomt <mailto:an...@tomt.net>> wrote: I added Cong Wang's hw csum failure debug patch to my 4.19.2 tree and got a splat with a bit more information. > [47273.905616] p0xe0

hw csum failure + conntrack with more debugging information

2018-11-17 Thread Andre Tomt
I added Cong Wang's hw csum failure debug patch to my 4.19.2 tree and got a splat with a bit more information. [47273.905616] p0xe0: hw csum failure [47273.905642] dev features: 0x000860c000114bb3 [47273.905663] skb len=44 data_len=0 gso_size=0 gso_type=0 ip_summed=2 csum=0, csum_complete_sw=0

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-11-03 Thread Andre Tomt
On 31.10.2018 05:08, Andre Tomt wrote: On 30.10.2018 12:04, Andre Tomt wrote: On 30.10.2018 11:58, Andre Tomt wrote: On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-30 Thread Andre Tomt
On 30.10.2018 12:04, Andre Tomt wrote: On 30.10.2018 11:58, Andre Tomt wrote: On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-30 Thread Andre Tomt
On 30.10.2018 11:58, Andre Tomt wrote: On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-30 Thread Andre Tomt
On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-27 Thread Andre Tomt
On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-26 Thread Andre Tomt
On 26.10.2018 14:59, Eric Dumazet wrote: On Fri, Oct 26, 2018 at 5:38 AM Andre Tomt wrote: And it tripped again with that commit; however on another box with a much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4 tunnel, VF device on mlx4) [ 8197.348260] wanib: hw csum

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-26 Thread Andre Tomt
On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-26 Thread Andre Tomt
On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-24 Thread Andre Tomt
On 21.10.2018 15:34, Andre Tomt wrote: On 20.10.2018 00:25, Eric Dumazet wrote: On 10/19/2018 02:58 PM, Eric Dumazet wrote: On 10/16/2018 06:00 AM, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: I've seen similar on several systems with mlx4 cards when using 4

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-21 Thread Andre Tomt
On 20.10.2018 00:25, Eric Dumazet wrote: On 10/19/2018 02:58 PM, Eric Dumazet wrote: On 10/16/2018 06:00 AM, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: I've seen similar on several systems with mlx4 cards when using 4.18.x - that is hw csum failure follow

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-15 Thread Andre Tomt
On 15.10.2018 17:41, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger Something is changed between 4.17.12 and 4.18, after bisecting the problem I got the following first bad commit: commit 88078d98d1bb085d72af8437707279e203524fa5 Author: Eric Dumazet Date: Wed Apr 18 11

[PATCH net] net/tls: Fix connection stall on partial tls record

2018-05-06 Thread Andre Tomt
In the case of writing a partial tls record we forgot to clear the ctx->in_tcp_sendpages flag, causing some connections to stall. Fixes: c212d2c7fc47 ("net/tls: Don't recursively call push_record during tls_write_space callbacks") Signed-off-by: Andre Tomt --- net/tls/tls_ma

Re: [PATCH net] net/tls: Don't recursively call push_record during tls_write_space callbacks

2018-05-05 Thread Andre Tomt
done with the send loop. Reported-by: Andre Tomt Signed-off-by: Dave Watson Unfortunately it seems like this patch has a bug, while it fixed the kernel crashing it is causing some connections in my testbed to stall. Making sure ctx->in_tcp_sendpages is also cleared before the return ret

Re: kTLS in combination with mlx4 is very unstable

2018-05-01 Thread Andre Tomt
On 01. mai 2018 18:09, Dave Watson wrote: On 04/24/18 10:01 AM, Dave Watson wrote: On 04/22/18 11:21 PM, Andre Tomt wrote: The kernel seems to get increasingly unstable as I load it up with client connections. At about 9Gbps and 700 connections, it is okay at least for a while - it might run

Re: kTLS in combination with mlx4 is very unstable

2018-04-23 Thread Andre Tomt
On 22. april 2018 23:21, Andre Tomt wrote: Hello! kTLS looks fun, so I decided to play with it. It is quite spiffy - however with mlx4 I get kernel crashes I'm not seeing when testing on ixgbe. For testing I'm using a git build of the "stream reflector" cubemap[1] conf

kTLS in combination with mlx4 is very unstable

2018-04-22 Thread Andre Tomt
Hello! kTLS looks fun, so I decided to play with it. It is quite spiffy - however with mlx4 I get kernel crashes I'm not seeing when testing on ixgbe. For testing I'm using a git build of the "stream reflector" cubemap[1] configured with kTLS and 8 worker threads running on 4 physical cores,

Re: Soft lockup issue in Linux 4.1.9

2015-10-01 Thread Andre Tomt
On 01. okt. 2015 13:52, Eric Dumazet wrote: On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte wrote: On 10/01/15 13:29, Eric Dumazet wrote: commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af Author: Eric Dumazet Date: Thu Aug 13 15:44:51 2015 -0700 inet: fix potential deadlock in reqsk

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-19 Thread Andre Tomt
Rik van Riel wrote: Andrew Morton wrote: - We expect that the lots-of-dirty-anon-memory-over-swap-over-network scenario might still cause deadlocks. I assert that this can be solved by putting swap on local disks. Peter asserts that this isn't acceptable due to disk unreliability. I