Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-16 Thread Thomas Rosenstein via Bloat
On 16 Nov 2020, at 13:34, Jesper Dangaard Brouer wrote: On Wed, 04 Nov 2020 16:23:12 +0100 Thomas Rosenstein via Bloat wrote: [...] I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shift in icmp (and generally all connection) if I run b2

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-16 Thread Thomas Rosenstein via Bloat
On 16 Nov 2020, at 12:56, Jesper Dangaard Brouer wrote: On Fri, 13 Nov 2020 07:31:26 +0100 "Thomas Rosenstein" wrote: On 12 Nov 2020, at 16:42, Jesper Dangaard Brouer wrote: On Thu, 12 Nov 2020 14:42:59 +0100 "Thomas Rosenstein" wrote: Notice "Adaptive" setting is on. My long-shot the

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-12 Thread Thomas Rosenstein via Bloat
On 12 Nov 2020, at 16:42, Jesper Dangaard Brouer wrote: On Thu, 12 Nov 2020 14:42:59 +0100 "Thomas Rosenstein" wrote: Notice "Adaptive" setting is on. My long-shot theory(2) is that this adaptive algorithm in the driver code can guess wrong (due to not taking TSO into account) and cause i

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-12 Thread Thomas Rosenstein via Bloat
On 12 Nov 2020, at 14:31, Jesper Dangaard Brouer wrote: > On Thu, 12 Nov 2020 12:26:20 +0100 > "Thomas Rosenstein" wrote: > >>> Long-shot theory(2): The NIC driver IRQ interrupt coalesce >>> adaptive-IRQ (adj via ethtool -C) is wrong. The reason it is wrong is >>> because of TSO frames, due pac

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-12 Thread Thomas Rosenstein via Bloat
sper Dangaard Brouer wrote: On Sat, 07 Nov 2020 14:00:04 +0100 Thomas Rosenstein via Bloat wrote: Here's an extract from the ethtool https://pastebin.com/cabpWGFz just in case there's something hidden. Yes, there is something hiding in the data from ethtool_stats.pl[1]: (10G Mellan

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat
On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote: On Mon, 09 Nov 2020 11:09:33 +0100 "Thomas Rosenstein" wrote: Could you also provide ethtool_stats for the TX interface? Notice that the tool[1] ethtool_stats.pl support monitoring several interfaces at the same time, e.g. run: ethtoo

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat
On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote: On Mon, 09 Nov 2020 11:09:33 +0100 "Thomas Rosenstein" wrote: On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote: On Sat, 07 Nov 2020 14:00:04 +0100 Thomas Rosenstein via Bloat wrote: Here's an extract from th

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat
On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote: On Mon, 09 Nov 2020 11:09:33 +0100 "Thomas Rosenstein" wrote: On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote: On Sat, 07 Nov 2020 14:00:04 +0100 Thomas Rosenstein via Bloat wrote: Here's an extract from th

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat
On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote: On Sat, 07 Nov 2020 14:00:04 +0100 Thomas Rosenstein via Bloat wrote: Here's an extract from the ethtool https://pastebin.com/cabpWGFz just in case there's something hidden. Yes, there is something hiding in the

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-07 Thread Thomas Rosenstein via Bloat
On 7 Nov 2020, at 17:46, Jesper Dangaard Brouer wrote: Did the latency issue happen during this this perf recording? (it is obviously important that you record during the issue) yes it was recorded during the transfer, while it exhibited the issue I let the transfer run some time, then did

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-07 Thread Thomas Rosenstein via Bloat
On 7 Nov 2020, at 13:37, Thomas Rosenstein wrote: I have also tried to reproduce the issue with the kernel on a virtual hyper-v machine, there I don't have any adverse effects. But it's not 100% the same, since MASQ happens on it .. will restructure a bit to get a similar representation I

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-07 Thread Thomas Rosenstein via Bloat
On 7 Nov 2020, at 13:40, Jan Ceuleers wrote: On 07/11/2020 13:37, Thomas Rosenstein via Bloat wrote: Nonetheless, I tested it and no difference :) The relevant test would be to run your experiment while IPv6 is completely disabled in the kernel. Tested with ipv6 full off (via ipv6

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-07 Thread Thomas Rosenstein via Bloat
On 7 Nov 2020, at 13:40, Jan Ceuleers wrote: > On 07/11/2020 13:37, Thomas Rosenstein via Bloat wrote: >> Nonetheless, I tested it and no difference :) > > The relevant test would be to run your experiment while IPv6 is > completely disabled in the kerne

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-07 Thread Thomas Rosenstein via Bloat
On 6 Nov 2020, at 21:19, Jesper Dangaard Brouer wrote: On Fri, 06 Nov 2020 18:04:49 +0100 "Thomas Rosenstein" wrote: On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote: I'm using ping on IPv4, but I'll try to see if IPv6 makes any difference! I think you misunderstand me. I'm not as

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-06 Thread Thomas Rosenstein via Bloat
On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote: On Fri, 6 Nov 2020 13:53:58 +0100 Jesper Dangaard Brouer wrote: [...] Could this be related to netlink? I have gobgpd running on these routers, which injects routes via netlink. But the churn rate during the tests is very minimal, may

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-06 Thread Thomas Rosenstein via Bloat
On 6 Nov 2020, at 12:45, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: On 6 Nov 2020, at 12:18, Jesper Dangaard Brouer wrote: On Fri, 06 Nov 2020 10:18:10 +0100 "Thomas Rosenstein" wrote: I just tested 5.9.4 seems to also fix it partly, I have long stretches where it looks

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-06 Thread Thomas Rosenstein via Bloat
l Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer On Wed, 04 Nov 2020 16:23:12 +0100 Thomas Rosenstein via Bloat wrote: General Info: Routers are connected between each other with 10G Mellanox Connect-X cards via 10G SPF+ DAC cables via a 10G Switch from fs.com La

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-06 Thread Thomas Rosenstein via Bloat
On 5 Nov 2020, at 14:33, Jesper Dangaard Brouer wrote: On Thu, 05 Nov 2020 13:22:10 +0100 Thomas Rosenstein via Bloat wrote: On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: If so, this sounds more like a driver issue, or maybe something

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-06 Thread Thomas Rosenstein via Bloat
On 5 Nov 2020, at 14:33, Jesper Dangaard Brouer wrote: On Thu, 05 Nov 2020 13:22:10 +0100 Thomas Rosenstein via Bloat wrote: On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: If so, this sounds more like a driver issue, or maybe something

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-05 Thread Thomas Rosenstein via Bloat
On 5 Nov 2020, at 13:38, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: If so, this sounds more like a driver issue, or maybe something to do with scheduling. Does it only happen with ICMP? Yo

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-05 Thread Thomas Rosenstein via Bloat
On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: If so, this sounds more like a driver issue, or maybe something to do with scheduling. Does it only happen with ICMP? You could try this tool for a userspace UDP measurement: It happens with all packets, t

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-05 Thread Thomas Rosenstein via Bloat
On 5 Nov 2020, at 1:10, Toke Høiland-Jørgensen wrote: "Thomas Rosenstein" writes: On 4 Nov 2020, at 17:10, Toke Høiland-Jørgensen wrote: Thomas Rosenstein via Bloat writes: Hi all, I'm coming from the lartc mailing list, here's the original text: = I have mu

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-04 Thread Thomas Rosenstein via Bloat
On 4 Nov 2020, at 17:10, Toke Høiland-Jørgensen wrote: Thomas Rosenstein via Bloat writes: Hi all, I'm coming from the lartc mailing list, here's the original text: = I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shi

[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-04 Thread Thomas Rosenstein via Bloat
Hi all, I'm coming from the lartc mailing list, here's the original text: = I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shift in icmp (and generally all connection) if I run b2 upload-file --threads 40 (and I can reproduce this) Wh