Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

2021-05-26 Thread Eric Dumazet
On 5/26/21 10:24 AM, Michael S. Tsirkin wrote: > There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with

Re: [Bloat] virtio_net: BQL?

2021-05-19 Thread Eric Dumazet
On 5/18/21 1:00 AM, Stephen Hemminger wrote: > > The Azure network driver (netvsc) also does not have BQL. Several years ago > I tried adding it but it benchmarked worse and there is the added complexity > of handling the accelerated networking VF path. > Note that NIC with many TX queues

[PATCH net-next] virtio-net: restrict build_skb() use to some arches

2021-04-20 Thread Eric Dumazet
From: Eric Dumazet build_skb() is supposed to be followed by skb_reserve(skb, NET_IP_ALIGN), so that IP headers are word-aligned. (Best practice is to reserve NET_IP_ALIGN+NET_SKB_PAD, but the NET_SKB_PAD part is only a performance optimization if tunnel encaps are added.) Unfortunately

Re: [PATCH net-next] virtio-net: fix use-after-free in page_to_skb()

2021-04-20 Thread Eric Dumazet
On 4/20/21 7:51 PM, Guenter Roeck wrote: > > sh does indeed fail, with the same symptoms as before, but so far I was not > able to track it down to a specific commit. The alpha failure is different, > though. It is a NULL pointer access. > > Anyway, testing ... > > The patch below does

[PATCH net-next] virtio-net: fix use-after-free in page_to_skb()

2021-04-20 Thread Eric Dumazet
From: Eric Dumazet KASAN/syzbot had 4 reports, one of them being: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: slab-out-of-bounds in page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480 Read of size 12 at addr 888014a5f800 by task

Re: [net-next, v2] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-20 Thread Eric Dumazet
On 4/20/21 6:46 AM, Guenter Roeck wrote: > On Wed, Apr 14, 2021 at 09:52:21AM +0800, Xuan Zhuo wrote: >> In page_to_skb(), if we have enough tailroom to save skb_shared_info, we >> can use build_skb to create skb directly. No need to alloc for >> additional space. And it can save a 'frags

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-20 Thread Eric Dumazet
On 4/16/21 11:16 AM, Xuan Zhuo wrote: > In page_to_skb(), if we have enough tailroom to save skb_shared_info, we > can use build_skb to create skb directly. No need to alloc for > additional space. And it can save a 'frags slot', which is very friendly > to GRO. > > Here, if the payload of the

Re: [PATCH net-next] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-07 Thread Eric Dumazet
On 4/7/21 7:49 AM, Xuan Zhuo wrote: > In page_to_skb(), if we have enough tailroom to save skb_shared_info, we > can use build_skb to create skb directly. No need to alloc for > additional space. And it can save a 'frags slot', which is very friendly > to GRO. > > Here, if the payload of the

[PATCH net] virtio_net: Do not pull payload in skb->head

2021-04-02 Thread Eric Dumazet
From: Eric Dumazet Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop. The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which use

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-21 Thread Eric Dumazet
On 1/21/21 2:47 PM, Xuan Zhuo wrote: > This patch is used to construct skb based on page to save memory copy > overhead. > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > directly construct skb.

Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo

2020-10-13 Thread Eric Dumazet
On 10/12/20 11:53 AM, Muchun Song wrote: > We are not complaining about TCP using too much memory, but how do > we know that TCP uses a lot of memory. When I firstly face this problem, > I do not know who uses the 25GB memory and it is not shown in the > /proc/meminfo. > If we can know the

Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo

2020-10-12 Thread Eric Dumazet
On 10/12/20 10:39 AM, Muchun Song wrote: > On Mon, Oct 12, 2020 at 3:42 PM Eric Dumazet wrote: >> >> On Mon, Oct 12, 2020 at 6:22 AM Muchun Song wrote: >>> >>> On Mon, Oct 12, 2020 at 2:39 AM Cong Wang wrote: >>>> >>>>

Re: [PATCH] virtio: Work around frames incorrectly marked as gso

2020-02-13 Thread Eric Dumazet
On 2/13/20 2:00 AM, Michael S. Tsirkin wrote: > On Wed, Feb 12, 2020 at 05:38:09PM +, Anton Ivanov wrote: >> >> >> On 11/02/2020 10:37, Michael S. Tsirkin wrote: >>> On Tue, Feb 11, 2020 at 07:42:37AM +, Anton Ivanov wrote: On 11/02/2020 02:51, Jason Wang wrote: > > On

Re: [PATCH] vsock/virtio: add support for MSG_PEEK

2019-09-27 Thread Eric Dumazet
On 9/27/19 1:55 AM, Stefano Garzarella wrote: > Good catch! > > Maybe we can solve in this way: > > list_for_each_entry(pkt, >rx_queue, list) { > size_t off = pkt->off; > > if (total == len) > break; > > while (total <

Re: [PATCH] vsock/virtio: add support for MSG_PEEK

2019-09-26 Thread Eric Dumazet
On 9/26/19 11:23 AM, Matias Ezequiel Vara Larsen wrote: > This patch adds support for MSG_PEEK. In such a case, packets are not > removed from the rx_queue and credit updates are not sent. > > Signed-off-by: Matias Ezequiel Vara Larsen > --- > net/vmw_vsock/virtio_transport_common.c | 50 >

Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Eric Dumazet
On 04/19/2018 09:12 AM, Mikulas Patocka wrote: > > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. > This sentence is wrong. because kvmalloc() falls back to vmalloc() ... ___

Re: [PATCH] net: don't use kvzalloc for DMA memory

2018-04-18 Thread Eric Dumazet
On 04/18/2018 09:44 AM, Mikulas Patocka wrote: > > > On Wed, 18 Apr 2018, Eric Dumazet wrote: > >> >> >> On 04/18/2018 07:34 AM, Mikulas Patocka wrote: >>> The patch 74d332c13b21 changes alloc_netdev_mqs to use vzalloc if kzalloc >>

Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

2017-08-22 Thread Eric Dumazet
On Tue, 2017-08-22 at 11:01 -0700, David Miller wrote: > From: "Michael S. Tsirkin" > Date: Tue, 22 Aug 2017 20:55:56 +0300 > > > Which reminds me that skb_linearize in net core seems to be > > fundamentally racy - I suspect that if skb is cloned, and someone is > > trying to

[PATCH net-next] virtio_net: add gro capability

2015-07-31 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com Straightforward patch to add GRO processing to virtio_net. napi_complete_done() usage allows more aggressive aggregation, opted-in by setting /sys/class/net/xxx/gro_flush_timeout Tested: Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec, Rick

Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.

2015-01-27 Thread Eric Dumazet
On Tue, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote: That's what I originally wanted to do, but had to move and grow txflags thus skb_shinfo ended up growing. I wanted to avoid that, so stole an skb flag. I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid from

Re: [PATCH] virtio_net: fix use after free

2014-10-31 Thread Eric Dumazet
On Fri, 2014-10-31 at 14:07 +0800, Jason Wang wrote: Since they are called before the possible free_old_xmit_skbs(), skb won't get freed at this time. Oh right, I forgot there is no completion handler yet, timer based or whatever. Thanks. ___

Re: [PATCH v2 net 1/2] drivers/net: Disable UFO through virtio

2014-10-30 Thread Eric Dumazet
On Thu, 2014-10-30 at 18:27 +, Ben Hutchings wrote: + { + static bool warned; + + if (!warned) { + warned = true; + netdev_warn(tun-dev, +

Re: [PATCH v2 net 1/2] drivers/net: Disable UFO through virtio

2014-10-30 Thread Eric Dumazet
On Thu, 2014-10-30 at 22:20 +, Ben Hutchings wrote: Could do. I'm trying to make small fixes that are suitable for stable. Oh right, makes sense ;) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org

Re: [PATCH] virtio_net: fix use after free

2014-10-30 Thread Eric Dumazet
On Wed, 2014-10-15 at 16:23 +0300, Michael S. Tsirkin wrote: commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb-xmit_more after the skb has been put in TX VQ. Since some paths process the ring

Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang

Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: ... +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); +

Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt

2014-10-11 Thread Eric Dumazet
On Sat, 2014-10-11 at 15:16 +0800, Jason Wang wrote: We free transmitted packets in ndo_start_xmit() in the past to get better performance in the past. One side effect is that skb_orphan() needs to be called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in fact. For TCP protocol,

Re: [PATCH net-next] virtio-net: rx busy polling support

2014-07-15 Thread Eric Dumazet
On Tue, 2014-07-15 at 17:41 +0800, Jason Wang wrote: Add basic support for rx busy polling. 1 byte netperf tcp_rr on mlx4 shows 116% improvement: the transaction rate was increased from 9151.94 to 19787.37. This is a misleading changelog. You forgot to describe how you allowed busy polling,

Re: [PULL 2/2] vhost: replace rcu with mutex

2014-06-03 Thread Eric Dumazet
On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote: Il 02/06/2014 23:58, Eric Dumazet ha scritto: This looks dubious What about using kfree_rcu() instead ? It would lead to unbound allocation from userspace. Look at how we did this in commit c3059477fce2d956a0bb3e04357324780c5d8eeb

Re: [PULL 2/2] vhost: replace rcu with mutex

2014-06-02 Thread Eric Dumazet
On Tue, 2014-06-03 at 00:30 +0300, Michael S. Tsirkin wrote: All memory accesses are done under some VQ mutex. So lock/unlock all VQs is a faster equivalent of synchronize_rcu() for memory access changes. Some guests cause a lot of these changes, so it's helpful to make them faster.

Re: [PATCH net-next v3 5/5] virtio-net: initial rx sysfs support, export mergeable rx buffer size

2014-01-16 Thread Eric Dumazet
On Thu, 2014-01-16 at 09:27 -0800, Michael Dalton wrote: Sorry, just realized - I think disabling NAPI is necessary but not sufficient. There is also the issue that refill_work() could be scheduled. If refill_work() executes, it will re-enable NAPI. We'd need to cancel the vi-refill delayed

Re: [PATCH net-next v3 4/5] net-sysfs: add support for device-specific rx queue sysfs attributes

2014-01-16 Thread Eric Dumazet
On Thu, 2014-01-16 at 11:51 -0800, Michael Dalton wrote: On Thu, Jan 16, 2014 Ben Hutchings bhutchi...@solarflare.com wrote: It's simpler but we don't know if it's faster (and I don't believe that matters for the current usage). If one of these functions starts to be used in the data

Re: [PATCH net-next v4 5/6] lib: Ensure EWMA does not store wrong intermediate values

2014-01-16 Thread Eric Dumazet
On Thu, 2014-01-16 at 11:52 -0800, Michael Dalton wrote: To ensure ewma_read() without a lock returns a valid but possibly out of date average, modify ewma_add() by using ACCESS_ONCE to prevent intermediate wrong values from being written to avg-internal. Suggested-by: Eric Dumazet eric.duma

Re: [PATCH net-next 2/3] virtio-net: use per-receive queue page frag alloc for mergeable bufs

2014-01-08 Thread Eric Dumazet
On Wed, 2014-01-08 at 19:21 +0200, Michael S. Tsirkin wrote: Basically yes, we could start dropping packets immediately once GFP_ATOMIC allocations fail and repost the buffer to host, and hope memory is available by the time we get the next interrupt. But we wanted host to have visibility

Re: [PATCH net-next v2 1/4] net: allow 0 order atomic page alloc in skb_page_frag_refill

2014-01-08 Thread Eric Dumazet
On Wed, 2014-01-08 at 20:08 +0200, Michael S. Tsirkin wrote: Eric said we also need a patch to add __GFP_NORETRY, right? Probably before this one in series. Nope, this __GFP_NORETRY has nothing to do with this. I am not yet convinced we want it. This needs mm guys advice, as its a tradeoff

Re: [PATCH net-next v2 3/4] virtio-net: auto-tune mergeable rx buffer size for improved performance

2014-01-08 Thread Eric Dumazet
On Wed, 2014-01-08 at 10:28 -0800, Michael Dalton wrote: Hi Jason, On Tue, Jan 7, 2014 at 10:23 PM, Jason Wang jasow...@redhat.com wrote: What's the reason that this extra space is not accounted for truesize? The initial rationale was that this extra space is due to internal fragmentation

Re: [PATCH net-next v2 1/4] net: allow 0 order atomic page alloc in skb_page_frag_refill

2014-01-08 Thread Eric Dumazet
On Wed, 2014-01-08 at 21:18 +0200, Michael S. Tsirkin wrote: On Wed, Jan 08, 2014 at 10:26:03AM -0800, Eric Dumazet wrote: On Wed, 2014-01-08 at 20:08 +0200, Michael S. Tsirkin wrote: Eric said we also need a patch to add __GFP_NORETRY, right? Probably before this one in series

Re: [PATCH net-next v2 1/4] net: allow 0 order atomic page alloc in skb_page_frag_refill

2014-01-08 Thread Eric Dumazet
On Wed, 2014-01-08 at 16:54 -0500, Debabrata Banerjee wrote: Actually I have more data on this: Could you please stop polluting this thread ? 1. __GFP_NORETRY really does help and should go into stable tree. Not at all. You are free to patch your kernel if you want. It helps you

Re: [PATCH net-next 1/3] net: allow 0 order atomic page alloc in skb_page_frag_refill

2014-01-03 Thread Eric Dumazet
On Fri, 2014-01-03 at 17:47 -0500, Debabrata Banerjee wrote: On Thu, 2014-01-02 at 16:56 -0800, Eric Dumazet wrote: Hmm... it looks like I missed __GFP_NORETRY diff --git a/net/core/sock.c b/net/core/sock.c index 5393b4b719d7..5f42a4d70cb2 100644 --- a/net/core/sock.c +++ b/net

Re: [PATCH net-next 1/3] net: allow 0 order atomic page alloc in skb_page_frag_refill

2014-01-02 Thread Eric Dumazet
On Thu, 2014-01-02 at 19:42 -0500, Debabrata Banerjee wrote: Currently because of how mm behaves (3.10.y) the code even before the patch is a problem. I believe what may fix it is if instead of just removing the conditional on __GFP_WAIT, the initial order 0 allocation should be made

Re: [PATCH net-next 1/3] net: allow 0 order atomic page alloc in skb_page_frag_refill

2014-01-02 Thread Eric Dumazet
On Thu, 2014-01-02 at 16:56 -0800, Eric Dumazet wrote: My suggestion is to use a recent kernel, and/or eventually backport the mm fixes if any. order-3 allocations should not reclaim 2GB out of 8GB. There is a reason PAGE_ALLOC_COSTLY_ORDER exists and is 3 Hmm... it looks like I missed

Re: [PATCH net-next 2/3] virtio-net: use per-receive queue page frag alloc for mergeable bufs

2013-12-26 Thread Eric Dumazet
On Thu, 2013-12-26 at 13:28 -0800, Michael Dalton wrote: On Mon, Dec 23, 2013 at 11:37 AM, Michael S. Tsirkin m...@redhat.com wrote: So there isn't a conflict with respect to locking. Is it problematic to use same page_frag with both GFP_ATOMIC and with GFP_KERNEL? If yes why? I

Re: [PATCH net-next 2/3] virtio-net: use per-receive queue page frag alloc for mergeable bufs

2013-12-26 Thread Eric Dumazet
On Thu, 2013-12-26 at 23:37 +0200, Michael S. Tsirkin wrote: Interesting. But if we can't allocate a buffer how can we do network processing? How typical NIC drivers handle this case ? Answer : nothing special should happen, we drop incoming traffic, and make sure the driver recovers

Re: [PATCH net-next 2/3] virtio-net: use per-receive queue page frag alloc for mergeable bufs

2013-12-26 Thread Eric Dumazet
On Fri, 2013-12-27 at 12:55 +0800, Jason Wang wrote: On 12/27/2013 05:56 AM, Eric Dumazet wrote: On Thu, 2013-12-26 at 13:28 -0800, Michael Dalton wrote: On Mon, Dec 23, 2013 at 11:37 AM, Michael S. Tsirkin m...@redhat.com wrote: So there isn't a conflict with respect to locking

Re: [PATCH net-next 1/3] net: allow 0 order atomic page alloc in skb_page_frag_refill

2013-12-23 Thread Eric Dumazet
allocation for atomic allocation. This patch changes this, and looks like it will introduce some extra cost when the memory is highly fragmented. No noticeable extra cost that I could measure. We use the same strategy in RX path nowadays. Acked-by: Eric Dumazet eduma...@google.com

Re: [PATCH net-next 2/3] virtio-net: use per-receive queue page frag alloc for mergeable bufs

2013-12-23 Thread Eric Dumazet
() is used? One is used under process context, where preemption and GFP_KERNEL are allowed. One is used from softirq context and GFP_ATOMIC. You cant share a common page_frag. Acked-by: Eric Dumazet eduma...@google.com ___ Virtualization mailing list

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote: On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote: On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a it didn't drop packets received

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote: Hmm some kind of disconnect here. I got you rmanagement about bufferbloat. What I am saying is that maybe we should drop packets more aggressively: when we drop one packet of a flow, why not drop everything that's queued and is

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
allocators). Cc: Michael Dalton mwdal...@google.com Cc: Eric Dumazet eduma...@google.com Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- The patch was needed for 3.12 stable. Good catch, but if we return from

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote: On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote: On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: We need to drop the refcnt of page when we fail to allocate an skb for frag list, otherwise it will be leaked

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a it didn't drop packets received from host as far as I can tell. virtio is more like a pipe than a real NIC in this respect. Prior/after to this patch, you were not

Re: [PATCH] virtio-net: mergeable buffer size should include virtio-net header

2013-11-14 Thread Eric Dumazet
): 6445.44Gb/s Suggested-by: Eric Northup digitale...@google.com Signed-off-by: Michael Dalton mwdal...@google.com --- Acked-by: Eric Dumazet eduma...@google.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https

Re: [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance

2013-11-13 Thread Eric Dumazet
On Wed, 2013-11-13 at 10:47 +0200, Ronen Hod wrote: I looked at how ewma works, and although it is computationally efficient, and it does what it is supposed to do, initially (at the first samples) it is strongly biased towards the value that was added at the first ewma_add. I suggest that

Re: [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance

2013-11-13 Thread Eric Dumazet
On Wed, 2013-11-13 at 18:43 +0200, Ronen Hod wrote: This initial value, that you do not really want to use, will slowly fade, but it will still pretty much dominate the returned value for the first RECEIVE_AVG_WEIGHT(==64) samples or so (most ewma implementations suffer from this bug).

Re: [PATCH net-next 1/4] virtio-net: mergeable buffer size should include virtio-net header

2013-11-12 Thread Eric Dumazet
): 6445.44Gb/s Suggested-by: Eric Northup digitale...@google.com Signed-off-by: Michael Dalton mwdal...@google.com --- Acked-by: Eric Dumazet eduma...@google.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https

Re: [PATCH net-next 2/4] net: allow 0 order atomic page alloc in skb_page_frag_refill

2013-11-12 Thread Eric Dumazet
to successively lower-order page allocations on failure. Part of migration of virtio-net to per-receive queue page frag allocators. Signed-off-by: Michael Dalton mwdal...@google.com --- net/core/sock.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) Acked-by: Eric Dumazet eduma

Re: [PATCH net-next 3/4] virtio-net: use per-receive queue page frag alloc for mergeable bufs

2013-11-12 Thread Eric Dumazet
allocated buffer so that the remaining space can be used to store packet data. Signed-off-by: Michael Dalton mwdal...@google.com --- Acked-by: Eric Dumazet eduma...@google.com ___ Virtualization mailing list Virtualization@lists.linux

Re: [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance

2013-11-12 Thread Eric Dumazet
On Wed, 2013-11-13 at 15:10 +0800, Jason Wang wrote: There's one concern with EWMA. How well does it handle multiple streams each with different packet size? E.g there may be two flows, one with 256 bytes each packet another is 64K. Looks like it can result we allocate PAGE_SIZE buffer for

Re: [PATCH net-next V3 1/2] net: introduce skb_coalesce_rx_frag()

2013-11-01 Thread Eric Dumazet
Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Michael Dalton mwdal...@google.com Cc: Eric Dumazet eduma...@google.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V2: - remove the skb_frag_unref

Re: [PATCH net-next V3 2/2] virtio-net: coalesce rx frags when possible during rx

2013-11-01 Thread Eric Dumazet
bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.0023841.26 Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Michael Dalton mwdal...@google.com Cc: Eric Dumazet eduma...@google.com Acked-by: Michael S. Tsirkin m...@redhat.com Acked

Re: [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx

2013-10-31 Thread Eric Dumazet
bytessecs.10^6bits/sec 87380 16384 1638410.0023841.26 Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Michael Dalton mwdal...@google.com Cc: Eric Dumazet eduma...@google.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off

Re: [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()

2013-10-31 Thread Eric Dumazet
Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Michael Dalton mwdal...@google.com Cc: Eric Dumazet eduma...@google.com Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - remove the useless off parameter

Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug

2013-01-03 Thread Eric Dumazet
On Wed, 2012-12-26 at 15:06 +0800, Wanlong Gao wrote: Add a cpu notifier to virtio-net, so that we can reset the virtqueue affinity if the cpu hotplug happens. It improve the performance through enabling or disabling the virtqueue affinity after doing cpu hotplug. Cc: Rusty Russell

Re: [patch net-next 1/4] net: introduce new priv_flag indicating iface capable of change mac when running

2012-06-28 Thread Eric Dumazet
On Thu, 2012-06-28 at 16:10 +0200, Jiri Pirko wrote: Introduce IFF_LIFE_ADDR_CHANGE priv_flag and use it to disable netif_running() check in eth_mac_addr() Signed-off-by: Jiri Pirko jpi...@redhat.com --- include/linux/if.h |2 ++ net/ethernet/eth.c |2 +- 2 files changed, 3

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-10 Thread Eric Dumazet
On Sun, 2012-06-10 at 10:03 +0300, Michael S. Tsirkin wrote: One question though: do we want to lay the structure out so that the rx sync structure precedes the rx counters? I am not sure its worth having holes in the structure, since its percpu data. That would be 8 bytes lost per cpu and

Re: [V2 RFC net-next PATCH 1/2] virtio_net: convert the statistics into array

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 15:52 +0800, Jason Wang wrote: Currently, we store the statistics in the independent fields of virtnet_stats, this is not scalable when we want to add more counters. As suggested by Michael, this patch convert it to an array and use the enum as the index to access

[PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race on 32bit arches. We must use separate syncp for rx and tx path as they can be run at the same time on different cpus. Thus one sequence increment can be lost and readers spin forever

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: From: Eric Dumazet eduma...@google.com commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race on 32bit arches. We must use separate syncp for rx and tx path as they can be run at the same time on different cpus. Thus

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: We currently do all stats either on napi callback or from start_xmit callback. This makes them safe, yes? Hmm, then _bh() variant is needed in virtnet_stats(), as explained in include/linux/u64_stats_sync.h section 6) * 6) If

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote: On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote: On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: We currently do all stats either on napi callback or from start_xmit callback. This makes them safe

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 19:17 +0300, Michael S. Tsirkin wrote: But why do you say at most 1 packet? Consider get_stats doing: u64_stats_update_begin(stats-syncp); stats-tx_bytes += skb-len; on 64 bit at this point tx_packets might get incremented any number

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 21:51 +0300, Michael S. Tsirkin wrote: BTW for cards that do implement the counters in software, under xmit lock, is anything wrong with simply taking the xmit lock when we get the stats instead of the per-cpu trick + seqlock? I still dont understand why you would do

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 19:57 +0300, Michael S. Tsirkin wrote: So for virtio since all counters get incremented from bh we can ensure they are read atomically, simply but reading them from the correct CPU with bh disabled. And then we don't need u64_stats_sync at all. Really ? How are you

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 21:43 +0300, Michael S. Tsirkin wrote: 1. We are trying to look at counters for purposes of tuning the device. E.g. if ethtool reports packets and bytes, we'd like to calculate average packet size by bytes/packets. If both counters are read atomically the metric

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: Absolutely, I am talking about virtio here. I'm not kicking u64_stats_sync idea I am just saying that simple locking would work for virtio and might be better as it gives us a way to get counters atomically. Which lock do you own

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote: On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote: On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: Absolutely, I am talking about virtio here. I'm not kicking u64_stats_sync idea I am just saying

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 21:19 +0100, Ben Hutchings wrote: On Wed, 2012-06-06 at 22:08 +0200, Eric Dumazet wrote: On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: Absolutely, I am talking about virtio here. I'm not kicking u64_stats_sync idea I am just saying that simple

Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Eric Dumazet
On Wed, 2012-06-06 at 22:24 +0200, Eric Dumazet wrote: (ndo_get_stats64() is not allowed to sleep, and I cant see how you are going to disable napi without sleeping) In case you wonder, take a look at bond_get_stats() in drivers/net/bonding/bond_main.c

Re: [net-next RFC PATCH 4/7] tuntap: multiqueue support

2011-08-12 Thread Eric Dumazet
Le vendredi 12 août 2011 à 09:55 +0800, Jason Wang a écrit : + rxq = skb_get_rxhash(skb); + if (rxq) { + tfile = rcu_dereference(tun-tfiles[rxq % numqueues]); + if (tfile) + goto out; + } You can avoid an expensive divide with

Re: [PATCH] virtio-net: per cpu 64 bit stats

2011-06-15 Thread Eric Dumazet
Le mercredi 15 juin 2011 à 11:43 -0400, Stephen Hemminger a écrit : Use per-cpu variables to maintain 64 bit statistics. Compile tested only. Signed-off-by: Stephen Hemminger shemmin...@vyatta.com --- a/drivers/net/virtio_net.c2011-06-14 15:18:46.448596355 -0400 +++

[PATCH] xen: netfront: fix declaration order

2011-04-03 Thread Eric Dumazet
] xen: netfront: fix declaration order Must declare xennet_fix_features() and xennet_set_features() before using them. Signed-off-by: Eric Dumazet eric.duma...@gmail.com Cc: Michał Mirosław mirq-li...@rere.qmqm.pl --- drivers/net/xen-netfront.c | 72 +-- 1 file

Re: [PATCH 3/3] vhost-net: use lock_sock_fast() in peek_head_len()

2011-03-13 Thread Eric Dumazet
Le dimanche 13 mars 2011 à 17:06 +0200, Michael S. Tsirkin a écrit : On Mon, Jan 17, 2011 at 04:11:17PM +0800, Jason Wang wrote: We can use lock_sock_fast() instead of lock_sock() in order to get speedup in peek_head_len(). Signed-off-by: Jason Wang jasow...@redhat.com ---

Re: [PATCH 3/3] vhost-net: use lock_sock_fast() in peek_head_len()

2011-03-13 Thread Eric Dumazet
Le dimanche 13 mars 2011 à 18:19 +0200, Michael S. Tsirkin a écrit : Other side is in drivers/net/tun.c and net/packet/af_packet.c At least wrt tun it seems clear socket is not locked. Yes (assuming you refer to tun_net_xmit()) Besides queue, dequeue seems to be done without socket locked.

Re: [PATCH 3/3] vhost-net: use lock_sock_fast() in peek_head_len()

2011-03-13 Thread Eric Dumazet
Le dimanche 13 mars 2011 à 18:43 +0200, Michael S. Tsirkin a écrit : On Sun, Mar 13, 2011 at 05:32:07PM +0100, Eric Dumazet wrote: Le dimanche 13 mars 2011 à 18:19 +0200, Michael S. Tsirkin a écrit : Other side is in drivers/net/tun.c and net/packet/af_packet.c At least wrt tun

Re: [PATCH 3/3] vhost-net: use lock_sock_fast() in peek_head_len()

2011-01-17 Thread Eric Dumazet
); - lock_sock(sk); head = skb_peek(sk-sk_receive_queue); if (head) len = head-len; - release_sock(sk); + unlock_sock_fast(sk, slow); return len; } Acked-by: Eric Dumazet eric.duma...@gmail.com

Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Eric Dumazet
Le jeudi 06 janvier 2011 à 18:33 +0900, Simon Horman a écrit : Hi, Back in October I reported that I noticed a problem whereby flow control breaks down when openvswitch is configured to mirror a port[1]. I have (finally) looked into this further and the problem appears to relate to

Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Eric Dumazet
Le jeudi 06 janvier 2011 à 21:44 +0900, Simon Horman a écrit : Hi Eric ! Thanks for the advice. I had thought about the socket buffer but at some point it slipped my mind. In any case the following patch seems to implement the change that I had in mind. However my discussions Michael

Re: [PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common

2010-11-13 Thread Eric Dumazet
Le samedi 13 novembre 2010 à 18:17 +0800, Américo Wang a écrit : On Wed, Nov 03, 2010 at 10:59:47AM -0400, Jeremy Fitzhardinge wrote: +union { +struct __raw_tickets tickets; +__ticketpair_t slock; +} tmp, new; +int ret; + +tmp.tickets =

Re: [PATCH 02/20] x86/ticketlock: convert spin loop to C

2010-11-03 Thread Eric Dumazet
Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a écrit : From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com The inner loop of __ticket_spin_lock isn't doing anything very special, so reimplement it in C. For the 8 bit ticket lock variant, we use a register union to

Re: [PATCH 2/4] macvlan: cleanup rx statistics

2009-11-24 Thread Eric Dumazet
Arnd Bergmann a écrit : We have very similar code for rx statistics in two places in the macvlan driver, with a third one being added in the next patch. Consolidate them into one function to improve overall readability of the driver. Signed-off-by: Arnd Bergmann a...@arndb.de ---

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Eric Dumazet
Paul E. McKenney a écrit : (Sorry, but, as always, I could not resist!) Yes :) Thanks Paul for this masterpiece of diplomatic Acked-by ;) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, +

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit : Paul, you acked this previously. Should I add you acked-by line so people calm down? If you would rather I replace rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this. Or maybe patch Documentation to explain this RCU usage? So you believe I am

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-06-03 Thread Eric Dumazet
Rusty Russell a écrit : On Sat, 30 May 2009 12:41:00 am Eric Dumazet wrote: Rusty Russell a écrit : DaveM points out that there are advantages to doing it generally (it's more likely to be on same CPU than after xmit), and I couldn't find any new starvation issues in simple benchmarking here

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-06-03 Thread Eric Dumazet
David Miller a écrit : From: Rusty Russell ru...@rustcorp.com.au Date: Thu, 4 Jun 2009 13:24:57 +0930 On Thu, 4 Jun 2009 06:32:53 am Eric Dumazet wrote: Also, taking a reference on socket for each xmit packet in flight is very expensive, since it slows down receiver in __udp4_lib_lookup

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-05-29 Thread Eric Dumazet
Rusty Russell a écrit : Various drivers do skb_orphan() because they do not free transmitted skbs in a timely manner (causing apps which hit their socket limits to stall, socket close to hang, etc.). DaveM points out that there are advantages to doing it generally (it's more likely to be on

Re: [PATCH 0/4] i386 - pte update optimizations

2007-04-13 Thread Eric Dumazet
Zachary Amsden a écrit : Yes. Even then, last time I clocked instructions, xchg was still slower than read / write, although I could be misremembering. And it's not totally clear that they will always be in cached state, however, and for SMP, we still want to drop the implicit lock in