Re: TCP connection issues against Amazon S3

2015-01-08 Thread Eric Dumazet
On Thu, 2015-01-08 at 17:47 +, Erik Grinaker wrote: > FWIW, I've done a bisection, and it’s triggered by this change: > > https://github.com/torvalds/linux/commit/4e4f1fc226816905c937f9b29dabe351075dfe0f This totally makes sense, thanks for doing the bisection ! > > > We are not going to

Re: [PATCH] drivers: net: xgene: fix: Out of order descriptor bytes read

2015-01-26 Thread Eric Dumazet
On Mon, 2015-01-26 at 13:12 -0800, Iyappan Subramanian wrote: > On Thu, Jan 22, 2015 at 2:50 PM, Eric Dumazet wrote: > > On Thu, 2015-01-22 at 12:03 -0800, Iyappan Subramanian wrote: > >> This patch fixes the following kernel crash, > >> > >> WARNING: C

Re: [PATCH] drivers: net: xgene: fix: Out of order descriptor bytes read

2015-01-26 Thread Eric Dumazet
On Mon, 2015-01-26 at 13:27 -0800, Eric Dumazet wrote: > What happens if you compile a kernel with CONFIG_SMP=n ? > > > Most drivers in drivers/net use rmb() in this case, not smp_rmb() or > barrier() Note that dma_rmb() was recently added as well. -- To unsubscribe from this

Re: [PATCH] lib/checksum.c: fix carry in csum_tcpudp_nofold

2015-01-27 Thread Eric Dumazet
On Wed, 2015-01-28 at 00:13 +0100, Karl Beldan wrote: > Here however I don't assume that a is "small", however I assume it has > never overflowed, which is trivial to verify since we only add 3 32bits > values and 2 16 bits values to a 64bits. > Now we just want (a + b + carry(a + b)) % 2^32, and

Re: 3.17 kernel crash while loading IPoIB

2014-09-23 Thread Eric Dumazet
On Tue, 2014-09-23 at 05:15 +, Sharma, Karun wrote: > Hello: > > I am facing an issue wherein kernel 3.17 crashes while loading IPoIB > module. I guess the issue discussed in this thread > (https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg20963.html) is > similar. > > We were able

Re: [PATCH] net: optimise csum_replace4()

2014-09-23 Thread Eric Dumazet
e in registers. This also has > impact, > especially on RISC processors. In the same spirit as the change done by > Eric Dumazet on csum_replace2(), this patch rewrites > inet_proto_csum_replace4() > taking into account RFC1624. > > I spotted during a NATted tcp transfert that cs

Re: [PATCH] net: optimise inet_proto_csum_replace4()

2014-09-23 Thread Eric Dumazet
e in registers. This also has > impact, > especially on RISC processors. In the same spirit as the change done by > Eric Dumazet on csum_replace2(), this patch rewrites > inet_proto_csum_replace4() > taking into account RFC1624. > > I spotted during a NATted tcp transfert that cs

Re: [PATCH net-next] net: keep original skb which only needs header checking during software GSO

2014-09-19 Thread Eric Dumazet
On Fri, 2014-09-19 at 14:38 +0800, Jason Wang wrote: > Commit ce93718fb7cdbc064c3000ff59e4d3200bdfa744 ("net: Don't keep > around original SKB when we software segment GSO frames") frees the > original skb after software GSO even for dodgy gso skbs. This breaks > the stream throughput from untruste

Re: [PATCH net-next V2] net: keep original skb which only needs header checking during software GSO

2014-09-19 Thread Eric Dumazet
44 ("net: Don't keep > around original SKB when we software segment GSO frames.") > > Cc: David S. Miller > Cc: Eric Dumazet > Signed-off-by: Jason Wang > --- Acked-by: Eric Dumazet -- To unsubscribe from this list: send the line "unsubscribe linux-kern

Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: > Accumulate the sent packets and sent bytes in local variables and perform a > single u64_stats_update_begin/end() after. > > Cc: Rusty Russell > Cc: Michael S. Tsirkin > Signed-off-by: Jason Wang > --- > drivers/net/virtio_net.c | 12 +++

Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: ... > +static int free_old_xmit_skbs(struct send_queue *sq, int budget) > +{ > + struct sk_buff *skb; > + unsigned int len; > + struct virtnet_info *vi = sq->vq->vdev->priv; > + struct virtnet_stats *stats = this_cpu_ptr(vi->sta

Re: Regarding tx-nocache-copy in the Sheevaplug

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 14:57 -0700, Benjamin Poirier wrote: > On 2014/10/13 12:52, Lluís Batlle i Rossell wrote: > > Hello, > > > > on the 7th of January 2014 ths patch was applied: > > https://lkml.org/lkml/2014/1/7/307 > > > > [PATCH v2] net: Do not enable tx-nocache-copy by default > >

Re: Regarding tx-nocache-copy in the Sheevaplug

2014-10-16 Thread Eric Dumazet
On Thu, 2014-10-16 at 10:34 -0700, Benjamin Poirier wrote: > On 2014/10/15 15:45, Eric Dumazet wrote: > > kmap_atomic()/kunmap_atomic() is missing, so we lack > > __cpuc_flush_dcache_area() operations. > > > > You lost me there. > 1) I don't see the link >

Re: getaddrinfo slowdown in 3.17.1, due to getifaddrs

2014-10-17 Thread Eric Dumazet
On Fri, 2014-10-17 at 07:25 +0100, Thomas Graf wrote: > On 10/17/14 at 02:34am, Steinar H. Gunderson wrote: > > On Fri, Oct 17, 2014 at 02:21:32AM +0200, Steinar H. Gunderson wrote: > > > Hi, > > > > > > We recently upgraded a machine from 3.14.5 to 3.17.1, and a Perl script > > > we're > > > run

Re: getaddrinfo slowdown in 3.17.1, due to getifaddrs

2014-10-17 Thread Eric Dumazet
On Fri, 2014-10-17 at 12:30 -0400, David Miller wrote: > Can I ask a serious question? What is the synchronize_net() in AF_NETLINK > exactly needed for? __netlink_lookup() calls rhashtable_lookup_compare() So you really want that any object found in this lookup respects rcu grace period before

Re: Upstream kernel build failures in include/net/tcp.h

2014-10-18 Thread Eric Dumazet
On Sat, 2014-10-18 at 21:37 -0700, Guenter Roeck wrote: > Hi, > > I am getting lots of build failures with the latest upstream kernel. > > In file included from net/core/sock.c:140:0: > include/net/tcp.h: In function 'tcp_v6_iif': > include/net/tcp.h:738:32: error: 'union ' has no member named 'h

Re: [PATCH] virtio_net: fix use after free

2014-10-30 Thread Eric Dumazet
On Wed, 2014-10-15 at 16:23 +0300, Michael S. Tsirkin wrote: > commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 > net: Remove ndo_xmit_flush netdev operation, use signalling instead. > > added code that looks at skb->xmit_more after the skb has > been put in TX VQ. Since some paths process the

Re: [PATCH] virtio_net: fix use after free

2014-10-31 Thread Eric Dumazet
On Fri, 2014-10-31 at 14:07 +0800, Jason Wang wrote: > Since they are called before the possible free_old_xmit_skbs(), skb > won't get freed at this time. Oh right, I forgot there is no completion handler yet, timer based or whatever. Thanks. -- To unsubscribe from this list: send the line "uns

Re: [PATCH] ipv4: avoid divide 0 error in tcp_incr_quickack

2014-10-31 Thread Eric Dumazet
On Fri, 2014-10-31 at 09:24 -0700, Alexei Starovoitov wrote: > cc-ing netdev > > On Fri, Oct 31, 2014 at 7:50 AM, Chen Weilong wrote: > > From: Weilong Chen > > > > We got a problem like this: > > [8801c1a05570] machine_kexec at 81025039 > > [8801c1a055d0] crash_kexec at ff

[PATHC] net: napi_reuse_skb() should check pfmemalloc

2014-10-23 Thread Eric Dumazet
From: Eric Dumazet Do not reuse skb if it was pfmemalloc tainted, otherwise future frame might be dropped anyway. Signed-off-by: Eric Dumazet --- net/core/dev.c |4 1 file changed, 4 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index b793e3521a36..945bbd001359 100644

Re: vmalloced stacks on x86_64?

2014-10-26 Thread Eric Dumazet
On Fri, 2014-10-24 at 19:38 -0700, H. Peter Anvin wrote: > On 10/24/2014 05:22 PM, Andy Lutomirski wrote: > > Is there any good reason not to use vmalloc for x86_64 stacks? > > Additional TLB pressure if anything else. It seems TLB pressure gets less and less interest these days... Is it still w

Re: [PATCH net] tcp: fix connect() invalid -EADDRNOTAVAIL error

2014-11-19 Thread Eric Dumazet
On Wed, 2014-11-19 at 17:37 +1100, Jon Maxwell wrote: > Prerequisites for this to happen: > 1) The local tcp port range must be exhausted. > 2) A process must have called bind() followed by connect() for all > local ports. How the bind() is done exactly ? How SO_REUSEADDR is used ? > 3) A diffe

Re: [PATCH net] tcp: fix connect() invalid -EADDRNOTAVAIL error

2014-11-19 Thread Eric Dumazet
On Thu, 2014-11-20 at 14:44 +1100, Jonathan Maxwell wrote: > > > Prerequisites for this to happen: > > > 1) The local tcp port range must be exhausted. > > > 2) A process must have called bind() followed by connect() for all > > > local ports. > > > > How the bind() is done exactly ? How SO_REUSEA

Re: [RFC] situation with csum_and_copy_... API

2014-11-20 Thread Eric Dumazet
On Thu, 2014-11-20 at 21:47 +, Al Viro wrote: > As far as I can see, these retries on the send side are simply broken - > normally we are talking to TCP sockets there and tcp_sendmsg() does *not* > modify iovec in normal case. Arg... I sent this morning something doing this (against net-next

Re: [RFC] situation with csum_and_copy_... API

2014-11-20 Thread Eric Dumazet
On Thu, 2014-11-20 at 22:25 +, Al Viro wrote: > Yes, it is. You are breaking several _other_ kernel_sendmsg() users. > They are already slightly broken, but that'll make breakage much more > common. > > Please, don't - the right thing to do is to have iov_iter in msghdr > (we already have th

Re: [PATCH] tcp: Restore RFC5961-compliant behavior for SYN packets

2014-11-20 Thread Eric Dumazet
On Thu, 2014-11-20 at 15:09 -0800, Calvin Owens wrote: > Commit c3ae62af8e755 ("tcp: should drop incoming frames without ACK > flag set") was created to mitigate a security vulnerability in which a > local attacker is able to inject data into locally-opened sockets by > using TCP protocol statistic

Re: [PATCH] tcp: Restore RFC5961-compliant behavior for SYN packets

2014-11-20 Thread Eric Dumazet
counter > only if it is set when a valid RST packet is seen. Seems tricky, a Challenge ACK do not necessarily gives an RST. Anyway this certainly can wait, as we already have a sysctl to eventually work around the issue. Acked-by: Eric Dumazet Thanks ! -- To unsubscribe from this list: se

Re: [RFC] situation with csum_and_copy_... API

2014-11-21 Thread Eric Dumazet
On Fri, 2014-11-21 at 08:49 +, Al Viro wrote: > Another thing is tcp_sendmsg_fastopen() and tcp_send_rcvq(). The latter > should just use copy_from_iter() instead of memcpy_from_iovec(), the former > is dealt with by making tcp_send_syn_data() use the same copy_from_iter() > instead of memcpy

Re: [PATCH v2 9/9] netfilter: Replace smp_read_barrier_depends() with lockless_dereference()

2014-11-21 Thread Eric Dumazet
On Fri, 2014-11-21 at 10:06 -0500, Pranith Kumar wrote: > Recently lockless_dereference() was added which can be used in place of > hard-coding smp_read_barrier_depends(). The following PATCH makes the change. > > Signed-off-by: Pranith Kumar > --- > net/ipv4/netfilter/arp_tables.c | 3 +-- > ne

Re: [PATCH V2 linux-next] net: fix rcu access on phonet_routes

2014-10-06 Thread Eric Dumazet
rse errors): > net/phonet/pn_dev.c:278:25: error: incompatible types in comparison > expression (different address spaces) > net/phonet/pn_dev.c:391:17: error: incompatible types in comparison > expression (different address spaces) > > Suggested-by: Eric Dumazet > Signed-of

Re: [PATCH net-next] net: core: Quiet W=1 warnings for unused vars and static functions

2014-10-06 Thread Eric Dumazet
On Mon, 2014-10-06 at 14:51 -0700, Joe Perches wrote: > Reduce noise when compiling W=1. > > All the variables are unused. > The functions are not called outside of the file so static > is preferred. > > Signed-off-by: Joe Perches > --- > > John, can you please verify that these gen_stats acces

Re: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-07 Thread Eric Dumazet
On Tue, 2014-10-07 at 15:19 +0200, Lothar Waßmann wrote: > commit 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx > performance") > introduced a regression for i.MX28. The swap_buffer() function doing > the endian conversion of the received data on i.MX28 may access memory > beyond t

RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-07 Thread Eric Dumazet
On Tue, 2014-10-07 at 14:23 +, David Laight wrote: > The point I was making is that if you have to do a read-write of the received > data (to byteswap it) then you might as well always copy it into a new skb > that > is just big enough for the actual receive frame. +1 -- To unsubscribe fro

Re: randconfig build error with next-20141008, in drivers/net/ethernet/mellanox/mlx4/en_tx.c

2014-10-08 Thread Eric Dumazet
On Wed, 2014-10-08 at 07:42 -0700, Jim Davis wrote: > Building with the attached random configuration file, > > drivers/net/ethernet/mellanox/mlx4/en_tx.c: In function > ‘mlx4_en_process_tx_cq’: > drivers/net/ethernet/mellanox/mlx4/en_tx.c:395:27: error: ‘struct > netdev_queue’ > has no member n

Re: [PATCH] net: description of dma_cookie cause make xmldocs warning

2014-10-08 Thread Eric Dumazet
On Wed, 2014-10-08 at 19:59 +0400, Sergei Shtylyov wrote: > Hello. > > On 10/08/2014 06:53 PM, Masanari Iida wrote: > > > In commit 7bced397510ab569d31de4c70b39e13355046387, > > Please also specify that commit's summary line in parens. Note that this information is useful for backports and

Re: [Xen-devel] BUG in xennet_make_frags with paged skb data

2014-11-07 Thread Eric Dumazet
On Fri, 2014-11-07 at 09:25 +, Zoltan Kiss wrote: Please do not top post. > Hi, > > AFAIK in this scenario your skb frag is wrong. The page pointer should > point to the original compound page (not a member of it), and offset > should be set accordingly. > For example, if your compound pag

[PATCH] mm/vmalloc.c: add a schedule point to vmalloc()

2014-06-24 Thread Eric Dumazet
From: Eric Dumazet It is not uncommon on busy servers to get stuck hundred of ms in vmalloc() calls (like file descriptor expansions). Add a cond_resched() to __vmalloc_area_node() to be gentle to other tasks. Signed-off-by: Eric Dumazet Cc: Hugh Dickins Cc: David Rientjes --- mm/vmalloc.c

Re: [PATCH] tcp: fix setting csum_start in tcp_gso_segment

2014-06-25 Thread Eric Dumazet
On Tue, 2014-06-24 at 21:03 -0700, Tom Herbert wrote: > > It looks like a likely culprit is that SKB_GSO_CB()->csum_start is > not set correctly when doing non-scatter gather. We are using > offset as opposed to doffset. > > Reported-by: Dave Jones > Signed-off-by: Tom Herbert > --- > net/cor

Re: [PATCH] appletalk: Set skb with destructor

2014-07-06 Thread Eric Dumazet
On Sun, 2014-07-06 at 14:56 +0300, Andrey Utkin wrote: > The sock ref counting is off so there is a kernel panic when you run > `atalkd`. See https://bugzilla.kernel.org/show_bug.cgi?id=79441 > This fix is similar to 0ae89beb283a ('can: add destructor for self > generated skbs') > should > Report

Re: [PATCH] appletalk: Set skb with destructor

2014-07-07 Thread Eric Dumazet
On Mon, 2014-07-07 at 11:03 +0300, Andrey Utkin wrote: > 2014-07-07 0:42 GMT+03:00 Eric Dumazet : > >> /* Queue packet (standard) */ > >> + sock_hold(sock); > >> + skb->destructor = atalk_skb_destructor; > >> skb->sk = sock; > &g

Re: [PATCH] appletalk: Set skb with destructor

2014-07-07 Thread Eric Dumazet
On Mon, 2014-07-07 at 10:57 +0200, Eric Dumazet wrote: > On Mon, 2014-07-07 at 11:03 +0300, Andrey Utkin wrote: > > 2014-07-07 0:42 GMT+03:00 Eric Dumazet : > > >> /* Queue packet (standard) */ > > >> + sock_hold(sock); > > >>

Re: [PATCH] appletalk: Set skb with destructor

2014-07-07 Thread Eric Dumazet
On Mon, 2014-07-07 at 13:02 +0300, Andrey Utkin wrote: > 2014-07-07 12:26 GMT+03:00 Eric Dumazet : > > Reading again this code, I think all you need is to remove the 2 buggy > > lines. > > > > No need for setup destructors. > > Reviewing the code again, i find you

Re: [PATCH] appletalk: Fix socket referencing in skb

2014-07-07 Thread Eric Dumazet
;sk. > Thanks to Eric Dumazet for correct solution. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 > Reported-by: Ed Martin > Signed-off-by: Andrey Utkin > --- Thanks ! Signed-off-by: Eric Dumazet -- To unsubscribe from this list: send the line &quo

Re: [PATCH net-next] virtio-net: rx busy polling support

2014-07-15 Thread Eric Dumazet
On Tue, 2014-07-15 at 17:41 +0800, Jason Wang wrote: > Add basic support for rx busy polling. > > 1 byte netperf tcp_rr on mlx4 shows 116% improvement: the transaction > rate was increased from 9151.94 to 19787.37. This is a misleading changelog. You forgot to describe how you allowed busy polli

Re: [PATCH net v6 4/4] tg3: Fix tx_pending checks for tg3_tso_bug

2014-09-05 Thread Eric Dumazet
On Fri, 2014-09-05 at 16:35 -0700, Prashant Sreedharan wrote: > fyi.. Initially the driver was doing a skb_copy() > (tigon3_dma_hwbug_workaround()) for LSO skb that met HW bug conditions > but users started reporting page allocation failures due to copying of > large LSO skbs. To avoid this Commit

Re: [PATCH] Freeing dst when the reference count <0 causes general protection fault, it could be a major security flaw as rogue app can modify dst to crash kernel.

2014-09-13 Thread Eric Dumazet
On Sat, 2014-09-13 at 11:35 -0700, shakil A Khan wrote: > On Saturday, September 13, 2014 04:50:22 AM Eric Dumazet wrote: > > On Sat, 2014-09-13 at 01:27 -0700, Shakil A Khan wrote: > > > Signed-off-by: Shakil A Khan > > > --- > > > > > > net/

Re: [PATCH 1/5] rhashtable: Remove gfp_flags from insert and remove functions

2014-09-15 Thread Eric Dumazet
On Mon, 2014-09-15 at 14:18 +0200, Thomas Graf wrote: > As the expansion/shrinking is moved to a worker thread, no allocations > will be performed anymore. > You meant : no GFP_ATOMIC allocations ? I would rephrase using something like : Because hash resizes are potentially time consuming, they

Re: [PATCH 1/5] rhashtable: Remove gfp_flags from insert and remove functions

2014-09-15 Thread Eric Dumazet
On Mon, 2014-09-15 at 13:49 +0100, Thomas Graf wrote: > Agreed. Will introduce this through a table parameter option when > converting the inet hash table. I am not sure you covered the /proc/net/tcp problem yet ? (or inet_diag) -- To unsubscribe from this list: send the line "unsubscribe linux

Re: [PATCH 5/5] rhashtable: Per bucket locks & expansion/shrinking in work queue

2014-09-15 Thread Eric Dumazet
On Mon, 2014-09-15 at 14:18 +0200, Thomas Graf wrote: > +static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table > *tbl) > +{ > + unsigned int i, size; > +#if defined(CONFIG_PROVE_LOCKING) > + unsigned int nr_pcpus = 2; > +#else > + unsigned int nr_pcpus = num_possibl

Re: [PATCH net-next v2 3/3 RFC] pktgen: Allow sending TCP packets

2014-07-02 Thread Eric Dumazet
On Wed, 2014-07-02 at 20:54 +0100, Zoltan Kiss wrote: > This is a prototype patch to enable sending TCP packets with pktgen. The > original motivation is to test TCP GSO with xen-netback/netfront, but I'm not > sure about how the checksum should be set up, and also someone should verify > the > GS

RE: [PATCH v6 2/4] net: moxa: replace build_skb() with netdev_alloc_skb_ip_align() / memcpy()

2014-08-26 Thread Eric Dumazet
On Tue, 2014-08-26 at 09:10 +, David Laight wrote: > From: Arnd Bergmann > > While this seems correct, I wonder why you don't do the normal approach of > > dequeuing the skb from the chain and adding a newly allocated skb to it to > > save the memcpy. > > Because the receive buffer area isn't

Re: [PATCH] bonding: bond_alb: Replace rcu_dereference() with rcu_access_pointer()

2014-08-27 Thread Eric Dumazet
On Wed, 2014-08-27 at 17:18 +0300, Andreea-Cristina Bernat wrote: > The "curr_active_slave" local variable obtained through the rcu_dereference() > call it is not dereferenced in the rest of the function. > Therefore, it is recommended to use rcu_access_pointer() instead of > rcu_dereference(). > T

Re: RTNL: assertion failed at net/ipv6/addrconf.c (1699)

2014-09-02 Thread Eric Dumazet
On Tue, 2014-09-02 at 11:04 -0700, Cong Wang wrote: > On Tue, Sep 2, 2014 at 10:58 AM, Hannes Frederic Sowa > > I definitely don't have a problem cleaning this up in net-next. I wanted > > a minimal patch for stable because I didn't check history where and when > > additional users of dev_get_by_f

Re: RTNL: assertion failed at net/ipv6/addrconf.c (1699)

2014-09-02 Thread Eric Dumazet
On Tue, 2014-09-02 at 11:15 -0700, Cong Wang wrote: > That is what we do when backporting patches, I can do that if David asks > me to backport it, but you know for netdev that is David's work. > > (I am not saying I don't want to help him, I just want to point out the fact. > I am very pleased t

Re: [PATCH net-next v2] net: bpf: make eBPF interpreter images read-only

2014-09-02 Thread Eric Dumazet
On Tue, 2014-09-02 at 14:31 -0700, Alexei Starovoitov wrote: > > +static inline void bpf_prog_unlock_ro(struct bpf_prog *fp) > > +{ > > + set_memory_rw((unsigned long)fp, fp->pages); > > why rw is needed? > since fp is allocated with vmalloc, vfree doesn't need > to touch the pages to free

Re: [PATCH] Freeing dst when the reference count <0 causes general protection fault, it could be a major security flaw as rogue app can modify dst to crash kernel.

2014-09-13 Thread Eric Dumazet
On Sat, 2014-09-13 at 01:27 -0700, Shakil A Khan wrote: > Signed-off-by: Shakil A Khan > --- > net/core/dst.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/core/dst.c b/net/core/dst.c > index a028409..6a848b0 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c

Re: [PATCH net-next 2/3] netlink: Convert netlink_lookup() to use RCU protected hash table

2014-08-04 Thread Eric Dumazet
etlink_seq_stop(). > -- Yes, two places use rht_dereference() instead of rht_dereference_rcu() [PATCH net-next] netlink: fix lockdep splats With netlink_lookup() conversion to RCU, we need to use appropriate rcu dereference in netlink_seq_socket_idx() & netlink_seq_next() Reported-by: Sasha L

Re: [RFC] net: Replace del_timer() with del_timer_sync()

2014-08-06 Thread Eric Dumazet
On Thu, 2014-08-07 at 11:48 +0530, Deepak wrote: > on SMP system, del_timer() might return even if the timer function > is running on other cpu so sk_stop_timer() will execute __sock_put() > while timer is accessing the socket on other cpu causing > "use-after-free". > > This commi

RE: [RFC] net: Replace del_timer() with del_timer_sync()

2014-08-07 Thread Eric Dumazet
On Thu, 2014-08-07 at 15:15 +, Das, Deepak wrote: Please do not top post on netdev, thanks. > I apologies for not explaining the scenario previously. > > sk_stop_timer() is used to stop the tcp timers with expiry callback > tcp_write_timer(), tcp_delack_timer(), tcp_keepalive_timer(), ... >

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-21 Thread Eric Dumazet
On Tue, 2015-04-21 at 22:06 +0200, Mateusz Guzik wrote: > On Tue, Apr 21, 2015 at 11:05:43AM -0700, Eric Dumazet wrote: > > 3) I avoid multiple threads doing a resize and then only one wins the > > deal. > > > > One could argue this last bit could be committed separa

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-21 Thread Eric Dumazet
On Tue, 2015-04-21 at 22:12 +0200, Mateusz Guzik wrote: > in dup_fd: >for (i = open_files; i != 0; i--) { > struct file *f = *old_fds++; > if (f) { > get_file(f); > I see no new requirement here. f is either NULL or not. multi threa

Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer

2015-04-21 Thread Eric Dumazet
On Tue, 2015-04-21 at 23:32 +0200, Thomas Gleixner wrote: > > Are you realizing that __mod_timer() is a massive hotpath for network > heavy workloads? BTW I was considering using mod_timer_pinned() from these networking timers (ie sk_reset_timer()) get_nohz_timer_target() sounds cool for laptop

[PATCH] fs/file.c: don't acquire files->file_lock in fd_install()

2015-04-21 Thread Eric Dumazet
From: Eric Dumazet Mateusz Guzik reported : Currently obtaining a new file descriptor results in locking fdtable twice - once in order to reserve a slot and second time to fill it. Holding the spinlock in __fd_install() is needed in case a resize is done, or to prevent a resize. Mateusz

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-22 Thread Eric Dumazet
On Wed, 2015-04-22 at 15:31 +0200, Mateusz Guzik wrote: > On Tue, Apr 21, 2015 at 02:06:53PM -0700, Eric Dumazet wrote: > > On Tue, 2015-04-21 at 22:12 +0200, Mateusz Guzik wrote: > > > > > in dup_fd: > > >for (i = open_files; i != 0; i--) { > > >

Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer

2015-04-22 Thread Eric Dumazet
On Wed, 2015-04-22 at 17:29 +0200, Peter Zijlstra wrote: > Hmm, that sounds unfortunate, this would wreck life for the power aware > laptop/tablet etc.. people. > > There is already a sysctl to manage this, is that not enough to mitigate > this problem on the server side of things? The thing is

Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer

2015-04-22 Thread Eric Dumazet
On Wed, 2015-04-22 at 20:56 +0200, Thomas Gleixner wrote: > On Wed, 22 Apr 2015, Eric Dumazet wrote: > > Check commit 4a8e320c929991c9480 ("net: sched: use pinned timers") > > for a specific example of the problems that can be raised. > > If you have a problem w

Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer

2015-04-22 Thread Eric Dumazet
On Wed, 2015-04-22 at 23:56 +0200, Thomas Gleixner wrote: > -int get_nohz_timer_target(int pinned) > +int get_nohz_timer_target(void) > { > - int cpu = smp_processor_id(); > - int i; > + int i, cpu = smp_processor_id(); > struct sched_domain *sd; > > - if (pinned || !get_

Re: net: non contiguous allocations passed to build_skb

2015-04-24 Thread Eric Dumazet
On Fri, 2015-04-24 at 17:28 -0400, Sasha Levin wrote: > Hey Eric, > > Your commit 79930f5892e ("net: do not deplete pfmemalloc reserve") assumes > that > build_skb() will only handle contiguous allocations because of the > virt_to_head_page(). > > However, netlink_sendmsg() calls build_skb() wit

Re: net: non contiguous allocations passed to build_skb

2015-04-24 Thread Eric Dumazet
On Fri, 2015-04-24 at 15:02 -0700, Eric Dumazet wrote: > On Fri, 2015-04-24 at 17:28 -0400, Sasha Levin wrote: > > Hey Eric, > > > > Your commit 79930f5892e ("net: do not deplete pfmemalloc reserve") assumes > > that > > build_skb() will only h

Re: [PATCH] netns: deinline net_generic()

2015-04-14 Thread Eric Dumazet
On Tue, 2015-04-14 at 14:25 +0200, Denys Vlasenko wrote: > On x86 allyesconfig build: > The function compiles to 130 bytes of machine code. > It has 493 callsites. > Total reduction of vmlinux size: 27906 bytes. > >textdata bss dec hex filename > 82447071 22255384 20

Re: [tip:timers/core] timer: Allocate per-cpu tvec_base's statically

2015-04-14 Thread Eric Dumazet
On Thu, 2015-04-02 at 11:47 -0700, tip-bot for Peter Zijlstra wrote: > Commit-ID: b337a9380f7effd60d082569dd7e0b97a7549730 > Gitweb: http://git.kernel.org/tip/b337a9380f7effd60d082569dd7e0b97a7549730 > Author: Peter Zijlstra > AuthorDate: Tue, 31 Mar 2015 20:49:00 +0530 > Committer: Ingo

Re: [PATCH] netns: deinline net_generic()

2015-04-14 Thread Eric Dumazet
On Tue, 2015-04-14 at 15:57 +0200, Denys Vlasenko wrote: > My allyesconfig, with BUG_ON's commented out: > Right. But I can tell you nobody uses lockdep on a production kernel. Here, at Google, we get what I described. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel"

Re: [PATCH] netns: deinline net_generic()

2015-04-14 Thread Eric Dumazet
On Tue, 2015-04-14 at 17:04 +0200, Denys Vlasenko wrote: > On 04/14/2015 04:21 PM, Eric Dumazet wrote: > > On Tue, 2015-04-14 at 15:57 +0200, Denys Vlasenko wrote: > > > >> My allyesconfig, with BUG_ON's commented out: > >> > > > > Right. But

Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink

2015-04-14 Thread Eric Dumazet
On Tue, 2015-04-14 at 23:20 +0800, Honggang Li wrote: > Starting monitoring for VG vg_rdma01: 3 logical volume(s) in volume > group "vg_rdma01" monitored > [ OK ] > CR2: 0120 > ---[ end trace a8610f6e9640eb85 ]--- > > Signed-off-by: Honggang Li When was this bug added ? Please

Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink

2015-04-14 Thread Eric Dumazet
On Tue, 2015-04-14 at 23:53 +0800, Honggang LI wrote: > On Tue, Apr 14, 2015 at 05:49:55PM +0200, Nicolas Dichtel wrote: > > Le 14/04/2015 17:44, Honggang LI a écrit : > > >On Tue, Apr 14, 2015 at 08:34:33AM -0700, Eric Dumazet wrote: > > >>On Tue, 2015-04-14 at

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 14:43 +0100, George Dunlap wrote: > On Mon, Apr 13, 2015 at 2:49 PM, Eric Dumazet wrote: > > On Mon, 2015-04-13 at 11:56 +0100, George Dunlap wrote: > > > >> Is the problem perhaps that netback/netfront delays TX completion? > >> Would i

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 18:23 +0100, George Dunlap wrote: > On 04/15/2015 05:38 PM, Eric Dumazet wrote: > > My thoughts that instead of these long talks you should guys read the > > code : > > > > /* TCP Small Queues : > > * C

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 18:23 +0100, George Dunlap wrote: > Which means that max(2*skb->truesize, sk->sk_pacing_rate >>10) is > *already* larger for Xen; that calculation mentioned in the comment is > *already* doing the right thing. Sigh. 1ms of traffic at 40Gbit is 5 MBytes The reason for the c

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 18:41 +0100, George Dunlap wrote: > So you'd be OK with a patch like this? (With perhaps a better changelog?) > > -George > > --- > TSQ: Raise default static TSQ limit > > A new dynamic TSQ limit was introduced in c/s 605ad7f18 based on the > size of actual packets and t

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 10:55 -0700, Rick Jones wrote: > > > > Have you tested this patch on a NIC without GSO/TSO ? > > > > This would allow more than 500 packets for a single flow. > > > > Hello bufferbloat. > > Woudln't the fq_codel qdisc on that interface address that problem? Last time I check

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 18:58 +0100, Stefano Stabellini wrote: > On Wed, 15 Apr 2015, Eric Dumazet wrote: > > On Wed, 2015-04-15 at 18:23 +0100, George Dunlap wrote: > > > > > Which means that max(2*skb->truesize, sk->sk_pacing_rate >>10) is > > &g

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote: > Maybe you should stop wasting all of our time and just tell us what > you're thinking. I think you make me wasting my time. I already gave all the hints in prior discussions. Rome was not built in one day. -- To unsubscribe from this l

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 11:19 -0700, Rick Jones wrote: > Well, I'm not sure that it is George and Jonathan themselves who don't > want to change a sysctl, but the customers who would have to tweak that > in their VMs? Keep in mind some VM users install custom qdisc, or even custom TCP sysctls.

Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-15 Thread Eric Dumazet
On Wed, 2015-04-15 at 23:48 +0100, Al Viro wrote: > On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote: > > On Wed, Apr 15, 2015 at 3:18 PM, Al Viro wrote: > > > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote: > > > > > >> This is functionally identical to passing AF_

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-15 Thread Eric Dumazet
On Thu, 2015-04-16 at 12:20 +0800, Herbert Xu wrote: > Eric Dumazet wrote: > > > > We already have netdev->gso_max_size and netdev->gso_max_segs > > which are cached into sk->sk_gso_max_size & sk->sk_gso_max_segs > > It is quite dangerous to attempt tri

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 12:39 +0100, George Dunlap wrote: > On 04/15/2015 07:17 PM, Eric Dumazet wrote: > > Do not expect me to fight bufferbloat alone. Be part of the challenge, > > instead of trying to get back to proven bad solutions. > > I tried that. I wrote a descript

Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 13:14 +0200, Denys Vlasenko wrote: > However, without BUG_ONs, function is still a bit big > on PREEMPT configs. Only on allyesconfig builds, that nobody use but to prove some points about code size. If you look at net_generic(), it is mostly used from code that is normally

Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 11:01 +0100, George Dunlap wrote: > He suggested that after he'd been prodded by 4 more e-mails in which two > of us guessed what he was trying to get at. That's what I was > complaining about. My big complain is that I suggested to test to double the sysctl, which gave goo

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 14:16 +0200, Mateusz Guzik wrote: > Hi, > > Currently obtaining a new file descriptor results in locking fdtable > twice - once in order to reserve a slot and second time to fill it. > > Hack below gets rid of the second lock usage. > > It gives me a ~30% speedup (~300k ops

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote: > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote: > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, > > int nr) > > cur_fdt = files_fdtable(files); > > if (nr >= cur_fdt->max_fds) { > >

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 13:42 -0700, Eric Dumazet wrote: > On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote: > > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote: > > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct > > > *fil

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Eric Dumazet
On Fri, 2015-04-17 at 00:00 +0200, Mateusz Guzik wrote: > On Thu, Apr 16, 2015 at 01:55:39PM -0700, Eric Dumazet wrote: > > On Thu, 2015-04-16 at 13:42 -0700, Eric Dumazet wrote: > > > On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote: > > > > On Thu, Apr 16, 2015 at 0

Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer

2015-04-25 Thread Eric Dumazet
On Thu, 2015-04-23 at 14:45 +0200, Thomas Gleixner wrote: > You definitely have a point from the high throughput networking > perspective. > > Though in a power optimizing scenario with minimal network traffic > this might be the wrong decision. We have to gather data from the > power maniacs whe

Re: [PATCH 3.10 06/31] tcp: tcp_make_synack() should clear skb->tstamp

2015-04-26 Thread Eric Dumazet
wrote: > Hi Greg, > > On Sun, Apr 26, 2015 at 03:46:26PM +0200, Greg Kroah-Hartman wrote: >> 3.10-stable review patch. If anyone has any objections, please let me know. >> >> ------ >> >> From: Eric Dumazet >> >> [ Upstream commit

Re: [PATCH] fs/file.c: don't acquire files->file_lock in fd_install()

2015-04-28 Thread Eric Dumazet
On Mon, 2015-04-27 at 21:05 +0200, Mateusz Guzik wrote: > On Tue, Apr 21, 2015 at 09:59:28PM -0700, Eric Dumazet wrote: > > From: Eric Dumazet > > > > Mateusz Guzik reported : > > > > Currently obtaining a new file descriptor results in locking fdtable > &

[PATCH v2] fs/file.c: don't acquire files->file_lock in fd_install()

2015-04-28 Thread Eric Dumazet
From: Eric Dumazet Mateusz Guzik reported : Currently obtaining a new file descriptor results in locking fdtable twice - once in order to reserve a slot and second time to fill it. Holding the spinlock in __fd_install() is needed in case a resize is done, or to prevent a resize. Mateusz

Re: [BUG REPORT] kernel panic in tcp_sendpage() on null pointer dereference

2015-04-07 Thread Eric Dumazet
On Tue, 2015-04-07 at 15:57 -0700, Tuan Bui wrote: > Hi all, > > I am consistently seeing this kernel panic on a 16 sockets machine > running Spark PageRank workload using Docker. I am running RHEL 7.0 > stock kernel which is 3.10.0-123.el7.x86_64. Have you tried a recent upstream kernel ? --

Re: [BUG REPORT] kernel panic in tcp_sendpage() on null pointer dereference

2015-04-09 Thread Eric Dumazet
On Tue, 2015-04-07 at 17:01 -0700, Tuan Bui wrote: > On Tue, 2015-04-07 at 16:33 -0700, Eric Dumazet wrote: > > On Tue, 2015-04-07 at 15:57 -0700, Tuan Bui wrote: > > > Hi all, > > > > > > I am consistently seeing this kernel panic on a 16 sockets machine &

Re: "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-09 Thread Eric Dumazet
; Through bisection I found that the perf regression is caused by the > prensence of the following commit in the guest kernel: > > > commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 > Author: Eric Dumazet > Date: Sun Dec 7 12:22:18 2014 -0800 > > tcp: refine TSO autosizin

Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer

2015-05-06 Thread Eric Dumazet
On Tue, 2015-05-05 at 15:00 +0200, Thomas Gleixner wrote: > On Sat, 25 Apr 2015, Eric Dumazet wrote: > > On Thu, 2015-04-23 at 14:45 +0200, Thomas Gleixner wrote: > > > > > You definitely have a point from the high throughput networking > > > perspective. > >

<    9   10   11   12   13   14   15   16   17   18   >