Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote: On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote: On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: We need to drop the refcnt of page when we fail to allocate an skb for frag list, otherwise it will be leaked

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a it didn't drop packets received from host as far as I can tell. virtio is more like a pipe than a real NIC in this respect. Prior/after to this patch, you were not

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote: On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote: On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a it didn't drop packets received

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote: Hmm some kind of disconnect here. I got you rmanagement about bufferbloat. What I am saying is that maybe we should drop packets more aggressively: when we drop one packet of a flow, why not drop everything that's queued and is

Re: lots of brief rcu stalls.

2013-12-04 Thread Eric Dumazet
On Wed, 2013-12-04 at 16:16 -0800, Paul E. McKenney wrote: + if (rdp-rsp == rcu_state + ULONG_CMP_GE(ACCESS_ONCE(jiffies), rdp-rsp-jiffies_resched)) { + rdp-rsp-jiffies_resched += 5; + resched_cpu(rdp-cpu); + } + return 0; } jiffies should

Re: [PATCH] procfs: make smp_affinity values 0644

2014-03-13 Thread Eric Dumazet
On Thu, 2014-03-13 at 19:05 -0700, Chema Gonzalez wrote: Includes: - /proc/irq/default_smp_affinity - /proc/irq/*/smp_affinity - /proc/irq/*/smp_affinity_list Users can distill the same information by reading /proc/interrupts. Signed-off-by: Chema Gonzalez ch...@google.com --- Seems

RE: [PATCH net-next 08/12] r8152: support TSO

2014-03-04 Thread Eric Dumazet
On Tue, 2014-03-04 at 14:35 +, David Laight wrote: Does that mean you are splitting the 64k 'ethernet packet' from TCP is software? I've looked at the ax88179 where the hardware can do it. Is there really a gain doing segmentation here if you have to do the extra data copy? There is no

Re: [PATCH net-next 08/12] r8152: support TSO

2014-03-04 Thread Eric Dumazet
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote: +static u32 r8152_xmit_frags(struct r8152 *tp, struct sk_buff *skb, u8 *data) +{ + struct skb_shared_info *info = skb_shinfo(skb); + unsigned int cur_frag; + u32 total = skb_headlen(skb); + + memcpy(data, skb-data,

Re: [PATCH net-next 09/12] r8152: support IPv6

2014-03-04 Thread Eric Dumazet
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote: Support hw IPv6 checksum for TCP and UDP packets. +/* + * r8152_csum_workaround() + * The hw limites the value the transport offset. When the offset is out of the + * range, calculate the checksum by sw. + */ +static void

Re: [PATCH net-next 08/12] r8152: support TSO

2014-03-04 Thread Eric Dumazet
On Tue, 2014-03-04 at 20:01 +0800, Hayes Wang wrote: Support scatter gather and TSO. - netdev-features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM; - netdev-hw_features = NETIF_F_RXCSUM | NETIF_F_IP_CSUM; + netdev-features |= NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG | +

Re: RCU stalls when running out of memory on 3.14-rc4 w/ NFS and kernel threads priorities changed

2014-03-04 Thread Eric Dumazet
On Tue, 2014-03-04 at 15:55 -0800, Florian Fainelli wrote: Hi all, I am seeing the following RCU stalls messages appearing on an ARMv7 4xCPUs system running 3.14-rc4: [ 42.974327] INFO: rcu_sched detected stalls on CPUs/tasks: [ 42.979839] (detected by 0, t=2102 jiffies,

Re: [PATCH] net: use raw_cpu ops in snmp stats bh

2014-03-06 Thread Eric Dumazet
On Thu, 2014-03-06 at 15:55 +0300, Sergey Senozhatsky wrote: Commit a25982c15ae52 (percpu: add preemption checks to __this_cpu ops) added preemption checks to __this_cpu ops, which are used in SNMP_INC_STATS_BH() and SNMP_ADD_STATS_BH(), resulting in following warnings: BUG: using

Re: [PATCH] net: use raw_cpu ops in snmp stats bh

2014-03-06 Thread Eric Dumazet
On Thu, 2014-03-06 at 06:44 -0800, Eric Dumazet wrote: I think you missed a lot of mails sent by Chrisoph Lameter recently... s/Chrisoph/Christoph/ Appropriate fix would be the following one, please submit this formally. Fixes: f19c29e3e391 (tcp: snmp stats for Fast Open, SYN rtx, and data

Re: next-20131218 - WARNING in qdisc_list_add

2014-03-08 Thread Eric Dumazet
On Sat, 2014-03-08 at 11:41 +0100, Mirco Tischler wrote: Hi I can reproduce this warning reliably on 3.14-rc* (by executing tc qdisc add dev dev root fq_codel via udev rule). The abbove patch fixes it. Thanks for the heads up, I'll respin the patch formally. -- To unsubscribe from this

Re: [PATCH v7 net-next 1/3] filter: add Extended BPF interpreter and converter

2014-03-09 Thread Eric Dumazet
On Sat, 2014-03-08 at 15:15 -0800, Alexei Starovoitov wrote: +/** + * sk_run_filter_ext - run an extended filter + * @ctx: buffer to run the filter on + * @insn: filter to apply + * + * Decode and execute extended BPF instructions. + * @ctx is the data we are operating on. + *

Re: [PATCH v7 net-next 1/3] filter: add Extended BPF interpreter and converter

2014-03-09 Thread Eric Dumazet
On Sat, 2014-03-08 at 15:15 -0800, Alexei Starovoitov wrote: + if (BPF_SRC(fp-code) == BPF_K + (int)fp-k 0) { + /* extended BPF immediates are signed, + * zero extend immediate into tmp

Re: [PATCH v7 net-next 1/3] filter: add Extended BPF interpreter and converter

2014-03-09 Thread Eric Dumazet
On Sun, 2014-03-09 at 10:38 -0700, Alexei Starovoitov wrote: On Sun, Mar 9, 2014 at 7:45 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Sat, 2014-03-08 at 15:15 -0800, Alexei Starovoitov wrote: +/** + * sk_run_filter_ext - run an extended filter + * @ctx: buffer to run the filter

Re: [PATCH v7 net-next 1/3] filter: add Extended BPF interpreter and converter

2014-03-09 Thread Eric Dumazet
On Sun, 2014-03-09 at 11:57 -0700, Alexei Starovoitov wrote: In sk_run_filter_ext() I used u64 stack[64];, but u64 stack[60]; is safe too, but I didn't want to go into extensive explanation of 'magic' 60 number in the first patch, so I just rounded it to 64. Since now you understand, I can

Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3)

2014-01-12 Thread Eric Dumazet
, but we can't find a way how to do that. Another way is to interpret the confirm bit as part of a search key and check it in nf_conntrack_find() too. Cc: Eric Dumazet eric.duma...@gmail.com Cc: Florian Westphal f...@strlen.de Cc: Pablo Neira Ayuso pa...@netfilter.org Cc: Patrick

Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3)

2014-01-14 Thread Eric Dumazet
On Tue, 2014-01-14 at 14:51 +0400, Andrew Vagin wrote: I think __nf_conntrack_alloc must use atomic_inc instead of atomic_set. And we must be sure, that the first object from a new page is zeroized. No you can not do that, and we do not need. If a new page is allocated, then you have the

Re: [PATCH RFC] reciprocal_divide: correction/update of the algorithm

2014-01-14 Thread Eric Dumazet
On Mon, 2014-01-13 at 22:42 +0100, Hannes Frederic Sowa wrote: This patch is a RFC and part of a series Daniel Borkmann and me want to do when introducing prandom_u32_range{,_ro} and prandom_u32_max{,_ro} helpers later this week. -static inline u32 reciprocal_divide(u32 A, u32 R) +struct

Re: [PATCH RFC] reciprocal_divide: correction/update of the algorithm

2014-01-14 Thread Eric Dumazet
On Tue, 2014-01-14 at 14:22 -0500, Austin S Hemmelgarn wrote: I disagree with the statement that current CPU's have reasonably fast dividers. A lot of embedded processors and many low-end x86 CPU's do not in-fact have any hardware divider, and usually provide it using microcode based

Re: [PATCH RFC] reciprocal_divide: correction/update of the algorithm

2014-01-14 Thread Eric Dumazet
On Tue, 2014-01-14 at 15:53 -0500, Austin S Hemmelgarn wrote: On 2014-01-14 14:50, Eric Dumazet wrote: On Tue, 2014-01-14 at 14:22 -0500, Austin S Hemmelgarn wrote: I disagree with the statement that current CPU's have reasonably fast dividers. A lot of embedded processors and many low

Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Eric Dumazet
On Mon, 2014-02-17 at 15:08 +0100, Thomas Glanzmann wrote: Hello Eric, may submit your latest patch for upstream? Or do you plan on doing that yourself? Unfortunately you did not had good results with the MSG_MORE applied to the page fragments. I think I'll submit the part only dealing with

Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Eric Dumazet
On Mon, 2014-02-17 at 16:32 +0100, Thomas Glanzmann wrote: Hello Eric, Unfortunately you did not had good results with the MSG_MORE applied to the page fragments. I agree. We should submit only the submit the patch from this message: Message-ID:

Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg

2014-02-10 Thread Eric Dumazet
On Mon, 2014-02-10 at 21:56 +0100, Thomas Glanzmann wrote: Hello Nab, This looks correct to me. Thomas, once your able to confirm please include your 'Tested-by' and I'll include for the next -rc3 PULL request. Eric is currently reviewing our latest iteration with MSG_MORE for

Re: [PATCH] This extends tx_data and and iscsit_do_tx_data with the additional parameter flags and avoids sending multiple TCP packets in iscsit_fe_sendpage_sg

2014-02-10 Thread Eric Dumazet
On Mon, 2014-02-10 at 13:01 -0800, Eric Dumazet wrote: On Mon, 2014-02-10 at 21:56 +0100, Thomas Glanzmann wrote: Hello Nab, This looks correct to me. Thomas, once your able to confirm please include your 'Tested-by' and I'll include for the next -rc3 PULL request. Eric

Re: [PATCH] n_tty: Fix poll() when TIME_CHAR and MIN_CHAR == 0

2014-02-11 Thread Eric Dumazet
) Reported-by: Eric Dumazet eduma...@google.com Tested-by: Eric Dumazet eduma...@google.com Thanks guys -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please

Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe

2014-02-11 Thread Eric Dumazet
On Tue, 2014-02-11 at 20:56 +0100, Sander Eikelenboom wrote: Hi Dan, FYI just tested and put Xen out of the equation (booting baremetal) and it still persists. I tried something else .. don't know if it gives you anymore insights, but it's worth the try: diff --git a/lib/dma-debug.c

Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe

2014-02-11 Thread Eric Dumazet
On Tue, 2014-02-11 at 18:07 -0800, Dan Williams wrote: The overlap granularity is too large. Multiple dma_map_single mappings are allowed to a given page as long as they don't collide on the same cache line. I am not sure why you try number of mappings of a page. Try launching 100

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: Eric Dumazet eric.duma...@gmail.com writes: I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that is insane... Couldn't it just be the cache miss? Or the fact that we mix 16 bit stores and 32bit loads ? iph

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: Or the fact that we mix 16 bit stores and 32bit loads ? iph-tot_len = newlen; iph-check = 0; iph-check = ip_fast_csum(iph, 5); Yep definitely. Using 16 bit loads in ip_fast_csum() totally removes the stall. I no longer see

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:32 -0700, Eric Dumazet wrote: On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: Or the fact that we mix 16 bit stores and 32bit loads ? iph-tot_len = newlen; iph-check = 0; iph-check = ip_fast_csum(iph, 5); Yep definitely. Using 16 bit loads

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:47 -0700, Eric Dumazet wrote: Another idea would be to move the ip_fast_csum() call at the end of inet_gro_complete() I'll try this : diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8c54870db792..0ca8f350a532 100644 --- a/net/ipv4/af_inet.c +++ b/net

Re: 3.8.4 kernel

2013-05-07 Thread Eric Dumazet
On Tue, 2013-05-07 at 09:20 -0700, Bjorn Helgaas wrote: [+cc Eric because he made a change (69b08f62e17) that apparently exposes driver bugs] On Mon, May 6, 2013 at 7:51 PM, Huang, Xiong xi...@qca.qualcomm.com wrote: Did this ever get resolved? I opened

Re: [patch net] macvlan: fix passthru mode race between dev removal and rx path

2013-05-09 Thread Eric Dumazet
the list is empty. Introduced by: commit eb06acdc85585f2 macvlan: Introduce 'passthru' mode to takeover the underlying device Signed-off-by: Jiri Pirko j...@resnulli.us --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel

Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

2013-06-07 Thread Eric Dumazet
On Fri, 2013-06-07 at 10:54 +0200, Steinar H. Gunderson wrote: On Thu, Jun 06, 2013 at 11:06:48PM -0400, Steven Rostedt wrote: Note the faulting address is 0xa0e52001, which is around the above address, be interesting to know what was at that location. Doh, I looked at the wrong

RE: Scaling problem with a lot of AF_PACKET sockets on different interfaces

2013-06-07 Thread Eric Dumazet
On Fri, 2013-06-07 at 14:30 +0100, David Laight wrote: Looks like the ptype_base[] should be per 'dev'? Or just put entries where ptype-dev != null_or_dev on a per-interface list and do two searches? Yes, but then we would have two searches instead of one in fast path. ptype_base[] is

RE: Scaling problem with a lot of AF_PACKET sockets on different interfaces

2013-06-07 Thread Eric Dumazet
On Fri, 2013-06-07 at 15:09 +0100, David Laight wrote: Looks like the ptype_base[] should be per 'dev'? Or just put entries where ptype-dev != null_or_dev on a per-interface list and do two searches? Yes, but then we would have two searches instead of one in fast path. Usually it

Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

2013-06-07 Thread Eric Dumazet
Rostedt rost...@goodmis.org Signed-off-by: Eric Dumazet eduma...@google.com --- include/net/ip_tunnels.h |6 +++--- net/ipv4/ip_tunnel.c |4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 40b4dfc..1be442f

Re: [PATCH v10 net-next 1/6] net: add napi_id and hash

2013-06-10 Thread Eric Dumazet
On Mon, 2013-06-10 at 11:39 +0300, Eliezer Tamir wrote: Adds a napi_id and a hashing mechanism to lookup a napi by id. This will be used by subsequent patches to implement low latency Ethernet device polling. Based on a code sample by Eric Dumazet. Signed-off-by: Eliezer Tamir eliezer.ta

Re: [PATCH v10 net-next 2/6] net: add low latency socket poll

2013-06-10 Thread Eric Dumazet
/ll_poll.h Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH v10 net-next 3/6] udp: add low latency socket poll support

2013-06-10 Thread Eric Dumazet
|4 net/ipv4/udp.c |6 +- net/ipv6/udp.c |6 +- 3 files changed, 14 insertions(+), 2 deletions(-) Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord

Re: [PATCH v10 net-next 4/6] tcp: add low latency socket poll support.

2013-06-10 Thread Eric Dumazet
to add busy-poll support to more protocols. Signed-off-by: Alexander Duyck alexander.h.du...@intel.com Signed-off-by: Jesse Brandeburg jesse.brandeb...@intel.com Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from

Re: [PATCH v10 net-next 5/6] ixgbe: add support for ndo_ll_poll

2013-06-10 Thread Eric Dumazet
netif_napi_add(). Signed-off-by: Alexander Duyck alexander.h.du...@intel.com Signed-off-by: Jesse Brandeburg jesse.brandeb...@intel.com Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- Reviewed-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-06-10 Thread Eric Dumazet
On Mon, 2013-06-10 at 22:29 +0400, Boris B. Zhmurov wrote: Hello David, Do you think, this patch is stable material and should be included in 3.0-stable and 3.4-stable trees? Thanks. Yes, its in the stable queue : http://patchwork.ozlabs.org/bundle/davem/stable/?state=* -- To

Re: [PATCH RFC ticketlock] Auto-queued ticketlock

2013-06-10 Thread Eric Dumazet
On Sun, 2013-06-09 at 12:36 -0700, Paul E. McKenney wrote: Breaking up locks is better than implementing high-contention locks, but if we must have high-contention locks, why not make them automatically switch between light-weight ticket locks at low contention and queued locks at high

Re: [PATCH v10 net-next 0/6] net: low latency Ethernet device polling

2013-06-11 Thread Eric Dumazet
On Tue, 2013-06-11 at 09:49 +0300, Eliezer Tamir wrote: I would like to hear opinions on what needs to be added to make this feature complete. The list I have so far is: 1. add a socket option Yes, please. I do not believe all sockets on the machine are candidate for low latency. In fact

Re: [PATCH net-next 2/2] net:add socket option for low latency polling

2013-06-11 Thread Eric Dumazet
On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote: adds a socket option for low latency polling. This allows overriding the global sysctl value with a per-socket one. Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- -static inline cycles_t ll_end_time(void) +static

Re: [PATCH net-next 2/2] net:add socket option for low latency polling

2013-06-11 Thread Eric Dumazet
On Tue, 2013-06-11 at 18:37 +0300, Eliezer Tamir wrote: On 11/06/2013 17:45, Eric Dumazet wrote: On Tue, 2013-06-11 at 17:24 +0300, Eliezer Tamir wrote: adds a socket option for low latency polling. This allows overriding the global sysctl value with a per-socket one. Signed-off

Re: [PATCH] slab: prevent warnings when allocating with __GFP_NOWARN

2013-06-11 Thread Eric Dumazet
On Tue, 2013-06-11 at 11:44 -0400, Sasha Levin wrote: On 06/11/2013 11:23 AM, Christoph Lameter wrote: On Tue, 11 Jun 2013, Pekka Enberg wrote: So you're OK with going forward with Sasha's patch? It's needed because __GFP_NOWARN was specifically added there to fix this issue earlier.

Re: [PATCH] slab: prevent warnings when allocating with __GFP_NOWARN

2013-06-11 Thread Eric Dumazet
On Tue, 2013-06-11 at 12:19 -0400, Sasha Levin wrote: It might be, but you need CAP_SYS_RESOURCE to go into the dangerous zone (pipe_max_size). So if root (or someone with that cap) wants to go there, as Rusty says: Root asked, we do. Yes and no : adding a test to select vmalloc()/vfree()

Re: [PATCH 1/3] skbuff: Update truesize in pskb_expand_head

2013-06-12 Thread Eric Dumazet
On Wed, 2013-06-12 at 19:05 +1000, Dave Wiltshire wrote: Some call sites to pskb_expand_head subsequently update the skb truesize and others don't (even with non-zero arguments). This is likely a memory audit leak. Fixed this up by moving the memory accounting to the skbuff.c file and removing

Re: [PATCH v2 net-next 3/3] net: add socket option for low latency polling

2013-06-12 Thread Eric Dumazet
On Wed, 2013-06-12 at 14:20 +0300, Eliezer Tamir wrote: adds a socket option for low latency polling. This allows overriding the global sysctl value with a per-socket one. Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- It seems EXPORT_SYMBOL_GPL(sysctl_net_ll_poll) can now

Re: [PATCH v2 net-next 3/3] net: add socket option for low latency polling

2013-06-12 Thread Eric Dumazet
On Wed, 2013-06-12 at 15:54 +0300, Eliezer Tamir wrote: On 12/06/2013 15:44, Eric Dumazet wrote: On Wed, 2013-06-12 at 14:20 +0300, Eliezer Tamir wrote: adds a socket option for low latency polling. This allows overriding the global sysctl value with a per-socket one. Signed-off

Re: [PATCH 1/3] skbuff: Update truesize in pskb_expand_head

2013-06-12 Thread Eric Dumazet
On Thu, 2013-06-13 at 09:35 +1000, Dave Wiltshire wrote: Firstly, from my cover letter: Perhaps I don't understand something, but I thought it best to generate the change and then ask. So is this correct?. Sure I have no problems with that. But secondly, I understand that the only reason

Re: [net-next PATCH 1/2] macvtap: slient sparse warnings

2013-06-12 Thread Eric Dumazet
On Thu, 2013-06-13 at 12:21 +0800, Jason Wang wrote: This patch silents the following sparse warnings: dr Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/macvtap.c |2 +- include/linux/if_macvlan.h |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff

Re: [PATCH v3 net-next 3/4] net: change sysctl_net_ll_poll into an unsigned int

2013-06-13 Thread Eric Dumazet
On Thu, 2013-06-13 at 17:46 +0300, Eliezer Tamir wrote: There is no reason for sysctl_net_ll_poll to be an unsigned long. Change it into an unsigned int. Fix the proc handler. Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- include/net/ll_poll.h |2 +-

Re: include/net/ipv6.h:408:38: warning: ‘*((void *)addr+8)’ may be used uninitialized in this function

2013-06-13 Thread Eric Dumazet
On Thu, 2013-06-13 at 18:16 +0300, Tommi Rantala wrote: Hello, I'm seeing the following compiler warnings. Do these make any sense? I'm doing a x86-64 build. $ gcc --version gcc (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3 Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see

Re: [PATCH v4 net-next 1/4] net: change sysctl_net_ll_poll into an unsigned int

2013-06-13 Thread Eric Dumazet
On Fri, 2013-06-14 at 04:56 +0300, Eliezer Tamir wrote: There is no reason for sysctl_net_ll_poll to be an unsigned long. Change it into an unsigned int. Fix the proc handler. Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- Acked-by: Eric Dumazet eduma...@google.com

Re: [PATCH v4 net-next 2/4] net: convert low latency sockets to sched_clock()

2013-06-13 Thread Eric Dumazet
On Fri, 2013-06-14 at 04:57 +0300, Eliezer Tamir wrote: Use sched_clock() instead of get_cycles(). We can use sched_clock() because we don't care much about accuracy. Remove the dependency on X86_TSC Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- -static inline bool

Re: [PATCH 1/1] ipv4: Fixed MD5 key lookups when adding/ removing MD5 to/ from TCP sockets.

2013-06-14 Thread Eric Dumazet
On Fri, 2013-06-14 at 18:56 +1200, Aydin Arik wrote: MD5 key lookups on a given TCP socket were being performed incorrectly. This fix alters parameter inputs to the MD5 lookup function tcp_md5_do_lookup, which is called by functions tcp_md5_do_add and tcp_md5_do_del. Specifically, the change

Re: [PATCH 1/1] [PATCH v2] tcp: Fixed MD5 key lookups when adding/removing MD5.

2013-06-14 Thread Eric Dumazet
= tcp_md5_do_lookup(sk, addr, family); if (!key) return -ENOENT; hlist_del_rcu(key-node); Thanks ! Acked-by: Eric Dumazet eduma...@google.com (No need for [PATCH 1/1] for a single patch) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body

Re: kernel BUG at net/core/skbuff.c:1065!

2013-06-16 Thread Eric Dumazet
On Sun, 2013-06-16 at 13:47 +0300, Tommi Rantala wrote: Hello, Hit this bug while fuzzing in a qemu virtual machine as the root user. Kernel is v3.10-rc5-0-g317ddd2. Tommi [575180.874750] type=1401 audit(1371378748.322:7750): SELinux: unrecognized netlink message type=0 for sclass=36

Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote: When search the first skb to collapse,the condition of overlap to the next one have been reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then followed process will trigger the BUG_ON of the offset(start -

Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)-seq;

Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote: On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq

Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Tue, 2013-06-18 at 05:52 -0400, Jun Chen wrote: There are many warning for tcp_recvmsg before this crash. I can't find other memory warning in the logs, but I'm not sure whether there are memory issues because of the length limitation of saved logs. I think this logs will give you more

Re: [PATCH v2 net-next] net: poll/select low latency socket support

2013-06-18 Thread Eric Dumazet
On Tue, 2013-06-18 at 11:58 +0300, Eliezer Tamir wrote: select/poll busy-poll support. */ -static inline u64 ll_end_time(struct sock *sk) +static inline u64 ll_sk_end_time(struct sock *sk) { - u64 end_time = ACCESS_ONCE(sk-sk_ll_usec); - - /* we don't mind a ~2.5%

Re: [PATCH v3 1/3] unix/dgram: peek beyond 0-sized skbs

2013-04-29 Thread Eric Dumazet
bpoir...@suse.de --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http

Re: [PATCH v3 2/3] unix/dgram: fix peeking with an offset larger than data in queue

2013-04-29 Thread Eric Dumazet
this so that the behavior is the same as peeking with no offset on an empty queue: the caller blocks. Signed-off-by: Benjamin Poirier bpoir...@suse.de --- v2: address review feedback from Eric Dumazet v3: address review feedback from Cong Wang net/core/datagram.c | 21

Re: [PATCH v3 3/3] unix/stream: fix peeking with an offset larger than data in queue

2013-04-29 Thread Eric Dumazet
on an empty queue: the caller blocks. Signed-off-by: Benjamin Poirier bpoir...@suse.de --- net/unix/af_unix.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line

Re: [PATCH 1/2] Make the batch size of the percpu_counter configurable

2013-04-30 Thread Eric Dumazet
On Tue, 2013-04-30 at 09:23 -0700, Tim Chen wrote: On Tue, 2013-04-30 at 13:32 +, Christoph Lameter wrote: On Mon, 29 Apr 2013, Tim Chen wrote: diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index d5dd465..5ca7df5 100644 ---

Re: linux-next: Tree for Apr 30 (netdev: mellanox/mlx4)

2013-04-30 Thread Eric Dumazet
On Tue, 2013-04-30 at 11:58 -0700, Randy Dunlap wrote: On 04/29/13 23:57, Stephen Rothwell wrote: Hi all, Please do not add any v3.11 destined work to your linux-next included branches until after v3.10-rc1 is released. Changes since 20130429: on i386: ERROR: __udivdi3

Re: linux-next: Tree for Apr 30 (netdev: mellanox/mlx4)

2013-04-30 Thread Eric Dumazet
On Tue, 2013-04-30 at 13:39 -0700, Randy Dunlap wrote: Yes, that fixes it. Thanks. Acked-by: Randy Dunlap rdun...@infradead.org Thanks, I'll send the official patch in a couple of minutes. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message

[PATCH net-next] mlx4_en: fix a build error on 32bit arches

2013-04-30 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com commit b6c39bfcf1d7d63 (net/mlx4_en: Add a service task) added a build error on 32bit arches. ERROR: __udivdi3 [drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko] undefined! Fix this problem by using do_div() Reported-by: Randy Dunlap rdun...@infradead.org

Re: [PATCH v2 1/2] sched: Add cond_resched_rcu_lock() helper

2013-05-01 Thread Eric Dumazet
On Wed, 2013-05-01 at 17:17 +0200, Peter Zijlstra wrote: On Wed, May 01, 2013 at 05:46:37AM -0700, Paul E. McKenney wrote: If the only goal is to allow preemption, and if long grace periods are not a concern, then this alternate approach would work fine as well. Hmm.. if that were the

Re: [PATCH v9 net-next 1/7] net: add napi_id and hash

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote: Adds a napi_id and a hashing mechanism to lookup a napi by id. This will be used by subsequent patches to implement low latency Ethernet device polling. Based on a code sample by Eric Dumazet. Signed-off-by: Eliezer Tamir eliezer.ta

Re: [PATCH v9 net-next 2/7] net: add low latency socket poll

2013-06-05 Thread Eric Dumazet
eliezer.ta...@linux.intel.com --- Are you sure this version was tested by Willem ? Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org

Re: [PATCH v9 net-next 3/7] udp: add low latency socket poll support

2013-06-05 Thread Eric Dumazet
...@linux.intel.com --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http

Re: [PATCH v9 net-next 4/7] tcp: add low latency socket poll support.

2013-06-05 Thread Eric Dumazet
to add busy-poll support to more protocols. Signed-off-by: Alexander Duyck alexander.h.du...@intel.com Signed-off-by: Jesse Brandeburg jesse.brandeb...@intel.com Tested-by: Willem de Bruijn will...@google.com Signed-off-by: Eliezer Tamir eliezer.ta...@linux.intel.com --- Acked-by: Eric Dumazet

Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote: A very naive select/poll busy-poll support. Add busy-polling to sock_poll(). When poll/select have nothing to report, call the low-level sock_poll() again until we are out of time or we find something. Right now we poll every socket

Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote: On 05/06/2013 16:30, Eric Dumazet wrote: On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote: A very naive select/poll busy-poll support. Add busy-polling to sock_poll(). When poll/select have nothing to report, call the low-level

RE: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 14:49 +0100, David Laight wrote: I am a bit uneasy with this one, because an applicatio polling() on one thousand file descriptors using select()/poll(), will call sk_poll_ll() one thousand times. Anything calling poll() on 1000 fds probably has performance issues

Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 06:56 -0700, Eric Dumazet wrote: This looks quite easy, by adding in include/uapi/asm-generic/poll.h #define POLL_LL 0x8000 And do the sk_poll_ll() call only if flag is set. I do not think we have to support select(), as its legacy interface, and people wanting ll

Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote: On 05/06/2013 16:30, Eric Dumazet wrote: I am a bit uneasy with this one, because an applicatio polling() on one thousand file descriptors using select()/poll(), will call sk_poll_ll() one thousand times. But we call sk_poll_ll

Re: [PATCH v9 net-next 2/7] net: add low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote: This is probably too big to be inlined, and nonblock should be a bool It would also make sense to give end_time as a parameter, so that the polling() code could really give a end_time for the whole duration of poll(). (You then should

Re: [PATCH v9 net-next 2/7] net: add low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote: On 05/06/2013 18:21, Eric Dumazet wrote: On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote: This is probably too big to be inlined, and nonblock should be a bool It would also make sense to give end_time as a parameter, so

Re: [PATCH v9 net-next 2/7] net: add low latency socket poll

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 18:46 +0300, Eliezer Tamir wrote: On 05/06/2013 18:39, Eric Dumazet wrote: On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote: On 05/06/2013 18:21, Eric Dumazet wrote: It would also make sense to give end_time as a parameter, so that the polling() code could

Re: stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 14:11 -0700, Tejun Heo wrote: (cc'ing wireless crowd, tglx and Ingo. The original thread is at http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 ) Hello, Ben. On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote: Hmm, wonder if I found it. I

Re: stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote: Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting. Do you care to revive the 10 iterations limit so that it's limited by both the count and timing? We

Re: stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 20:41 -0700, Ben Greear wrote: On 06/05/2013 08:26 PM, Eric Dumazet wrote: On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote: Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting

Re: stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 20:50 -0700, Ben Greear wrote: On 06/05/2013 08:46 PM, Eric Dumazet wrote: We use in Google a patch triggering warning is a thread holds the cpu without taking care to need_resched() for more than xx ms Well, I'm sure that patch works nicely until the clock stops

Re: [PATCH] Fix lockup related to stop_machine being stuck in __do_softirq.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 21:25 -0700, gree...@candelatech.com wrote: From: Ben Greear gree...@candelatech.com diff --git a/kernel/softirq.c b/kernel/softirq.c index 14d7758..f150ad6 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -204,6 +204,7 @@ EXPORT_SYMBOL(local_bh_enable_ip); *

Re: [PATCH 5/5] net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg

2013-06-05 Thread Eric Dumazet
On Thu, 2013-06-06 at 12:56 +1000, Michael Neuling wrote: On Thu, May 23, 2013 at 7:07 AM, Andy Lutomirski l...@amacapital.net wrote: MSG_CMSG_COMPAT is (AFAIK) not intended to be part of the API -- it's a hack that steals a bit to indicate to other networking code that a compat entry was

Re: [PATCH] net: Unbreak compat_sys_{send,recv}msg

2013-06-06 Thread Eric Dumazet
in sys_socketcall. Apparently I was suffering from underscore blindness the first time around. Signed-off-by: Andy Lutomirski l...@amacapital.net Eric, can you test this patch too? Yes, this fixes the problem as well on x86_64 Tested-by: Eric Dumazet eduma...@google.com Thanks ! PS: I

Re: [PATCH v2] Fix lockup related to stop_machine being stuck in __do_softirq.

2013-06-06 Thread Eric Dumazet
__do_softirq can hang is that it has a bail-out based on jiffies timeout, but in the lockup case, jiffies itself is not incremented. ... Signed-off-by: Ben Greear gree...@candelatech.com --- v2: Fix comments and reformat conditional per suggestions. Acked-by: Eric Dumazet eduma...@google.com

Re: NULL pointer dereference when loading the gre module (3.10.0-rc4)

2013-06-06 Thread Eric Dumazet
On Thu, 2013-06-06 at 23:06 -0400, Steven Rostedt wrote: On Fri, Jun 07, 2013 at 12:16:56AM +0200, Steinar H. Gunderson wrote: Hi, In 3.10.0-rc4, I get this on boot: [ 16.871043] BUG: unable to handle kernel NULL pointer dereference at 0003 [ 16.879453] IP:

Re: 13GB dcache+inode cache hash tables

2013-06-25 Thread Eric Dumazet
On Tue, 2013-06-25 at 16:56 +0800, Daniel J Blueman wrote: As memory capacity increases, we see the dentry and inode cache hash tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system. Perhaps a better approach adds a linear component to an exponent to give tuned scaling,

Re: [netlink] WARNING: at mm/vmalloc.c:1487 __vunmap()

2013-06-26 Thread Eric Dumazet
On Mon, 2013-06-17 at 22:09 +0200, Pablo Neira Ayuso wrote: I've been trying to trigger this bug here with no success using different communication configurations (userspace - userspace, userspace - kernelspace). I got more success with trinity ;) The address that vfree shows seems good

<    5   6   7   8   9   10   11   12   13   14   >