[PATCH] get rid of ARCH_HAVE_XTIME_LOCK

2006-12-11 Thread Eric Dumazet
((weak)) in the declaration. linker should do the job. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19/kernel/timer.c 2006-12-11 11:25:50.0 +0100 +++ linux-2.6.19-ed/kernel/timer.c 2006-12-11 11:31:30.0 +0100 @@ -1020,11 +1020,9 @@ static inline void calc_load

[PATCH] Optimize calc_load()

2006-12-11 Thread Eric Dumazet
and nr_uninterruptible of all online CPUS, bringing foreign dirty cache lines. This patch is an optimization of calc_load() so that nr_active() is called only if we need it. The use of unlikely() is welcome since the condition is true only once every 5*HZ time. Signed-off-by: Eric Dumazet [EMAIL

[PATCH] constify pipe_buf_operations

2006-12-11 Thread Eric Dumazet
pipe/splice should use const pipe_buf_operations and file_operations struct pipe_inode_info has an unused field start : get rid of it. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19/include/linux/pipe_fs_i.h 2006-12-11 17:00:21.0 +0100 +++ linux-2.6.19-ed/include

Re: [PATCH] group xtime, xtime_lock, wall_to_monotonic, avenrun, calc_load_count fields together in ktimed

2006-12-11 Thread Eric Dumazet
Andrew Morton a écrit : hm, the patch seems to transform a mess into a mess. I guess it's a messy problem. I agree that aggregating all the time-related things into a struct like this makes some sense. As does aggregating them all into a similar-looking namespace, but that'd probably be too

[PATCH] reorder struct pipe_buf_operations

2006-12-11 Thread Eric Dumazet
vmlinux.pre vmlinux textdata bss dec hex filename 3268989 664356 492196 4425541 438745 vmlinux.pre 3268765 664356 492196 4425317 438665 vmlinux So this patch reduces text size by 224 bytes on my x86_64 machine. Similar results on ia32. Signed-off-by: Eric Dumazet [EMAIL

[PATCH] Introduce jiffies_32 and related compare functions

2006-12-11 Thread Eric Dumazet
). Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19/include/linux/jiffies.h2006-12-12 00:32:00.0 +0100 +++ linux-2.6.19-ed/include/linux/jiffies.h 2006-12-12 00:41:40.0 +0100 @@ -80,6 +80,11 @@ */ extern u64 __jiffy_data jiffies_64; extern unsigned

Re: [PATCH] Introduce jiffies_32 and related compare functions

2006-12-11 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Mon, 11 Dec 2006 23:58:06 +0100 Some subsystems dont need more than 32bits timestamps. See for example net/ipv4/inetpeer.c and include/net/tcp.h : #define tcp_time_stamp((__u32)(jiffies)) Because most timeouts

Re: [PATCH] Introduce jiffies_32 and related compare functions

2006-12-11 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Tue, 12 Dec 2006 04:47:14 +0100 I doubt being able to extend the expiration of a dst above 2^32 ticks (49 days if HZ=1000, 198 days if HZ=250) is worth the ram wastage. And this doesn't apply for all jiffies uses because

Re: [PATCH] Introduce jiffies_32 and related compare functions

2006-12-11 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Tue, 12 Dec 2006 05:09:23 +0100 We definitly *like* being able to use bigger timeouts on 64bits platforms. Not that they are mandatory since the same application should run fine on 32bits kernel. But as the standard type

[PATCH] Introduce time_data, a new structure to hold jiffies, xtime, xtime_lock, wall_to_monotonic, calc_load_count and avenrun

2006-12-13 Thread Eric Dumazet
/jiffies in time_data. This patch does the thing for i386 and x86_64. avenrun, xtime, xtime_lock, wall_to_monotonic, are now temporary defined as macros to make this patch not too invasive, but we can in future patches gradually deletes these macros. Signed-off-by: Eric Dumazet [EMAIL PROTECTED

Re: kref refcnt and false positives

2006-12-14 Thread Eric Dumazet
Andrew Morton a écrit : On Wed, 13 Dec 2006 16:12:46 -0800 Greg KH [EMAIL PROTECTED] wrote: Original comment seemed to indicate that this conditional thing was performance related. Is it really? If not, we should consider the below patch. Yes, it's a performance gain and I don't see how this

Re: [PATCH] x86_64: Make the NUMA hash function nodemap allocation dynamic and remove NODEMAPSIZE

2006-11-27 Thread Eric Dumazet
applies to 2.6.19-rc4 and has been tested. This patch needs testing on a K8 NUMA platform. Thanks to Eric Dumazet and Andi Kleen for their improvement suggestions. I had the patch in, but had to drop it again because it makes one of my test system triple fault. Haven't done much investigation

Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations

2006-11-27 Thread Eric Dumazet
Andrew Morton a écrit : On Thu, 23 Nov 2006 11:57:29 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: On 32bits SMP platforms, 64bits i_size is protected by a seqcount (i_size_seqcount). When i_size is read or written, i_size_seqcount is read/written as well, so it make sense to group these two

Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations

2006-11-28 Thread Eric Dumazet
Andrew Morton a écrit : This all depends on the offset of the inode, and you don't know what that is. offsetof(ext3_inode_info, vfs_inode) != offsetof(nfs_inode, vfs_inode), etc. Doh... yes you are absolutly right :) I feel dumb now :( - To unsubscribe from this list: send the line

Re: [PATCH] i386-pda UP optimization

2006-11-29 Thread Eric Dumazet
On Wednesday 29 November 2006 00:12, Jeremy Fitzhardinge wrote: Hi Eric, Could you try this patch out and see if it makes much performance difference for you. You should apply this on top of the %fs patch I posted earlier (and use the %fs patch as the baseline for your comparisons). Hi

Re: [RCU] adds a prefetch() in rcu_do_batch()

2006-11-30 Thread Eric Dumazet
On Thursday 30 November 2006 02:25, Paul E. McKenney wrote: On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote: On some workloads, (for example when lot of close() syscalls are done), RCU qlen can be quite large, and RCU heads are no longer in cpu cache when rcu_do_batch

Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race

2006-12-03 Thread Eric Dumazet
Oleg Nesterov a écrit : On top of rcu-add-a-prefetch-in-rcu_do_batch.patch rcu_do_batch: struct rcu_head *next, *list; while (list) { next = list-next; -- [1] list-func(list); list = next; } We can't trust *list

Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race

2006-12-03 Thread Eric Dumazet
Oleg Nesterov a écrit : On 12/03, Eric Dumazet wrote: Oleg Nesterov a ?crit : On top of rcu-add-a-prefetch-in-rcu_do_batch.patch rcu_do_batch: struct rcu_head *next, *list; while (list) { next = list-next; -- [1] list-func(list

Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race

2006-12-03 Thread Eric Dumazet
Oleg Nesterov a écrit : On 12/03, Eric Dumazet wrote: Oleg Nesterov a ?crit : Yes, but how is it related to RCU ? I mean, rcu_do_batch() is just a loop like others in kernel. The loop itself is not buggy, but can call a buggy function, you are right. int start_me_again

[PATCH] SLAB : use a multiply instead of a divide in obj_to_index()

2006-12-04 Thread Eric Dumazet
),%ebx Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19/mm/slab.c 2006-12-04 11:50:19.0 +0100 +++ linux-2.6.19-ed/mm/slab.c 2006-12-04 17:25:02.0 +0100 @@ -371,6 +371,19 @@ static void kmem_list3_init(struct kmem_ } while (0) /* + * Define the reciprocal

Re: [PATCH] SLAB : use a multiply instead of a divide in obj_to_index()

2006-12-04 Thread Eric Dumazet
(%esp) // useless mov    (%esp),%ebx Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19/include/linux/reciprocal_div.h 1970-01-01 01:00:00.0 +0100 +++ linux-2.6.19-ed/include/linux/reciprocal_div.h 2006-12-04 19:01:44.0 +0100 @@ -0,0 +1,30 @@ +#ifndef

Re: [PATCH] SLAB : use a multiply instead of a divide in obj_to_index()

2006-12-04 Thread Eric Dumazet
) // useless mov(%esp),%ebx Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19/include/linux/reciprocal_div.h 1970-01-01 01:00:00.0 +0100 +++ linux-2.6.19-ed/include/linux/reciprocal_div.h 2006-12-04 23:12:34.0 +0100 @@ -0,0 +1,30 @@ +#ifndef

Re: [PATCH] SLAB : use a multiply instead of a divide in obj_to_index()

2006-12-04 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Mon, 04 Dec 2006 22:34:29 +0100 On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1 cycle for a multiply. For UltraSPARC I and II (which is what this 200mhz guy probably is), it's 4 cycle latency

Re: [discuss] [PATCH] allow CONFIG_FRAME_POINTER for x86-64

2005-09-09 Thread Eric Dumazet
Philippe Elie a écrit : On Fri, 09 Sep 2005 at 11:23 +, Andi Kleen wrote: Indeed. Someone must have fixed it. But why would anyone want frame pointers on x86-64? Oprofile can use it, I though it was already used but apparently only to backtrace userspace actually. Hi Pilippe Last

Re: [PATCH] lib: gcd: prevent possible div by 0

2012-09-10 Thread Eric Dumazet
On Sun, 2012-09-09 at 17:03 +0200, Davidlohr Bueso wrote: Account for properties when a and/or b are 0: gcd(0, 0) = 0 gcd(a, 0) = a gcd(0, b) = b Cc: sta...@vger.kernel.org Signed-off-by: Davidlohr Bueso d...@gnu.org --- lib/gcd.c |3 +++ 1 file changed, 3 insertions(+) diff

Re: net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!

2012-09-10 Thread Eric Dumazet
On Mon, 2012-09-10 at 09:44 +0100, ch...@chris-wilson.co.uk wrote: I've not seen this reported yet, so here's a warning that happens occasionally: [192979.475833] [192979.475840] === [192979.475841] [ INFO: suspicious RCU usage. ] [192979.475844] 3.6.0-rc2+ #33

Re: [PATCH v2 10/10] thp: implement refcounting for huge zero page

2012-09-10 Thread Eric Dumazet
On Mon, 2012-09-10 at 16:13 +0300, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com H. Peter Anvin doesn't like huge zero page which sticks in memory forever after the first allocation. Here's implementation of lockless refcounting for huge zero page. ...

Re: [PATCH v2 10/10] thp: implement refcounting for huge zero page

2012-09-10 Thread Eric Dumazet
On Mon, 2012-09-10 at 17:44 +0300, Kirill A. Shutemov wrote: On Mon, Sep 10, 2012 at 04:02:39PM +0200, Eric Dumazet wrote: On Mon, 2012-09-10 at 16:13 +0300, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com H. Peter Anvin doesn't like huge zero page

Re: [PATCH v2 10/10] thp: implement refcounting for huge zero page

2012-09-10 Thread Eric Dumazet
On Mon, 2012-09-10 at 17:44 +0300, Kirill A. Shutemov wrote: Yes, disabling preemption before alloc_pages() and enabling after atomic_set() looks reasonable. Thanks. In fact, as alloc_pages(GFP_TRANSHUGE | __GFP_ZERO, HPAGE_PMD_ORDER); might sleep, it would be better to disable preemption

[PATCH net-next v1] net: use a per task frag allocator

2012-09-19 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com We currently use a per socket page reserve for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent

Re: [PATCH net-next v1] net: use a per task frag allocator

2012-09-21 Thread Eric Dumazet
, bnx2x, tg3, mellanox mlx4) Signed-off-by: Eric Dumazet eduma...@google.com Cc: Ben Hutchings bhutchi...@solarflare.com Cc: Vijay Subramanian subramanian.vi...@gmail.com Cc: Alexander Duyck alexander.h.du...@intel.com --- v2: uses existing page_frag structure to hold page/offset/size convert

Re: [PATCH net-next v1] net: use a per task frag allocator

2012-09-22 Thread Eric Dumazet
On Fri, 2012-09-21 at 13:27 -0700, Vijay Subramanian wrote: I get the following compile error with the newer version of the patch net/sched/em_meta.c: In function ‘meta_int_sk_sendmsg_off’: net/sched/em_meta.c:464: error: ‘struct sock’ has no member named ‘sk_sndmsg_off’ make[1]: ***

Re: iwl3945: order 5 allocation during ifconfig up; vm problem?

2012-09-11 Thread Eric Dumazet
On Tue, 2012-09-11 at 16:25 -0700, Andrew Morton wrote: Asking for a 256k allocation is pretty crazy - this is an operating system kernel, not a userspace application. I'm wondering if this is due to a recent change, but I'm having trouble working out where the allocation call site is. --

Re: [PATCH] netfilter/iptables: Fix log-level processing

2012-09-12 Thread Eric Dumazet
On Wed, 2012-09-12 at 00:46 -0700, Joe Perches wrote: auto75914...@hushmail.com reports that iptables does not correctly output the KERN_level. $IPTABLES -A RULE_0_in -j LOG --log-level notice --log-prefix DENY in: result with linux 3.6-rc5 Sep 12 06:37:29 x kernel: 5DENY in:

Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()

2012-09-12 Thread Eric Dumazet
On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote: void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode) { debug_object_init_on_stack(timer, hrtimer_debug_descr); - __hrtimer_init(timer, clock_id, mode); +

Re: xt_nat_init: BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0

2012-09-13 Thread Eric Dumazet
On Thu, 2012-09-13 at 17:16 +0800, Fengguang Wu wrote: Hi Patrick, This happens in today's linux-next tree and is pretty reproducible. Bisection has been started. [1.834544] nf_conntrack version 0.5.0 (1786 buckets, 7144 max) [1.835406] ctnetlink v0.93: registering with nfnetlink.

Re: [PATCH] sch_red: fix weighted average calculation

2012-09-13 Thread Eric Dumazet
On Thu, 2012-09-13 at 09:43 -0400, Cyril Chemparathy wrote: This patch fixes an apparent bug in the running weighted average calculation used in the RED algorithm. Going by the described formula: qavg = qavg*(1-W) + backlog*W = qavg = qavg + (backlog - qavg) * W ... with W

RE: [PATCH] sch_red: fix weighted average calculation

2012-09-14 Thread Eric Dumazet
On Fri, 2012-09-14 at 13:01 +, Dowdal, John wrote: Eric, thank you for reviewing the code. I now see the problem with the patch since backlog is an integer and qavg is a fixed point number at logW. We are considering another patch to update the comments to this code (with the actual

Re: more interrupts (lower performance) in bare-metal compared with running VM

2012-07-27 Thread Eric Dumazet
On Fri, 2012-07-27 at 22:09 -0500, sheng qiu wrote: Hi all, i am comparing network throughput performance under bare-metal case with that running VM with assigned-device (assigned NIC). i have two physical machines (each has a 10Gbit NIC), one is used as remote server (run netserver) and

Re: [PATCH 2/3] Introduce percpu rw semaphores

2012-07-28 Thread Eric Dumazet
On Sat, 2012-07-28 at 12:41 -0400, Mikulas Patocka wrote: Introduce percpu rw semaphores When many CPUs are locking a rw semaphore for read concurrently, cache line bouncing occurs. When a CPU acquires rw semaphore for read, the CPU writes to the cache line holding the semaphore.

Re: [dm-devel] [PATCH 2/3] Introduce percpu rw semaphores

2012-07-29 Thread Eric Dumazet
On Sun, 2012-07-29 at 01:13 -0400, Mikulas Patocka wrote: Each cpu should have its own rw semaphore in its cache, so I don't see a problem there. When you change block size, all 4096 rw semaphores are locked for write, but changing block size is not a performance sensitive operation.

Re: [dm-devel] [PATCH 2/3] Introduce percpu rw semaphores

2012-07-29 Thread Eric Dumazet
On Sun, 2012-07-29 at 12:10 +0200, Eric Dumazet wrote: You can probably design something needing no more than 4 bytes per cpu, and this thing could use non locked operations as bonus. like the following ... Coming back from my bike ride, here is a more polished version with proper

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-28 Thread Eric Dumazet
On Tue, 2012-08-28 at 23:38 +0300, Alexey Dobriyan wrote: Nothing can stop RCU! After running modprobe;rmmod in a loop and cat in another loop for a while rmmod got stuck in D-state inside remove_proc_entry() with trace amounts of CPU time being consumed. It didn't oopsed, though.

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-29 Thread Eric Dumazet
On Tue, 2012-08-28 at 21:34 -0700, H.K. Jerry Chu wrote: IMHO 31secs seem a little short. Why not change it to 6 as well because 63 secs still beats 93secs with 3sec initRTO and 5 retries. Jerry My rationale was that such increase were going to amplify SYN attacks impact by 20% (if we

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-29 Thread Eric Dumazet
On Wed, 2012-08-29 at 16:50 +0300, Alexey Dobriyan wrote: On Wed, Aug 29, 2012 at 7:11 AM, Eric Dumazet eric.duma...@gmail.com wrote: I'll polish this patch once LKS/LPC is over... It should oops in the following way (excuse Gmail please): PDEO is removed from lists -pde_users is 0 PDE

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-30 Thread Eric Dumazet
On Wed, 2012-08-29 at 10:25 -0700, H.K. Jerry Chu wrote: But it probably matter slightly more for TCP Fast Open (the server side patch has been completed and will be posted soon, after I finish breaking it up into smaller pieces for ease of review purpose), when a full socket will be created

Re: [PATCH v2 1/2] 6lowpan: Make a copy of skb's delivered to 6lowpan

2012-08-31 Thread Eric Dumazet
On Wed, 2012-08-29 at 22:39 -0400, Alan Ott wrote: Since lowpan_process_data() modifies the skb (by calling skb_pull()), we need our own copy so that it doesn't affect the data received by other protcols (in this case, af_ieee802154). Signed-off-by: Alan Ott a...@signal11.us ---

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-31 Thread Eric Dumazet
updated to describe the current settings. The same goes for the documentation file Documentation/networking/ip-sysctl.txt. Signed-off-by: Alexander Bergmann a...@linlab.net --- Thanks for your patience and followup, this seems good to me ! Acked-by: Eric Dumazet eduma...@google.com

Re: [PATCH] bnx2: update bnx2-mips-09 firmware to bnx2-mips-09-6.2.1b

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 08:17 +0200, Willy Tarreau wrote: Well, if the drivers provided with the kernel don't work out of the box anymore, maybe we should also move them to a separate repository. All it is going to do otherwise is to cause invalid bug reports because users don't understand

Re: [PATCH] bnx2: update bnx2-mips-09 firmware to bnx2-mips-09-6.2.1b

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 08:49 +0200, Willy Tarreau wrote: Hi Eric, On Wed, Aug 08, 2012 at 08:27:52AM +0200, Eric Dumazet wrote: On Wed, 2012-08-08 at 08:17 +0200, Willy Tarreau wrote: Well, if the drivers provided with the kernel don't work out of the box anymore, maybe we should

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
this down and it seems to be the following commit: commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046 Author: Eric Dumazet eduma...@google.com Date: Thu Jul 19 07:34:03 2012 + ipv4: tcp: remove per net tcp_sock It doesn't revert totally cleanly, but after fixing up the rejections

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 15:26 -0400, Paul Moore wrote: On Wednesday, August 08, 2012 12:14:42 PM John Stultz wrote: So I bisected this down and it seems to be the following commit: commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046 Author: Eric Dumazet eduma...@google.com Date: Thu Jul 19

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 15:50 -0400, Paul Moore wrote: Yep. I was just trying to see if there was a way we could avoid having to make it conditional on CONFIG_SECURITY, but I think this is a better approach than the alternatives. I'm also looking into making sure we get a sane LSM label on

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 12:49 -0700, John Stultz wrote: I can't comment on the patch itself, but I tested it against Linus' HEAD and it seems to resolve the oops on shutdown for me. OK, thanks ! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote: Seems wrong. We shouldn't ever need ifdef CONFIG_SECURITY in core code. Sure but it seems include file misses an accessor for this. We could add it on a future cleanup patch, as Paul mentioned. Ifndef CONF_SECURITY then

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote: On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote: Seems wrong. We shouldn't ever need ifdef CONFIG_SECURITY in core code. Sure but it seems include file misses an accessor for this. We could add it on a future cleanup patch

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 16:46 -0400, Paul Moore wrote: On Wednesday, August 08, 2012 10:32:52 PM Eric Dumazet wrote: On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote: On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote: Seems wrong. We shouldn't ever need ifdef CONFIG_SECURITY

[PATCH net-next] time: jiffies_delta_to_clock_t() helper to the rescue

2012-08-09 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com Various /proc/net files sometimes report crazy timer values, expressed in clock_t units. This happens when an expired timer delta (expires - jiffies) is passed to jiffies_to_clock_t(). This function has an overflow in : return div_u64((u64)x * TICK_NSEC

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-09 Thread Eric Dumazet
On Thu, 2012-08-09 at 09:30 -0400, Paul Moore wrote: In the case of a TCP syn-recv and timewait ACK things are a little less clear. Eric (Dumazet), it looks like we have a socket in tcp_v4_reqsk_send_ack() and tcp_v4_timewait_ack(), any reason why we can't propagate the socket down

[PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a selinux regression, reported and bisected by John Stultz selinux_ip_postroute_compat() expect to find a valid sk-sk_security pointer, but this field is NULL for unicast_sock Fix

Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Dumazet
On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: Is is possible to do the call to security_sk_alloc() in the ip_init() function or does the per-cpu nature of the socket make this a pain? Its a pain, if we want NUMA affinity. Here, each cpu should get memory from its closest node.

Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Dumazet
On Thu, 2012-08-09 at 12:05 -0400, Eric Paris wrote: On Thu, Aug 9, 2012 at 11:36 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: Is is possible to do the call to security_sk_alloc() in the ip_init() function or does the per-cpu

Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Dumazet
On Thu, 2012-08-09 at 16:06 -0400, Eric Paris wrote: NAK. I personally think commit be9f4a44e7d41cee should be reverted until it is fixed. Let me explain what all I believe it broke and how. Suggesting to revert this commit while we have known working fixes is a bit of strange reaction.

Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Dumazet
On Thu, 2012-08-09 at 14:53 -0700, Casey Schaufler wrote: On 8/9/2012 2:29 PM, Eric Dumazet wrote: smack_sk_alloc_security() uses smk_of_current() : What can be the meaning of smk_of_current() in the context of 'kernel' sockets... Yes, and all of it's callers - to date - have had

[PATCH] ipv4: tcp: unicast_sock should not land outside of TCP stack

2012-08-09 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a selinux regression, reported and bisected by John Stultz selinux_ip_postroute_compat() expect to find a valid sk-sk_security pointer, but this field is NULL for unicast_sock It turns out

Re: [PATCH] task_work: add a scheduling point in task_work_run()

2012-08-21 Thread Eric Dumazet
On Tue, 2012-08-21 at 16:37 -0400, Mimi Zohar wrote: We're here, because fput() called schedule_work() to delay the last fput(). The execution needs to take place before the syscall returns to userspace. Need to read __schedule()... Do you know if cond_resched() can guarantee that it will

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-21 Thread Eric Dumazet
On Tue, 2012-08-21 at 23:07 -0500, Larry Finger wrote: Hi, The commit entitled tcp: reduce out_of_order memory use turns out to cause problems with a number of USB drivers. The first one called to my attention was for staging/r8712u. For this driver, there are problems with SSL

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 01:29 +0200, Alex Bergmann wrote: Hi David, I'm not 100% sure, but it looks like I found an RFC mismatch with the current default values of the TCP implementation. Alex From 8b854a525eb45f64ad29dfab16f9d9f681e84495 Mon Sep 17 00:00:00 2001 From: Alexander

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 10:48 +0200, Alex Bergmann wrote: On 08/22/2012 10:06 AM, Eric Dumazet wrote: Prior to 9ad7c049 the timeout was defined with 189secs. Now we have only a timeout of 63secs. ((2 5) - 1) * 3 secs = 189 secs ((2 5) - 1) * 1 secs = 63 secs

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 11:29 +0200, Alex Bergmann wrote: Actual 6 SYN frames are sent. The initial one and 5 retries. first one had a t0 + 0 delay. How can it count ??? The kernel is waiting another 32 seconds for a SYN+ACK and then gives the ETIMEDOUT back to userspace. Do you mean

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 12:00 +0200, Eric Dumazet wrote: On Wed, 2012-08-22 at 11:29 +0200, Alex Bergmann wrote: Actual 6 SYN frames are sent. The initial one and 5 retries. first one had a t0 + 0 delay. How can it count ??? The kernel is waiting another 32 seconds for a SYN+ACK

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 11:38 -0500, Nathan Zimmer wrote: This moves a kfree outside a spinlock to help scaling on larger (512 core) systems. I ran a simple test which just reads from /proc/cpuinfo. Lower is better, as you can see the worst case scenario is improved. baseline

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 20:28 +0200, Eric Dumazet wrote: Thats interesting, but if you really want this to fly, one RCU conversion would be much better ;) pde_users would be an atomic_t and you would avoid the spinlock contention. Here is what I had in mind, I would be interested to know

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 16:33 -0500, Larry Finger wrote: On 08/22/2012 12:15 AM, Eric Dumazet wrote: This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path

Re: [PATCH] staging: rtl8192e: use is_zero_ether_addr() instead of memcmp()

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 15:19 +0800, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn Using is_zero_ether_addr() instead of directly use memcmp() to determine if the ethernet address is all zeros. spatch with a semantic match is used to found this problem.

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 13:58 +0200, Alex Bergmann wrote: On 08/22/2012 06:41 PM, H.K. Jerry Chu wrote: This issue occurred to me right after I submitted the patch for RFC6298. I did not commit any more change because RFC compliance aside, 180secs just seem like eternity in the Internet age.

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-23 Thread Eric Dumazet
*/ #define TCP_SYNACK_RETRIES 5 /* number of times to retry passive opening a Acked-by: Eric Dumazet eduma...@google.com A change of the comment might be good, to help future readers. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord

RE: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 13:35 +0100, David Laight wrote: I would suggest to increase TCP_SYN_RETRIES from 5 to 6. 180 secs is eternity, but 31 secs is too small. Wasn't the intention of the long delay to allow a system acting as a router to reboot? I suspect that is why it (and some

Re: [REGRESSION] 3.6-rc2 and 3.6-rc3: TCP/IP network connection hang

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 22:35 +0200, Martin Steigerwald wrote: Hi! Its a bit difficult to describe. With 3.6-rc2 and 3.6-rc3 on an Lenovo ThinkPad T520 from Linus git, I get occasional network hangs: On for example sending a small mail via SMTP to my Debian Squeeze based server via a ASUS

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 15:57 -0500, Larry Finger wrote: On 08/22/2012 11:03 PM, Eric Dumazet wrote: Changing the allocation size removes the problem ? thats really strange. If you try different sizes in the 9100-30720 range, can you pinpoint the failure threshold ? The allocation size

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
Le vendredi 24 août 2012 à 09:09 -0500, Larry Finger a écrit : With kernel 3.6-rc2, the error changes to the following: --2012-08-24 08:26:42-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-24 Thread Eric Dumazet
Le vendredi 24 août 2012 à 09:48 -0500, Nathan Zimmer a écrit : On Wed, Aug 22, 2012 at 11:42:58PM +0200, Eric Dumazet wrote: On Wed, 2012-08-22 at 20:28 +0200, Eric Dumazet wrote: Thats interesting, but if you really want this to fly, one RCU conversion would be much better

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote: There is nothing in kernel log when it happens. The file STRACE is attached. So there is indeed a corruption. What was the driver you used in this case ? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote: On 08/24/2012 11:23 AM, Eric Dumazet wrote: On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-25 Thread Eric Dumazet
of 63secs. The comments for SYN and SYNACK retries have also been updated to describe the current settings. Signed-off-by: Alexander Bergmann a...@linlab.net --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body

Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-27 Thread Eric Dumazet
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote: I have prepared a patch to fix all the unchecked allocations. Over the weekend I made some progress. To test the latest vendor driver, I installed a 32-bit system. Their driver is not compatible with a 64-bit system. I found that

Re: Huge performance degradation for UDP between 2.4.17 and 2.6

2012-08-02 Thread Eric Dumazet
On Thu, 2012-08-02 at 14:27 +0200, leroy christophe wrote: Hi I'm having a big issue with UDP. Using a powerpc board (MPC860). With our board running kernel 2.4.17, I'm able to send 16 voice packets (UDP, 96 bytes per packet) in 11 seconds. With the same board running either Kernel

Re: [RFC 1/4] hashtable: introduce a small and naive hashtable

2012-08-02 Thread Eric Dumazet
On Thu, 2012-08-02 at 10:32 -0700, Linus Torvalds wrote: On Thu, Aug 2, 2012 at 9:40 AM, Eric W. Biederman ebied...@xmission.com wrote: For a trivial hash table I don't know if the abstraction is worth it. For a hash table that starts off small and grows as big as you need it the incent

Re: [PATCH 1/7] netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

2012-08-03 Thread Eric Dumazet
On Fri, 2012-07-27 at 23:37 +0800, Cong Wang wrote: slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Cc: David S. Miller da...@davemloft.net Reported-by: Dan Carpenter dan.carpen...@oracle.com Signed-off-by: Cong

Re: [PATCH 1/7] netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

2012-08-03 Thread Eric Dumazet
On Fri, 2012-08-03 at 17:34 +0800, Cong Wang wrote: On Fri, 2012-08-03 at 11:17 +0200, Eric Dumazet wrote: On Fri, 2012-07-27 at 23:37 +0800, Cong Wang wrote: slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory

Re: [RFC v2 1/7] hashtable: introduce a small and naive hashtable

2012-08-03 Thread Eric Dumazet
On Fri, 2012-08-03 at 16:23 +0200, Sasha Levin wrote: This hashtable implementation is using hlist buckets to provide a simple hashtable to prevent it from getting reimplemented all over the kernel. +static void hash_add(struct hash_table *ht, struct hlist_node *node, long key) +{ +

Re: [RFC v2 7/7] net,9p: use new hashtable implementation

2012-08-03 Thread Eric Dumazet
On Fri, 2012-08-03 at 16:23 +0200, Sasha Levin wrote: Switch 9p error table to use the new hashtable implementation. This reduces the amount of generic unrelated code in 9p. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- net/9p/error.c | 17 - 1 files changed,

Re: TCP Delayed ACK in FIN/ACK

2012-08-04 Thread Eric Dumazet
On Sat, 2012-08-04 at 16:51 +0200, richard -rw- weinberger wrote: On Sat, Aug 4, 2012 at 4:45 PM, Sławek Janecki jane...@gmail.com wrote: I have a node.js client (10.177.62.7) requesting some data from http rest service from server (10.177.0.1). Client is simply using nodejs http.request()

Re: Huge performance degradation for UDP between 2.4.17 and 2.6

2012-08-05 Thread Eric Dumazet
On Sun, 2012-08-05 at 10:16 +0200, LEROY christophe wrote: Le 02/08/2012 16:13, Eric Dumazet a écrit : On Thu, 2012-08-02 at 14:27 +0200, leroy christophe wrote: Hi I'm having a big issue with UDP. Using a powerpc board (MPC860). With our board running kernel 2.4.17, I'm able to send

Re: [PATCH 1/7] netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

2012-08-06 Thread Eric Dumazet
On Mon, 2012-08-06 at 17:08 +0800, Cong Wang wrote: On Fri, 2012-08-03 at 12:10 +0200, Eric Dumazet wrote: I did this , just take it ;) Do we have to pass gfp to -ndo_netpoll_setup() too? It seems no, so far I don't think we have to do that. Thanks. It is needed

Re: IPv4 BUG: held lock freed!

2012-08-19 Thread Eric Dumazet
On Sat, 2012-08-18 at 10:19 +0800, Fengguang Wu wrote: Hi David, The bug should be introduced somewhere between 3.5 and 3.6-rc1. [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 880019ec [ 2866.131726] [ 2866.132188] = [ 2866.132281] [ BUG:

Re: IPv4 BUG: held lock freed!

2012-08-19 Thread Eric Dumazet
On Sun, 2012-08-19 at 22:15 +0800, Lin Ming wrote: Will it still has problem if code goes here without sock_hold(sk)? Not sure of what you mean. At the time tcp_write_timer() runs, we own one reference on the socket. (this reference was taken in sk_reset_timer()) On old kernels, if we found

Re: IPv4 BUG: held lock freed!

2012-08-19 Thread Eric Dumazet
On Sun, 2012-08-19 at 23:05 +0800, Lin Ming wrote: On Sun, Aug 19, 2012 at 10:45 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Sun, 2012-08-19 at 22:15 +0800, Lin Ming wrote: Will it still has problem if code goes here without sock_hold(sk)? Not sure of what you mean. See my

<    1   2   3   4   5   6   7   8   9   10   >