Re: regression with poll(2)

2012-08-20 Thread Eric Dumazet
...@suse.de Acked-by: David S. Miller da...@davemloft.net Cc: Neil Brown ne...@suse.de Cc: Peter Zijlstra a.p.zijls...@chello.nl Cc: Mike Christie micha...@cs.wisc.edu Cc: Eric B Munson emun...@mgebm.net Cc: Eric Dumazet eric.duma...@gmail.com Cc: Sebastian Andrzej

Re: regression with poll(2)

2012-08-20 Thread Eric Dumazet
On Mon, 2012-08-20 at 10:04 +0100, Mel Gorman wrote: Can the following patch be tested please? It is reported to fix an fio regression that may be similar to what you are experiencing but has not been picked up yet. - This seems to help here. Boot your machine with mem=768M or a bit less

Re: regression with poll(2)

2012-08-20 Thread Eric Dumazet
On Mon, 2012-08-20 at 16:20 -0700, Andrew Morton wrote: On Mon, 20 Aug 2012 11:30:59 +0200 Eric Dumazet eric.duma...@gmail.com wrote: On Mon, 2012-08-20 at 10:04 +0100, Mel Gorman wrote: Can the following patch be tested please? It is reported to fix an fio regression that may

[PATCH] task_work: add a scheduling point in task_work_run()

2012-08-21 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced the problem addressed in commit 944be0b2 (close_files(): add scheduling point) If a server process with a lot of files (say 2 million tcp sockets) is killed, we can spend a lot of time

Re: SLUB: Support for statistics to help analyze allocator behavior

2008-02-04 Thread Eric Dumazet
Pekka J Enberg a écrit : Hi Christoph, On Mon, 4 Feb 2008, Christoph Lameter wrote: The statistics provided here allow the monitoring of allocator behavior at the cost of some (minimal) loss of performance. Counters are placed in SLUB's per cpu data structure that is already written to by

Re: SLUB: Support for statistics to help analyze allocator behavior

2008-02-05 Thread Eric Dumazet
On Tue, 5 Feb 2008 10:08:00 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: On Tue, 5 Feb 2008, Pekka J Enberg wrote: Heh, sure, but it's not exported to userspace which is required for slabinfo to display the statistics. Well we could do the same as for numa stats. Output the

Re: [PATCH 2/4] x86 mmiotrace: fix relay-buffer-full flag for SMP

2008-02-05 Thread Eric Dumazet
Pekka Paalanen a écrit : Relay has per-cpu buffers, but mmiotrace was using only a single flag for detecting buffer full/not-full transitions. The new code makes this per-cpu and actually counts missed events. Signed-off-by: Pekka Paalanen [EMAIL PROTECTED] ---

Re: [patch] x86: add code to dump the (kernel) page tables for visual inspection

2008-02-05 Thread Eric Dumazet
Arjan van de Ven a écrit : Subject: x86: add code to dump the (kernel) page tables for visual inspection by kernel developers From: Arjan van de Ven [EMAIL PROTECTED] This patch adds code to the kernel to have an (optional) /proc/kernel_page_tables debug file that basically dumps the kernel

Re: [PATCH 2/4] x86 mmiotrace: fix relay-buffer-full flag for SMP

2008-02-05 Thread Eric Dumazet
Pekka Paalanen a écrit : On Tue, 05 Feb 2008 21:44:07 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: Pekka Paalanen a écrit : diff --git a/arch/x86/kernel/mmiotrace/mmio-mod.c b/arch/x86/kernel/mmiotrace/mmio-mod.c index 82ae920..f492b65 100644 --- a/arch/x86/kernel/mmiotrace/mmio-mod.c +++ b

Re: SLUB: statistics improvements

2008-02-06 Thread Eric Dumazet
Christoph Lameter a écrit : SLUB: statistics improvements - Fix indentation in unfreeze_slab - FREE_SLAB/ALLOC_SLAB counters were slightly misplaced and counted even if the slab was kept because we were below the mininum of partial slabs. - Export per cpu statistics to user space (follow

[PATCH] Avoid divides in BITS_TO_LONGS

2008-02-06 Thread Eric Dumazet
of an expensive integer divide. Applying this patch saves 141 bytes on x86 when CONFIG_CC_OPTIMIZE_FOR_SIZE=y and speedup bitmap operations. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 69c1edb..be5c27c 100644 --- a/include/linux

Re: [PATCH] Add IPv6 support to TCP SYN cookies

2008-02-07 Thread Eric Dumazet
be a litle bit too much. Using a per_cpu var is more friendly. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index f470fe4..177da14 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -35,10 +35,12 @@ module_init

Re: Bug? Kernels 2.6.2x drops TCP packets over wireless (independent of card used)

2008-02-07 Thread Eric Dumazet
Marcin Koziej a écrit : hmm, i think, the site is broken (193.219.28.140), and not the card or the driver is wrong. when it does, then other sites are auch reproductable .. /* is use auch madwifi-0.9.3.3, but it think, it is not driver problem */ Unfortunately, this is not the case :(

Re: Bug? Kernels 2.6.2x drops TCP packets over wireless (independentof card used)

2008-02-07 Thread Eric Dumazet
Eric Dumazet [EMAIL PROTECTED] - Very strange, as the tcpdump you gave shows that the remote peer only sent 220-\r\n This was ACKed, and then nothing but timeout. We can conclude remote peer is *very* slow or a firewall is blocking trafic after 6 bytes

Re: [git pull] more SLUB updates for 2.6.25

2008-02-07 Thread Eric Dumazet
Nick Piggin a écrit : On Friday 08 February 2008 13:13, Christoph Lameter wrote: are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git slub-linus (includes the cmpxchg_local fastpath since the cmpxchg_local work by Matheiu is in now, and the

Re: How does ext2 implement sparse files?

2008-02-01 Thread Eric Dumazet
Shuduo Sang a écrit : On Feb 1, 2008 2:14 AM, Andi Kleen [EMAIL PROTECTED] wrote: Lars Noschinski [EMAIL PROTECTED] writes: For an university project, we had to write a toy filesystem (ext2-like), for which I would like to implement sparse file support. For this, I digged through the

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Eric Dumazet
On Tue, 15 Jan 2008 11:14:25 -0600 Chris Friesen [EMAIL PROTECTED] wrote: Radoslaw Szkodzinski (AstralStorm) wrote: On Tue, 15 Jan 2008 08:47:07 -0600 Chris Friesen [EMAIL PROTECTED] wrote: Some of our hardware is not supported on mainline, so we need per-kernel version patches to even

Re: [PATCH 02/10] x86: Change size of node ids from u8 to u16 V3

2008-01-16 Thread Eric Dumazet
On Wed, 16 Jan 2008 09:09:04 -0800 [EMAIL PROTECTED] wrote: Change the size of node ids from 8 bits to 16 bits to accomodate more than 256 nodes. Signed-off-by: Mike Travis [EMAIL PROTECTED] Reviewed-by: Christoph Lameter [EMAIL PROTECTED] --- V1-V2: - changed pxm_to_node_map to u16

Re: [PATCH 02/10] x86: Change size of node ids from u8 to u16 V3

2008-01-16 Thread Eric Dumazet
Mike Travis a écrit : Another point: you want this change, sorry if my previous mail was not detailed enough : --- a/arch/x86/mm/numa_64.c +++ b/arch/x86/mm/numa_64.c @@ -78,7 +78,7 @@ static int __init allocate_cachealigned_memnodemap(void) unsigned long pad, pad_addr;

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Eric Dumazet
Chris Friesen a écrit : I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly longer), although the long-term average is down

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Eric Dumazet
Chris Friesen a écrit : Eric Dumazet wrote: Chris Friesen a écrit : I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly

Re: include/linux/pcounter.h

2008-02-16 Thread Eric Dumazet
Andrew Morton a écrit : - First up, why was this added at all? We have percpu_counter.h which has several years development invested in it. afaict it would suit the present applications of pcounters. If some deficiency in percpu_counters has been identified, is it possible to correct

Re: include/linux/pcounter.h

2008-02-16 Thread Eric Dumazet
Andrew Morton a écrit : On Sat, 16 Feb 2008 11:07:29 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: Andrew, pcounter is a temporary abstraction. It's buggy! Main problems are a) possible return of negative numbers b) some of the API can't be from preemptible code c) excessive interrupt-off

Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet
On Mon, 18 Feb 2008 16:12:38 +0800 Zhang, Yanmin [EMAIL PROTECTED] wrote: On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100 On linux-2.6.25-rc1 x86_64 : offsetof(struct dst_entry, lastuse)=0xb0

Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet
Zhang, Yanmin a écrit : On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The

Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet
Zhang, Yanmin a écrit : On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote: On Mon, 18 Feb 2008 16:12:38 +0800 Zhang, Yanmin [EMAIL PROTECTED] wrote: On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100

Re: Linux 2.6.25-rc2

2008-02-19 Thread Eric Dumazet
On Tue, 19 Feb 2008 09:02:30 -0500 Mathieu Desnoyers [EMAIL PROTECTED] wrote: * Pekka Enberg ([EMAIL PROTECTED]) wrote: On Feb 19, 2008 8:54 AM, Torsten Kaiser [EMAIL PROTECTED] wrote: [ 5282.056415] [ cut here ] [ 5282.059757] kernel BUG at

Re: [RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference

2008-02-19 Thread Eric Dumazet
Adrian Bunk a écrit : Unless I miss a guaranteed relation between between f and new_fa-fa_info this patch is required for fixing a NULL dereference introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Eric Dumazet
Zhang, Yanmin a écrit : On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote: Zhang, Yanmin a �crit : On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2

Re: [PATCH 1/2] x86_64: Fold pda into per cpu area v3

2008-02-20 Thread Eric Dumazet
of nr_cpu_ids pointers, so we should respect its bonds. Delay change of _cpu_pda after array initialization. Also take into account that alloc_bootmem_low() : - calls panic() if not enough memory - already clears allocated memory Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/arch/x86/kernel

Re: [RFC PATCH 1/8] [NET]: uninline skb_put, de-bloats a lot

2008-02-20 Thread Eric Dumazet
On Wed, 20 Feb 2008 15:47:11 +0200 Ilpo Järvinen [EMAIL PROTECTED] wrote: ~500 files changed ... kernel/uninlined.c: skb_put | +104 1 function changed, 104 bytes added, diff: +104 vmlinux.o: 869 functions changed, 198 bytes added, 111003 bytes removed, diff:

[PATCH] alloc_percpu() fails to allocate percpu data

2008-02-21 Thread Eric Dumazet
they switched to SLUB. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] mm/allocpercpu.c | 15 ++- 1 files changed, 14 insertions(+), 1 deletion(-) diff --git a/mm/allocpercpu.c b/mm/allocpercpu.c index 7e58322..b0012e2 100644 --- a/mm/allocpercpu.c +++ b/mm/allocpercpu.c @@ -6,6 +6,10

Re: Try to fill route cache

2007-02-02 Thread Eric Dumazet
On Friday 02 February 2007 16:19, Markus Wenke wrote: Andi Kleen schrieb: Markus Wenke [EMAIL PROTECTED] writes: Where is the problem ? pktgen doesn't use the routing cache. Is there any other chance to test my Application with a stressed route cache? I am not sure I understood your

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-06 Thread Eric Dumazet
David Miller a écrit : From: Linus Torvalds [EMAIL PROTECTED] Date: Tue, 6 Feb 2007 13:28:34 -0800 (PST) Yeah, in 1% of all cases it will block, and you'll want to wait for them. Maybe the kevent queue works then, but if it needs any more setup than the nonblocking case, that's a big no. So

[PATCH] FS : Speedup rw_verify_area()

2007-02-07 Thread Eric Dumazet
write_count=9549894 21.216694 samples per call (best value out of 10 runs) oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 % for the rw_verify_area() function. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.20/fs/read_write.c2007-02-07 18:21

Re: 2.6.20 BUG: soft lockup detected on CPU#0!

2007-02-08 Thread Eric Dumazet
On Thursday 08 February 2007 09:06, Ingo Molnar wrote: * Andrew Morton [EMAIL PROTECTED] wrote: The softlock detector has a long history of false positives and precious few true positives, in my experience. hm, not so the latest lamest in my experience. The commit that made it quite

Re: [PATCH 02/22] r/o bind mounts: add vfsmount writer counts

2007-02-09 Thread Eric Dumazet
Dave Hansen a écrit : @@ -56,6 +57,7 @@ struct vfsmount { struct vfsmount *mnt_master;/* slave is on master-mnt_slave_list */ struct mnt_namespace *mnt_ns; /* containing namespace */ struct user_namespace *mnt_user_ns; /* namespace for uid interpretation */ +

Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-09 Thread Eric Dumazet
Linus Torvalds a écrit : Ok, here's another entry in this discussion. - IF the system call blocks, we call the architecture-specific schedule_async() function before we even get any scheduler locks, and it can just do a fork() at that time, and let the *child* return to the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) After boot is complete we allow the reduction of the size of the per cpu areas . Lets say we only need 128k per cpu. Then the remaining

Re: TCP_DEFER_ACCEPT issues

2007-11-02 Thread Eric Dumazet
Felix von Leitner a écrit : I am trying to use TCP_DEFER_ACCEPT in my web server. There are some operational problems. First of all: timeout handling. I would like to be able to set a timeout in seconds (or better: milliseconds) for how long the socket is allowed to sit there without data

Re:

2007-11-09 Thread Eric Dumazet
is missing. Could you please apply this patch ? Thank you [NET] adds a missing include linux/vmalloc.h Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 8461cda..469216d 100644 --- a/include/net/inet_hashtables.h +++ b/include

Re: [PATCH v2] cgroup: fix panic in netprio_cgroup

2012-07-08 Thread Eric Dumazet
) + atomic_set(max_prioidx, prioidx); spin_unlock_irqrestore(prioidx_map_lock, flags); - atomic_set(max_prioidx, prioidx); *prio = prioidx; return 0; } This patch seems fine to me. Acked-by: Eric Dumazet eduma...@google.com Neil, looking at this file, I believe

[PATCH] net: cgroup: fix out of bounds accesses

2012-07-09 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com dev-priomap is allocated by extend_netdev_table() called from update_netdev_tables(). And this is only called if write_priomap() is called. But if write_priomap() is not called, it seems we can have out of bounds accesses in cgrp_destroy(), read_priomap

Re: 82571EB: Detected Hardware Unit Hang

2012-07-09 Thread Eric Dumazet
On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote: Hi list, I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy a big file (500M) from another server will hit it at once. Would you please

Re: [PATCH] net: cgroup: fix out of bounds accesses

2012-07-09 Thread Eric Dumazet
On Mon, 2012-07-09 at 07:01 -0400, Neil Horman wrote: Thank you for doing this Eric, Gao. Just to be sure (I asked in the previous thread), would it be better to avoid the length check in skb_update_prio, and instead update the netdev tables to be long enough in cgrp_create and in

Re: [PATCH] net: cgroup: fix out of bounds accesses

2012-07-09 Thread Eric Dumazet
On Mon, 2012-07-09 at 08:13 -0400, Neil Horman wrote: On Mon, Jul 09, 2012 at 01:50:52PM +0200, Eric Dumazet wrote: On Mon, 2012-07-09 at 07:01 -0400, Neil Horman wrote: Thank you for doing this Eric, Gao. Just to be sure (I asked in the previous thread), would it be better

Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup

2012-07-09 Thread Eric Dumazet
extend_netdev_table, so when new_priomap is allocated failed,write_priomap will stop to access the priomap,and return -ENOMEM back to the userspace to tell the user what happend. Signed-off-by: Gao feng gaof...@cn.fujitsu.com Cc: Neil Horman nhor...@tuxdriver.com Cc: Eric Dumazet eduma...@google.com

Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup

2012-07-10 Thread Eric Dumazet
On Tue, 2012-07-10 at 16:53 +0800, Gao feng wrote: Hi Gao Is it still needed to call update_netdev_tables() from write_priomap() ? Yes, I think it's needed,because read_priomap will show all of the net devices, But we may add the netdev after create a netprio cgroup, so the new

Re: [PATCH v2] net: cgroup: fix access the unallocated memory in netprio cgroup

2012-07-10 Thread Eric Dumazet
On Tue, 2012-07-10 at 18:44 +0800, Gao feng wrote: there are some out of bound accesses in netprio cgroup. - update_netdev_tables(); + ret = extend_netdev_table(dev, max_len); + if (ret 0) + goto out_free_devname; + ret = 0; rcu_read_lock(); map

Re: [PATCH v2] net: cgroup: fix access the unallocated memory in netprio cgroup

2012-07-10 Thread Eric Dumazet
On Tue, 2012-07-10 at 13:05 +0200, Eric Dumazet wrote: On Tue, 2012-07-10 at 18:44 +0800, Gao feng wrote: there are some out of bound accesses in netprio cgroup. - update_netdev_tables(); + ret = extend_netdev_table(dev, max_len); + if (ret 0) + goto out_free_devname

Re: [PATCH v3] net: cgroup: fix access the unallocated memory in netprio cgroup

2012-07-11 Thread Eric Dumazet
dev_put to reduce device's refcount. Signed-off-by: Gao feng gaof...@cn.fujitsu.com Cc: Neil Horman nhor...@tuxdriver.com Cc: Eric Dumazet eduma...@google.com --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body

Re: Protocol handler using dev_add_pack

2012-07-11 Thread Eric Dumazet
On Wed, 2012-07-11 at 10:38 -0300, Jerry Yu wrote: I am working on a kernel module to monitor all TCP packets. I created a protocol handler with protocol code ETH_P_ALL to handle all incoming and outgoing TCP packets. The code worked fine on 2.6.14 kernel, but in current 3.2.0-26 kernel, I

Re: [RFC] net: further seperate dst_entry.__refcnt from cache contention

2012-07-20 Thread Eric Dumazet
On Fri, 2012-07-20 at 14:46 -0500, Nathan Zimmer wrote: After some investigation on large machines I found that dst_entry.__refcnt particpates in false cache sharing issues that show when scaling past 12 threads who communicate via tcp with loopback addresses. I adjusted refcnt to be on its

Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line

2012-11-25 Thread Eric Dumazet
On Mon, 2012-11-26 at 11:29 +0800, ling.ma.prog...@gmail.com wrote: From: Ma Ling ling.ma.prog...@gmail.com In order to reduce memory latency when last level cache miss occurs, modern CPUs i.e. x86 and arm introduced Critical Word First(CWF) or Early Restart(ER) to get data ASAP. For CWF if

Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line

2012-11-27 Thread Eric Dumazet
On Tue, 2012-11-27 at 21:48 +0800, Ling Ma wrote: Ling: in the looking-up routine, hash value is the most important key, if it is matched, the other values have most possibility to be satisfied, and CFW is limited by memory bandwidth(64bit usually), so we only move hash value as critical

Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues

2012-12-20 Thread Eric Dumazet
On Thu, 2012-12-20 at 13:07 +0100, Nicolas Dichtel wrote: Le 20/12/2012 09:08, Stefan Hasko a écrit : + out_hlist_search\n); checkpatch will warn you about this one, something like: WARNING: quoted string split across lines. Not breaking such line ease to grep the

Re: [PATCH] net: ipv4: route: fix coding style issues net: ipv4: tcp: fix coding style issues

2012-12-20 Thread Eric Dumazet
On Thu, 2012-12-20 at 15:28 +0100, Stefan Hasko wrote: Fix a coding style issues. Signed-off-by: Stefan Hasko hasko.st...@gmail.com --- net/ipv4/route.c | 119 - net/ipv4/tcp.c | 218 +++--- 2 files changed,

Re: [PATCH] net: sched: integer overflow fix

2012-12-21 Thread Eric Dumazet
On Fri, 2012-12-21 at 21:39 +0100, Stefan Hasko wrote: Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko hasko.st...@gmail.com --- net/sched/sch_htb.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c

Re: [PATCH] net: sched: integer overflow fix

2012-12-21 Thread Eric Dumazet
On Fri, 2012-12-21 at 14:51 -0800, Eric Dumazet wrote: On Fri, 2012-12-21 at 21:39 +0100, Stefan Hasko wrote: Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko hasko.st...@gmail.com --- net/sched/sch_htb.c |2 +- 1 file changed, 1 insertion(+), 1

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Eric Dumazet
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote: Argh, the first one had a typo in it that did not influence performance with fewer threads running, but that made things worse with more than a dozen threads... + + /* + * The lock is still busy, the delay was

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Eric Dumazet
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote: Argh, the first one had a typo in it that did not influence performance with fewer threads running, but that made things worse with more than a dozen threads... Please let me know if you can break these patches. ---8--- Subject:

Re: [PATCH] net: sched: integer overflow fix

2012-12-21 Thread Eric Dumazet
; - next_event = q-now + 5 * NSEC_PER_SEC; + next_event = q-now + 5LLU * NSEC_PER_SEC; for (level = 0; level TC_HTB_MAXDEPTH; level++) { /* common case optimization - skip event handler quickly */ I guess David will remove the first line of your changelog Acked-by: Eric Dumazet

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Eric Dumazet
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote: + int *delay_ptr = per_cpu(spinlock_delay, smp_processor_id()); + int delay = *delay_ptr; int delay = __this_cpu_read(spinlock_delay); } + *delay_ptr = delay; __this_cpu_write(spinlock_delay, delay); -- To

Re: Regression in 3.8-rc1: BUG: sleeping function called from invalid context

2012-12-22 Thread Eric Dumazet
On Sat, 2012-12-22 at 11:11 -0600, Larry Finger wrote: With kernel 3.8-rc1, I get 2 BUG: sleeping function called from invalid context reports. These have been present got some time in the 3.7-git versions and I have tried twice to bisect the problem. Both times, I ended up at a merge

Re: Regression in 3.8-rc1: BUG: sleeping function called from invalid context

2012-12-22 Thread Eric Dumazet
On Sat, 2012-12-22 at 19:02 +0100, Borislav Petkov wrote: Top-posting so that the rest can remain untouched. Right, so AFAICT, something is holding rtnl_mutex (probably some rtnetlink traffic) and device_rename() is doing kstrdup with GFP_KERNEL which, among others, has __GFP_WAIT and *that*

Re: Porting problem: ndo_set_multicast_list removed

2012-12-24 Thread Eric Dumazet
On Tue, 2012-12-25 at 00:30 +0800, Woody Wu wrote: Hi, list I am porting an ethernet driver from 2.6.x to 3.7.1. I found in the new kernel, the ndo_set_multicast_list method in the net_device_ops had been removed. What's the story behind? Can I simply ignore this method defined in an old

Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug

2012-12-26 Thread Eric Dumazet
On Wed, 2012-12-26 at 15:06 +0800, Wanlong Gao wrote: Add a cpu notifier to virtio-net, so that we can reset the virtqueue affinity if the cpu hotplug happens. It improve the performance through enabling or disabling the virtqueue affinity after doing cpu hotplug. Cc: Rusty Russell

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-26 Thread Eric Dumazet
On Fri, 2012-12-21 at 22:50 -0500, Rik van Riel wrote: I will try to run this test on a really large SMP system in the lab during the break. Ideally, the auto-tuning will keep the delay value large enough that performance will stay flat even when there are 100 CPUs contending over the same

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-26 Thread Eric Dumazet
On Wed, 2012-12-26 at 11:10 -0800, Eric Dumazet wrote: +#define DELAY_HASH_SHIFT 4 +DEFINE_PER_CPU(int [1 DELAY_HASH_SHIFT], spinlock_delay) = { + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-27 Thread Eric Dumazet
On Wed, 2012-12-26 at 22:07 -0800, Michel Lespinasse wrote: If we go with per-spinlock tunings, I feel we'll most likely want to add an associative cache in order to avoid the 1/16 chance (~6%) of getting 595Mbit/s instead of 982Mbit/s when there is a hash collision. I would still prefer if

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-27 Thread Eric Dumazet
On Thu, 2012-12-27 at 09:35 -0500, Rik van Riel wrote: The lock acquisition time depends on the holder of the lock, and what the CPUs ahead of us in line will do with the lock, not on the caller IP of the spinner. That would be true only for general cases. In network land, we do have

Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked

2012-12-27 Thread Eric Dumazet
On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote: With 3.8-rc1, the first call of pci_map_single() that is not checked with a corresponding pci_dma_mapping_error() call results in a warning with a splat as follows: WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950() Hardware

Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked

2012-12-27 Thread Eric Dumazet
On Thu, 2012-12-27 at 14:38 -0600, Larry Finger wrote: On 12/27/2012 02:05 PM, Eric Dumazet wrote: On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote: + if (pci_dma_mapping_error(np-pci_dev, +np-put_tx_ctx-dma)) + return

Re: Panic: dma_map_area overflow 66 bytes on 3.7+

2012-12-28 Thread Eric Dumazet
On Fri, 2012-12-28 at 20:05 +0100, Martin Nybo Andersen wrote: Hi list, Since the release of 3.7 my main computer has been panicking a couple of times on both 3.7.0 and 3.7.1 because of a 'dma_map_area overflow xx bytes'. Example screen shot: http://www.tweek.dk/panic.jpg I can

Re: [PATCH v2] lib: cpu_rmap: avoid flushing all workqueues

2012-12-28 Thread Eric Dumazet
On Fri, 2012-12-28 at 11:03 -0800, David Decotigny wrote: In some cases, free_irq_cpu_rmap() is called while holding a lock (eg. rtnl). This can lead to deadlocks, because it invokes flush_scheduled_work() which ends up waiting for whole system workqueue to flush, but some pending works might

Re: [PATCH v2] lib: cpu_rmap: avoid flushing all workqueues

2012-12-28 Thread Eric Dumazet
On Fri, 2012-12-28 at 13:44 -0800, David Decotigny wrote: Thanks, Ok for the cpu_rmap_put helper. Will do this in v3 of this patch. Your comments suggest more refactoring, which might be better in the form of 1 or 2 additional patches that: - rename alloc_cpu_rmap co according to new

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-28 Thread Eric Dumazet
On Thu, 2012-12-27 at 14:31 -0500, Rik van Riel wrote: to use a bigger/smaller one. I guess we want a larger value. With your hashed lock approach, we can get away with larger values - they will not penalize other locks the same way a single value per cpu might have. Then, we absolutely

Re: [PATCH] poll: prevent missed events if _qproc is NULL

2013-01-01 Thread Eric Dumazet
On Mon, 2012-12-31 at 13:21 +, Eric Wong wrote: This patch seems to fix my issue with ppoll() being stuck on my SMP machine: http://article.gmane.org/gmane.linux.file-systems/70414 The change to sock_poll_wait() in commit 626cf236608505d376e4799adb4f7eb00a8594af (poll: add

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-02 Thread Eric Dumazet
: Jonathan Corbet cor...@lwn.net Cc: Al Viro v...@zeniv.linux.org.uk Cc: Davide Libenzi davi...@xmailserver.org Cc: Hans de Goede hdego...@redhat.com Cc: Mauro Carvalho Chehab mche...@infradead.org Cc: David Miller da...@davemloft.net Cc: Eric Dumazet eric.duma...@gmail.com Cc: Andrew Morton

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-02 Thread Eric Dumazet
On Wed, 2013-01-02 at 18:40 +, Eric Wong wrote: Eric Dumazet eric.duma...@gmail.com wrote: First, thanks for working on this issue. No problem! It seems the real problem is the epi-event.events = event-events; which is done without taking ep-lock Yes. I am hoping it is possible

Re: [PATCH v4] lib: cpu_rmap: avoid flushing all workqueues

2013-01-02 Thread Eric Dumazet
try to acquire the lock we are already holding. This commit uses reference-counting to replace irq_run_affinity_notifiers(). It also removes irq_run_affinity_notifiers() altogether. Signed-off-by: David Decotigny de...@googlers.com --- Acked-by: Eric Dumazet eduma...@google.com

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

2013-01-02 Thread Eric Dumazet
On Wed, 2013-01-02 at 19:32 +, Eric Wong wrote: That modification in ep_send_events_proc() is protected by ep-mtx (as is ep_modify()), though. Maybe there are other places, but I don't see it. Yes, and using a mutex for protecting this field while its read from interrupt context (so

[PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback

Re: [RFC PATCH 4/5] x86,smp: keep spinlock delay values per hashed spinlock address

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 04:48 -0800, Michel Lespinasse wrote: On Wed, Jan 2, 2013 at 9:24 PM, Rik van Riel r...@redhat.com wrote: From: Eric Dumazet eric.duma...@gmail.com Eric Dumazet found a regression with the spinlock backoff code, in workloads where multiple spinlocks were contended

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 14:12 +0100, Sedat Dilek wrote: Hi Eric, your patch from [2] applies cleanly on top of Linux v3.8-rc2. I would like to test it. In [1] you were talking about benchmarks you did. Can you describe them or provide a testcase (script etc.)? You made only network testing?

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 08:24 -0500, Steven Rostedt wrote: On Thu, 2013-01-03 at 09:05 +, Jan Beulich wrote: How much bus traffic do monitor/mwait cause behind the scenes? I would suppose that this just snoops the bus for writes, but the amount of bus traffic involved in this isn't

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Dumazet
On Wed, 2013-01-02 at 20:47 +, Eric Wong wrote: Eric Wong normalper...@yhbt.net wrote: [1] my full setup is very strange. Other than the FUSE component I forgot to mention, little depends on the kernel. With all this, the standalone toosleepy can get stuck. I'll try to

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 10:32 -0500, Steven Rostedt wrote: On Thu, 2013-01-03 at 05:35 -0800, Eric Dumazet wrote: On Thu, 2013-01-03 at 08:24 -0500, Steven Rostedt wrote: On Thu, 2013-01-03 at 09:05 +, Jan Beulich wrote: How much bus traffic do monitor/mwait cause behind

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 11:45 -0500, Steven Rostedt wrote: On Thu, 2013-01-03 at 08:10 -0800, Eric Dumazet wrote: But then would the problem even exist? If the lock is on its own cache line, it shouldn't cause a performance issue if other CPUs are spinning on it. Would it? Not sure

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2013-01-03 Thread Eric Dumazet
On Sat, 2012-12-29 at 02:27 -0800, Michel Lespinasse wrote: On Wed, Dec 26, 2012 at 11:10 AM, Eric Dumazet eric.duma...@gmail.com wrote: I did some tests with your patches with following configuration : tc qdisc add dev eth0 root htb r2q 1000 default 3 (to force a contention on qdisc lock

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 22:08 +, Ben Hutchings wrote: On Thu, 2013-01-03 at 04:28 -0800, Eric Dumazet wrote: From: Eric Dumazet eduma...@google.com In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 12:46 -0800, Andrew Morton wrote: On Thu, 03 Jan 2013 04:28:52 -0800 Eric Dumazet eric.duma...@gmail.com wrote: From: Eric Dumazet eduma...@google.com In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Thu, 2013-01-03 at 11:41 -0800, Rick Jones wrote: In terms of netperf overhead, once you specify P99_LATENCY, you are already in for the pound of cost but only getting the penny of output (so to speak). While it would clutter the output, one could go ahead and ask for the other latency

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Fri, 2013-01-04 at 14:16 +0900, Namhyung Kim wrote: Probably a silly question: Why not using ktime rather than jiffies for this? ktime is too expensive on some hardware. Here we only want a safety belt, no need for high time resolution. -- To unsubscribe from this list: send the line

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Fri, 2013-01-04 at 06:31 +0100, Sedat Dilek wrote: Will you send a v2 with this change...? -#define MAX_SOFTIRQ_TIME min(1, (2*HZ/1000)) +#define MAX_SOFTIRQ_TIME max(1, (2*HZ/1000)) I will, I was planning to do this after waiting for other comments/reviews. -- To unsubscribe from

Re: [PATCH net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
On Fri, 2013-01-04 at 11:14 +0400, Oleg A.Arkhangelsky wrote: It leads to many context switches when softirqs processing deffered to ksoftirqd kthreads which can be very expensive. Here is some evidence of ksoftirqd activation effects: http://marc.info/?l=linux-netdevm=124116262916969w=2

[PATCH v2 net-next] softirq: reduce latencies

2013-01-03 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback

Re: [PATCH v2 net-next] softirq: reduce latencies

2013-01-04 Thread Eric Dumazet
On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote: On Thu, 2013-01-03 at 23:49 -0800, Eric Dumazet wrote: In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This patch changes the fallback to ksoftirqd condition to : - A time

Re: [PATCH v2 net-next] softirq: reduce latencies

2013-01-04 Thread Eric Dumazet
On Fri, 2013-01-04 at 01:12 -0800, Joe Perches wrote: On Fri, 2013-01-04 at 00:23 -0800, Eric Dumazet wrote: On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote: Perhaps MAX_SOFTIRQ_TIME should be #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) though it would be nicer if it were

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Eric Dumazet
On Fri, 2013-01-04 at 16:01 +, Mel Gorman wrote: Implying that it's stuck in compaction somewhere. It could be the case that compaction alters timing enough to trigger another bug. You say it tests differently depending on whether TCP or unix sockets are used which might indicate multiple

<    2   3   4   5   6   7   8   9   10   11   >