Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

2007-12-17 Thread Eric Dumazet
On Mon, 17 Dec 2007 10:05:35 -0800 Ray Lee [EMAIL PROTECTED] wrote: On Dec 17, 2007 9:55 AM, Eric Dumazet [EMAIL PROTECTED] wrote: - mid = (last - first) / 2 + first; + while (low = high) { + mid = (low + high) / 2; I think you just introduced a bug

Re: After many hours all outbound connections get stuck in SYN_SENT

2007-12-18 Thread Eric Dumazet
James Nichols a écrit : Here is a purely hypothethical (and in practice unlikely) idea: Java opens up too many sockets (more than you really request) and the kernel, for whatever reason, does not deliver packets to programs which have maxed out their fds. Well it would already help if the java

Re: After many hours all outbound connections get stuck in SYN_SENT

2007-12-18 Thread Eric Dumazet
James Nichols a écrit : Well... please dont start a flame war :( Back to your SYN_SENT problem, I suppose the remote IP is known, so you probably could post here the result of a tcdpump ? tcpdump -p -n -s 1600 host IP_of_problematic_peer -c 500 Most probably remote peer received too many

[PATCH] procfs : Move some extern declaration from fs/proc/proc_misc.c to include/linux/seq_file.h

2007-12-18 Thread Eric Dumazet
cpuinfo_op; will be taken into account in a separate patch, since its const status is arch dependant. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] fs/proc/proc_misc.c |9 - include/linux/seq_file.h | 11 +++ 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/fs

[PATCH] lib/extable.c : Removes an expensive integer divide in search_extable()

2007-12-18 Thread Eric Dumazet
,%eax 1c: 8d 04 c3lea(%ebx,%eax,8),%eax 1f: 39 08 cmp%ecx,(%eax) ... Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/lib/extable.c b/lib/extable.c index 463f456..179c087 100644 --- a/lib/extable.c +++ b/lib/extable.c @@ -57,10 +57,10

[PATCH] kernel/sys.c : Get rid of expensive divides in groups_sort()

2007-12-18 Thread Eric Dumazet
was found in groups_search() (commit d74beb9f33a5f16d2965f11b275e401f225c949d ) and at that time I changed some variables to unsigned int. I believe that a more generic fix is to make sure NGROUPS_PER_BLOCK is unsigned. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/include/linux

Re: [PATCH] kernel/sys.c : Get rid of expensive divides in groups_sort()

2007-12-18 Thread Eric Dumazet
Andrew Morton a écrit : On Wed, 19 Dec 2007 01:14:33 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: groups_sort() can be quite long if user loads a large gid table. This is because GROUP_AT(group_info, some_integer) uses an integer divide. So having to do XXX thousand divides during one syscall

Re: The code segment of the user level in PPC64 are in VMAs with write permissions

2007-12-19 Thread Eric Dumazet
Dotan Barak a écrit : Hi all. I noticed that the code segment of the user level in PPC64 machines is in a VMA with a write permission enabled. I'm using the following machine attributes: * Host Name : mtlsqt185 Host

Re: After many hours all outbound connections get stuck in SYN_SENT

2007-12-19 Thread Eric Dumazet
James Nichols a écrit : So you see outgoing SYN packets, but no SYN replies coming from the remote peer ? (you mention ACKS, but the first packet received from the remote peer should be a SYN+ACK), Right, I meant to say SYN+ACK. I don't see them coming back. So... Really unlikely a linux

Re: After many hours all outbound connections get stuck in SYN_SENT

2007-12-19 Thread Eric Dumazet
James Nichols a écrit : On 12/19/07, Eric Dumazet [EMAIL PROTECTED] wrote: James Nichols a écrit : So you see outgoing SYN packets, but no SYN replies coming from the remote peer ? (you mention ACKS, but the first packet received from the remote peer should be a SYN+ACK), Right, I meant

Re: [PATCH] Use ilog2() in fs/namespace.c

2007-12-21 Thread Eric Dumazet
On Fri, 21 Dec 2007 02:29:12 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Fri, 14 Dec 2007 18:51:06 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at compile time, not runtime. Well noted. [namespace.patch text

Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-30 Thread Eric Dumazet
Holger Hoffstaette a écrit : Hi - This regular Linux user and lkml lurker just noticed data corruption in ftp'ed files and narrowed it down to vsftpd using sendfile(). So far this has never caused problems in the past; I have not noticed this with 2.6.22.x but may have missed it. I do remember

Re: Why does reading from /dev/urandom deplete entropy so much?

2007-12-04 Thread Eric Dumazet
Marc Haber a écrit : While debugging Exim4's GnuTLS interface, I recently found out that reading from /dev/urandom depletes entropy as much as reading from /dev/random would. This has somehow surprised me since I have always believed that /dev/urandom has lower quality entropy than /dev/random,

Re: Why does reading from /dev/urandom deplete entropy so much?

2007-12-04 Thread Eric Dumazet
that the nbits estimation is less pessimistic, but also to avoid injecting false entropy. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/drivers/char/random.c b/drivers/char/random.c index 5fee056..6eccfc9 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -550,8 +550,8

Re: Possible bug from kernel 2.6.22 and above, 2.6.24-rc4

2007-12-05 Thread Eric Dumazet
Ingo Molnar a écrit : * Jie Chen [EMAIL PROTECTED] wrote: I just ran the same test on two 2.6.24-rc4 kernels: one with CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED off. The odd behavior I described in my previous e-mails were still there for both kernels. Let me know

Re: Possible bug from kernel 2.6.22 and above, 2.6.24-rc4

2007-12-05 Thread Eric Dumazet
Ingo Molnar a écrit : * Eric Dumazet [EMAIL PROTECTED] wrote: $ gcc -O2 -o burner burner.c $ ./burner Time to perform the unit of work on one thread is 0.040328 s Time to perform the unit of work on 2 threads is 0.040221 s ok, but this actually suggests that scheduling is fine

Re: Why does reading from /dev/urandom deplete entropy so much?

2007-12-05 Thread Eric Dumazet
Matt Mackall a écrit : On Tue, Dec 04, 2007 at 07:17:58PM +0100, Eric Dumazet wrote: Alan Cox a ?crit : No matter what you consider as being better, changing a 12 years old and widely used userspace interface like /dev/urandom is simply not an option. Fixing it to be more efficient

Re: [PATCHv4 0/6] sys_indirect system call

2007-11-20 Thread Eric Dumazet
Ulrich Drepper a écrit : wing patches provide an alternative implementation of the sys_indirect system call which has been discussed a few times. This no system call allows us to extend existing system call interfaces with adding more system calls. I am wondering if some parts are missing from

Re: Possible bug from kernel 2.6.22 and above

2007-11-21 Thread Eric Dumazet
Jie Chen a écrit : Hi, there: We have a simple pthread program that measures the synchronization overheads for various synchronization mechanisms such as spin locks, barriers (the barrier is implemented using queue-based barrier algorithm) and so on. We have dual quad-core AMD opterons

Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-23 Thread Eric Dumazet
Ulrich Drepper a écrit : This patch adds support for setting the O_NONBLOCK flag of the file descriptors returned by socket, socketpair, and accept. Thanks Ulrich for this v5 series. I have two more questions. 1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC support ?

Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-24 Thread Eric Dumazet
Ulrich Drepper a écrit : -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Eric Dumazet wrote: 1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC support ? Already there, see MSG_CMSG_CLOEXEC. OK, but maybe for consistency, we might accept the two mechanisms. The one added

[PATCH] get rid of NR_OPEN and introduce a sysctl_nr_open

2007-11-26 Thread Eric Dumazet
sysctl (/proc/sys/fs/nr_open) wich defaults to 1024*1024, so that admins can decide to change this limit if their workload needs it. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Cc: Alan Cox [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Documentation/filesystems/proc.txt

Re: [PATCH] get rid of NR_OPEN and introduce a sysctl_nr_open

2007-11-26 Thread Eric Dumazet
[EMAIL PROTECTED] a écrit : On Tue, 27 Nov 2007 08:09:19 +0100, Eric Dumazet said: Changing NR_OPEN is not considered safe because of vmalloc space potential exhaust. Verbiage about this point... +nr_open +--- + +Denotes the maximum number of file-handles a process can +allocate

[PATCH, v2] get rid of NR_OPEN and introduce a sysctl_nr_open

2007-11-27 Thread Eric Dumazet
of vmalloc space potential exhaust. This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to 1024*1024, so that admins can decide to change this limit if their workload needs it. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Cc: Alan Cox [EMAIL PROTECTED] Signed-off

[PATCH] RCU : move three variables to __read_mostly to save space

2007-12-10 Thread Eric Dumazet
vmlinux.before_patch Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index a66d4d1..11c815c 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -75,9 +75,9 @@ DEFINE_PER_CPU(struct rcu_data, rcu_bh_data) = { 0L }; /* Fake initialization required by compiler

[PATCH] parport : dev-timeslice is an unsigned long, not an int

2007-12-11 Thread Eric Dumazet
While auditing proc_doulongvec_ms_jiffies_minmax() usage in kernel, I found a bug in drivers/parport/procfs.c, incorrectly using sizeof(int) instead of sizeof(unsigned long) Only 64bit arches are affected by this old bug. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/drivers

[PATCH] sysctl : proc_dointvec_minmax() expects int values for min/max guard values

2007-12-12 Thread Eric Dumazet
min_sched_granularity_ns, max_sched_granularity_ns, min_wakeup_granularity_ns and max_wakeup_granularity_ns are declared unsigned long. This is incorrect since proc_dointvec_minmax() expects plain int guard values. This bug only triggers on big endian 64 bit arches. Signed-off-by: Eric Dumazet

Re: ip neigh show not showing arp cache entries?

2007-12-12 Thread Eric Dumazet
Chris Friesen a écrit : I retested it on an x86 machine and am seeing similar problems. First, arp gives the arp table as expected: [EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 arp -n Address HWtype HWaddress Flags MaskIface 172.24.0.9 ether

Re: ip neigh show not showing arp cache entries?

2007-12-12 Thread Eric Dumazet
Chris Friesen a écrit : Eric Dumazet wrote: Chris Friesen a écrit : Is this expected behaviour? Probably not... Still a 2.6.14 kernel ? Yep. Embedded hardware, so I'm unable to test with a more recent kernel. And what is the version of ip command you have on this machine ? ip -V You

Re: RFC: remove __read_mostly

2007-12-13 Thread Eric Dumazet
Adrian Bunk a écrit : I tried the following patch with a full x86 .config [1]: --- a/include/asm-x86/cache.h +++ b/include/asm-x86/cache.h -#define __read_mostly __attribute__((__section__(.data.read_mostly))) +/* #define __read_mostly __attribute__((__section__(.data.read_mostly))) */ The

Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]

2007-12-13 Thread Eric Dumazet
Christoph Lameter a écrit : On Sat, 8 Dec 2007, Ingo Molnar wrote: Good. Although we should perhaps look at that reported performance problem with SLUB. It looks like SLUB will do a memclear() for the area twice (first for the whole page, then for the thing it allocated) for the slow case.

Re: RFC: remove __read_mostly

2007-12-14 Thread Eric Dumazet
Matt Mackall a écrit : On Thu, Dec 13, 2007 at 11:20:44PM +0100, Adrian Bunk wrote: I tried the following patch with a full x86 .config [1]: --- a/include/asm-x86/cache.h +++ b/include/asm-x86/cache.h -#define __read_mostly __attribute__((__section__(.data.read_mostly))) +/* #define

Re: RFC: remove __read_mostly

2007-12-14 Thread Eric Dumazet
Arnd Bergmann a écrit : On Thursday 13 December 2007, Adrian Bunk wrote: On Thu, Dec 13, 2007 at 11:29:08PM +0100, Andi Kleen wrote: Adrian Bunk [EMAIL PROTECTED] writes: -rwxrwxr-x 1 bunk bunk 46607243 2007-12-13 19:50 vmlinux.old -rwxrwxr-x 1 bunk bunk 46598691 2007-12-13

[PATCH] Use ilog2() in fs/namespace.c

2007-12-14 Thread Eric Dumazet
We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at compile time, not runtime. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/fs/namespace.c b/fs/namespace.c index 0608388..835f14a 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -25,6 +25,7 @@ #include

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-15 Thread Eric Dumazet
Herbert Xu a écrit : Ob Tue, Dec 04, 2007 at 12:17:23AM +1100, Herbert Xu wrote: Never mind, we already have that in local_t and as Alexey correctly points out, USER is still going to be the expensive variant with the preempt_disable (well until BH gets threaded). So how about this patch? I

[RANDOM] Move two variables to read_mostly section to save memory

2007-12-16 Thread Eric Dumazet
-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/drivers/char/random.c b/drivers/char/random.c index 5fee056..af48e86 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -256,14 +256,14 @@ * The minimum number of bits of entropy before we wake up a read on * /dev/random. Should

Re: [RANDOM] Move two variables to read_mostly section to save memory

2007-12-16 Thread Eric Dumazet
Adrian Bunk a écrit : On Sun, Dec 16, 2007 at 12:45:01PM +0100, Eric Dumazet wrote: While examining vmlinux namelist on i686, I noticed : c0581300 D random_table c0581480 d input_pool c0581580 d random_read_wakeup_thresh c0581584 d random_write_wakeup_thresh c0581600 d blocking_pool

Re: [RANDOM] Move two variables to read_mostly section to save memory

2007-12-16 Thread Eric Dumazet
Adrian Bunk a écrit : On Sun, Dec 16, 2007 at 03:44:37PM +0100, Eric Dumazet wrote: Adrian Bunk a écrit : On Sun, Dec 16, 2007 at 12:45:01PM +0100, Eric Dumazet wrote: While examining vmlinux namelist on i686, I noticed : c0581300 D random_table c0581480 d input_pool c0581580 d

Re: [RANDOM] Move two variables to read_mostly section to save memory

2007-12-16 Thread Eric Dumazet
Matt Mackall a écrit : On Sun, Dec 16, 2007 at 12:45:01PM +0100, Eric Dumazet wrote: While examining vmlinux namelist on i686, I noticed : c0581300 D random_table c0581480 d input_pool c0581580 d random_read_wakeup_thresh c0581584 d random_write_wakeup_thresh c0581600 d blocking_pool

Re: [RANDOM] Move two variables to read_mostly section to save memory

2007-12-16 Thread Eric Dumazet
Adrian Bunk a écrit : On Sun, Dec 16, 2007 at 06:42:57PM +0100, Eric Dumazet wrote: Adrian Bunk a écrit : ... And even more funny, with gcc 4.2 and CONFIG_CC_OPTIMIZE_FOR_SIZE=y your patch doesn't seem to make any space difference - are you using an older compiler or even worse

[PATCH] X86 : Introduce DEFINE_PER_CPU_PAGE_ALIGNED() macro for x86 arch to shrink percpu section

2008-01-01 Thread Eric Dumazet
.data.percpu22048 3227328512 Signed-off-by: Eric Dumazet [EMAIL PROTECTED] arch/x86/kernel/cpu/common.c |2 +- arch/x86/kernel/vmlinux_32.lds.S |1 + include/asm-generic/percpu.h |4 include/asm-generic/vmlinux.lds.h |1 + include/asm-x86

Re: [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages

2008-01-02 Thread Eric Dumazet
Andi Kleen a écrit : They now look like hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30 error 4 in libacl.so.1.1.0[2b9c8caea000+6000] This makes it easier to pinpoint bugs to specific libraries. And printing the offset into a mapping also always allows to find the

Re: [PATCH x86] [6/16] Add a new arch_early_alloc() interface for x86-64

2008-01-03 Thread Eric Dumazet
On Thu, 3 Jan 2008 16:42:20 +0100 (CET) Andi Kleen [EMAIL PROTECTED] wrote: This allows to allocate memory really early before bootmem is setup. And a symbol that can be tested by the preprocessor. pgtable.h is probably not the best include for it, but also not the worst. Cc: [EMAIL

[PATCH] i386 : Make arch/x86/kernel/acpi/wakeup_32.S use a separate text section

2008-01-03 Thread Eric Dumazet
422838 458752 5501532 53f25c vmlinux.before 4610534 422838 458752 5492124 53cd9c vmlinux.after This saves 9408 bytes Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S index 1e931aa..f53e327 100644 --- a/arch/x86/kernel

Re: epoll design problems with common fork/exec patterns

2007-10-27 Thread Eric Dumazet
Marc Lehmann a écrit : Hi! I ran into what I see as unsolvable problems that make epoll useless as a generic event mechanism. I recently switched to libevent as event loop, and found that my programs work fine when it is using select or poll, but work eratically or halt when using epoll. The

Re: epoll design problems with common fork/exec patterns

2007-10-27 Thread Eric Dumazet
Marc Lehmann a écrit : On Sat, Oct 27, 2007 at 10:23:17AM +0200, Eric Dumazet [EMAIL PROTECTED] wrote: In this case, the parent process works fine until the child closes fds, after which the fds become unarmed in the parent too. This works as I have no idea what exact problem you have

Re: epoll design problems with common fork/exec patterns

2007-10-27 Thread Eric Dumazet
Marc Lehmann a écrit : On Sat, Oct 27, 2007 at 11:22:25AM +0200, Eric Dumazet [EMAIL PROTECTED] wrote: If such a bug exists on your kernel, please fill a complete bug report, giving details. As this behaviour is clearly documented in the epoll manpage, why do you think it is a bug? I think

Re: epoll design problems with common fork/exec patterns

2007-10-28 Thread Eric Dumazet
David Schwartz a écrit : 6) Epoll removes the file from the set, when the *kernel* object gets closed (internal use-count goes to zero) With that in mind, how can the code snippet above trigger a removal from the epoll set? I don't see how that can be. Suppose I add fd 8 to an

[PATCH] CFS : Use NSEC_PER_MSEC and NSEC_PER_SEC in kernel/sched.c and kernel/sysctl.c

2007-10-30 Thread Eric Dumazet
1) hardcoded 10 value is used five times in places where NSEC_PER_SEC might be more readable. 2) A conversion from nsec to msec uses the hardcoded 100 value, which is a candidate for NSEC_PER_MSEC. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/kernel/sched.c b

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : This patch increases the speed of the SLUB fastpath by improving the per cpu allocator and makes it usable for SLUB. Currently allocpercpu manages arrays of pointer to per cpu objects. This means that is has to allocate the arrays and then populate them as needed

Re: [patch 1/7] allocpercpu: Make it a true per cpu allocator by allocating from a per cpu array

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : + +enum unit_type { FREE, END, USED }; + +static u8 cpu_alloc_map[UNITS_PER_CPU] = { 1, }; You mean END here instead of 1 :) +/* + * Allocate an object of a certain size + * + * Returns a per cpu pointer that must not be directly used. + */ +static void

Re: [patch 3/7] Allocpercpu: Do __percpu_disguise() only if CONFIG_DEBUG_VM is set

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : Disguising costs a few cycles in the hot paths. So switch it off if we are not debuggin. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/percpu.h |4 1 file changed, 4 insertions(+) Index: linux-2.6/include/linux/percpu.h

Re: [PATCH] [net/ipv4]: fib_seq_show function adjustment to get a more sensable output of /proc/net/route

2007-10-22 Thread Eric Dumazet
Denis Cheng a écrit : the temporary bf[127] char array is redundant, and the specified width 127 make the output of /proc/net/route include many trailing spaces; since most terminal's cols are less than 127, this made every fib entry occupy two lines, after applied this patch, the output of

Re: [git pull] more SLUB updates for 2.6.25

2008-02-09 Thread Eric Dumazet
Christoph Lameter a écrit : On Fri, 8 Feb 2008, Eric Dumazet wrote: And SLAB/SLUB allocators, even if only used from process context, want to disable/re-enable interrupts... Not any more. The new fastpath does allow avoiding interrupt enable/disable and we will be hopefully able

Re: tbench regression in 2.6.25-rc1

2008-02-14 Thread Eric Dumazet
Zhang, Yanmin a écrit : Comparing with kernel 2.6.24, tbench result has regression with 2.6.25-rc1. 1) On 2 quad-core processor stoakley: 4%. 2) On 4 quad-core processor tigerton: more than 30%. bisect located below patch. b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit commit

Re: tbench regression in 2.6.25-rc1

2008-02-15 Thread Eric Dumazet
Zhang, Yanmin a écrit : On Fri, 2008-02-15 at 07:05 +0100, Eric Dumazet wrote: Zhang, Yanmin a �crit : Comparing with kernel 2.6.24, tbench result has regression with 2.6.25-rc1. 1) On 2 quad-core processor stoakley: 4%. 2) On 4 quad-core processor tigerton: more than 30%. bisect

Re: Is netif_tx_lock() SMP PREEMPT safe?

2008-02-15 Thread Eric Dumazet
Marin Mitov a écrit : Hi all, As in: include/linux/netdevice.h (kernel-2.6.24.2) one finds: static inline void __netif_tx_lock(struct net_device *dev, int cpu) { spin_lock(dev-_xmit_lock); dev-xmit_lock_owner = cpu; } static inline void netif_tx_lock(struct net_device *dev) {

Re: [Suggestion] net-ipv6: format %8s change to %16s in rt6_info_route function of route.c

2012-11-01 Thread Eric Dumazet
On Thu, 2012-11-01 at 14:45 +0800, Chen Gang wrote: Hello: 1) For Public Kernel: A) in rt6_info_route function of net/ipv6/route.c B) the length of rt-rt6i_dev-name is 16 (IFNAMSIZ) C) using %16s is better than %8s (it will be more beautiful) (also suggest to delete

Re: linux-next: manual merge of the net-next tree with the net tree

2012-09-24 Thread Eric Dumazet
On Tue, 2012-09-25 at 12:34 +1000, Stephen Rothwell wrote: Hi all, Today's linux-next merge of the net-next tree got a conflict in net/ipv4/raw.c between commit ab43ed8b7490 (ipv4: raw: fix icmp_filter ()) from the net tree and commit 5640f7685831 (net: use a per task frag allocator) from

Re: linux-next: manual merge of the net-next tree with the net tree

2012-09-24 Thread Eric Dumazet
On Tue, 2012-09-25 at 01:13 -0400, David Miller wrote: From: Eric Dumazet eric.duma...@gmail.com Date: Tue, 25 Sep 2012 07:10:42 +0200 Oops, my bad, net/ipv4/raw.c changes in 5640f7685831 (net: use a per task frag allocator) should not be there : I accidentally left a debugging

[PATCH net-next] net: raw: revert unrelated change

2012-09-25 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com Commit 5640f7685831 (net: use a per task frag allocator) accidentally contained an unrelated change to net/ipv4/raw.c, later committed (without the pr_err() debugging bits) in net tree as commit ab43ed8b749 (ipv4: raw: fix icmp_filter()) This patch reverts

Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-03 Thread Eric Dumazet
On Wed, 2012-10-03 at 10:47 +0100, Mel Gorman wrote: On Tue, Oct 02, 2012 at 03:48:57PM -0700, Rick Jones wrote: PS - I trust it is the receive-side throughput being reported/used with UDP_STREAM :) Good question. Now that I examine the scripts, it is in fact the sending side that is

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-04 Thread Eric Dumazet
corrupted. If you're lucky, it might be something that netdev will recognize as already fixed. I have the same problem on the exact same hardware and found the cause: Author: Eric Dumazet eric.duma...@gmail.com Date: Tue Apr 10 20:08:39 2012 + net: allow pskb_expand_head() to get

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-04 Thread Eric Dumazet
On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote: O Since skb_recycle() resets skb-data using (skb-head + NET_SKB_PAD), a recycled skb going multiple times through a path that needs to expand skb head will get bigger and bigger each time, and you eventually end up with an allocation

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-04 Thread Eric Dumazet
On Thu, 2012-10-04 at 19:09 +0200, Maxime Bizon wrote: yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each packet forwarded has its headroom expanded, it is then recycled and gets its original default headroom back, then it gets forwarded, expanded, ... Hmm, this sounds

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-04 Thread Eric Dumazet
On Thu, 2012-10-04 at 19:34 +0200, Maxime Bizon wrote: On Thu, 2012-10-04 at 19:17 +0200, Eric Dumazet wrote: yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each packet forwarded has its headroom expanded, it is then recycled and gets its original default headroom

Re: [PATCH] net: Fix skb_under_panic oops in neigh_resolve_output

2012-10-05 Thread Eric Dumazet
On Thu, 2012-10-04 at 20:05 -0700, ramesh.naga...@gmail.com wrote: From: Ramesh Nagappa ramesh.naga...@ericsson.com The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Thu, 2012-10-04 at 18:29 +0200, Eric Dumazet wrote: On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote: On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote: Hi, Francois is right that a GFP_ATOMIC allocation from pskb_expand_head() is failing, which can easily happen

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 12:49 +0200, Maxime Bizon wrote: On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote: By the way, the commit you pointed has no effect on the reallocation performed by pskb_expand_head() : The commit has a side effect, because the problem appeared after

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 14:22 +0200, Eric Dumazet wrote: On Fri, 2012-10-05 at 12:49 +0200, Maxime Bizon wrote: On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote: By the way, the commit you pointed has no effect on the reallocation performed by pskb_expand_head() : The commit

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 14:37 +0200, Eric Dumazet wrote: diff --git a/net/core/skbuff.c b/net/core/skbuff.c index cdc2859..f6c1f52 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, { int i

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote: New convention would be : pass number of needed bytes after current tail, not after current end. Fully agree on this Here is the proposal : Change all occurrences of : pskb_expand_head(skb, 0, 0, gfp) to pskb_realloc_head(skb,

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 16:50 +0200, Maxime Bizon wrote: On Fri, 2012-10-05 at 15:02 +0200, Eric Dumazet wrote: On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote: New convention would be : pass number of needed bytes after current tail, not after current end. Fully agree

Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 17:15 +0200, Maxime Bizon wrote: You think removing skb_recycle() is too big a change for stable ? Driver change is simple, as recycling is not guaranteed today you have this: if (!try_recycle(skb)) skb = alloc_skb() we just remove the try_recycle part, we

RE: [PATCH] net: Fix skb_under_panic oops in neigh_resolve_output

2012-10-05 Thread Eric Dumazet
On Fri, 2012-10-05 at 12:33 -0400, Ramesh Nagappa wrote: Hi Eric, Yes, that is a good optimization. neigh_resolve_output() also has the __skb_pull() outside the loop, is that required ? The changes would be like ... neigh_resolve_output() ... -__skb_pull(skb,

Re: [REGRESSION] Kernel 3.6 drops network packets instead of forwarding

2012-10-06 Thread Eric Dumazet
On Sat, 2012-10-06 at 11:52 +, Chini, Georg (HP App Services) wrote: Hello, I have a network issue with kernel 3.6. My machine is set up as router and with 3.6 it sometimes silently drops reply packets instead of forwarding. Example tcpdump on the router for ping www.heise.de from a

Re: AW: [REGRESSION] Kernel 3.6 drops network packets instead of forwarding

2012-10-07 Thread Eric Dumazet
On Sat, 2012-10-06 at 17:52 +, Chini, Georg (HP App Services) wrote: Hello, Looks like the bug fixed by : http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=f4ef85bbda96324785097356336bc79cdd37db0a (David Miller will send it to stable team shortly)

Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash

2012-11-06 Thread Eric Dumazet
On Tue, 2012-11-06 at 20:39 -0500, Dave Jones wrote: On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote: tcp_recvmsg contains a sanity check that WARNs when there is a gap between the socket's copied_seq and the first buffer in the sk_receive_queue. In theory, the TCP stack

Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers

2012-11-07 Thread Eric Dumazet
On Wed, 2012-11-07 at 10:54 -0500, Dave Jones wrote: It sounds more appropriate to me, instead of silently wedging the box. At least with that approach we have a chance of finding out what happened. Its quite the opposite. If bug is still there 6 months after the commits that broke the

Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers

2012-11-07 Thread Eric Dumazet
On Wed, 2012-11-07 at 11:43 -0500, Dave Jones wrote: dude, look at the bug reports I just pointed you at. People _are_ aware there are bugs there. If I remember well, I helped to fix some of them. If you turn that into a BUG() those reports would never have been filed. How is that

Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug

2012-11-07 Thread Eric Dumazet
On Wed, 2012-11-07 at 11:33 -0800, Julius Werner wrote: tcp_recvmsg contains a sanity check that WARNs when there is a gap between the socket's copied_seq and the first buffer in the sk_receive_queue. In theory, the TCP stack makes sure that This Should Never Happen (TM)... however, practice

Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug

2012-11-07 Thread Eric Dumazet
On Wed, 2012-11-07 at 13:14 -0800, Julius Werner wrote: What I find very sad in all this is that you didnt mention the driver that was triggering this bug. Sorry, I was just trying to keep this thread focussed on one patch. The bug report that led me to this is publicly accessible at

Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug

2012-11-07 Thread Eric Dumazet
On Wed, 2012-11-07 at 15:33 -0800, Eric Dumazet wrote: So you probably are fighting a bug we already fixed in upstream kernel. (commit c8628155ece363 tcp: reduce out_of_order memory use did not played well with cloned skbs.) This issue was already discussed on netdev in the past. If you

Re: [Q] Default SLAB allocator

2012-10-19 Thread Eric Dumazet
On Fri, 2012-10-19 at 09:03 +0900, JoonSoo Kim wrote: Hello, Eric. Thank you very much for a kind comment about my question. I have one more question related to network subsystem. Please let me know what I misunderstand. 2012/10/14 Eric Dumazet eric.duma...@gmail.com: In latest kernels

Re: [PATCH v4] posix timers: allocate timer id per process

2012-10-19 Thread Eric Dumazet
On Fri, 2012-10-19 at 11:50 +0400, Stanislav Kinsbursky wrote: v4: 1) a couple of coding style fixes (lines over 80 characters) v3: 1) hash calculation simlified to improve perfomance. v2: 1) Hash table become RCU-friendly. Hash table search now done under RCU lock protection. This

Re: [PATCH v4] posix timers: allocate timer id per process

2012-10-19 Thread Eric Dumazet
On Fri, 2012-10-19 at 13:38 +0400, Stanislav Kinsbursky wrote: 19.10.2012 11:56, Eric Dumazet пишет: I wonder if some applications relied on our idr, assuming they would get low values for their timer id. (We could imagine some applications use a table indexed by the timer id) Hmm

Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Eric Dumazet
On Mon, 2012-10-22 at 09:46 -0700, Peter LaDow wrote: I posted this problem some time back on the linux-rt-users and netfilter lists. Since then, we thought we had a workaround to avoid this problem, so we dropped the issue. But now 5 months later, the problem has reappeared. And this time

Re: 3.7-rc2 regression : file copied to CIFS-mounted directory corrupted

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 05:38 +, Jongman Heo wrote: Hmm, I've just met the issue, with the commit 5640f768 reverted. It seems that the issue does not always happen. So, my bisection may not be correct. At this moment, I don't have enough time to do bisection again.. Regards. What

Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote: 2012/10/22 Christoph Lameter c...@linux.com: On Sun, 21 Oct 2012, Joonsoo Kim wrote: kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA. This patch optimize this case, so when @flags = __GFP_DMA, it

Re: [PATCH v5] posix timers: allocate timer id per process

2012-10-23 Thread Eric Dumazet
skinsbur...@parallels.com --- SGTM Signed-off-by: Eric Dumazet eduma...@google.com Thanks ! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read

Re: Re: 3.7-rc2 regression : file copied to CIFS-mounted directory corrupted

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 08:17 +, Jongman Heo wrote: FYI, vmxnet3 driver is used for ethernet. Yes, this driver needs some changes #define VMXNET3_MAX_TX_BUF_SIZE (1 14) Thats 16KB As we can now provide up to 32KB fragments we broke something. vmxnet3_tq_xmit() needs to split large

Re: [Pv-drivers] 3.7-rc2 regression : file copied to CIFS-mounted directory corrupted

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 03:02 -0700, Shreyas Bhatewara wrote: Please dont top post on netdev or lkml Well, actually the driver does split large frags into frags of VMXNET3_MAX_TX_BUF_SIZE bytes each. vmxnet3_drv.c 711 while (len) { 712 u32 buf_size; 713 714

Re: [Pv-drivers] 3.7-rc2 regression : file copied to CIFS-mounted directory corrupted

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 15:50 +0200, Eric Dumazet wrote: Only the skb head is handled in the code you copy/pasted. You need to generalize that to code in lines ~754 Then, the number of estimated descriptors is bad : /* conservatively estimate # of descriptors to use */ count

Re: [PATCH v5] posix timers: allocate timer id per process

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 23:47 +0200, Thomas Gleixner wrote: Not so good to me. Signed-off-by: Eric Dumazet eduma...@google.com And that should be either an Acked-by or a Reviewed-by. You can't sign off on patches which have not been submitted or transported by you. I actually gave some

Re: [PATCH v5] posix timers: allocate timer id per process

2012-10-23 Thread Eric Dumazet
On Wed, 2012-10-24 at 00:33 +0200, Thomas Gleixner wrote: On Tue, 23 Oct 2012, Eric Dumazet wrote: On Tue, 2012-10-23 at 23:47 +0200, Thomas Gleixner wrote: Not so good to me. Signed-off-by: Eric Dumazet eduma...@google.com And that should be either an Acked

Re: Process Hang in __read_seqcount_begin

2012-10-23 Thread Eric Dumazet
On Tue, 2012-10-23 at 17:15 -0700, Peter LaDow wrote: (Sorry for the subject change, but I wanted to try and pull in those who work on RT issues, and the subject didn't make that obvious. Please search for the same subject without the RT Linux trailing text.) Well, more information. Even

Re: Process Hang in __read_seqcount_begin

2012-10-24 Thread Eric Dumazet
On Wed, 2012-10-24 at 09:30 -0700, Peter LaDow wrote: On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet eric.duma...@gmail.com wrote: Could you try following patch ? Thanks for the suggestion. But I have a question about the patch below. + /* Note : cmpxchg() is a memory barrier, we

Re: net: fix typo in freescale/ucc_geth.c

2012-10-09 Thread Eric Dumazet
On Tue, 2012-10-09 at 10:52 +1100, Michael Neuling wrote: The following patch: acb600d net: remove skb recycling added dev_free_skb() to drivers/net/ethernet/freescale/ucc_geth.c This is a typo and should be dev_kfree_skb(). This fixes this. Signed-off-by: Michael Neuling

Re: [PATCH v3 10/10] mm: kill vma flag VM_RESERVED and mm-reserved_vm counter

2012-10-09 Thread Eric Dumazet
On Tue, 2012-07-31 at 14:42 +0400, Konstantin Khlebnikov wrote: A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags

Re: Instead of IP addresses the kernel started to show zero's

2012-10-09 Thread Eric Dumazet
On Tue, 2012-10-09 at 15:36 +0300, Dan Carpenter wrote: Add netdev to the CC list. netdev already in the CC list by Borislav Petkov Reporter was (kindly) requested to try 3.6-rc7 +, and we got no answer. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of

<    1   2   3   4   5   6   7   8   9   10   >