Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-11 Thread Eric Dumazet
On Wed, 2016-05-11 at 07:40 -0700, Eric Dumazet wrote: > On Wed, May 11, 2016 at 6:13 AM, Hannes Frederic Sowa > wrote: > > > This looks racy to me as the ksoftirqd could be in the progress to stop > > and we would miss another softirq invocation. > > Looking at sm

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-11 Thread Eric Dumazet
On Wed, May 11, 2016 at 7:38 AM, Paolo Abeni wrote: > Uh, we have likely the same issue in the net_rx_action() function, which > also execute with bh disabled and check for jiffies changes even on > single core hosts ?!? That is why we have a loop break after netdev_budget=300 packets. And a sys

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-11 Thread Eric Dumazet
On Wed, May 11, 2016 at 6:13 AM, Hannes Frederic Sowa wrote: > This looks racy to me as the ksoftirqd could be in the progress to stop > and we would miss another softirq invocation. Looking at smpboot_thread_fn(), it looks fine : if (!ht->thread_should_run(td->cpu)) {

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-11 Thread Eric Dumazet
On Wed, 2016-05-11 at 11:48 +0200, Paolo Abeni wrote: > Hi Eric, > On Tue, 2016-05-10 at 15:51 -0700, Eric Dumazet wrote: > > On Wed, 2016-05-11 at 00:32 +0200, Hannes Frederic Sowa wrote: > > > > > Not only did we want to present this solely as a bugfix but a

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Wed, 2016-05-11 at 00:32 +0200, Hannes Frederic Sowa wrote: > Not only did we want to present this solely as a bugfix but also as as > performance enhancements in case of virtio (as you can see in the cover > letter). Given that a long time ago there was a tendency to remove > softirqs complete

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, 2016-05-10 at 15:02 -0700, Eric Dumazet wrote: > On Tue, 2016-05-10 at 14:53 -0700, Eric Dumazet wrote: > > On Tue, 2016-05-10 at 17:35 -0400, Rik van Riel wrote: > > > > > You might need another one of these in invoke_softirq() > > > > > > &

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, 2016-05-10 at 14:53 -0700, Eric Dumazet wrote: > On Tue, 2016-05-10 at 17:35 -0400, Rik van Riel wrote: > > > You might need another one of these in invoke_softirq() > > > > Excellent. > > I gave it a quick try (without your suggestion), and host se

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, 2016-05-10 at 17:35 -0400, Rik van Riel wrote: > You might need another one of these in invoke_softirq() > Excellent. I gave it a quick try (without your suggestion), and host seems to survive a stress test. Of course we do have to fix these problems : [ 147.781629] NOHZ: local_softi

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, 2016-05-10 at 14:09 -0700, Eric Dumazet wrote: > On Tue, May 10, 2016 at 1:46 PM, Hannes Frederic Sowa > wrote: > > > I agree here, but I don't think this patch particularly is a lot of > > bloat and something very interesting people can play with and extend

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, May 10, 2016 at 1:46 PM, Hannes Frederic Sowa wrote: > I agree here, but I don't think this patch particularly is a lot of > bloat and something very interesting people can play with and extend upon. > Sure, very rarely patch authors think their stuff is bloat. I prefer to fix kernel so

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, 2016-05-10 at 18:03 +0200, Paolo Abeni wrote: > If a single core host is under network flood, i.e. ksoftirqd is > scheduled and it eventually (after processing ~640 packets) will let the > user space process run. The latter will execute a syscall to receive a > packet, which will have to d

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Eric Dumazet
On Tue, 2016-05-10 at 16:11 +0200, Paolo Abeni wrote: > Currently, the softirq loop can be scheduled both inside the ksofirqd kernel > thread and inside any running process. This makes nearly impossible for the > process scheduler to balance in a fair way the amount of time that > a given core spen

Re: [lkp] [net] 9317bb6982: INFO: task cat-kmsg:893 blocked for more than 300 seconds.

2016-05-09 Thread Eric Dumazet
On Mon, May 9, 2016 at 6:26 PM, Huang, Ying wrote: > Hi, Eric, > > kernel test robot writes: >> FYI, we noticed the following commit: >> >> git://internal_merge_and_test_tree devel-catchup-201604281529 >> commit 9317bb69824ec8d078b0b786b6971aedb0af3d4f ("net: SOCKWQ_ASYNC_NOSPACE >> optimization

Re: [PATCH net-next v2] block/drbd: use nla_put_u64_64bit()

2016-05-04 Thread Eric Dumazet
On Wed, 2016-05-04 at 12:50 -0400, David Miller wrote: > From: Eric Dumazet > Date: Wed, 04 May 2016 07:27:06 -0700 > > > kernel was fine, and most user land apps were fine as well. > > Userland should really not have to deal with garbage like this. > > And because

Re: [PATCH net-next v2] block/drbd: use nla_put_u64_64bit()

2016-05-04 Thread Eric Dumazet
On Wed, 2016-05-04 at 14:49 +0200, Nicolas Dichtel wrote: > Le 04/05/2016 11:05, Lars Ellenberg a écrit : > [snip] > > We don't have an "alignment problem" there, btw. > > Last time I checked, we did work fine without this alignment magic, > > we already take care of that, yes, even on affected arc

Re: [net] 5413d1babe: INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]

2016-05-03 Thread Eric Dumazet
On Tue, May 3, 2016 at 7:47 PM, kernel test robot wrote: > > FYI, we noticed the following commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > commit 5413d1babe8f10de13d72496c12b862eef8ba613 ("net: do not block BH while > processing socket backlog") > > on tes

Re: __napi_alloc_skb failures locking up the box

2016-04-30 Thread Eric Dumazet
On Sat, 2016-04-30 at 22:24 +0300, Aaro Koskinen wrote: > Hi, > > I have old NAS box (Thecus N2100) with 512 MB RAM, where rsync from NFS -> > disk reliably results in temporary out-of-memory conditions. > > When this happens the dmesg gets flooded with below logs. If the serial > console logging

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-30 Thread Eric Dumazet
On Sat, 2016-04-30 at 10:12 -0700, Linus Torvalds wrote: > On Sat, Apr 30, 2016 at 9:45 AM, Eric Dumazet wrote: > > > > I use hash_32() in net/sched/sch_fq.c, for all packets sent by Google > > servers. (Note that I did _not_ use hash_ptr()) > > > > That's

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-30 Thread Eric Dumazet
On Sat, 2016-04-30 at 15:02 +0200, Thomas Gleixner wrote: > Yes. So I tested those two: > > u32 hash_64(u64 key) > { >key = ~key + (key << 18); >key ^= key >> 31; >key += (key << 2)) + (key << 4); >key ^= key >> 11; >key += key << 6; >key ^= key >>

[PATCH] mm: tighten fault_in_pages_writeable()

2016-04-26 Thread Eric Dumazet
From: Eric Dumazet copy_page_to_iter_iovec() is currently the only user of fault_in_pages_writeable(), and it definitely can use fragments from high order pages. Make sure fault_in_pages_writeable() is only touching two adjacent pages at most, as claimed. Signed-off-by: Eric Dumazet

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet
On Sun, 2016-04-24 at 14:25 -0700, Eric Dumazet wrote: > On Sun, 2016-04-24 at 17:13 -0400, valdis.kletni...@vt.edu wrote: > > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said: > > > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > > > > On S

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet
On Sun, 2016-04-24 at 17:13 -0400, valdis.kletni...@vt.edu wrote: > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said: > > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > > >

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet
On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > >>> + return !debug_locks || > > >>> + lockdep_is_held(&sk->sk_lock) || > > > Issue here is that

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet
On Sun, 2016-04-24 at 14:54 -0400, David Miller wrote: > From: Hannes Frederic Sowa > Date: Sun, 24 Apr 2016 20:48:24 +0200 > > > Eric's patch is worth to apply anyway, but I am not sure if it solves > > the (fundamental) problem. I couldn't reproduce it with the exact next- > > tag provided in t

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet
On Sun, 2016-04-24 at 20:48 +0200, Hannes Frederic Sowa wrote: > On 24.04.2016 20:38, David Miller wrote: > > From: Hannes Frederic Sowa > > Date: Thu, 21 Apr 2016 15:49:37 +0200 > > > >> On 21.04.2016 15:31, Eric Dumazet wrote: > >>> On Thu, 2016-04-

Re: Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420

2016-04-22 Thread Eric Dumazet
On Fri, 2016-04-22 at 21:02 -0700, Shi, Yang wrote: > Hi David, > > When I ran some test on a nfs mounted rootfs, I got the below warning > with LOCKDEP enabled on linux-next-20160420: > > WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408 > udp_queue_rcv_skb+0x3d0/0x660 > Modules linked in: > C

Re: [PATCH net-next 2/9] libnl: nla_put_le64(): align on a 64-bit area

2016-04-22 Thread Eric Dumazet
On Fri, 2016-04-22 at 17:31 +0200, Nicolas Dichtel wrote: > nla_data() is now aligned on a 64-bit area. > > Signed-off-by: Nicolas Dichtel > --- > include/net/netlink.h | 8 +--- > include/net/nl802154.h| 6 ++ > net/ieee802154/nl802154.c | 13 - > 3 files changed,

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-21 Thread Eric Dumazet
On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: > On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: > > Hi, > > > > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of > > > uptime, I pi

Re: [patch -next] udp: fix if statement in SIOCINQ ioctl

2016-04-18 Thread Eric Dumazet
l SIOCINQ') > Signed-off-by: Dan Carpenter Acked-by: Eric Dumazet

Re: [PATCH] cls_cgroup: get sk_classid only from full sockets

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 14:37 +0300, Konstantin Khlebnikov wrote: > skb->sk could point to timewait or request socket which has no sk_classid. > Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify". > > Signed-off-by: Konstantin Khlebnikov > --- Acked-by: Eric Dumazet Thanks !

Re: [PATCH] net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC

2016-04-18 Thread Eric Dumazet
ents the same logic. > > Signed-off-by: Konstantin Khlebnikov > --- > drivers/net/ethernet/mellanox/mlx4/en_rx.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Acked-by: Eric Dumazet Thanks !

Re: linux-next: manual merge of the net-next tree with the net tree

2016-04-17 Thread Eric Dumazet
On Mon, 2016-04-18 at 11:31 +1000, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the net-next tree got a conflict in: > > net/ipv4/udp.c > > between commit: > > d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") > > from the net tree and commit: > > c

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-15 Thread Eric Dumazet
On Fri, 2016-04-15 at 08:42 -0400, Jamal Hadi Salim wrote: > On 16-04-14 01:49 PM, Eric Dumazet wrote: > > > And what would be the chosen behavior ? > > > > TBF is probably a bad example because it started life as > a classless qdisc. There was only one built-in f

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 18:08 +0200, Jiri Kosina wrote: > On Thu, 14 Apr 2016, Phil Sutter wrote: > > > > > I've came across the behavior where adding a child qdisc and then > > > > deleting > > > > it again makes the networking dysfunctional (I guess that's because all > > > > of > > > > a sudd

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 18:22 +0200, Phil Sutter wrote: > And those being invisible can be overridden using 'tc qd add', right? > AFAIR they're not listed because they don't properly register, so the > system doesn't care to override them. In this case we could change all > classful qdiscs to restor

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote: > On Thu, 14 Apr 2016, Phil Sutter wrote: > > > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default > > one upon deletion instead of noop_qdisc, hence I would describe > > the situation using the words 'inconsistent' and 'acc

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote: > Hi, > > I've came across the behavior where adding a child qdisc and then deleting > it again makes the networking dysfunctional (I guess that's because all of > a sudden there is absolutely no working qdisc on the device, although > there

Re: [RFC] Is it a bug for nfs on udp6 mode or kernel?

2016-04-13 Thread Eric Dumazet
On Wed, 2016-04-13 at 19:28 +0800, Ding Tianhong wrote: > Hi everyone: > > I have met this problem when I try to test udp6 for nfs connection, my > environment is: > > Server: > kernel: 4.1.15 > IP:::36/64 > MTU:1500 > Setting: /etc/exports:/home/nfs *(rw,sync,no_subtree_check,no_root_squash

Re: TCP reaching to maximum throughput after a long time

2016-04-12 Thread Eric Dumazet
On Tue, 2016-04-12 at 20:08 -0700, Yuchung Cheng wrote: > based on the prev thread I propose we disable hystart ack-train. It is > brittle under various circumstances. We've disabled that at Google for > years. Right, but because we also use sch_fq packet scheduler and pacing ;)

Re: TCP reaching to maximum throughput after a long time

2016-04-12 Thread Eric Dumazet
On Tue, 2016-04-12 at 13:23 -0700, Ben Greear wrote: > It worked well enough for years that I didn't even know other algorithms were > available. It was broken around 4.0 time, and I reported it to the list, > and no one seemed to really care enough to do anything about it. I changed > to reno a

Re: TCP reaching to maximum throughput after a long time

2016-04-12 Thread Eric Dumazet
On Tue, 2016-04-12 at 13:11 -0700, Ben Greear wrote: > On 04/12/2016 12:31 PM, Machani, Yaniv wrote: > > On Tue, Apr 12, 2016 at 18:04:52, Ben Greear wrote: > >> On 04/12/2016 07:52 AM, Eric Dumazet wrote: > >>> On Tue, 2016-04-12 at 12:17 +, Machani, Yaniv wrot

Re: [PATCH net v3] net: sched: do not requeue a NULL skb

2016-04-12 Thread Eric Dumazet
55a93b3ea780 ("qdisc: validate skb without holding lock") > Signed-off-by: Lars Persson > --- Acked-by: Eric Dumazet Thanks !

Re: TCP reaching to maximum throughput after a long time

2016-04-12 Thread Eric Dumazet
On Tue, 2016-04-12 at 12:17 +, Machani, Yaniv wrote: > Hi, > After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance > degradation over Wi-Fi. > In 3.14 kernel, TCP got to its max throughout after less than a second, while > in the 4.4 it is taking ~20-30 seconds. > UDP

Re: [PATCH net v2] net: sched: do not requeue a NULL skb

2016-04-11 Thread Eric Dumazet
On Mon, 2016-04-11 at 16:19 -0700, Cong Wang wrote: > My point is, for example, in OOM case, we don't know processin > more SKB would make it better or worse. Maybe we really need to > check the error code to decide to continue to exit? Really, given this bug has been there for a long time (v3.18

Re: [PATCH net v2] net: sched: do not requeue a NULL skb

2016-04-11 Thread Eric Dumazet
On Mon, 2016-04-11 at 11:26 -0700, Eric Dumazet wrote: > On Mon, 2016-04-11 at 11:02 -0700, Cong Wang wrote: > > > I am fine with either way as long as the loop stops on failure. Note that skb that could not be validated is already freed. So I do not see any value from stopping the

Re: [PATCH net v2] net: sched: do not requeue a NULL skb

2016-04-11 Thread Eric Dumazet
On Mon, 2016-04-11 at 11:02 -0700, Cong Wang wrote: > I am fine with either way as long as the loop stops on failure. > Folding the test "if (skb)" into one also requires to retake the spinlock. Adding the likely() in this path would probably help as well. diff --git a/net/sched/sch_generic.c b/

Re: [PATCH net v2] net: sched: do not requeue a NULL skb

2016-04-11 Thread Eric Dumazet
On Mon, 2016-04-11 at 17:17 +0200, Lars Persson wrote: > > On 04/11/2016 04:22 PM, Eric Dumazet wrote: > > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: > > > >> I though it would be prudent because the queue can be non-empty even for > >> the case o

Re: [PATCH net v2] net: sched: do not requeue a NULL skb

2016-04-11 Thread Eric Dumazet
On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: > I though it would be prudent because the queue can be non-empty even for > the case of skb=NULL. So should it be there in this patch, another patch > or not at all ? Then maybe change return code ? It seems strange that a validate_xmit_s

Re: [PATCH net v2] net: sched: do not requeue a NULL skb

2016-04-11 Thread Eric Dumazet
On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote: > A failure in validate_xmit_skb_list() triggered an unconditional call > to dev_requeue_skb with skb=NULL. This slowly grows the queue > discipline's qlen count until all traffic through the queue stops. > > By introducing a NULL check in dev

Re: Kernel crash on startup - bisected to commit 3b24d854cb35

2016-04-08 Thread Eric Dumazet
r_write+0x27. As this crash is likely configuration dependent, > a copy of my .config is also attached. Note that IPv6 is turned off on my > machine. > > Please let me know if any other info is needed. Can you double check you have this fix ? commit 8501786929de4616b10b8059ad97abd304a7d

Re: How to get creatior PID information for the local tcp connection

2016-04-07 Thread Eric Dumazet
On Thu, 2016-04-07 at 11:26 -0700, Eric Dumazet wrote: > On Thu, 2016-04-07 at 23:01 +0530, Vishnu Pratap Singh wrote: > > Hi, > > > > > > Issue - How to get PID information for the local tcp connection > > > > > > > > i want to get the

Re: How to get creatior PID information for the local tcp connection

2016-04-07 Thread Eric Dumazet
On Thu, 2016-04-07 at 23:01 +0530, Vishnu Pratap Singh wrote: > Hi, > > > Issue - How to get PID information for the local tcp connection > > > > i want to get the creator PID for each socket in user space for local > tcp connection, i see in kernel there is support for returing PID with > "S

Re: [PATCH net-next 1/6] net: skbuff: don't use union for napi_id and sender_cpu

2016-04-01 Thread Eric Dumazet
On Fri, 2016-04-01 at 12:49 +0800, Jason Wang wrote: > > On 04/01/2016 10:55 AM, Eric Dumazet wrote: > > On Fri, 2016-04-01 at 10:13 +0800, Jason Wang wrote: > > > > > >> The problem is we want to support busy polling for tun. This needs > >> napi_id to

Re: [PATCH net-next 1/6] net: skbuff: don't use union for napi_id and sender_cpu

2016-03-31 Thread Eric Dumazet
On Fri, 2016-04-01 at 10:13 +0800, Jason Wang wrote: > > The problem is we want to support busy polling for tun. This needs > napi_id to be passed to tun socket by sk_mark_napi_id() during > tun_net_xmit(). But before reaching this, XPS will set sender_cpu will > make us can't see correct napi_i

Re: [PATCH net-next 1/6] net: skbuff: don't use union for napi_id and sender_cpu

2016-03-31 Thread Eric Dumazet
On Thu, 2016-03-31 at 13:50 +0800, Jason Wang wrote: > We use a union for napi_id and send_cpu, this is ok for most of the > cases except when we want to support busy polling for tun which needs > napi_id to be stored and passed to socket during tun_net_xmit(). In > this case, napi_id was overridde

Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-29 Thread Eric Dumazet
On Tue, 2016-03-29 at 17:27 +0800, Wei-Ning Huang wrote: > Adding some chromium devs to the thread. > > In, http://lxr.free-electrons.com/source/mm/page_alloc.c#L3152 > > The default mm retry allocation when 'order <= > PAGE_ALLOC_COSTLY_ORDER' of gfp_mask contains __GFP_REPEAT. > PAGE_ALLOC_COST

Re: [PATCH] vlan: propagate gso_min_segs

2016-03-23 Thread Eric Dumazet
On Wed, 2016-03-23 at 14:25 -0400, David Miller wrote: > From: Eric Dumazet > Date: Tue, 22 Mar 2016 19:33:52 -0700 > > > On Wed, 2016-03-23 at 09:35 +0800, Haishuang Yan wrote: > >> vlan drivers lack proper propagation of gso_min_segs from lower device. > >>

Re: net/sctp: stack-out-of-bounds in sctp_getsockopt

2016-03-23 Thread Eric Dumazet
On Thu, 2016-03-24 at 00:42 +0800, Baozeng wrote: > Thanks for your quick patch. I tested it but it still reproduce the > bug. We should limit the length of the name, > not the prefix. The following patch fixs it. > > diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c

Re: [PATCH] vlan: propagate gso_min_segs

2016-03-22 Thread Eric Dumazet
On Wed, 2016-03-23 at 09:35 +0800, Haishuang Yan wrote: > vlan drivers lack proper propagation of gso_min_segs from lower device. > > Signed-off-by: Haishuang Yan > --- The plan was to get rid of gso_min_segs, as nothing uses it. Otherwise I would have included this in my recent patches... For

Re: net/sctp: stack-out-of-bounds in sctp_getsockopt

2016-03-22 Thread Eric Dumazet
On Tue, 2016-03-22 at 08:21 -0700, Eric Dumazet wrote: > On Tue, 2016-03-22 at 23:08 +0800, Baozeng Ding wrote: > > Hi all, > > > > The following program triggers an out-of-bounds bug in > > sctp_getsockopt. The kernel version is 4.5 (on Mar 16 > > commit 09fd671

Re: net/sctp: stack-out-of-bounds in sctp_getsockopt

2016-03-22 Thread Eric Dumazet
On Tue, 2016-03-22 at 23:08 +0800, Baozeng Ding wrote: > Hi all, > > The following program triggers an out-of-bounds bug in > sctp_getsockopt. The kernel version is 4.5 (on Mar 16 > commit 09fd671ccb2475436bd5f597f751ca4a7d177aea). > >

Re: [PATCH] KVM: fix spin_lock_init order on x86

2016-03-21 Thread Eric Dumazet
On Mon, 2016-03-21 at 10:24 +0100, Paolo Bonzini wrote: > kvm_arch_init_vm is now using mmu_lock, causing lockdep to > complain: ... > > Reported-by: Borislav Petkov > Signed-off-by: Paolo Bonzini > --- > virt/kvm/kvm_main.c | 20 ++-- > 1 file changed, 10 insertions(+), 10 dele

Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-20 Thread Eric Dumazet
On Sun, 2016-03-20 at 11:43 +0100, Geert Uytterhoeven wrote: > If CONFIG_PM=n: > > drivers/net/usb/lan78xx.c: In function ‘lan78xx_get_stats64’: > drivers/net/usb/lan78xx.c:3274: error: ‘struct dev_pm_info’ has no member > named ‘runtime_auto’ > > If PM is disabled, the runtime_auto flag

Re: [PATCH] af_unix: closed SOCK_SEQPACKET socketpair must get SIGPIPE

2016-03-15 Thread Eric Dumazet
On Tue, 2016-03-15 at 10:03 +0100, Alexander Potapenko wrote: > According to IEEE Std 1003.1, 2013, sending data to a SOCK_SEQPACKET > socketpair with MSG_NOSIGNAL flag set must result in a SIGPIPE if the > socket is no longer connected. I find this sentence slightly confusing ? If MSG_NOSIGNAL i

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 11:13 -0800, Peter Hurley wrote: > On 02/29/2016 07:27 AM, Eric Dumazet wrote: > > On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > > > >> The reason why Eric's change is so effective for Eric's workload is > >> that it f

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote: > While I appreciate the attempt, that's not the problem. > > Just to be clear > > if (time_before(jiffies, end) && !need_resched() && > --max_restart) > goto restart; > > aborts softir

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:58 -0800, Peter Hurley wrote: > All that's happened is the first loop of NET_RX softirq has woken a > process; that is sufficient to abort softirq and defer it for ksoftirqd. > > That's why I'm saying this is a priority inversion, and one that > will happen a lot. Sure.

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote: > The current kernel is HZ=250 but this would occur on HZ=1000 as well. Right. But the problem with HZ=100 and HZ=250 is that the detection can happens because jiffy granularity is too coarse, since msecs_to_jiffies(2) -> 1 Following pat

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > The reason why Eric's change is so effective for Eric's workload is > that it fixes the problem where NET_RX keeps getting new network packets > so it keeps looping, servicing more NET_RX softirq. You have very little idea of what is happe

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > Not the case. The softirq is raised from interrupt. > > Before Eric's change, when an interrupt raises a new softirq > while processing another softirq, the new softirq is immediately > processed *after the existing softirq completes*. >

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-27 Thread Eric Dumazet
On sam., 2016-02-27 at 18:10 -0800, Peter Hurley wrote: > On 02/27/2016 05:59 PM, Eric Dumazet wrote: > > On sam., 2016-02-27 at 15:33 -0800, Peter Hurley wrote: > >> On 02/27/2016 03:04 PM, David Miller wrote: > >>> From: Peter Hurley > >>> Date: Sat, 2

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-27 Thread Eric Dumazet
On sam., 2016-02-27 at 15:33 -0800, Peter Hurley wrote: > On 02/27/2016 03:04 PM, David Miller wrote: > > From: Peter Hurley > > Date: Sat, 27 Feb 2016 12:29:39 -0800 > > > >> Not really. softirq raised from interrupt context will always execute > >> on this cpu and not in ksoftirqd, unless load

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-27 Thread Eric Dumazet
at > the extreme, and what it caught was a priority inversion stemming from > your commit: > >commit c10d73671ad30f54692f7f69f0e09e75d3a8926a >Author: Eric Dumazet >Date: Thu Jan 10 15:26:34 2013 -0800 > >softirq: reduce latencies > >In va

Re: [PATCH] ipv6: Annotate change of locking mechanism for np->opt

2016-02-18 Thread Eric Dumazet
s has no > effect at runtime. > > Signed-off-by: Benjamin Poirier > --- LGTM Acked-by: Eric Dumazet

Re: [PATCH 1/1] af_packet: Raw socket destruction warning fix

2016-02-10 Thread Eric Dumazet
On Wed, 2016-02-10 at 17:35 +0530, Maninder Singh wrote: > This Patch fixes below warning:- > WARNING: at net/packet/af_packet.c:xxx packet_sock_destruct > > There is following race between packet_rcv and packet_close > which keeps unfree packet in receive queue. > > CPU 1

Re: [PATCH] af_packet: Raw socket destruction warning fix

2016-02-10 Thread Eric Dumazet
On Wed, 2016-02-10 at 12:43 +, Vaneet Narang wrote: > Hi, > > >What driver are you using (is that in-tree)? Can you reproduce the same issue > >with a latest -net kernel, for example (or, a 'reasonably' recent one like > >4.3 or > >4.4)? There has been quite a bit of changes in err queue hand

Re: [PATCH net-next iproute2] iplink: display rx nohandler stats

2016-02-09 Thread Eric Dumazet
On Tue, 2016-02-09 at 17:41 -0800, Stephen Hemminger wrote: > On Tue, 9 Feb 2016 18:51:35 -0500 > Jarod Wilson wrote: > > > On Tue, Feb 09, 2016 at 11:17:57AM -0800, Stephen Hemminger wrote: > > > Support for the new rx_nohandler statistic. > > > This code is designed to handle the case where the

Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter

2016-02-08 Thread Eric Dumazet
On Mon, 2016-02-08 at 11:38 -0800, Stephen Hemminger wrote: > The iproute2 command can be fixed, but adding dependency on size of response > gets gross fast. Imagine when 4 more fields get added, this doesn't scale > well. Really ? I see no problem at all doing the proper tests. > > Also, the

Re: [V4.4-rc6 Regression] af_unix: Revert 'lock_interruptible' in stream receive code

2016-02-07 Thread Eric Dumazet
On Sun, 2016-02-07 at 22:24 +, Rainer Weikusat wrote: > Rainer Weikusat writes: > > [...] > > > The start uses that to record an error which might need to be > > reported, the return statement uses it to indicate that an error has > > occurred. Hence, some kind of in-between translation must

Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter

2016-02-07 Thread Eric Dumazet
On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote: > > Why was this userspace ABI change allowed? > > The stats structure is exposed to user space via netlink > > and changing the size of responses will break iproute2 commands. I do not think it breaks anything. iproute2 always assumed kerne

Re: [V4.4-rc6 Regression] af_unix: Revert 'lock_interruptible' in stream receive code

2016-02-05 Thread Eric Dumazet
On Fri, 2016-02-05 at 21:44 +, Rainer Weikusat wrote: > The present unix_stream_read_generic contains various code sequences of > the form > > err = -EDISASTER; > if () > goto out; > > This has the unfortunate side effect of possibly causing the error code > to bleed through to the fina

Re: [PATCH] af_packet: Raw socket destruction warning fix

2016-02-05 Thread Eric Dumazet
On Mon, 2016-01-18 at 10:44 +0100, Daniel Borkmann wrote: > On 01/18/2016 07:37 AM, Maninder Singh wrote: > > Receieve queue is not purged when socket dectruction is called > > results in kernel warning because of non zero sk_rmem_alloc. > > > > WARNING: at net/packet/af_packet.c:1142 packet_sock_d

Re: net: memory leak in ip_cmsg_send

2016-02-04 Thread Eric Dumazet
On Thu, 2016-02-04 at 10:47 +0100, Dmitry Vyukov wrote: > Hello, > > I've hit the following memory leak while running syzkaller fuzzer: > > unreferenced object 0x88002ea39708 (size 64): > comm "syz-executor", pid 19887, jiffies 4295848369 (age 8.676s) > hex dump (first 32 bytes): > 00

Re: [PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Eric Dumazet
On Wed, 2016-02-03 at 10:24 -0800, Alexander Duyck wrote: > If this is only meant to be a performance modification and is only > really targeted at TCP TSO/GRO then all I ask is that we use a name > like tcp_max_gso_frags and relocate the sysctl to the TCP section. > Otherwise if we are actually g

Re: [PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Eric Dumazet
On Wed, 2016-02-03 at 09:43 -0800, Alexander Duyck wrote: > Read the history. I still say it is best if we don't accept a partial > solution. If we are going to introduce the sysctl as a core item it > should function as a core item and not as something that belongs to > TCP only. But this pat

Re: [PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Eric Dumazet
On Wed, 2016-02-03 at 07:58 -0800, Alexander Duyck wrote: > > +++ b/net/core/sysctl_net_core.c > > I really don't think these changes belong in the core. Below you only > modify the TCP code path so this more likely belongs in the TCP path > unless you are going to guarantee that all other code pa

Re: [PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Eric Dumazet
mplement the linearisation anyway because > of virtualisation. Sure. We use a similar patch here at Google, since bnx2x has in some cases a limit of 13 frags per skb. This driver calls linearize which can fail under memory fragmentation. TCP usually retransmits, so only effect of failures is extra latencies. I am actually okay with this patch. Acked-by: Eric Dumazet

Re: [PATCH] Optimize int_sqrt for small values for faster idle

2016-02-02 Thread Eric Dumazet
On Tue, 2016-02-02 at 21:46 +0100, Rasmus Villemoes wrote: > On Tue, Feb 02 2016, Eric Dumazet wrote: > > > On Tue, 2016-02-02 at 00:08 +0100, Rasmus Villemoes wrote: > > > >> Thanks. (Is there a good way to tell gcc that avg*avg is actually a > >> 32x32-&g

Re: [PATCH] Optimize int_sqrt for small values for faster idle

2016-02-01 Thread Eric Dumazet
On Tue, 2016-02-02 at 00:08 +0100, Rasmus Villemoes wrote: > Thanks. (Is there a good way to tell gcc that avg*avg is actually a > 32x32->64 multiplication?) If avg is 32bit, compiler does that for you. u32 avg = ... u64 result = (u64)avg * avg;

Re: [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy

2016-02-01 Thread Eric Dumazet
On Mon, 2016-02-01 at 14:21 -0500, r...@redhat.com wrote: > From: Rik van Riel > > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN > +static bool vtime_jiffies_changed(struct task_struct *tsk, unsigned long now) > +{ > + if (tsk->vtime_jiffies == jiffies) > + return false; > + > + tsk

Re: [PATCH V3] netfilter: h323: avoid potential attack

2016-02-01 Thread Eric Dumazet
t_h2x5_addr. > > > > Signed-off-by: Zhouyi Zhou > > Signed-off-by: Eric Dumazet > > Reviewed-by: Sergei Shtylyov > > > > --- > > net/netfilter/nf_conntrack_h323_main.c | 13 + > > 1 file changed, 13 insertions(+) > > > >

Re: [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64

2016-01-30 Thread Eric Dumazet
k_stats64 without also extending net_device_stats. Relax > the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and > zero out all the stat counters that aren't present in net_device_stats. > > CC: Eric Dumazet > CC: net...@vger.kernel.org > Signed-off-by: Jarod

Re: [PATCH 3/4] netfilter: ipv4: use preferred kernel types

2016-01-30 Thread Eric Dumazet
On Sat, 2016-01-30 at 12:05 -0200, Lucas Tanure wrote: > On Sat, Jan 30, 2016 at 11:45 AM, Patrick McHardy wrote: > > On 30.01, Lucas Tanure wrote: > >> As suggested by checkpatch.pl: > >> CHECK: Prefer kernel type 'uX' over 'uintX_t' > > > > You might have noticed we have literally hundreds of th

Re: [PATCH] Optimize int_sqrt for small values for faster idle

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 13:42 -0800, Andi Kleen wrote: > From: Andi Kleen > > The menu cpuidle governor does at least two int_sqrt() each time > we go into idle in get_typical_interval to compute stddev > > int_sqrts take 100-120 cycles each. Short idle latency is important > for many workloads. >

Re: [PATCH net 0/4] net: add rx_unhandled stat counter

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 06:42 -0800, Eric Dumazet wrote: > > Sure, you also can set stats64->rx_unhandled to 0 here, just to be 100% > safe. And not add the memset(stats64, 0, sizeof(*stats64)), since we have the guarantee to properly init whole stats64 structure.

Re: [PATCH net 0/4] net: add rx_unhandled stat counter

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 06:44 -0800, Eric Dumazet wrote: > On Thu, 2016-01-28 at 06:42 -0800, Eric Dumazet wrote: > > > > > Sure, you also can set stats64->rx_unhandled to 0 here, just to be 100% > > safe. > > And not add the memset(stats64, 0, sizeof(*stats64)),

Re: [PATCH net 0/4] net: add rx_unhandled stat counter

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 09:38 -0500, Jarod Wilson wrote: > Something like this then: > > diff --git a/net/core/dev.c b/net/core/dev.c > index 82334c6..2ca3eab 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -7262,15 +7262,16 @@ void netdev_run_todo(void) > void netdev_stats_to_stats64(st

Re: Re: [PATCH V2] netfilter: h323: avoid potential attack

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 06:00 -0800, Eric Dumazet wrote: > On Thu, 2016-01-28 at 21:14 +0800, Zhouyi Zhou wrote: > > > My patch is intend to prevent kernel panic, to prevent reading garbage > > or read data from a prior frame and leak secrets, the prototypes of the > > get

Re: Re: [PATCH V2] netfilter: h323: avoid potential attack

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 21:14 +0800, Zhouyi Zhou wrote: > My patch is intend to prevent kernel panic, to prevent reading garbage > or read data from a prior frame and leak secrets, the prototypes of the > get_h2x5_addr functions and the functions that call get_h2x5_addr should > be changed, should

Re: [PATCH net 0/4] net: add rx_unhandled stat counter

2016-01-28 Thread Eric Dumazet
On Thu, 2016-01-28 at 01:02 -0500, Jarod Wilson wrote: > Outside of that, does this approach look sane? Should I > bother with touching /proc/net/dev output or not? Please do not touch /proc/net/dev This is legacy stuff and really should not be touched anymore.

<    8   9   10   11   12   13   14   15   16   17   >