Re: slab corruption with current -git

2016-10-13 Thread Al Viro
On Thu, Oct 13, 2016 at 12:49:33PM -0700, Linus Torvalds wrote: > That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the > loop that initializes things: > > for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0; > hook_mask >>= 1, ++hooknum) { > > and it

Re: slab corruption with current -git

2016-10-13 Thread Al Viro
On Thu, Oct 13, 2016 at 12:49:33PM -0700, Linus Torvalds wrote: > That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the > loop that initializes things: > > for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0; > hook_mask >>= 1, ++hooknum) { > > and it

Re: slab corruption with current -git

2016-10-13 Thread Florian Westphal
Linus Torvalds wrote: > On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf > wrote: > > > > Yeah. > > > > 105 entry->orig_ops = reg; > > 106 entry->ops = *reg; > > 107 entry->next = NULL; > > So

Re: slab corruption with current -git

2016-10-13 Thread Florian Westphal
Linus Torvalds wrote: > On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf > wrote: > > > > Yeah. > > > > 105 entry->orig_ops = reg; > > 106 entry->ops = *reg; > > 107 entry->next = NULL; > > So ipt_register_table() does: > > ret =

Re: slab corruption with current -git

2016-10-13 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf wrote: > > Yeah. > > 105 entry->orig_ops = reg; > 106 entry->ops = *reg; > 107 entry->next = NULL; So ipt_register_table() does: ret = nf_register_net_hooks(net, ops,

Re: slab corruption with current -git

2016-10-13 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf wrote: > > Yeah. > > 105 entry->orig_ops = reg; > 106 entry->ops = *reg; > 107 entry->next = NULL; So ipt_register_table() does: ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.12 at 23:18 -0700, Linus Torvalds wrote: > On Oct 12, 2016 23:07, "Markus Trippelsdorf" wrote: > > > > This is nf_register_net_hook at net/netfilter/core.c:106 > > The "*regs" access? Yeah. 105 entry->orig_ops = reg; 106 entry->ops = *reg;

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.12 at 23:18 -0700, Linus Torvalds wrote: > On Oct 12, 2016 23:07, "Markus Trippelsdorf" wrote: > > > > This is nf_register_net_hook at net/netfilter/core.c:106 > > The "*regs" access? Yeah. 105 entry->orig_ops = reg; 106 entry->ops = *reg; 107 entry->next

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.13 at 08:02 +0200, Markus Trippelsdorf wrote: > On 2016.10.11 at 04:57 -0400, David Miller wrote: > > From: Linus Torvalds > > Date: Mon, 10 Oct 2016 22:47:50 -0700 > > > > > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds > > >

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.13 at 08:02 +0200, Markus Trippelsdorf wrote: > On 2016.10.11 at 04:57 -0400, David Miller wrote: > > From: Linus Torvalds > > Date: Mon, 10 Oct 2016 22:47:50 -0700 > > > > > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds > > > wrote: > > >> > > >> I guess I will have to

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.11 at 04:57 -0400, David Miller wrote: > From: Linus Torvalds > Date: Mon, 10 Oct 2016 22:47:50 -0700 > > > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds > > wrote: > >> > >> I guess I will have to double-check that the

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.11 at 04:57 -0400, David Miller wrote: > From: Linus Torvalds > Date: Mon, 10 Oct 2016 22:47:50 -0700 > > > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds > > wrote: > >> > >> I guess I will have to double-check that the slub corruption is gone > >> still with that fixed. > > > >

Re: slab corruption with current -git

2016-10-11 Thread Aaron Conole
Michal Kubecek writes: > On Mon, Oct 10, 2016 at 04:24:01AM -0400, David Miller wrote: >> From: David Miller >> Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT) >> >> This means that the netns is possibly getting freed up before we >> unregister the netfilter

Re: slab corruption with current -git

2016-10-11 Thread Aaron Conole
Michal Kubecek writes: > On Mon, Oct 10, 2016 at 04:24:01AM -0400, David Miller wrote: >> From: David Miller >> Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT) >> >> This means that the netns is possibly getting freed up before we >> unregister the netfilter hooks. > > Sounds a bit like the issue

Re: slab corruption with current -git

2016-10-11 Thread Michal Kubecek
On Mon, Oct 10, 2016 at 04:24:01AM -0400, David Miller wrote: > From: David Miller > Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT) > > This means that the netns is possibly getting freed up before we > unregister the netfilter hooks. Sounds a bit like the issue discussed

Re: slab corruption with current -git

2016-10-11 Thread Michal Kubecek
On Mon, Oct 10, 2016 at 04:24:01AM -0400, David Miller wrote: > From: David Miller > Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT) > > This means that the netns is possibly getting freed up before we > unregister the netfilter hooks. Sounds a bit like the issue discussed here:

Re: slab corruption with current -git

2016-10-11 Thread David Miller
From: Linus Torvalds Date: Mon, 10 Oct 2016 22:47:50 -0700 > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds > wrote: >> >> I guess I will have to double-check that the slub corruption is gone >> still with that fixed. > > So I'm

Re: slab corruption with current -git

2016-10-11 Thread David Miller
From: Linus Torvalds Date: Mon, 10 Oct 2016 22:47:50 -0700 > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds > wrote: >> >> I guess I will have to double-check that the slub corruption is gone >> still with that fixed. > > So I'm not getting any warnings now from SLUB debugging. So the >

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds wrote: > > I guess I will have to double-check that the slub corruption is gone > still with that fixed. So I'm not getting any warnings now from SLUB debugging. So the original bug seems to not have re-surfaced, and

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds wrote: > > I guess I will have to double-check that the slub corruption is gone > still with that fixed. So I'm not getting any warnings now from SLUB debugging. So the original bug seems to not have re-surfaced, and the registration bug is gone,

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 8:41 PM, Linus Torvalds wrote: > This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this. > > I repeat: it's ENTIRELY UNTESTED. Gaah. That patch was subtle garbage. The "add to list" thing did this:

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 8:41 PM, Linus Torvalds wrote: > This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this. > > I repeat: it's ENTIRELY UNTESTED. Gaah. That patch was subtle garbage. The "add to list" thing did this: rcu_assign_pointer(entry->next, p);

Re: slab corruption with current -git

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 5:30 PM, David Miller wrote: > > Linus can you add some extra info to that: Sure. I made it a WARN_ON_ONCE(), but then always just printed the pf/hooknum. It's all over the map: reg->pf=2 and reg->hooknum=4 reg->pf=2 and reg->hooknum=2 reg->pf=2

Re: slab corruption with current -git

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 5:30 PM, David Miller wrote: > > Linus can you add some extra info to that: Sure. I made it a WARN_ON_ONCE(), but then always just printed the pf/hooknum. It's all over the map: reg->pf=2 and reg->hooknum=4 reg->pf=2 and reg->hooknum=2 reg->pf=2 and reg->hooknum=3

Re: slab corruption with current -git

2016-10-10 Thread David Miller
From: Linus Torvalds Date: Mon, 10 Oct 2016 12:05:17 -0700 > David - I think that also explains what was wrong with the old code. > In the old code, this loop: > > while (hooks_entry && nf_entry_dereference(hooks_entry->next)) { > > would exit with

Re: slab corruption with current -git

2016-10-10 Thread David Miller
From: Linus Torvalds Date: Mon, 10 Oct 2016 12:05:17 -0700 > David - I think that also explains what was wrong with the old code. > In the old code, this loop: > > while (hooks_entry && nf_entry_dereference(hooks_entry->next)) { > > would exit with "hooks_entry" pointing to the last

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Aaron Conole
Linus Torvalds writes: > On Mon, Oct 10, 2016 at 9:28 AM, Linus Torvalds > wrote: >> >> So as I already answered to Dave, I'm not actually sure that this was >> the buggy code, or that my patch would make any difference at all. > >

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Aaron Conole
Linus Torvalds writes: > On Mon, Oct 10, 2016 at 9:28 AM, Linus Torvalds > wrote: >> >> So as I already answered to Dave, I'm not actually sure that this was >> the buggy code, or that my patch would make any difference at all. > > My patch does seem to fix things, and in fact the warning about

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 9:28 AM, Linus Torvalds wrote: > > So as I already answered to Dave, I'm not actually sure that this was > the buggy code, or that my patch would make any difference at all. My patch does seem to fix things, and in fact the warning about

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 9:28 AM, Linus Torvalds wrote: > > So as I already answered to Dave, I'm not actually sure that this was > the buggy code, or that my patch would make any difference at all. My patch does seem to fix things, and in fact the warning about "hook not found" now triggers. So

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 6:49 AM, Aaron Conole wrote: > > Okay, I'm looking it over. Sorry for the mess. So as I already answered to Dave, I'm not actually sure that this was the buggy code, or that my patch would make any difference at all. I never got a good reproducer for

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 6:49 AM, Aaron Conole wrote: > > Okay, I'm looking it over. Sorry for the mess. So as I already answered to Dave, I'm not actually sure that this was the buggy code, or that my patch would make any difference at all. I never got a good reproducer for the bug: I spent

Re: slab corruption with current -git

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 1:24 AM, David Miller wrote: > > So I've been reviewing this patch and it looks fine, but I also want > to figure out what is actually causing the OOPS and I can't spot it > yet. Yeah, I'm not actually sure the old linked list implementation is buggy

Re: slab corruption with current -git

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 1:24 AM, David Miller wrote: > > So I've been reviewing this patch and it looks fine, but I also want > to figure out what is actually causing the OOPS and I can't spot it > yet. Yeah, I'm not actually sure the old linked list implementation is buggy - it might just be

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Aaron Conole
Linus Torvalds writes: > On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds > wrote: >> >> There is one *correct* way to remove an entry from a singly linked >> list, and it looks like this: >> >> struct entry **pp, *p; >> >> pp

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Aaron Conole
Linus Torvalds writes: > On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds > wrote: >> >> There is one *correct* way to remove an entry from a singly linked >> list, and it looks like this: >> >> struct entry **pp, *p; >> >> pp = >> while ((p = *pp) != NULL) { >> if

Re: slab corruption with current -git

2016-10-10 Thread David Miller
From: David Miller Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT) > From: Linus Torvalds > Date: Sun, 9 Oct 2016 20:41:17 -0700 > >> Note that the "correct way" of doing list operations also almost >> inevitably is the shortest way by far, since

Re: slab corruption with current -git

2016-10-10 Thread David Miller
From: David Miller Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT) > From: Linus Torvalds > Date: Sun, 9 Oct 2016 20:41:17 -0700 > >> Note that the "correct way" of doing list operations also almost >> inevitably is the shortest way by far, since it gets rid of all the >> special cases. So the

Re: slab corruption with current -git

2016-10-09 Thread David Miller
From: Linus Torvalds Date: Sun, 9 Oct 2016 20:41:17 -0700 > Note that the "correct way" of doing list operations also almost > inevitably is the shortest way by far, since it gets rid of all the > special cases. So the patch looks nice. It gets rid of the magic >

Re: slab corruption with current -git

2016-10-09 Thread David Miller
From: Linus Torvalds Date: Sun, 9 Oct 2016 20:41:17 -0700 > Note that the "correct way" of doing list operations also almost > inevitably is the shortest way by far, since it gets rid of all the > special cases. So the patch looks nice. It gets rid of the magic > "nf_set_hooks_head()" thing too,

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds wrote: > > There is one *correct* way to remove an entry from a singly linked > list, and it looks like this: > > struct entry **pp, *p; > > pp = > while ((p = *pp) != NULL) { > if (right_entry(p))

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds wrote: > > There is one *correct* way to remove an entry from a singly linked > list, and it looks like this: > > struct entry **pp, *p; > > pp = > while ((p = *pp) != NULL) { > if (right_entry(p)) { > *pp = p->next;

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 6:35 PM, Aaron Conole wrote: > > I was just about to build and test something similar: So I haven't actually tested that one, but looking at the code, it really looks very bogus. In fact, that code just looks like crap. It does *not* do a proper "remove

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 6:35 PM, Aaron Conole wrote: > > I was just about to build and test something similar: So I haven't actually tested that one, but looking at the code, it really looks very bogus. In fact, that code just looks like crap. It does *not* do a proper "remove singly linked list

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Aaron Conole
Florian Westphal writes: > Linus Torvalds wrote: >> On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds >> wrote: >> > >> > Anyway, I don't think I can bisect it, but I'll try to narrow it down >> > a *bit* at least. >>

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Aaron Conole
Florian Westphal writes: > Linus Torvalds wrote: >> On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds >> wrote: >> > >> > Anyway, I don't think I can bisect it, but I'll try to narrow it down >> > a *bit* at least. >> > >> > Not doing any more pulls on this unstable base, I've been puttering >>

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Florian Westphal
Linus Torvalds wrote: > On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds > wrote: > > > > Anyway, I don't think I can bisect it, but I'll try to narrow it down > > a *bit* at least. > > > > Not doing any more pulls on this unstable

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Florian Westphal
Linus Torvalds wrote: > On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds > wrote: > > > > Anyway, I don't think I can bisect it, but I'll try to narrow it down > > a *bit* at least. > > > > Not doing any more pulls on this unstable base, I've been puttering > > around in trying to clean up some

slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds wrote: > > Anyway, I don't think I can bisect it, but I'll try to narrow it down > a *bit* at least. > > Not doing any more pulls on this unstable base, I've been puttering > around in trying to clean up some stupid

slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds wrote: > > Anyway, I don't think I can bisect it, but I'll try to narrow it down > a *bit* at least. > > Not doing any more pulls on this unstable base, I've been puttering > around in trying to clean up some stupid printk logging issues > instead.