Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-07 Thread Theodore Ts'o
On Wed, Oct 07, 2015 at 08:32:16AM +0100, Linus Torvalds wrote: > And none of *those* requirements change just because "copied" would be > zero. If you avoid zeroing the buffers and marking them dirty, nothing > will ever initialize them on disk, andn if the prefault then later > fails during retry

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-07 Thread Linus Torvalds
On Wed, Oct 7, 2015 at 4:34 AM, Theodore Ts'o wrote: > > So your patch looks good, but in addition to that, if copied is > 0 > and less than len, we shouldn't be calling page_zero_new_buffers(). > We're going to need our own version of it that doesn't call > mark_buffer_dirty(). Well, even with c

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Theodore Ts'o
On Mon, Oct 05, 2015 at 11:04:35AM -0700, Dave Hansen wrote: > > The warning comes out of ext4_walk_page_buffers() and the dirty state > comes from page_zero_new_buffers(). That seems a _bit_ goofy that the > filesystem is marking the page dirty and then so shortly warning about it. Yes, this is

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Ingo Molnar
* Linus Torvalds wrote: > On Tue, Oct 6, 2015 at 10:27 AM, Ingo Molnar wrote: > > > >> > >> We really should try get rid of _all_ uses of the "__" versions unless > >> they are > >> very locally and obviously checked with access_ok(). We've had way too many > >> cases where people thought they

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Linus Torvalds
On Tue, Oct 6, 2015 at 10:27 AM, Ingo Molnar wrote: > >> >> We really should try get rid of _all_ uses of the "__" versions unless they >> are >> very locally and obviously checked with access_ok(). We've had way too many >> cases where people thought they were clever, and weren't really. > > Tha

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Ingo Molnar
* Linus Torvalds wrote: > On Tue, Oct 6, 2015 at 8:56 AM, Ingo Molnar wrote: > > > > Yes, but note that those interfaces are x86 only at the moment, so they'd > > have > > to be factored out and generalized before we can use it in generic code. > > ARM64 these days (as a part of ARM8.1) has

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Linus Torvalds
On Tue, Oct 6, 2015 at 8:56 AM, Ingo Molnar wrote: > > Yes, but note that those interfaces are x86 only at the moment, so they'd > have to > be factored out and generalized before we can use it in generic code. ARM64 these days (as a part of ARM8.1) has "Privileged Access Never", which is their

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Linus Torvalds
On Tue, Oct 6, 2015 at 12:33 AM, Dave Hansen wrote: > > Did you mean that as a cleanup, or to fix the regression? Purely as a cleanup to try to avoid the (already existing) special case in at least ext4 - and possibly others. So that patch was meant just for discussion - it's not really fixing a

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-06 Thread Ingo Molnar
* H. Peter Anvin wrote: > On October 5, 2015 1:22:44 PM PDT, Linus Torvalds > wrote: > > > > We could probably do them outside the loop, rather than tightly around the > > actual move instructions. Peter (hpa), is there some sane interface to try > > to > > do that? > > There are... in par

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
> On Mon, Oct 5, 2015 at 10:18 PM, Linus Torvalds > wrote: >> >> Your ext4 patch may well fix the issue, and be the right thing to do >> (_regardless_ of the revert, in fact - while it might make the revert >> unnecessary, it might also be a good idea even if we do revert). > > Thinking a bit mor

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Linus Torvalds
On Mon, Oct 5, 2015 at 10:18 PM, Linus Torvalds wrote: > > Your ext4 patch may well fix the issue, and be the right thing to do > (_regardless_ of the revert, in fact - while it might make the revert > unnecessary, it might also be a good idea even if we do revert). Thinking a bit more about your

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Linus Torvalds
On Mon, Oct 5, 2015 at 9:48 PM, Dave Hansen wrote: > > Although I was probably wrong about the source of the overhead, the > point still remains that the prefaulting is eating cycles for no > practical benefit. Yeah, no, I'm not disagreeing with that part, I'm just more of a "at this point in the

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread H. Peter Anvin
On October 5, 2015 1:22:44 PM PDT, Linus Torvalds wrote: >We could probably do them outside the loop, rather than tightly around >the actual move instructions. Peter (hpa), is there some sane >interface to try to do that? There are... in particular the get_user_try/catch mechanism, used among ot

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
On 10/05/2015 01:22 PM, Linus Torvalds wrote: > On Mon, Oct 5, 2015 at 5:23 PM, Dave Hansen > wrote: >> One thing I've been noticing on Skylake is that barriers (implicit and >> explicit) are showing up more in profiles. > > Ahh, you're on skylake? Yup. > It's entirely possible that the issue

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Linus Torvalds
On Mon, Oct 5, 2015 at 5:23 PM, Dave Hansen wrote: > > One thing I've been noticing on Skylake is that barriers (implicit and > explicit) are showing up more in profiles. Ahh, you're on skylake? It's entirely possible that the issue is that the whole "stac/mov/clac" is much more expensive becaus

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
I managed to catch the condition in an ftrace. Full spew is below. We can see that the iov_iter_copy_from_user_atomic() "failed" and ended up with a copied=0 which we can see in the ext4_journalled_write_end() tracepoint as "copied 0". So we're in this code with copied=0 and len=4096: > static

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
On 10/05/2015 08:58 AM, Linus Torvalds wrote: ... > Dave, mind sharing the micro-benchmark or perhaps even just a kernel > profile of it? How is that "iov_iter_fault_in_readable()" so > noticeable? It really shouldn't be a big deal. The micro was just plugging this test: https://www.sr71.

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
On 10/05/2015 08:22 AM, Theodore Ts'o wrote: ... > I've bisected it down to commit 998ef75ddb: "fs: do not prefault > sys_write() user buffer pages". I've confirmed that 4.3-rc2 fails as > detailed below, but with 998ef75ddb reverted, the problem goes away. ... > Before commit 998ef75ddb, if we ne

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Linus Torvalds
On Mon, Oct 5, 2015 at 4:22 PM, Theodore Ts'o wrote: > > What I think is going on is that when we do attempt the copy, we end > up marking the page dirty before we notice that we need to page fault > in the page, which ends up triggering the warning that jbd2 > buffer_head that is supposed to be j

[REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Theodore Ts'o
I've been tracking down a test failure in xfstests generic/208 in the data_journal configuration, which I've been testing using: gce-xfstests -c data_journal -C 20 generic/208 I've bisected it down to commit 998ef75ddb: "fs: do not prefault sys_write() user buffer pages". I've confirmed