the block/fs drivers they already do
> > things normal daemons do not to meet that guarantee like mlock their
> > memory, disable oom killer, and preallocate resources they have control
> > over. They have no control over reclaim like the kernel drivers do so
> > its easy for us to deadlock when memory gets low.
>
> OK, fair enough. How much of a control do they really need though. Is a
> single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO
> context) sufficient?
I think some of these usrspace processes work at the filesystem
level and so really only need GFP_NOFS allocation (fuse), while
others work at the block device level (iscsi, nbd) so need GFP_NOIO
allocation. So there's definitely an argument for providing both...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
in include/linux/capability.h...
Otherwise looks fine to me.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
4946074ae1fb9d8f05d] writeback:
Generalize and expose wb_completio
$
Not obvious to me what is wrong with that commit right now, but the
bisect is solid. Kinda surprised to see such significant
fs-writeback changes in 5.4, though, because there was nothing sent
to the -fsdevel list for review in the last dev cycle.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
hey act together years down
the track we can remove the workaround from XFS. Users don't care
how we fix the problem, they just want it fixed. If that means we
have to route around dysfunctional developer groups, then we'll just
have to do that
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Aug 29, 2019 at 09:56:13AM +0200, Vlastimil Babka wrote:
> On 8/29/19 12:24 AM, Dave Chinner wrote:
> > On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
> >> On Wed, Aug 28, 2019 at 06:45:07PM +, Christopher Lameter wrote:
> >>> I stil
ed (e.g. another set of
heap slabs like the -rcl slabs), I just don't want every high level
subsystem that allocates heap memory for IO buffers to have to
implement their own aligned slab caches.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Aug 22, 2019 at 01:03:12AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 22, 2019 at 10:37:45AM +1000, Dave Chinner wrote:
> > > I know Jens disagree, but with the amount of bugs we've been hitting
> > > thangs to slub (and I'm pretty sure we have a more
On Thu, Aug 22, 2019 at 10:50:02AM +0800, Ming Lei wrote:
> On Thu, Aug 22, 2019 at 8:06 AM Christoph Hellwig wrote:
> >
> > On Wed, Aug 21, 2019 at 06:38:20PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner
> > >
> > > Add memory buffer alignm
On Wed, Aug 21, 2019 at 04:29:45PM -0700, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 06:38:20PM +1000, Dave Chinner wrote:
> > From: Dave Chinner
> >
> > Add memory buffer alignment validation checks to bios built in XFS
> > to catch bugs that will result i
On Tue, Aug 20, 2019 at 10:08:38PM +, Verma, Vishal L wrote:
> On Wed, 2019-08-21 at 07:44 +1000, Dave Chinner wrote:
> >
> > However, the case here is that:
> >
> > > > > > i.e. page offset len sector
> > > &g
On Tue, Aug 20, 2019 at 05:24:25PM +0800, Ming Lei wrote:
> On Tue, Aug 20, 2019 at 04:13:26PM +0800, Ming Lei wrote:
> > On Tue, Aug 20, 2019 at 07:53:20AM +0200, h...@lst.de wrote:
> > > On Tue, Aug 20, 2019 at 02:41:35PM +1000, Dave Chinner wrote:
> > > > &g
On Tue, Aug 20, 2019 at 07:53:20AM +0200, h...@lst.de wrote:
> On Tue, Aug 20, 2019 at 02:41:35PM +1000, Dave Chinner wrote:
> > > With the following debug patch. Based on that I think I'll just
> > > formally submit the vmalloc switch as we're at -rc5, and then we
On Tue, Aug 13, 2019 at 01:46:25PM -0400, Johannes Weiner wrote:
> On Sat, Aug 10, 2019 at 08:12:48AM +1000, Dave Chinner wrote:
> > On Thu, Aug 08, 2019 at 03:03:00PM -0400, Johannes Weiner wrote:
> > > psi tracks the time tasks wait for refaulting pages to become
> > >
Any thoughts of how we might be able to integrate more of the system
caches into the PSI infrastructure, Johannes?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
goto error;
> > }
> > dio->size += bio_size;
> > + bio_put(bio);
> >
> > Thoughts ?
> >
>
> That does not work since the reference to dio->size in blkdev_bio_end_io()
> depends on atomic_dec_and_test(&dio->ref) which counts the BIO fragments for
> the
> dio (+1 for async multi-bio case). So completion of the last bio can still
> reference the old value of dio->size.
Didn't we fix this same use-after-free in iomap_dio_rw() in commit
4ea899ead278 ("iomap: fix a use after free in iomap_dio_rw")?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Jul 23, 2019 at 04:19:31PM -0600, Jens Axboe wrote:
> On 7/23/19 4:05 PM, Dave Chinner wrote:
> > On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote:
> >> On 7/23/19 2:07 AM, Stefan Hajnoczi wrote:
> >>> Hi,
> >>> io_uring O_DIRECT writes
On Tue, Jul 23, 2019 at 01:34:50PM -0600, Jens Axboe wrote:
> On 7/23/19 1:04 PM, Johannes Weiner wrote:
> > CCing Jens for bio layer stuff
> >
> > On Tue, Jul 23, 2019 at 10:02:26AM +1000, Dave Chinner wrote:
> >> Even better: If this memstall and "refault&q
full, then
we simply use the existing bdi_congested() interface to check.
That works for all types of block devices - not just random mq
devices - and matches code we have all over the kernel to avoid
blocking async IO submission on congested reuqest queues...
So, yeah, I think REQ_NOWAIT needs to die and the direct IO callers
should do just congestion checks on IOCB_NOWAIT/IOMAP_NOWAIT rather
than try to add new error reporting mechanisms into bios that lots
of code will need to be changed to support
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
and so this does not require
magic pixie dust at the page cache iteration level
e.g. bio_add_page_memstall() can do the working set check and then
set a flag on the bio to say it contains a memstall page. Then on
submission of the bio the memstall condition can be cleared.
Cheers,
-Dave.
--
Dave Chinner
da...@fromorbit.com
the fallocate flags to have /completely/ different
behaviour on block devices to filesystems.
We excel at screwing up APIs, don't we?
I give up, we've piled the shit too high on this one to dig it out
now
-Dave.
--
Dave Chinner
da...@fromorbit.com
oes that description make sense?
>
> The problem is that most vendors implement (3) using (1). But can't make
> it work well because (3) was -- and still is for ATA -- outside the
> scope of what the protocols can express.
>
> And I agree with you that if (3) was implemented correctly in all
> devices, we wouldn't need (1) at all. At least not for devices with an
> internal granularity << total capacity.
What I'm saying is that we should be pushing standards to ensure (3)
is correctly standardise, certified and implemented because that is
what the "Linux OS" requires from future hardware.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
es not return zeroes on
subsequent reads? i.e. it is effectively
fallocate(FALLOC_FL_NO_HIDE_STALE) preallocation semantics?
For many use cases cases we actually want zeroed space to be
guaranteed so we don't expose stale data from previous device use
into the new user's visibility - can that be done with WRITE_SAME
and the ANCHOR flag?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, May 07, 2019 at 08:07:53PM -0400, Ric Wheeler wrote:
> On 5/7/19 6:04 PM, Dave Chinner wrote:
> > On Mon, May 06, 2019 at 04:56:44PM -0400, Ric Wheeler wrote:
> > > (repost without the html spam, sorry!)
> > >
> > > Last week at LSF/MM, I suggested we
we just deprecate blkdev_issue_discard and all the
interfaces that lead to it as a first step?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Apr 26, 2019 at 11:20:45AM -0400, Jerome Glisse wrote:
> On Fri, Apr 26, 2019 at 04:28:16PM +1000, Dave Chinner wrote:
> > On Thu, Apr 25, 2019 at 09:38:14PM -0400, Jerome Glisse wrote:
> > > I see that they are still empty spot in LSF/MM schedule so i would like to
>
that complains if the resulting configuration violates whichever
> alignment and other assumptions we end up baking into this.
Makes sense to me. If we can ensure that the alignment requirements
for the stack are communicated in the existing geometry infoi
correctly (i.e. io_min, io_opt) then we've pretty much already got
everything we need in place. We might need a few mkfs tweaks to
ensure the log is placed correctly, log write padding is set
appropriately, and check that inode clusters are appropriately
aligned (I think they are already) but otherwise I think we will be
good here...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
er layers of the storage being less than perfect.
IOWs, the filesystem doesn't expect hard "always correct" guarantees
from the storage layers - we always have to assume IO failures will
occur because they do, even with T10 PI. Hence it makes no sense to
for an automatic retry-and-recovery infrastructure for filesystems
to require hard guarantees that the block device will always return
good data. Automatic repair doesn't guarantee the storage is free
from errors - it just provides a mechanism to detect errors and
perform optimistic, best effort recovery at the lowest possible
layer in the stack as early as possible.
Cheers,
dave.
--
Dave Chinner
da...@fromorbit.com
that
happens here is we give up the capability for automatic block device
recovery and repair of damaged copies, which we can't do right now,
so it's essentially status quo...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
reached the end of the file as
directed?
So perhaps the caller should be waiting on a specific range to bound
the wait (e.g. isize as the end of the wait) rather than using the
default "keep going until the end of file is reached" semantics?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Feb 28, 2019 at 04:28:53PM -0700, Andreas Dilger wrote:
> On Feb 28, 2019, at 7:22 AM, Bob Liu wrote:
> >
> > On 2/19/19 5:31 AM, Dave Chinner wrote:
> >> On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
> >>> Motivation:
> >>> When
On Sun, Mar 03, 2019 at 10:37:59AM +0800, Bob Liu wrote:
> On 3/1/19 5:49 AM, Dave Chinner wrote:
> > On Thu, Feb 28, 2019 at 10:22:02PM +0800, Bob Liu wrote:
> >> On 2/19/19 5:31 AM, Dave Chinner wrote:
> >>> On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
On Thu, Feb 28, 2019 at 10:22:02PM +0800, Bob Liu wrote:
> On 2/19/19 5:31 AM, Dave Chinner wrote:
> > On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
> >> Motivation:
> >> When fs data/metadata checksum mismatch, lower block devices may have other
>
ny, many years - the same issue that occurs with loop
devices when you try to mount a 512 byte sector image on a hard 4k
sector host filesystem/storage device using direct IO in the loop
device. This isn't a new thing at all - if you want to use direct IO
to manipulate filesystem images, you actually need to know what you
are doing
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Feb 27, 2019 at 09:50:55AM +0800, Ming Lei wrote:
> On Wed, Feb 27, 2019 at 07:45:50AM +1100, Dave Chinner wrote:
> > On Tue, Feb 26, 2019 at 05:33:04PM +0800, Ming Lei wrote:
> > > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote:
> > > > On M
On Tue, Feb 26, 2019 at 05:33:04PM +0800, Ming Lei wrote:
> On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote:
> > On Mon, Feb 25, 2019 at 07:27:37PM -0800, Matthew Wilcox wrote:
> > > On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote:
> > > >
On Mon, Feb 25, 2019 at 07:27:37PM -0800, Matthew Wilcox wrote:
> On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote:
> > > Or what is the exact size of sub-page IO in xfs most of time? For
> >
> > Determined by mkfs parameters. Any power of 2 between 512 bytes
On Tue, Feb 26, 2019 at 10:22:50AM +0800, Ming Lei wrote:
> On Tue, Feb 26, 2019 at 07:26:30AM +1100, Dave Chinner wrote:
> > On Mon, Feb 25, 2019 at 02:15:59PM +0100, Vlastimil Babka wrote:
> > > On 2/25/19 5:36 AM, Dave Chinner wrote:
> > > > On Mon, Feb 25, 2019
On Mon, Feb 25, 2019 at 02:15:59PM +0100, Vlastimil Babka wrote:
> On 2/25/19 5:36 AM, Dave Chinner wrote:
> > On Mon, Feb 25, 2019 at 12:09:04PM +0800, Ming Lei wrote:
> >> XFS uses kmalloc() to allocate sector sized IO buffer.
> >
> >> Use page_frag_all
On Mon, Feb 25, 2019 at 04:46:25PM +0800, Ming Lei wrote:
> On Mon, Feb 25, 2019 at 03:36:48PM +1100, Dave Chinner wrote:
> > On Mon, Feb 25, 2019 at 12:09:04PM +0800, Ming Lei wrote:
> > > XFS uses kmalloc() to allocate sector sized IO buffer.
> >
> > > Use
y single metadata allocation is a sub-page
allocation and so will use this new page frag mechanism. IOWs, it
will result in fragmenting memory severely and typical memory
reclaim not being able to fix it because the metadata that pins each
page is largely unreclaimable...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
4k, and
> disk size is
> 1024,
XFS won't do that, either - it checks at mount time if it can read
the very last sector of the filesystem via uncached IO (see
xfs_check_sizes() and xfs_rtmount_init()). If any of the EOD reads
fail, it won't mount.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ent if you write
> or discard at the smaller granularity.
Filesystems discard extents these days, not individual blocks. If
you free a 1MB file, they you are likely to get a 1MB discard. Or if
you use fstrim, then it's free space extent sizes (on XFS can be
hundred of GBs) and small fre
On Mon, Feb 18, 2019 at 06:15:34PM -0800, Jane Chu wrote:
> On 2/15/2019 9:39 PM, Dave Chinner wrote:
>
> >On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote:
> >>On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote:
> >>>(This is a jo
On Mon, Feb 18, 2019 at 06:55:20PM -0800, Darrick J. Wong wrote:
> On Tue, Feb 19, 2019 at 08:31:50AM +1100, Dave Chinner wrote:
> > On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
> > > Motivation:
> > > When fs data/metadata checksum mismatch, lower block de
her layer
requirements. The only difference from a caller point of view should
be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func);
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sun, Feb 17, 2019 at 06:42:59PM -0500, Ric Wheeler wrote:
> On 2/17/19 4:09 PM, Dave Chinner wrote:
> >On Sun, Feb 17, 2019 at 03:36:10PM -0500, Ric Wheeler wrote:
> >>One proposal for btrfs was that we should look at getting discard
> >>out of the synchronous pa
st of the various discard
> commands - how painful is it for modern SSD's?
AIUI, it still depends on the SSD implementation, unfortunately.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sat, Feb 16, 2019 at 09:05:31AM -0800, Dan Williams wrote:
> On Fri, Feb 15, 2019 at 9:40 PM Dave Chinner wrote:
> >
> > On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote:
> > > On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote:
> >
On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote:
> On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote:
> > (This is a joint proposal with Hannes Reinecke)
> >
> > Servers with NV-DIMM are slowly emerging in data centers but one key feature
> &g
all the metadata goes to the software
raided pmem block devices that aren't DAX capable.
Problem already solved, yes?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Feb 05, 2019 at 12:50:48PM +0100, Jan Kara wrote:
> On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> >
> > On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > > On Mon
gt; > > >
> > > > >
> > > > >
> > > > > Changes since v1:
> > > > >
> > > > > - introduce four more hints for in-kernel us
On Wed, Jan 16, 2019 at 03:21:21PM -0700, Jens Axboe wrote:
> On 1/16/19 3:09 PM, Dave Chinner wrote:
> > On Wed, Jan 16, 2019 at 02:20:53PM -0700, Jens Axboe wrote:
> >> On 1/16/19 1:53 PM, Dave Chinner wrote:
> >> I'd be fine with that restriction, especially si
On Wed, Jan 16, 2019 at 02:20:53PM -0700, Jens Axboe wrote:
> On 1/16/19 1:53 PM, Dave Chinner wrote:
> > On Wed, Jan 16, 2019 at 10:50:00AM -0700, Jens Axboe wrote:
> >> If we have fixed user buffers, we can map them into the kernel when we
> >> setup the io_context.
DAX because the above problems are
actually a user-after-free of storage space, not just a dangling
page reference that can be cleaned up after the gup pin is dropped.
Perhaps, at least until we solve the GUP problems w.r.t. file backed
pages and/or add and require file layout leases for these reference,
we should error out if the user buffer pages are file-backed
mappings?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e writes are on stable
storage. They *may* be on stable storage if the timing is right, but
it is not guaranteed by the OS code. Likewise, flush 2 only
guarantees writes 1, 3 and 5 are on stable storage becase they are
the only writes that have been signalled as complete when flush 2
was submitted.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Nov 27, 2018 at 11:37:22PM -0800, Christoph Hellwig wrote:
> On Wed, Nov 28, 2018 at 04:33:03PM +1100, Dave Chinner wrote:
> > One thing that is going to make this more complex at the XFS layer
> > is discontiguous buffers. They require multiple IOs (and therefore
> >
On Tue, Nov 27, 2018 at 09:49:23PM -0800, Darrick J. Wong wrote:
> On Wed, Nov 28, 2018 at 04:33:03PM +1100, Dave Chinner wrote:
> > On Tue, Nov 27, 2018 at 08:49:44PM -0700, Allison Henderson wrote:
> > > Motivation:
> > > When fs data/metadata checksum mismatch, l
On Tue, Nov 27, 2018 at 09:26:04PM -0800, Darrick J. Wong wrote:
> On Wed, Nov 28, 2018 at 04:17:19PM +1100, Dave Chinner wrote:
> > On Tue, Nov 27, 2018 at 08:49:50PM -0700, Allison Henderson wrote:
> > > If we had to try more than one mirror to get a successful
> > > r
On Tue, Nov 27, 2018 at 09:22:45PM -0800, Darrick J. Wong wrote:
> On Wed, Nov 28, 2018 at 04:08:50PM +1100, Dave Chinner wrote:
> > On Tue, Nov 27, 2018 at 08:49:49PM -0700, Allison Henderson wrote:
> > > Check to see if the _xfs_buf_read fails. If so loop over the
> >
RAID5/6 to trigger verification/recovery from the parity
information in the stripe?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
continue;
0:
/* good copy, rewrite it to repair bad copy */
xfs_bwrite(bp);
/* fallthrough */
default:
return bp;
}
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
case -EIO:
> + case -EFSCORRUPTED:
> + case -EFSBADCRC:
> + /* loop again */
> + continue;
> + default:
> + goto retry_done;
Just return bp here, don't need a jump label for it.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
will miss setting bp->b_rw_hint for IO that completes before
submission returns to __xfs_buf_submit() (i.e. b_io_remaining is 2
at IO completion).
So I suspect it won't do the right thing on fast or synchronous
block devices like pmem. You should be able to tst this with a RAID1
made from two ramdisks...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
other direct
IO paths, too?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
data IO submission, so this value is going to change as the
IO progresses. What does making these partial IOs visible provide,
especially as they then get overwritten by the next submissions?
Indeed, how does one wait on all IOs in the DIO to complete if we
are only tracking one of many?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Nov 15, 2018 at 02:24:19PM -0800, Darrick J. Wong wrote:
> On Fri, Nov 16, 2018 at 09:13:37AM +1100, Dave Chinner wrote:
> > On Thu, Nov 15, 2018 at 11:10:36AM +0800, Ming Lei wrote:
> > > On Thu, Nov 15, 2018 at 12:22:01PM +1100, Dave Chinner wrote:
> > > >
On Thu, Nov 15, 2018 at 11:10:36AM +0800, Ming Lei wrote:
> On Thu, Nov 15, 2018 at 12:22:01PM +1100, Dave Chinner wrote:
> > On Thu, Nov 15, 2018 at 09:06:52AM +0800, Ming Lei wrote:
> > > On Wed, Nov 14, 2018 at 08:18:24AM -0700, Jens Axboe wrote:
> > > > On 11/13/
On Thu, Nov 15, 2018 at 09:06:52AM +0800, Ming Lei wrote:
> On Wed, Nov 14, 2018 at 08:18:24AM -0700, Jens Axboe wrote:
> > On 11/13/18 2:43 PM, Dave Chinner wrote:
> > > From: Dave Chinner
> > >
> > > A discard cleanup merged into 4.20-rc2 causes fstests x
On Wed, Nov 14, 2018 at 10:53:11AM +0800, Ming Lei wrote:
> On Wed, Nov 14, 2018 at 5:44 AM Dave Chinner wrote:
> >
> > From: Dave Chinner
> >
> > A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
> > fall into an endless loop in the discard code.
't make head or
tail of them because I haven't received the rest of the patches.
i.e. if you are going to send the patch 0 to a mailing list, the
entire patchset should also be sent to that mailing list.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
From: Dave Chinner
A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
fall into an endless loop in the discard code. The test is creating
a device that is exactly 2^32 sectors in size to test mkfs boundary
conditions around the 32 bit sector overflow region.
mkfs issues a discard
ge_frag_alloc() directly, seems not necessary to
> introduce this change in block layer any more given 512-aligned buffer
> should be fine everywhere.
>
> The only benefit to make it as block helper is that the offset or size
> can be checked with q->dma_alignment.
>
> Dave/Jens, do you think which way is better? Put allocation as block
> helper or fs uses page_frag_alloc() directly for allocating 512*N-byte
> buffer(total size is less than PAGE_SIZE)?
Cristoph has already said he's looking at using page_frag_alloc()
directly in XFS
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, May 22, 2018 at 10:24:54AM +0200, Christoph Hellwig wrote:
> On Tue, May 22, 2018 at 10:07:45AM +1000, Dave Chinner wrote:
> > > Something doesn't smell right here. The only pages we need to read in
> > > are the first and last pages in the write_begin range, and
e write_begin range, and only if they
> aren't page aligned and the underlying extent is IOMAP_MAPPED, right?
And not beyond EOF, too.
The bufferhead code handles this via the buffer_new() flag - it
triggers the skipping of read IO and the states in which it is
set are clearly indicated in iom
ap?
>
> Oh, I assumed iomap would work for filesystems with block size greater
> than PAGE_SIZE.
It will eventually, but first we've got to remove the iomap
infrastructure and filesystem dependencies on bufferheads
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
unlock_page(ctx.cur_page);
> + put_page(ctx.cur_page);
> + }
> + WARN_ON_ONCE(ret && !list_empty(ctx.pages));
And this warning will never trigger. Was this intended behaviour?
If it is, it needs a comment, because it looks wrong
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
p;wb->work_list))
> - mod_delayed_work(bdi_wq, &wb->dwork, 0);
> + wb_wakeup(wb);
> else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
> wb_wakeup_delayed(wb);
Yup, looks fine - I can't see any more of these op
From: Dave Chinner
To prepare for iomap iinfrastructure based DSYNC optimisations.
While moving the code araound, move the XFS write bytes metric
update for direct IO into xfs_dio_write_end_io callback so that we
always capture the amount of data written via AIO+DIO. This fixes
the problem
From: Dave Chinner
Currently iomap_dio_rw() only handles (data)sync write completions
for AIO. This means we can't optimised non-AIO IO to minimise device
flushes as we can't tell the caller whether a flush is required or
not.
To solve this problem and enable further optimisat
From: Dave Chinner
If we are doing direct IO writes with datasync semantics, we often
have to flush metadata changes along with the data write. However,
if we are overwriting existing data, there are no metadata changes
that we need to flush. In this case, optimising the IO by using
FUA write
Hi folks,
Version 3 of the FUA for O-DSYNC patchset. This version fixes bugs
found in the previous version. Functionality is otherwise the same
as described in the first version:
https://marc.info/?l=linux-xfs&m=152213446528167&w=2
Version 3:
- fixed O_SYNC behaviour as noticed by Jan Kara
- fi
On Sat, Apr 21, 2018 at 03:03:09PM +0200, Jan Kara wrote:
> On Wed 18-04-18 14:08:26, Dave Chinner wrote:
> > From: Dave Chinner
> >
> > Currently iomap_dio_rw() only handles (data)sync write completions
> > for AIO. This means we can't optimised non-AIO IO to m
> Oops, good catch. I think the above if should just be
>
> if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) {
>
> and we are fine.
Ah, not exactly. IOMAP_DIO_NEED_SYNC needs to be set for either
DYSNC or SYNC writes, while IOMAP_DIO_WRITE_FUA should only be set
for DSYNC.
I'll fix this up appropriately.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Apr 30, 2018 at 05:00:14PM -0600, Jens Axboe wrote:
> On 4/30/18 4:40 PM, Jens Axboe wrote:
> > On 4/30/18 4:28 PM, Dave Chinner wrote:
> >> Yes, it does, but so would having the block layer to throttle device
> >> discard requests in flight to a queue de
On Mon, Apr 30, 2018 at 03:42:11PM -0600, Jens Axboe wrote:
> On 4/30/18 3:31 PM, Dave Chinner wrote:
> > On Mon, Apr 30, 2018 at 09:32:52AM -0600, Jens Axboe wrote:
> >> XFS recently added support for async discards. While this can be
> >> a win for some workloads
a hack to work around the symptoms being seen...
More details of the regression and the root cause analysis is
needed, please.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
hare transport. Does anyone have a wiki we can use to coordinate this?
Arriving 4.15pm sunday, so if you want to wait around for a bit I'm
happy to share...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e schedule for the all the
ext4/btrfs/XFS/NFS/CIFS devs to get together with each other and
talk about things of interest only to their own fileystems.
That means we all don't have to find time outside the schedule to do
this, and think this wold be time very well spent for most f
't have to keep moving rooms every half hour...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Hi folks,
This is the latest version of the "use FUA for DIO writes" patchset
that was last posted here:
https://marc.info/?l=linux-xfs&m=152213446528167&w=2
Functionality and performance is the same as the previous version.
Changes in this version address Christoph's review comments.
Version 2
From: Dave Chinner
Currently iomap_dio_rw() only handles (data)sync write completions
for AIO. This means we can't optimised non-AIO IO to minimise device
flushes as we can't tell the caller whether a flush is required or
not.
To solve this problem and enable further optimisat
From: Dave Chinner
So we can check FUA support status from the iomap direct IO code.
Signed-Off-By: Dave Chinner
---
include/linux/blkdev.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9af3e0f430bc..c362aadfe036 100644
--- a/include
From: Dave Chinner
To prepare for iomap iinfrastructure based DSYNC optimisations.
While moving the code araound, move the XFS write bytes metric
update for direct IO into xfs_dio_write_end_io callback so that we
always capture the amount of data written via AIO+DIO. This fixes
the problem
From: Dave Chinner
If we are doing direct IO writes with datasync semantics, we often
have to flush metadata changes along with the data write. However,
if we are overwriting existing data, there are no metadata changes
that we need to flush. In this case, optimising the IO by using
FUA write
From: Dave Chinner
Currently iomap_dio_rw() only handles (data)sync write completions
for AIO. This means we can't optimised non-AIO IO to minimise device
flushes as we can't tell the caller whether a flush is required or
not.
To solve this problem and enable further optimisat
From: Dave Chinner
To prepare for iomap iinfrastructure based DSYNC optimisations.
While moving the code araound, move the XFS write bytes metric
update for direct IO into xfs_dio_write_end_io callback so that we
always capture the amount of data written via AIO+DIO. This fixes
the problem
From: Dave Chinner
If we are doing direct IO writes with datasync semantics, we often
have to flush metadata changes along with the data write. However,
if we are overwriting existing data, there are no metadata changes
that we need to flush. In this case, optimising the IO by using
FUA write
Hi folks,
This is a followup on my original patch to enable use of FUA writes
for pure data O_DSYNC writes through the XFS and iomap based direct
IO paths. This version has all of the changes christoph asked for,
and splits it up into simpler patches. The performance improvements
are detailed in t
On Fri, Jan 19, 2018 at 02:23:16AM -0200, Raphael Carvalho wrote:
> On Fri, Jan 19, 2018 at 1:57 AM, Dave Chinner wrote:
> >
> > On Thu, Jan 18, 2018 at 06:57:41PM -0600, Goldwyn Rodrigues wrote:
> > > From: Goldwyn Rodrigues
> > >
> > > Since we can
1 - 100 of 139 matches
Mail list logo