Re: [PATCH] Add prctl support for controlling PF_MEMALLOC V2

2019-10-22 Thread Dave Chinner
the block/fs drivers they already do > > things normal daemons do not to meet that guarantee like mlock their > > memory, disable oom killer, and preallocate resources they have control > > over. They have no control over reclaim like the kernel drivers do so > > its easy for us to deadlock when memory gets low. > > OK, fair enough. How much of a control do they really need though. Is a > single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO > context) sufficient? I think some of these usrspace processes work at the filesystem level and so really only need GFP_NOFS allocation (fuse), while others work at the block device level (iscsi, nbd) so need GFP_NOIO allocation. So there's definitely an argument for providing both... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] Add prctl support for controlling PF_MEMALLOC V2

2019-10-21 Thread Dave Chinner
in include/linux/capability.h... Otherwise looks fine to me. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)

2019-10-02 Thread Dave Chinner
4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completio $ Not obvious to me what is wrong with that commit right now, but the bisect is solid. Kinda surprised to see such significant fs-writeback changes in 5.4, though, because there was nothing sent to the -fsdevel list for review in the last dev cycle. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-09-24 Thread Dave Chinner
hey act together years down the track we can remove the workaround from XFS. Users don't care how we fix the problem, they just want it fixed. If that means we have to route around dysfunctional developer groups, then we'll just have to do that Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-08-29 Thread Dave Chinner
On Thu, Aug 29, 2019 at 09:56:13AM +0200, Vlastimil Babka wrote: > On 8/29/19 12:24 AM, Dave Chinner wrote: > > On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote: > >> On Wed, Aug 28, 2019 at 06:45:07PM +, Christopher Lameter wrote: > >>> I stil

Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-08-28 Thread Dave Chinner
ed (e.g. another set of heap slabs like the -rcl slabs), I just don't want every high level subsystem that allocates heap memory for IO buffers to have to implement their own aligned slab caches. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/3] xfs: alignment check bio buffers

2019-08-22 Thread Dave Chinner
On Thu, Aug 22, 2019 at 01:03:12AM -0700, Christoph Hellwig wrote: > On Thu, Aug 22, 2019 at 10:37:45AM +1000, Dave Chinner wrote: > > > I know Jens disagree, but with the amount of bugs we've been hitting > > > thangs to slub (and I'm pretty sure we have a more

Re: [PATCH 3/3] xfs: alignment check bio buffers

2019-08-21 Thread Dave Chinner
On Thu, Aug 22, 2019 at 10:50:02AM +0800, Ming Lei wrote: > On Thu, Aug 22, 2019 at 8:06 AM Christoph Hellwig wrote: > > > > On Wed, Aug 21, 2019 at 06:38:20PM +1000, Dave Chinner wrote: > > > From: Dave Chinner > > > > > > Add memory buffer alignm

Re: [PATCH 3/3] xfs: alignment check bio buffers

2019-08-21 Thread Dave Chinner
On Wed, Aug 21, 2019 at 04:29:45PM -0700, Christoph Hellwig wrote: > On Wed, Aug 21, 2019 at 06:38:20PM +1000, Dave Chinner wrote: > > From: Dave Chinner > > > > Add memory buffer alignment validation checks to bios built in XFS > > to catch bugs that will result i

Re: 5.3-rc1 regression with XFS log recovery

2019-08-20 Thread Dave Chinner
On Tue, Aug 20, 2019 at 10:08:38PM +, Verma, Vishal L wrote: > On Wed, 2019-08-21 at 07:44 +1000, Dave Chinner wrote: > > > > However, the case here is that: > > > > > > > > i.e. page offset len sector > > > &g

Re: 5.3-rc1 regression with XFS log recovery

2019-08-20 Thread Dave Chinner
On Tue, Aug 20, 2019 at 05:24:25PM +0800, Ming Lei wrote: > On Tue, Aug 20, 2019 at 04:13:26PM +0800, Ming Lei wrote: > > On Tue, Aug 20, 2019 at 07:53:20AM +0200, h...@lst.de wrote: > > > On Tue, Aug 20, 2019 at 02:41:35PM +1000, Dave Chinner wrote: > > > > &g

Re: 5.3-rc1 regression with XFS log recovery

2019-08-20 Thread Dave Chinner
On Tue, Aug 20, 2019 at 07:53:20AM +0200, h...@lst.de wrote: > On Tue, Aug 20, 2019 at 02:41:35PM +1000, Dave Chinner wrote: > > > With the following debug patch. Based on that I think I'll just > > > formally submit the vmalloc switch as we're at -rc5, and then we

Re: [PATCH RESEND] block: annotate refault stalls from IO submission

2019-08-13 Thread Dave Chinner
On Tue, Aug 13, 2019 at 01:46:25PM -0400, Johannes Weiner wrote: > On Sat, Aug 10, 2019 at 08:12:48AM +1000, Dave Chinner wrote: > > On Thu, Aug 08, 2019 at 03:03:00PM -0400, Johannes Weiner wrote: > > > psi tracks the time tasks wait for refaulting pages to become > > >

Re: [PATCH RESEND] block: annotate refault stalls from IO submission

2019-08-09 Thread Dave Chinner
Any thoughts of how we might be able to integrate more of the system caches into the PSI infrastructure, Johannes? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Block device direct read EIO handling broken?

2019-08-05 Thread Dave Chinner
goto error; > > } > > dio->size += bio_size; > > + bio_put(bio); > > > > Thoughts ? > > > > That does not work since the reference to dio->size in blkdev_bio_end_io() > depends on atomic_dec_and_test(&dio->ref) which counts the BIO fragments for > the > dio (+1 for async multi-bio case). So completion of the last bio can still > reference the old value of dio->size. Didn't we fix this same use-after-free in iomap_dio_rw() in commit 4ea899ead278 ("iomap: fix a use after free in iomap_dio_rw")? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: EIO with io_uring O_DIRECT writes on ext4

2019-07-23 Thread Dave Chinner
On Tue, Jul 23, 2019 at 04:19:31PM -0600, Jens Axboe wrote: > On 7/23/19 4:05 PM, Dave Chinner wrote: > > On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote: > >> On 7/23/19 2:07 AM, Stefan Hajnoczi wrote: > >>> Hi, > >>> io_uring O_DIRECT writes

Re: [PATCH] psi: annotate refault stalls from IO submission

2019-07-23 Thread Dave Chinner
On Tue, Jul 23, 2019 at 01:34:50PM -0600, Jens Axboe wrote: > On 7/23/19 1:04 PM, Johannes Weiner wrote: > > CCing Jens for bio layer stuff > > > > On Tue, Jul 23, 2019 at 10:02:26AM +1000, Dave Chinner wrote: > >> Even better: If this memstall and "refault&q

Re: EIO with io_uring O_DIRECT writes on ext4

2019-07-23 Thread Dave Chinner
full, then we simply use the existing bdi_congested() interface to check. That works for all types of block devices - not just random mq devices - and matches code we have all over the kernel to avoid blocking async IO submission on congested reuqest queues... So, yeah, I think REQ_NOWAIT needs to die and the direct IO callers should do just congestion checks on IOCB_NOWAIT/IOMAP_NOWAIT rather than try to add new error reporting mechanisms into bios that lots of code will need to be changed to support Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] psi: annotate refault stalls from IO submission

2019-07-22 Thread Dave Chinner
and so this does not require magic pixie dust at the page cache iteration level e.g. bio_add_page_memstall() can do the working set check and then set a flag on the bio to say it contains a memstall page. Then on submission of the bio the memstall condition can be cleared. Cheers, -Dave. -- Dave Chinner da...@fromorbit.com

Re: Testing devices for discard support properly

2019-05-08 Thread Dave Chinner
the fallocate flags to have /completely/ different behaviour on block devices to filesystems. We excel at screwing up APIs, don't we? I give up, we've piled the shit too high on this one to dig it out now -Dave. -- Dave Chinner da...@fromorbit.com

Re: Testing devices for discard support properly

2019-05-08 Thread Dave Chinner
oes that description make sense? > > The problem is that most vendors implement (3) using (1). But can't make > it work well because (3) was -- and still is for ATA -- outside the > scope of what the protocols can express. > > And I agree with you that if (3) was implemented correctly in all > devices, we wouldn't need (1) at all. At least not for devices with an > internal granularity << total capacity. What I'm saying is that we should be pushing standards to ensure (3) is correctly standardise, certified and implemented because that is what the "Linux OS" requires from future hardware. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Testing devices for discard support properly

2019-05-08 Thread Dave Chinner
es not return zeroes on subsequent reads? i.e. it is effectively fallocate(FALLOC_FL_NO_HIDE_STALE) preallocation semantics? For many use cases cases we actually want zeroed space to be guaranteed so we don't expose stale data from previous device use into the new user's visibility - can that be done with WRITE_SAME and the ANCHOR flag? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Testing devices for discard support properly

2019-05-07 Thread Dave Chinner
On Tue, May 07, 2019 at 08:07:53PM -0400, Ric Wheeler wrote: > On 5/7/19 6:04 PM, Dave Chinner wrote: > > On Mon, May 06, 2019 at 04:56:44PM -0400, Ric Wheeler wrote: > > > (repost without the html spam, sorry!) > > > > > > Last week at LSF/MM, I suggested we

Re: Testing devices for discard support properly

2019-05-07 Thread Dave Chinner
we just deprecate blkdev_issue_discard and all the interfaces that lead to it as a first step? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] Direct block mapping through fs for device

2019-04-26 Thread Dave Chinner
On Fri, Apr 26, 2019 at 11:20:45AM -0400, Jerome Glisse wrote: > On Fri, Apr 26, 2019 at 04:28:16PM +1000, Dave Chinner wrote: > > On Thu, Apr 25, 2019 at 09:38:14PM -0400, Jerome Glisse wrote: > > > I see that they are still empty spot in LSF/MM schedule so i would like to >

Re: [PATCH v3 2/3] block: verify data when endio

2019-04-03 Thread Dave Chinner
that complains if the resulting configuration violates whichever > alignment and other assumptions we end up baking into this. Makes sense to me. If we can ensure that the alignment requirements for the stack are communicated in the existing geometry infoi correctly (i.e. io_min, io_opt) then we've pretty much already got everything we need in place. We might need a few mkfs tweaks to ensure the log is placed correctly, log write padding is set appropriately, and check that inode clusters are appropriately aligned (I think they are already) but otherwise I think we will be good here... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 2/3] block: verify data when endio

2019-04-01 Thread Dave Chinner
er layers of the storage being less than perfect. IOWs, the filesystem doesn't expect hard "always correct" guarantees from the storage layers - we always have to assume IO failures will occur because they do, even with T10 PI. Hence it makes no sense to for an automatic retry-and-recovery infrastructure for filesystems to require hard guarantees that the block device will always return good data. Automatic repair doesn't guarantee the storage is free from errors - it just provides a mechanism to detect errors and perform optimistic, best effort recovery at the lowest possible layer in the stack as early as possible. Cheers, dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 2/3] block: verify data when endio

2019-03-31 Thread Dave Chinner
that happens here is we give up the capability for automatic block device recovery and repair of damaged copies, which we can't do right now, so it's essentially status quo... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: question about writeback

2019-03-14 Thread Dave Chinner
reached the end of the file as directed? So perhaps the caller should be waiting on a specific range to bound the wait (e.g. isize as the end of the wait) rather than using the default "keep going until the end of file is reached" semantics? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

2019-03-03 Thread Dave Chinner
On Thu, Feb 28, 2019 at 04:28:53PM -0700, Andreas Dilger wrote: > On Feb 28, 2019, at 7:22 AM, Bob Liu wrote: > > > > On 2/19/19 5:31 AM, Dave Chinner wrote: > >> On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote: > >>> Motivation: > >>> When

Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

2019-03-03 Thread Dave Chinner
On Sun, Mar 03, 2019 at 10:37:59AM +0800, Bob Liu wrote: > On 3/1/19 5:49 AM, Dave Chinner wrote: > > On Thu, Feb 28, 2019 at 10:22:02PM +0800, Bob Liu wrote: > >> On 2/19/19 5:31 AM, Dave Chinner wrote: > >>> On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:

Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

2019-02-28 Thread Dave Chinner
On Thu, Feb 28, 2019 at 10:22:02PM +0800, Bob Liu wrote: > On 2/19/19 5:31 AM, Dave Chinner wrote: > > On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote: > >> Motivation: > >> When fs data/metadata checksum mismatch, lower block devices may have other >

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-27 Thread Dave Chinner
ny, many years - the same issue that occurs with loop devices when you try to mount a 512 byte sector image on a hard 4k sector host filesystem/storage device using direct IO in the loop device. This isn't a new thing at all - if you want to use direct IO to manipulate filesystem images, you actually need to know what you are doing Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-26 Thread Dave Chinner
On Wed, Feb 27, 2019 at 09:50:55AM +0800, Ming Lei wrote: > On Wed, Feb 27, 2019 at 07:45:50AM +1100, Dave Chinner wrote: > > On Tue, Feb 26, 2019 at 05:33:04PM +0800, Ming Lei wrote: > > > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote: > > > > On M

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-26 Thread Dave Chinner
On Tue, Feb 26, 2019 at 05:33:04PM +0800, Ming Lei wrote: > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote: > > On Mon, Feb 25, 2019 at 07:27:37PM -0800, Matthew Wilcox wrote: > > > On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote: > > > >

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-25 Thread Dave Chinner
On Mon, Feb 25, 2019 at 07:27:37PM -0800, Matthew Wilcox wrote: > On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote: > > > Or what is the exact size of sub-page IO in xfs most of time? For > > > > Determined by mkfs parameters. Any power of 2 between 512 bytes

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-25 Thread Dave Chinner
On Tue, Feb 26, 2019 at 10:22:50AM +0800, Ming Lei wrote: > On Tue, Feb 26, 2019 at 07:26:30AM +1100, Dave Chinner wrote: > > On Mon, Feb 25, 2019 at 02:15:59PM +0100, Vlastimil Babka wrote: > > > On 2/25/19 5:36 AM, Dave Chinner wrote: > > > > On Mon, Feb 25, 2019

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-25 Thread Dave Chinner
On Mon, Feb 25, 2019 at 02:15:59PM +0100, Vlastimil Babka wrote: > On 2/25/19 5:36 AM, Dave Chinner wrote: > > On Mon, Feb 25, 2019 at 12:09:04PM +0800, Ming Lei wrote: > >> XFS uses kmalloc() to allocate sector sized IO buffer. > > > >> Use page_frag_all

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-25 Thread Dave Chinner
On Mon, Feb 25, 2019 at 04:46:25PM +0800, Ming Lei wrote: > On Mon, Feb 25, 2019 at 03:36:48PM +1100, Dave Chinner wrote: > > On Mon, Feb 25, 2019 at 12:09:04PM +0800, Ming Lei wrote: > > > XFS uses kmalloc() to allocate sector sized IO buffer. > > > > > Use

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-24 Thread Dave Chinner
y single metadata allocation is a sub-page allocation and so will use this new page frag mechanism. IOWs, it will result in fragmenting memory severely and typical memory reclaim not being able to fix it because the metadata that pins each page is largely unreclaimable... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors

2019-02-24 Thread Dave Chinner
4k, and > disk size is > 1024, XFS won't do that, either - it checks at mount time if it can read the very last sector of the filesystem via uncached IO (see xfs_check_sizes() and xfs_rtmount_init()). If any of the EOD reads fail, it won't mount. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-21 Thread Dave Chinner
ent if you write > or discard at the smaller granularity. Filesystems discard extents these days, not individual blocks. If you free a 1MB file, they you are likely to get a 1MB discard. Or if you use fstrim, then it's free space extent sizes (on XFS can be hundred of GBs) and small fre

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-18 Thread Dave Chinner
On Mon, Feb 18, 2019 at 06:15:34PM -0800, Jane Chu wrote: > On 2/15/2019 9:39 PM, Dave Chinner wrote: > > >On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote: > >>On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote: > >>>(This is a jo

Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

2019-02-18 Thread Dave Chinner
On Mon, Feb 18, 2019 at 06:55:20PM -0800, Darrick J. Wong wrote: > On Tue, Feb 19, 2019 at 08:31:50AM +1100, Dave Chinner wrote: > > On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote: > > > Motivation: > > > When fs data/metadata checksum mismatch, lower block de

Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

2019-02-18 Thread Dave Chinner
her layer requirements. The only difference from a caller point of view should be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func); Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-17 Thread Dave Chinner
On Sun, Feb 17, 2019 at 06:42:59PM -0500, Ric Wheeler wrote: > On 2/17/19 4:09 PM, Dave Chinner wrote: > >On Sun, Feb 17, 2019 at 03:36:10PM -0500, Ric Wheeler wrote: > >>One proposal for btrfs was that we should look at getting discard > >>out of the synchronous pa

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-17 Thread Dave Chinner
st of the various discard > commands - how painful is it for modern SSD's? AIUI, it still depends on the SSD implementation, unfortunately. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-16 Thread Dave Chinner
On Sat, Feb 16, 2019 at 09:05:31AM -0800, Dan Williams wrote: > On Fri, Feb 15, 2019 at 9:40 PM Dave Chinner wrote: > > > > On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote: > > > On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote: > >

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-15 Thread Dave Chinner
On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote: > On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote: > > (This is a joint proposal with Hannes Reinecke) > > > > Servers with NV-DIMM are slowly emerging in data centers but one key feature > &g

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-15 Thread Dave Chinner
all the metadata goes to the software raided pmem block devices that aren't DAX capable. Problem already solved, yes? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 0/4] Write-hint for FS journal

2019-02-05 Thread Dave Chinner
On Tue, Feb 05, 2019 at 12:50:48PM +0100, Jan Kara wrote: > On Wed 30-01-19 19:24:39, Kanchan Joshi wrote: > > > > On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote: > > > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote: > > > > On Mon

Re: [PATCH v2 0/4] Write-hint for FS journal

2019-01-29 Thread Dave Chinner
gt; > > > > > > > > > > > > > > > > > > Changes since v1: > > > > > > > > > > - introduce four more hints for in-kernel us

Re: [PATCH 12/15] io_uring: add support for pre-mapped user IO buffers

2019-01-16 Thread Dave Chinner
On Wed, Jan 16, 2019 at 03:21:21PM -0700, Jens Axboe wrote: > On 1/16/19 3:09 PM, Dave Chinner wrote: > > On Wed, Jan 16, 2019 at 02:20:53PM -0700, Jens Axboe wrote: > >> On 1/16/19 1:53 PM, Dave Chinner wrote: > >> I'd be fine with that restriction, especially si

Re: [PATCH 12/15] io_uring: add support for pre-mapped user IO buffers

2019-01-16 Thread Dave Chinner
On Wed, Jan 16, 2019 at 02:20:53PM -0700, Jens Axboe wrote: > On 1/16/19 1:53 PM, Dave Chinner wrote: > > On Wed, Jan 16, 2019 at 10:50:00AM -0700, Jens Axboe wrote: > >> If we have fixed user buffers, we can map them into the kernel when we > >> setup the io_context.

Re: [PATCH 12/15] io_uring: add support for pre-mapped user IO buffers

2019-01-16 Thread Dave Chinner
DAX because the above problems are actually a user-after-free of storage space, not just a dangling page reference that can be cleaned up after the gup pin is dropped. Perhaps, at least until we solve the GUP problems w.r.t. file backed pages and/or add and require file layout leases for these reference, we should error out if the user buffer pages are file-backed mappings? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Block device flush ordering

2019-01-13 Thread Dave Chinner
e writes are on stable storage. They *may* be on stable storage if the timing is right, but it is not guaranteed by the OS code. Likewise, flush 2 only guarantees writes 1, 3 and 5 are on stable storage becase they are the only writes that have been signalled as complete when flush 2 was submitted. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v1 0/7] Block/XFS: Support alternative mirror device retry

2018-11-27 Thread Dave Chinner
On Tue, Nov 27, 2018 at 11:37:22PM -0800, Christoph Hellwig wrote: > On Wed, Nov 28, 2018 at 04:33:03PM +1100, Dave Chinner wrote: > > One thing that is going to make this more complex at the XFS layer > > is discontiguous buffers. They require multiple IOs (and therefore > >

Re: [RFC PATCH v1 0/7] Block/XFS: Support alternative mirror device retry

2018-11-27 Thread Dave Chinner
On Tue, Nov 27, 2018 at 09:49:23PM -0800, Darrick J. Wong wrote: > On Wed, Nov 28, 2018 at 04:33:03PM +1100, Dave Chinner wrote: > > On Tue, Nov 27, 2018 at 08:49:44PM -0700, Allison Henderson wrote: > > > Motivation: > > > When fs data/metadata checksum mismatch, l

Re: [PATCH v1 6/7] xfs: Rewrite retried read

2018-11-27 Thread Dave Chinner
On Tue, Nov 27, 2018 at 09:26:04PM -0800, Darrick J. Wong wrote: > On Wed, Nov 28, 2018 at 04:17:19PM +1100, Dave Chinner wrote: > > On Tue, Nov 27, 2018 at 08:49:50PM -0700, Allison Henderson wrote: > > > If we had to try more than one mirror to get a successful > > > r

Re: [PATCH v1 5/7] xfs: Add device retry

2018-11-27 Thread Dave Chinner
On Tue, Nov 27, 2018 at 09:22:45PM -0800, Darrick J. Wong wrote: > On Wed, Nov 28, 2018 at 04:08:50PM +1100, Dave Chinner wrote: > > On Tue, Nov 27, 2018 at 08:49:49PM -0700, Allison Henderson wrote: > > > Check to see if the _xfs_buf_read fails. If so loop over the > >

Re: [RFC PATCH v1 0/7] Block/XFS: Support alternative mirror device retry

2018-11-27 Thread Dave Chinner
RAID5/6 to trigger verification/recovery from the parity information in the stripe? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 6/7] xfs: Rewrite retried read

2018-11-27 Thread Dave Chinner
continue; 0: /* good copy, rewrite it to repair bad copy */ xfs_bwrite(bp); /* fallthrough */ default: return bp; } Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 5/7] xfs: Add device retry

2018-11-27 Thread Dave Chinner
case -EIO: > + case -EFSCORRUPTED: > + case -EFSBADCRC: > + /* loop again */ > + continue; > + default: > + goto retry_done; Just return bp here, don't need a jump label for it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/7] xfs: Add b_rw_hint to xfs_buf

2018-11-27 Thread Dave Chinner
will miss setting bp->b_rw_hint for IO that completes before submission returns to __xfs_buf_submit() (i.e. b_io_remaining is 2 at IO completion). So I suspect it won't do the right thing on fast or synchronous block devices like pmem. You should be able to tst this with a RAID1 made from two ramdisks... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/5] aio: add iocb->ki_blk_qc field

2018-11-18 Thread Dave Chinner
other direct IO paths, too? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 5/5] aio: add support for file based polled IO

2018-11-18 Thread Dave Chinner
data IO submission, so this value is going to change as the IO progresses. What does making these partial IOs visible provide, especially as they then get overwritten by the next submissions? Indeed, how does one wait on all IOs in the DIO to complete if we are only tracking one of many? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] block: fix 32 bit overflow in __blkdev_issue_discard()

2018-11-15 Thread Dave Chinner
On Thu, Nov 15, 2018 at 02:24:19PM -0800, Darrick J. Wong wrote: > On Fri, Nov 16, 2018 at 09:13:37AM +1100, Dave Chinner wrote: > > On Thu, Nov 15, 2018 at 11:10:36AM +0800, Ming Lei wrote: > > > On Thu, Nov 15, 2018 at 12:22:01PM +1100, Dave Chinner wrote: > > > >

Re: [PATCH] block: fix 32 bit overflow in __blkdev_issue_discard()

2018-11-15 Thread Dave Chinner
On Thu, Nov 15, 2018 at 11:10:36AM +0800, Ming Lei wrote: > On Thu, Nov 15, 2018 at 12:22:01PM +1100, Dave Chinner wrote: > > On Thu, Nov 15, 2018 at 09:06:52AM +0800, Ming Lei wrote: > > > On Wed, Nov 14, 2018 at 08:18:24AM -0700, Jens Axboe wrote: > > > > On 11/13/

Re: [PATCH] block: fix 32 bit overflow in __blkdev_issue_discard()

2018-11-14 Thread Dave Chinner
On Thu, Nov 15, 2018 at 09:06:52AM +0800, Ming Lei wrote: > On Wed, Nov 14, 2018 at 08:18:24AM -0700, Jens Axboe wrote: > > On 11/13/18 2:43 PM, Dave Chinner wrote: > > > From: Dave Chinner > > > > > > A discard cleanup merged into 4.20-rc2 causes fstests x

Re: [PATCH] block: fix 32 bit overflow in __blkdev_issue_discard()

2018-11-14 Thread Dave Chinner
On Wed, Nov 14, 2018 at 10:53:11AM +0800, Ming Lei wrote: > On Wed, Nov 14, 2018 at 5:44 AM Dave Chinner wrote: > > > > From: Dave Chinner > > > > A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to > > fall into an endless loop in the discard code.

Re: [PATCH V9 00/19] block: support multi-page bvec

2018-11-13 Thread Dave Chinner
't make head or tail of them because I haven't received the rest of the patches. i.e. if you are going to send the patch 0 to a mailing list, the entire patchset should also be sent to that mailing list. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[PATCH] block: fix 32 bit overflow in __blkdev_issue_discard()

2018-11-13 Thread Dave Chinner
From: Dave Chinner A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to fall into an endless loop in the discard code. The test is creating a device that is exactly 2^32 sectors in size to test mkfs boundary conditions around the 32 bit sector overflow region. mkfs issues a discard

Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Dave Chinner
ge_frag_alloc() directly, seems not necessary to > introduce this change in block layer any more given 512-aligned buffer > should be fine everywhere. > > The only benefit to make it as block helper is that the offset or size > can be checked with q->dma_alignment. > > Dave/Jens, do you think which way is better? Put allocation as block > helper or fs uses page_frag_alloc() directly for allocating 512*N-byte > buffer(total size is less than PAGE_SIZE)? Cristoph has already said he's looking at using page_frag_alloc() directly in XFS Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 16/34] iomap: add initial support for writes without buffer heads

2018-05-22 Thread Dave Chinner
On Tue, May 22, 2018 at 10:24:54AM +0200, Christoph Hellwig wrote: > On Tue, May 22, 2018 at 10:07:45AM +1000, Dave Chinner wrote: > > > Something doesn't smell right here. The only pages we need to read in > > > are the first and last pages in the write_begin range, and

Re: [PATCH 16/34] iomap: add initial support for writes without buffer heads

2018-05-21 Thread Dave Chinner
e write_begin range, and only if they > aren't page aligned and the underlying extent is IOMAP_MAPPED, right? And not beyond EOF, too. The bufferhead code handles this via the buffer_new() flag - it triggers the skipping of read IO and the states in which it is set are clearly indicated in iom

Re: [PATCH 31/33] iomap: add support for sub-pagesize buffered I/O without buffer heads

2018-05-15 Thread Dave Chinner
ap? > > Oh, I assumed iomap would work for filesystems with block size greater > than PAGE_SIZE. It will eventually, but first we've got to remove the iomap infrastructure and filesystem dependencies on bufferheads Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 11/33] iomap: add an iomap-based readpage and readpages implementation

2018-05-09 Thread Dave Chinner
unlock_page(ctx.cur_page); > + put_page(ctx.cur_page); > + } > + WARN_ON_ONCE(ret && !list_empty(ctx.pages)); And this warning will never trigger. Was this intended behaviour? If it is, it needs a comment, because it looks wrong Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-03 Thread Dave Chinner
p;wb->work_list)) > - mod_delayed_work(bdi_wq, &wb->dwork, 0); > + wb_wakeup(wb); > else if (wb_has_dirty_io(wb) && dirty_writeback_interval) > wb_wakeup_delayed(wb); Yup, looks fine - I can't see any more of these op

[PATCH 1/3] xfs: move generic_write_sync calls inwards

2018-05-01 Thread Dave Chinner
From: Dave Chinner To prepare for iomap iinfrastructure based DSYNC optimisations. While moving the code araound, move the XFS write bytes metric update for direct IO into xfs_dio_write_end_io callback so that we always capture the amount of data written via AIO+DIO. This fixes the problem

[PATCH 2/3] iomap: iomap_dio_rw() handles all sync writes

2018-05-01 Thread Dave Chinner
From: Dave Chinner Currently iomap_dio_rw() only handles (data)sync write completions for AIO. This means we can't optimised non-AIO IO to minimise device flushes as we can't tell the caller whether a flush is required or not. To solve this problem and enable further optimisat

[PATCH 3/3] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-05-01 Thread Dave Chinner
From: Dave Chinner If we are doing direct IO writes with datasync semantics, we often have to flush metadata changes along with the data write. However, if we are overwriting existing data, there are no metadata changes that we need to flush. In this case, optimising the IO by using FUA write

[PATCH 0/3 v3] iomap: Use FUA for O_DSYNC DIO writes

2018-05-01 Thread Dave Chinner
Hi folks, Version 3 of the FUA for O-DSYNC patchset. This version fixes bugs found in the previous version. Functionality is otherwise the same as described in the first version: https://marc.info/?l=linux-xfs&m=152213446528167&w=2 Version 3: - fixed O_SYNC behaviour as noticed by Jan Kara - fi

Re: [PATCH 2/4] iomap: iomap_dio_rw() handles all sync writes

2018-05-01 Thread Dave Chinner
On Sat, Apr 21, 2018 at 03:03:09PM +0200, Jan Kara wrote: > On Wed 18-04-18 14:08:26, Dave Chinner wrote: > > From: Dave Chinner > > > > Currently iomap_dio_rw() only handles (data)sync write completions > > for AIO. This means we can't optimised non-AIO IO to m

Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-05-01 Thread Dave Chinner
> Oops, good catch. I think the above if should just be > > if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) { > > and we are fine. Ah, not exactly. IOMAP_DIO_NEED_SYNC needs to be set for either DYSNC or SYNC writes, while IOMAP_DIO_WRITE_FUA should only be set for DSYNC. I'll fix this up appropriately. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] xfs: add 'discard_sync' mount flag

2018-04-30 Thread Dave Chinner
On Mon, Apr 30, 2018 at 05:00:14PM -0600, Jens Axboe wrote: > On 4/30/18 4:40 PM, Jens Axboe wrote: > > On 4/30/18 4:28 PM, Dave Chinner wrote: > >> Yes, it does, but so would having the block layer to throttle device > >> discard requests in flight to a queue de

Re: [PATCH 2/2] xfs: add 'discard_sync' mount flag

2018-04-30 Thread Dave Chinner
On Mon, Apr 30, 2018 at 03:42:11PM -0600, Jens Axboe wrote: > On 4/30/18 3:31 PM, Dave Chinner wrote: > > On Mon, Apr 30, 2018 at 09:32:52AM -0600, Jens Axboe wrote: > >> XFS recently added support for async discards. While this can be > >> a win for some workloads

Re: [PATCH 2/2] xfs: add 'discard_sync' mount flag

2018-04-30 Thread Dave Chinner
a hack to work around the symptoms being seen... More details of the regression and the root cause analysis is needed, please. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM] Ride sharing

2018-04-19 Thread Dave Chinner
hare transport. Does anyone have a wiki we can use to coordinate this? Arriving 4.15pm sunday, so if you want to wait around for a bit I'm happy to share... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM] schedule suggestion

2018-04-18 Thread Dave Chinner
e schedule for the all the ext4/btrfs/XFS/NFS/CIFS devs to get together with each other and talk about things of interest only to their own fileystems. That means we all don't have to find time outside the schedule to do this, and think this wold be time very well spent for most f

Re: [LSF/MM] schedule suggestion

2018-04-18 Thread Dave Chinner
't have to keep moving rooms every half hour... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[PATCH 0/4 V2] iomap: Use FUA for O_DSYNC DIO writes

2018-04-17 Thread Dave Chinner
Hi folks, This is the latest version of the "use FUA for DIO writes" patchset that was last posted here: https://marc.info/?l=linux-xfs&m=152213446528167&w=2 Functionality and performance is the same as the previous version. Changes in this version address Christoph's review comments. Version 2

[PATCH 2/4] iomap: iomap_dio_rw() handles all sync writes

2018-04-17 Thread Dave Chinner
From: Dave Chinner Currently iomap_dio_rw() only handles (data)sync write completions for AIO. This means we can't optimised non-AIO IO to minimise device flushes as we can't tell the caller whether a flush is required or not. To solve this problem and enable further optimisat

[PATCH 3/4] blk: add blk_queue_fua() helper function

2018-04-17 Thread Dave Chinner
From: Dave Chinner So we can check FUA support status from the iomap direct IO code. Signed-Off-By: Dave Chinner --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9af3e0f430bc..c362aadfe036 100644 --- a/include

[PATCH 1/4] xfs: move generic_write_sync calls inwards

2018-04-17 Thread Dave Chinner
From: Dave Chinner To prepare for iomap iinfrastructure based DSYNC optimisations. While moving the code araound, move the XFS write bytes metric update for direct IO into xfs_dio_write_end_io callback so that we always capture the amount of data written via AIO+DIO. This fixes the problem

[PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-17 Thread Dave Chinner
From: Dave Chinner If we are doing direct IO writes with datasync semantics, we often have to flush metadata changes along with the data write. However, if we are overwriting existing data, there are no metadata changes that we need to flush. In this case, optimising the IO by using FUA write

[PATCH 2/3] iomap: iomap_dio_rw() handles all sync writes

2018-03-27 Thread Dave Chinner
From: Dave Chinner Currently iomap_dio_rw() only handles (data)sync write completions for AIO. This means we can't optimised non-AIO IO to minimise device flushes as we can't tell the caller whether a flush is required or not. To solve this problem and enable further optimisat

[PATCH 1/3] xfs: move generic_write_sync calls inwards

2018-03-27 Thread Dave Chinner
From: Dave Chinner To prepare for iomap iinfrastructure based DSYNC optimisations. While moving the code araound, move the XFS write bytes metric update for direct IO into xfs_dio_write_end_io callback so that we always capture the amount of data written via AIO+DIO. This fixes the problem

[PATCH 3/3] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-03-27 Thread Dave Chinner
From: Dave Chinner If we are doing direct IO writes with datasync semantics, we often have to flush metadata changes along with the data write. However, if we are overwriting existing data, there are no metadata changes that we need to flush. In this case, optimising the IO by using FUA write

[PATCH 0/3 V2] iomap: Use FUA for O_DSYNC DIO writes

2018-03-27 Thread Dave Chinner
Hi folks, This is a followup on my original patch to enable use of FUA writes for pure data O_DSYNC writes through the XFS and iomap based direct IO paths. This version has all of the changes christoph asked for, and splits it up into simpler patches. The performance improvements are detailed in t

Re: [PATCH 2/2] xfs: remove assert to check bytes returned

2018-01-18 Thread Dave Chinner
On Fri, Jan 19, 2018 at 02:23:16AM -0200, Raphael Carvalho wrote: > On Fri, Jan 19, 2018 at 1:57 AM, Dave Chinner wrote: > > > > On Thu, Jan 18, 2018 at 06:57:41PM -0600, Goldwyn Rodrigues wrote: > > > From: Goldwyn Rodrigues > > > > > > Since we can

  1   2   >