Re: Mis-Design of Btrfs?

2011-07-13 Thread NeilBrown
On Wed, 29 Jun 2011 10:29:53 +0100 Ric Wheeler rwhee...@redhat.com wrote: On 06/27/2011 07:46 AM, NeilBrown wrote: On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius nico-lkml-20110...@schottelius.org wrote: Good morning devs, I'm wondering whether the raid- and volume-management

Re: Mis-Design of Btrfs?

2011-07-14 Thread NeilBrown
On Thu, 14 Jul 2011 07:02:22 +0100 Ric Wheeler rwhee...@redhat.com wrote: I'm certainly open to suggestions and collaboration. Do you have in mind any particular way to make the interface richer?? NeilBrown Hi Neil, I know that Chris has a very specific set of use cases

Re: Mis-Design of Btrfs?

2011-07-14 Thread NeilBrown
On Thu, 14 Jul 2011 11:37:41 +0200 Jan Schmidt list.bt...@jan-o-sch.net wrote: Hi Neil, On 14.07.2011 08:38, NeilBrown wrote: I imagine a new field in 'struct bio' which was normally zero but could be some small integer. It is only meaningful for read. When 0 it means get this data way

Re: Mis-Design of Btrfs?

2011-07-15 Thread NeilBrown
On Thu, 14 Jul 2011 21:58:46 -0700 (PDT) da...@lang.hm wrote: On Thu, 14 Jul 2011, Chris Mason wrote: Excerpts from Ric Wheeler's message of 2011-07-14 02:57:54 -0400: On 07/14/2011 07:38 AM, NeilBrown wrote: On Thu, 14 Jul 2011 07:02:22 +0100 Ric Wheelerrwhee...@redhat.com wrote

Re: raid10 make_request failure during iozone benchmark upon btrfs

2012-07-02 Thread NeilBrown
On Tue, 03 Jul 2012 03:13:33 +0100 Kerin Millar kerfra...@gmail.com wrote: Hi, On 03/07/2012 02:39, NeilBrown wrote: [snip] Could you please double check that you are running a kernel with commit aba336bd1d46d6b0404b06f6915ed76150739057 Author: NeilBrownne...@suse.de Date

Re: ext4 vs btrfs performance on SSD array

2014-09-02 Thread NeilBrown
reports that a very large raid5 stripe cache size can cause a reduction in performance. I don't know why but I suspect it is a bug that should be found and fixed. Do we need max_sectors ?? NeilBrown signature.asc Description: PGP signature

Re: Triple parity and beyond

2013-11-22 Thread NeilBrown
. NeilBrown In the 20TB drive case, this would shave 18 hours off the total rebuild operation elapsed time. With current 4TB drives it would still save 6.5 hours. Losing both drives in one mirror set of a striped array is rare, but given the rebuild time saved it may be worth investigating during

Re: Triple parity and beyond

2013-11-22 Thread NeilBrown
significant. NeilBrown signature.asc Description: PGP signature

Re: Triple parity and beyond

2013-11-22 Thread NeilBrown
On Fri, 22 Nov 2013 21:46:50 -0600 Stan Hoeppner s...@hardwarefreak.com wrote: On 11/22/2013 5:07 PM, NeilBrown wrote: On Thu, 21 Nov 2013 16:57:48 -0600 Stan Hoeppner s...@hardwarefreak.com wrote: On 11/21/2013 1:05 AM, John Williams wrote: On Wed, Nov 20, 2013 at 10:52 PM, Stan

Re: Triple parity and beyond

2013-11-22 Thread NeilBrown
On Fri, 22 Nov 2013 21:34:41 -0800 John Williams jwilliams4...@gmail.com wrote: On Fri, Nov 22, 2013 at 9:04 PM, NeilBrown ne...@suse.de wrote: I guess with that many drives you could hit PCI bus throughput limits. A 16-lane PCIe 4.0 could just about give 100MB/s to each of 16 devices

Re: [RFC] lib: raid: New RAID library supporting up to six parities

2014-01-06 Thread NeilBrown
with the code, so maybe that would end up being more complex. Thanks, NeilBrown signature.asc Description: PGP signature

Re: __might_sleep() warnings on v3.19-rc6

2015-02-01 Thread NeilBrown
can't use wait_event_lock_irq_cmd() as there are actually several spinlocks I need to manipulate. So I'm hoping that this part of the patch (at least) can be reverted. Otherwise I guess I'll need to use __wait_event_cmd(). Thanks, NeilBrown pgpQc81MRD5Ju.pgp Description: OpenPGP digital

Re: __might_sleep() warnings on v3.19-rc6

2015-02-01 Thread NeilBrown
On Sun, 1 Feb 2015 21:08:12 -0800 Linus Torvalds torva...@linux-foundation.org wrote: On Sun, Feb 1, 2015 at 3:03 PM, NeilBrown ne...@suse.de wrote: I guess I could __set_current_state(TASK_RUNNING); somewhere to defeat the warning, and add a comment explaining why. Would

Re: [PATCH 3/3] btrfs: set FS_SUPPORTS_SEEK_HOLE flag.

2015-04-20 Thread NeilBrown
On Mon, 20 Apr 2015 09:47:42 +0100 David Howells dhowe...@redhat.com wrote: NeilBrown ne...@suse.de wrote: + .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | + FS_SUPPORTS_SEEK_HOLE, I must be missing something: warthoggit merge linus/master

Re: [PATCH 2/3] fscache/cachefiles: optionally use SEEK_DATA instead of -bmap.

2015-04-21 Thread NeilBrown
On Mon, 20 Apr 2015 02:45:39 -0700 Christoph Hellwig h...@infradead.org wrote: On Mon, Apr 20, 2015 at 04:27:00PM +1000, NeilBrown wrote: A worthwhile goal, but I certainly wouldn't consider pursuing it until what I have submitted so far as been accepted - let's not reject good while

Re: [PATCH/RFC] fscache/cachefiles versus btrfs

2015-04-19 Thread NeilBrown
On Fri, 10 Apr 2015 11:24:31 +1000 NeilBrown ne...@suse.de wrote: On Thu, 09 Apr 2015 10:23:08 +0100 David Howells dhowe...@redhat.com wrote: NeilBrown ne...@suse.de wrote: Is there a better way? Could a better way be created? Maybe SEEK_DATA_RELIABLE ?? fiemap() maybe

[PATCH 3/3] btrfs: set FS_SUPPORTS_SEEK_HOLE flag.

2015-04-19 Thread NeilBrown
This allows fscache to cachefiles in a btrfs filesystem. Signed-off-by: NeilBrown ne...@suse.de --- fs/btrfs/super.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 05fef198ff94..d3c5d2b40f8e 100644 --- a/fs/btrfs/super.c +++ b

[PATCH 0/3] Allow fscache to work on BTRFS

2015-04-19 Thread NeilBrown
. Thanks, NeilBrown --- NeilBrown (3): cachefiles: perform test on s_blocksize when opening cache file. fscache/cachefiles: optionally use SEEK_DATA instead of -bmap. btrfs: set FS_SUPPORTS_SEEK_HOLE flag. fs/btrfs/super.c |3 + fs/cachefiles/namei.c | 13 - fs

[PATCH 2/3] fscache/cachefiles: optionally use SEEK_DATA instead of -bmap.

2015-04-19 Thread NeilBrown
. Subsequent patch will set flag for btrfs. Other filesystems could usefully have FS_SUPPORTS_SEEK_HOLE set, but I'll leave that to the relevant maintainers to decide. Signed-off-by: NeilBrown ne...@suse.de --- fs/cachefiles/namei.c | 15 -- fs/cachefiles/rdwr.c | 119

[PATCH 1/3] cachefiles: perform test on s_blocksize when opening cache file.

2015-04-19 Thread NeilBrown
cachefiles requires that s_blocksize in the cache is not greater than PAGE_SIZE, and performs the check every time a block is accessed. Move the test to the place where the file is opened, where other file-validity tests are performed. Signed-off-by: NeilBrown ne...@suse.de --- fs/cachefiles

Re: [PATCH 2/3] fscache/cachefiles: optionally use SEEK_DATA instead of -bmap.

2015-04-20 Thread NeilBrown
as been accepted - let's not reject good while waiting for perfect. NeilBrown pgp5aCT52PYUG.pgp Description: OpenPGP digital signature

Re: [PATCH 3/3] btrfs: set FS_SUPPORTS_SEEK_HOLE flag.

2015-04-26 Thread NeilBrown
On Mon, 20 Apr 2015 02:48:55 -0700 Christoph Hellwig h...@infradead.org wrote: On Mon, Apr 20, 2015 at 10:46:49AM +0100, David Howells wrote: NeilBrown ne...@suse.de wrote: Missing patch 2 of the 3-patch series? Yes. :-) Do ext4 and xfs support this, do you know? Yes. As do

[PATCH v2] NILFS2: support NFSv2 export

2015-05-11 Thread NeilBrown
to it. Signed-off-by: NeilBrown ne...@suse.de diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c index 22180836ec22..37dd6b05b1b5 100644 --- a/fs/nilfs2/namei.c +++ b/fs/nilfs2/namei.c @@ -496,8 +496,7 @@ static struct dentry *nilfs_fh_to_dentry(struct super_block *sb, struct fid *fh, { struct

[PATCH/RFC] fscache/cachefiles versus btrfs

2015-04-09 Thread NeilBrown
] fscache_object_work_func+0x151/0x210 [fscache] [ 859.703578] [81078b07] process_one_work+0x147/0x3c0 [ 859.703642] [8107929c] worker_thread+0x20c/0x470 I haven't figured out the cause of that yet. Thanks, NeilBrown diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index

Re: [PATCH] block: add a bi_error field to struct bio

2015-07-21 Thread NeilBrown
mechanisms to clean all this up. Signed-off-by: Christoph Hellwig h...@lst.de --- Reviewed-by: NeilBrown ne...@suse.com (umem and md/raid). i.e. these files. drivers/block/umem.c| 4 +-- drivers/md/faulty.c | 4 +-- drivers/md/linear.c | 2

Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-09 Thread NeilBrown
return -EINVAL; is correct, but it looks wrong. It looks like it is saying that it is invalid to use the LAST backend! Secondly, you use "dup" as an abbreviation of "duplicate". The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". It would be nice if we

Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-13 Thread NeilBrown
plicate. Normal pronunciation rules for English indicate that "dup" should be pronounced with a short vowel sound, like "cup". So "dup" sounds wrong. To make a vowel long you can add an 'e' at the end of a word. So: tub or cub have a short "u" tube or

Re: [PATCH] [RFC] fix potential access after free: return value of blk_check_plugged() must be used schedule() safe

2016-04-05 Thread NeilBrown
_plugged() below is actually just a cast. Fair point. I generally prefer container_of to casts because it is more obviously correct, and type checked. However as blk_check_plugged performs the allocation, the blk_plug_cb must be at the start of the containing structure, so the complex tests f

Re: [PATCH] [RFC] fix potential access after free: return value of blk_check_plugged() must be used schedule() safe

2016-04-05 Thread NeilBrown
lug(). So I don't think you are missing anything, we were. Lars: have your concerns been relieved or do you still have reason to think there is a problem? Thanks, NeilBrown signature.asc Description: PGP signature

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread NeilBrown
ep further and deprecate GFP_ATOMIC in favour of some in_atomic() test? Maybe that is going too far. Thanks, NeilBrown > > Any feedback is highly appreciated. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to

Re: [PATCH v2 3/3] block: avoid to call .bi_end_io() recursively

2016-04-26 Thread NeilBrown
a to depend on it. If you move the "struct bio_list bl_in_stack" to the top of the function I would be a lot happier. Or you could change the code to: if (bl) { bio_list_add(bl, bio); } else { struct bio_list bl_in_stack; ... use bl_in_stack, while loop

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-30 Thread NeilBrown
dcache seems rather late to be performing that unlink though, so I've probably missed some key detail. If we find a way for evict(), when called from the shrinker, to be non-blocking, and generally require all shrinkers to be non-blocking, then many of these allocation problems evaporate. Thanks, NeilBrown signature.asc Description: PGP signature

Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api

2016-04-30 Thread NeilBrown
On Fri, Apr 29 2016, Steven Whitehouse wrote: > Hi, > > On 29/04/16 06:35, NeilBrown wrote: >> If we could similarly move evict() into kswapd (and I believe we can) >> then most file systems would do nothing in reclaim context that >> interferes with allocation con

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-30 Thread NeilBrown
On Sat, Apr 30 2016, Dave Chinner wrote: > On Fri, Apr 29, 2016 at 03:35:42PM +1000, NeilBrown wrote: >> On Tue, Apr 26 2016, Michal Hocko wrote: >> >> > Hi, >> > we have discussed this topic at LSF/MM this year. There was a general >> > interest in t

Re: [PATCH 0/2] scop GFP_NOFS api

2016-05-03 Thread NeilBrown
On Wed, May 04 2016, Michal Hocko wrote: > Hi, > > On Sun 01-05-16 07:55:31, NeilBrown wrote: > [...] >> One particular problem with your process-context idea is that it isn't >> inherited across threads. >> Steve Whitehouse's example in gfs shows how allocation

Re: [PATCH 0/2] scop GFP_NOFS api

2016-05-05 Thread NeilBrown
rn for all filesystems to follow. Rather the mm/vfs should get out of the filesystems' way as much as possible and let them innovate independently. Thanks for your time, NeilBrown signature.asc Description: PGP signature

Re: [PATCH 1/2] Btrfs: be more precise on errors when getting an inode from disk

2016-07-21 Thread NeilBrown
On Fri, Jul 22 2016, J. Bruce Fields wrote: > On Fri, Jul 22, 2016 at 11:08:17AM +1000, NeilBrown wrote: >> On Fri, Jun 10 2016, fdman...@kernel.org wrote: >> >> > From: Filipe Manana <fdman...@suse.com> >> > >> > When we attempt to rea

Re: [PATCH 1/2] Btrfs: be more precise on errors when getting an inode from disk

2016-07-21 Thread NeilBrown
c invalid ones. Bruce: do you have an opinion where we should make sure that PUTFH (and various other requests) returns a valid error code? Thanks, NeilBrown signature.asc Description: PGP signature

Re: [PATCH] exportfs: be careful to only return expected errors.

2016-08-04 Thread NeilBrown
On Thu, Aug 04 2016, Christoph Hellwig wrote: > On Thu, Aug 04, 2016 at 10:19:06AM +1000, NeilBrown wrote: >> >> >> When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors. >> In particular it can be tempting to return ENOENT, but this is not >> ha

[PATCH] exportfs: be careful to only return expected errors.

2016-08-03 Thread NeilBrown
. This is safest. Signed-off-by: NeilBrown <ne...@suse.com> --- I didn't add a dprintk for unexpected error messages, partly because dprintk isn't usable in exportfs. I could have used pr_debug() but I really didn't see much value. This has been tested together with the btrfs change, and it restores c

Re: [PATCH] exportfs: be careful to only return expected errors.

2016-10-06 Thread NeilBrown
On Thu, Aug 04 2016, NeilBrown wrote: > > > When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors. > In particular it can be tempting to return ENOENT, but this is not > handled well by nfsd. > > Rather than requiring strict adherence to error code code fil

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-04-04 Thread NeilBrown
s is written to stable storage before the change. If a file has not been changed for a while, you can add crash-count*large-number and clear lsb. The lsb of the i_version would be for internal use only. It would not be visible outside the filesystem. It feels a bit clunky, but I think it would work and is the best combination of Jan's idea and your requirement. The biggest cost would be switching to 'odd' before an changes, and the unknown is when does it make sense to switch to 'even'. NeilBrown signature.asc Description: PGP signature

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-04-04 Thread NeilBrown
remount. So it just > reboots without a remount-ro. This uncovered a bug in grub in Filesystems could use register_reboot_notifier() to get a notification that even systemd cannot stuff-up. It could check for dirty data and, if there is none (which there shouldn't be if a sync happened), it does a single write to disk to update the superblock (or a single write to each disk... or something). md does this, because getting the root device to be marked read-only is even harder than getting the root filesystem to be remounted read-only. NeilBrown signature.asc Description: PGP signature

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-04-05 Thread NeilBrown
On Wed, Apr 05 2017, Jan Kara wrote: > On Wed 05-04-17 11:43:32, NeilBrown wrote: >> On Tue, Apr 04 2017, J. Bruce Fields wrote: >> >> > On Thu, Mar 30, 2017 at 02:35:32PM -0400, Jeff Layton wrote: >> >> On Thu, 2017-03-30 at 12:12 -0400, J. Bruce Fields wrote

Re: [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag

2017-03-03 Thread NeilBrown
t;i_lock); > + ret = inode->i_state & I_VERS_BUMP; > + spin_unlock(>i_lock); > + return ret; > } > I know this code gets removed, so this isn't really important. By why do you take the spinlock here? What are you racing again? Thanks, NeilBrown signature.asc Description: PGP signature

Re: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t

2017-03-03 Thread NeilBrown
er has changed" state. But most, this would mean we wouldn't need inode_cmp_iversion() at all. We could just use "<" or ">=" or whatever. The number returned by inode_get_iversion() would always be even (or maybe odd) and wrapping (after the end of time) would "just work". Thanks, NeilBrown signature.asc Description: PGP signature

Re: [RFC PATCH v1 11/30] fs: new API for handling i_version

2017-03-03 Thread NeilBrown
rather than the _read version. Surely you need to know about any changes after this read. Thanks, NeilBrown signature.asc Description: PGP signature

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-05-11 Thread NeilBrown
u_to_be32(convert_to_wallclock(exp->cd->flush_time)); > *p++ = 0; > } else if (IS_I_VERSION(inode)) { > - p = xdr_encode_hyper(p, inode->i_version); > + p = xdr_encode_hyper(p, nfsd4_change_attribute(inode)); > } else { &

Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-09 Thread NeilBrown
ff Layton <jlay...@redhat.com> I like that this is a separate lib/*.c - nicely structured too. Reviewed-by: NeilBrown <ne...@suse.com> Thanks, NeilBrown > --- > include/linux/errseq.h | 19 + > lib/Makefile | 2 +- > lib/errseq.c | 199 > ++

Re: [PATCH 00/13] convert block layer to bioset_init()/mempool_init()

2018-05-20 Thread NeilBrown
to bioset_init()/mempool_init() > bcache: convert to bioset_init()/mempool_init() > md: convert to bioset_init()/mempool_init() Hi Kent, this conversion looks really good, thanks for Ccing me on it. However as Shaohua Li is now the maintainer of md, it probably should have gone to

Re: [PATCH v4 01/19] fs: new API for handling inode->i_version

2017-12-22 Thread NeilBrown
LINUX_IVERSION_H > +#define _LINUX_IVERSION_H > + > +#include > + > +/* > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > + * appear different to observers if there