Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
On 2017-08-03 06:02, Duncan wrote: > Liu Bo posted on Wed, 02 Aug 2017 14:27:21 -0600 as excerpted: > > It is correct reading this as: all data is written two times ? > > If as is being discussed the log is mirrored by default that'd be three > times... And for raid6 you need to do it 4 times... (!) > Parity-raid is slow and of course normally has the infamous write hole > this patch set is trying to close. Yes, closing the write hole is > possible, but for sure it's going to make the performance bite of parity- > raid even worse. =:^( This is the reason for looking for possible optimization from the beginning: a full stripe (only datacow) writing doesn't require logging at all. This could be a big optimization ( if you need to write a lot of data, only tail and head are NOT full stripe). However this require to know that the data is [no]cow when it is logged, and I think that it is not so simple: possible but not simple. > > Or are logged only the stripes involved by a RMW cycle (i.e. if a > stripe is fully written, the log is bypassed )? For data, only data in bios from high level will be logged, while for parity, the whole parity will be logged. Full stripe write still logs all data and parity, as full stripe write may not survive from unclean shutdown. >>> >>> Does this matter ? Due to the COW nature of BTRFS if a transaction is >>> interrupted (by an unclean shutdown) the transaction data are all lost. >>> Am I missing something ? >>> >>> What I want to understand, is if it is possible to log only the >>> "partial stripe" RMW cycle. >>> >>> >> I think your point is valid if all data is written with datacow. In >> case of nodatacow, btrfs does overwrite in place, so a full stripe write >> may pollute on-disk data after unclean shutdown. Checksum can detect >> errors but repair thru raid5 may not recover the correct data. > > But nodatacow doesn't have checksum... True, but Liu is correct stating that a write "nocow" is not protected by a transaction. The funny part, is that in case of raid5 we need to duplicate the data written for the nocow case, when for the cow case it would be possible to avoid it (in the full stripe case) ! -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
Liu Bo posted on Wed, 02 Aug 2017 14:27:21 -0600 as excerpted: >> >> It is correct reading this as: all data is written two times ? If as is being discussed the log is mirrored by default that'd be three times... Parity-raid is slow and of course normally has the infamous write hole this patch set is trying to close. Yes, closing the write hole is possible, but for sure it's going to make the performance bite of parity- raid even worse. =:^( >> >> Or are logged only the stripes involved by a RMW cycle (i.e. if a >> >> stripe is fully written, the log is bypassed )? >> > >> > For data, only data in bios from high level will be logged, while for >> > parity, the whole parity will be logged. >> > >> > Full stripe write still logs all data and parity, as full stripe >> > write may not survive from unclean shutdown. >> >> Does this matter ? Due to the COW nature of BTRFS if a transaction is >> interrupted (by an unclean shutdown) the transaction data are all lost. >> Am I missing something ? >> >> What I want to understand, is if it is possible to log only the >> "partial stripe" RMW cycle. >> >> > I think your point is valid if all data is written with datacow. In > case of nodatacow, btrfs does overwrite in place, so a full stripe write > may pollute on-disk data after unclean shutdown. Checksum can detect > errors but repair thru raid5 may not recover the correct data. But nodatacow doesn't have checksum... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive loss of disk space
Goffredo Baroncelli posted on Wed, 02 Aug 2017 19:52:30 +0200 as excerpted: > it seems that BTRFS always allocate the maximum space required, without > consider the one already allocated. Is it too conservative ? I think no: > consider the following scenario: > > a) create a 2GB file > b) fallocate -o 1GB -l 2GB > c) write from 1GB to 3GB > > after b), the expectation is that c) always succeed [1]: i.e. there is > enough space on the filesystem. Due to the COW nature of BTRFS, you > cannot rely on the already allocated space because there could be a > small time window where both the old and the new data exists on the > disk. Not only a small time, perhaps (effectively) permanently, due to either of two factors: 1) If the existing extents are reflinked by snapshots or other files they obviously won't be released at all when the overwrite is completed. fallocate must account for this possibility, and behaving differently in the context of other reflinks would be confusing, so the best policy is consistently behave as if the existing data will not be freed. 2) As the devs have commented a number of times, an extent isn't freed if there's still a reflink to part of it. If the original extent was a full 1 GiB data chunk (the chunk being the max size of a native btrfs extent, one of the reasons a balance and defrag after conversion from ext4 and deletion of the ext4-saved subvolume is recommended, to break up the longer ext4 extents so they won't cause btrfs problems later) and all but a single 4 KiB block has been rewritten, the full 1 GiB extent will remain referenced and continue to take that original full 1 GiB space, *plus* the space of all the new-version extents of the overwritten data, of course. So in our fallocate and overwrite scenario, we again must reserve space for two copies of the data, the original which may well not be freed even without other reflinks, if a single 4 KiB block of an extent remains unoverwritten, and the new version of the data. At least that /was/ the behavior explained on-list previous to the hole- punching changes. I'm not a dev and haven't seen a dev comment on whether that remains the behavior after hole-punching, which may at least naively be expected to automatically handle and free overwritten data using hole-punching, or not. I'd be interested in seeing someone who can read the code confirm one way or the other whether hole-punching changed that previous behavior, or not. > My opinion is that in general this behavior is correct due to the COW > nature of BTRFS. > The only exception that I can find, is about the "nocow" file. For these > cases taking in accout the already allocated space would be better. I'd say it's dangerously optimistic even then, considering that "nocow" is actually "cow1" in the presence of snapshots. Meanwhile, it's worth keeping in mind that it's exactly these sorts of corner-cases that are why btrfs is taking so long to stabilize. Supposedly "simple" expectations aren't always so simple, and if a filesystem gets it wrong, it's somebody's data hanging in the balance! (Tho if they've any wisdom at all, they'll ensure they're aware of the stability status of a filesystem before they put data on it, and will adjust their backup policies accordingly if they're using a still not fully stabilized filesystem such as btrfs, so the data won't actually be in any danger anyway unless it was literally throw-away value, only whatever specific instance of it was involved in that corner-case.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Wed, Aug 2, 2017 at 2:38 AM, Brendan Hidewrote: > The title seems alarmist to me - and I suspect it is going to be > misconstrued. :-/ Josef pushed bak on the HN thread with very sound reasoning about why this is totally unsurprising. RHEL runs old kernels, and they have no upstream Btrfs developers. So it's a huge PITA to backport the tons of changes Btrfs has been going through (thousands of line changes per kernel cycle). What's more interesting to me is whether this means - CONFIG_BTRFS_FS=m + # CONFIG_BTRFS_FS is not set In particular in elrepo.org kernels. Also more interesting is this Stratis project that started up a few months ago: https://github.com/stratis-storage/stratisd Which also includes this design document: https://stratis-storage.github.io/StratisSoftwareDesign.pdf Basically they're creating a file system manager manifesting as a daemon, new CLI tools, and new metadata formats for the volume manager. So it's going to use existing device mapper, md, some LVM stuff, XFS, in a layered approach abstracted from the user. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Thu, Aug 3, 2017 at 1:44 AM, Chris Masonwrote: > > On 08/02/2017 04:38 AM, Brendan Hide wrote: >> >> The title seems alarmist to me - and I suspect it is going to be >> misconstrued. :-/ > > > Supporting any filesystem is a huge amount of work. I don't have a problem > with Redhat or any distro picking and choosing the projects they want to > support. > It'd help a lot of people if things like https://btrfs.wiki.kernel.org/index.php/Status is kept up-to-date and 'promoted', so at least users are more informed about what they're getting into and can choose which features (stable/still in dev/likely to destroy your data) that they want to use. For example, https://btrfs.wiki.kernel.org/index.php/Status says compression is 'mostly OK' ('auto-repair and compression may crash' looks pretty scary, as from newcomers-perspective it might be interpretted as 'potential data loss'), while https://en.opensuse.org/SDB:BTRFS#Compressed_btrfs_filesystems says they support compression on newer opensuse versions. > > At least inside of FB, our own internal btrfs usage is continuing to grow. > Btrfs is becoming a big part of how we ship containers and other workloads > where snapshots improve performance. > Ubuntu also support btrfs as part their container implementation (lxd), and (reading lxd mailing list) some people use lxd+btrfs on their production environment. IIRC the last problem posted on lxd list about btrfs was about how 'btrfs send/receive (used by lxd copy) is slower than rsync for full/initial copy'. -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: search parity device wisely
After mapping block with BTRFS_MAP_WRITE, parities have been sorted to the end position, so this search can start from the first parity stripe. Signed-off-by: Liu Bo--- fs/btrfs/raid56.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index d8ea0eb..0c5ed68 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2225,12 +2225,13 @@ raid56_parity_alloc_scrub_rbio(struct btrfs_fs_info *fs_info, struct bio *bio, ASSERT(!bio->bi_iter.bi_size); rbio->operation = BTRFS_RBIO_PARITY_SCRUB; - for (i = 0; i < rbio->real_stripes; i++) { + for (i = rbio->data_stripes; i < rbio->real_stripes; i++) { if (bbio->stripes[i].dev == scrub_dev) { rbio->scrubp = i; break; } } + ASSERT(i < rbio->real_stripes); /* Now we just support the sectorsize equals to page size */ ASSERT(fs_info->sectorsize == PAGE_SIZE); -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
On Wed, Aug 02, 2017 at 10:41:30PM +0200, Goffredo Baroncelli wrote: > Hi Liu, > > thanks for your reply, below my comments > On 2017-08-02 19:57, Liu Bo wrote: > > On Wed, Aug 02, 2017 at 12:14:27AM +0200, Goffredo Baroncelli wrote: > >> On 2017-08-01 19:24, Liu Bo wrote: > >>> On Tue, Aug 01, 2017 at 07:42:14PM +0200, Goffredo Baroncelli wrote: > Hi Liu, > > On 2017-08-01 18:14, Liu Bo wrote: > > This aims to fix write hole issue on btrfs raid5/6 setup by adding a > > separate disk as a journal (aka raid5/6 log), so that after unclean > > shutdown we can make sure data and parity are consistent on the raid > > array by replaying the journal. > > > > it would be possible to have more information ? > - what is logged ? data, parity or data + parity ? > >>> > >>> Patch 5 has more details(sorry for not making it clear that in the > >>> cover letter). > >>> > >>> So both data and parity are logged so that while replaying the journal > >>> everything is written to whichever disk it should be written to. > >> > >> It is correct reading this as: all data is written two times ? Or are > >> logged only the stripes involved by a RMW cycle (i.e. if a stripe is fully > >> written, the log is bypassed )? > > > > For data, only data in bios from high level will be logged, while for > > parity, the whole parity will be logged. > > > > Full stripe write still logs all data and parity, as full stripe write > > may not survive from unclean shutdown. > > Does this matter ? Due to the COW nature of BTRFS if a transaction is > interrupted (by an unclean shutdown) the transaction data are all lost. Am I > missing something ? > > What I want to understand, is if it is possible to log only the "partial > stripe" RMW cycle. > I think your point is valid if all data is written with datacow. In case of nodatacow, btrfs does overwrite in place, so a full stripe write may pollute on-disk data after unclean shutdown. Checksum can detect errors but repair thru raid5 may not recover the correct data. > > > > Taking a raid5 setup with 3 disks as an example, doing an overwrite > > of 4k will log 4K(data) + 64K(parity). > > > >>> > - in the past I thought that it would be sufficient to log only the > stripe position involved by a RMW cycle, and then start a scrub on these > stripes in case of an unclean shutdown: do you think that it is feasible > ? > >>> > >>> An unclean shutdown causes inconsistence between data and parity, so > >>> scrub won't help as it's not able to tell which one (data or parity) > >>> is valid > >> Scrub compares data against its checksum; so it knows if the data is > >> correct. If no disk is lost, a scrub process is sufficient/needed to > >> rebuild the parity/data. > >> > > > > If no disk is lost, it depends on whether the number of errors caused > > by an unclean shutdown can be tolerated by the raid setup. > > see below > > > >> The problem born when after "an unclean shutdown" a disk failure happens. > >> But these are *two* distinct failures. These together break the BTRFS > >> raid5 redundancy. But if you run a scrub process between these two > >> failures, the btrfs raid5 redundancy is still effective. > >> > > > > I wouldn't say that the redundancy is still effective after a scrub > > process, but rather those data which match their checksum can still be > > read out while the mismatched data are lost forever after unclean > > shutdown. > > > I think that this is the point where we are in disagreement: until now I > understood that in BTRFS > a) a transaction is fully completed or fully not-completed. > b) a transaction is completed after both the data *and* the parity are > written. > > With these assumption, due to the COW nature of BTRFS an unclean shutdown > might invalidate only data of the current transaction. Of course the unclean > shutdown prevent the transaction to be completed, and this means that all the > data of this transaction is lost in any case. > > For the parity this is different, because it is possible a misalignment > between the parity and the data (which might be of different transactions). > > Let me to explain with the help of your example: > > > Taking a raid5 setup with 3 disks as an example, doing an overwrite > > of 4k will log 4K(data) + 64K(parity). > > If the transaction is aborted, 128k-4k = 124k are untouched, and these still > be valid. The last 4k might be wrong, but in any case this data is not > referenced because the transaction was never completed. > The parity need to be rebuild because we are not able to know if the > transaction was aborted before/after the data and/or parity writing > True, 4k data is not referenced, but again after rebuilding the parity, the rest 124K and the 4k which has random data are not consistent with the rebuilt parity. The point is to keep parity and data consistent at any point of time so that raid5 tolerance is
Re: Massive loss of disk space
On 2017-08-02 21:10, Austin S. Hemmelgarn wrote: > On 2017-08-02 13:52, Goffredo Baroncelli wrote: >> Hi, >> [...] >> consider the following scenario: >> >> a) create a 2GB file >> b) fallocate -o 1GB -l 2GB >> c) write from 1GB to 3GB >> >> after b), the expectation is that c) always succeed [1]: i.e. there is >> enough space on the filesystem. Due to the COW nature of BTRFS, you cannot >> rely on the already allocated space because there could be a small time >> window where both the old and the new data exists on the disk. > There is also an expectation based on pretty much every other FS in existence > that calling fallocate() on a range that is already in use is a (possibly > expensive) no-op, and by extension using fallocate() with an offset of 0 like > a ftruncate() call will succeed as long as the new size will fit. The man page of fallocate doesn't guarantee that. Unfortunately in a COW filesystem the assumption that an allocate area may be simply overwritten is not true. Let me to say it with others words: as general rule if you want to _write_ something in a cow filesystem, you need space. Doesn't matter if you are *over-writing* existing data or you are *appending* to a file. > > I've checked JFS, XFS, ext4, vfat, NTFS (via NTFS-3G, not the kernel driver), > NILFS2, OCFS2 (local mode only), F2FS, UFS, and HFS+ on Linux, UFS and HFS+ > on OS X, UFS and ZFS on FreeBSD, FFS (UFS with a different name) and LFS (log > structured) on NetBSD, and UFS and ZFS on Solaris, and VxFS on HP-UX, and > _all_ of them behave correctly here and succeed with the test I listed, while > BTRFS does not. This isn't codified in POSIX, but it's also not something > that is listed as implementation defined, which in turn means that we should > be trying to match the other implementations. [...] > >> >> My opinion is that in general this behavior is correct due to the COW nature >> of BTRFS. >> The only exception that I can find, is about the "nocow" file. For these >> cases taking in accout the already allocated space would be better. > There are other, saner ways to make that expectation hold though, and I'm not > even certain that it does as things are implemented (I believe we still CoW > unwritten extents when data is written to them, because I _have_ had writes > to fallocate'ed files fail on BTRFS before with -ENOSPC). > > The ideal situation IMO is as follows: > > 1. This particular case (using fallocate() with an offset of 0 to extend a > file that is already larger than half the remaining free space on the FS) > _should_ succeed. This description is not accurate. What happened is the following: 1) you have a file *with valid data* 2) you want to prepare an update of this file and want to be sure to have enough space at this point fallocate have to guarantee: a) you have your old data still available b) you have allocated the space for the update In terms of a COW filesystem, you need the space of a) + the space of b) > Short of very convoluted configurations, extending a file with fallocate will > not result in over-committing space on a CoW filesystem unless it would > extend the file by more than the remaining free space, and therefore barring > long external interactions, subsequent writes will also succeed. Proof of > this for a general case is somewhat complicated, but in the very specific > case of the script I posted as a reproducer in the other thread about this > and the test case I gave in this thread, it's trivial to prove that the > writes will succeed. Either way, the behavior of SnapRAID, while not optimal > in this case, is still a legitimate usage (I've seen programs do things like > that just to make sure the file isn't sparse). > > 2. Conversion of unwritten extents to written ones should not require new > allocation. Ideally, we need to be allocating not just space for the data, > but also reasonable space for the associated metadata when allocating an > unwritten extent, and there should be no CoW involved when they are written > to except for the small metadata updates required to account the new blocks. > Unless we're doing this, then we have edge cases where the the above listed > expectation does not hold (also note that GlobalReserve does not count IMO, > it's supposed to be for temporary usage only and doesn't ever appear to be > particularly large). > > 3. There should be some small amount of space reserved globally for not just > metadata, but data too, so that a 'full' filesystem can still update existing > files reliably. I'm not sure that we're not doing this already, but AIUI, > GlobalReserve is metadata only. If we do this, we don't have to worry _as > much_ about avoiding CoW when converting unwritten extents to regular ones. >> >> Comments are welcome. >> >> BR >> G.Baroncelli >> >> [1] from man 2 fallocate >> [...] >> After a successful call, subsequent writes into the range >> specified by
Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
Hi Liu, thanks for your reply, below my comments On 2017-08-02 19:57, Liu Bo wrote: > On Wed, Aug 02, 2017 at 12:14:27AM +0200, Goffredo Baroncelli wrote: >> On 2017-08-01 19:24, Liu Bo wrote: >>> On Tue, Aug 01, 2017 at 07:42:14PM +0200, Goffredo Baroncelli wrote: Hi Liu, On 2017-08-01 18:14, Liu Bo wrote: > This aims to fix write hole issue on btrfs raid5/6 setup by adding a > separate disk as a journal (aka raid5/6 log), so that after unclean > shutdown we can make sure data and parity are consistent on the raid > array by replaying the journal. > it would be possible to have more information ? - what is logged ? data, parity or data + parity ? >>> >>> Patch 5 has more details(sorry for not making it clear that in the >>> cover letter). >>> >>> So both data and parity are logged so that while replaying the journal >>> everything is written to whichever disk it should be written to. >> >> It is correct reading this as: all data is written two times ? Or are logged >> only the stripes involved by a RMW cycle (i.e. if a stripe is fully written, >> the log is bypassed )? > > For data, only data in bios from high level will be logged, while for > parity, the whole parity will be logged. > > Full stripe write still logs all data and parity, as full stripe write > may not survive from unclean shutdown. Does this matter ? Due to the COW nature of BTRFS if a transaction is interrupted (by an unclean shutdown) the transaction data are all lost. Am I missing something ? What I want to understand, is if it is possible to log only the "partial stripe" RMW cycle. > > Taking a raid5 setup with 3 disks as an example, doing an overwrite > of 4k will log 4K(data) + 64K(parity). > >>> - in the past I thought that it would be sufficient to log only the stripe position involved by a RMW cycle, and then start a scrub on these stripes in case of an unclean shutdown: do you think that it is feasible ? >>> >>> An unclean shutdown causes inconsistence between data and parity, so >>> scrub won't help as it's not able to tell which one (data or parity) >>> is valid >> Scrub compares data against its checksum; so it knows if the data is >> correct. If no disk is lost, a scrub process is sufficient/needed to rebuild >> the parity/data. >> > > If no disk is lost, it depends on whether the number of errors caused > by an unclean shutdown can be tolerated by the raid setup. see below > >> The problem born when after "an unclean shutdown" a disk failure happens. >> But these are *two* distinct failures. These together break the BTRFS raid5 >> redundancy. But if you run a scrub process between these two failures, the >> btrfs raid5 redundancy is still effective. >> > > I wouldn't say that the redundancy is still effective after a scrub > process, but rather those data which match their checksum can still be > read out while the mismatched data are lost forever after unclean > shutdown. I think that this is the point where we are in disagreement: until now I understood that in BTRFS a) a transaction is fully completed or fully not-completed. b) a transaction is completed after both the data *and* the parity are written. With these assumption, due to the COW nature of BTRFS an unclean shutdown might invalidate only data of the current transaction. Of course the unclean shutdown prevent the transaction to be completed, and this means that all the data of this transaction is lost in any case. For the parity this is different, because it is possible a misalignment between the parity and the data (which might be of different transactions). Let me to explain with the help of your example: > Taking a raid5 setup with 3 disks as an example, doing an overwrite > of 4k will log 4K(data) + 64K(parity). If the transaction is aborted, 128k-4k = 124k are untouched, and these still be valid. The last 4k might be wrong, but in any case this data is not referenced because the transaction was never completed. The parity need to be rebuild because we are not able to know if the transaction was aborted before/after the data and/or parity writing > > Thanks, > > -liubo >> >>> >>> With nodatacow, we do overwrite, so RMW during unclean shutdown is not safe. >>> With datacow, we don't do overwrite, but the following situation may happen, >>> say we have a raid5 setup with 3 disks, the stripe length is 64k, so >>> >>> 1) write 64K --> now the raid layout is >>> [64K data + 64K random + 64K parity] >>> 2) write another 64K --> now the raid layout after RMW is >>> [64K 1)'s data + 64K 2)'s data + 64K new parity] >>> >>> If unclean shutdown occurs before 2) finishes, then parity may be >>> corrupted and then 1)'s data may be recovered wrongly if the disk >>> which holds 1)'s data is offline. >>> - does this journal disk also host other btrfs log ? >>> >>> No, purely data/parity and some associated metadata. >>> >>> Thanks, >>> >>> -liubo >>>
[PATCH] btrfs: pass fs_info to routines that always take tree_root
From: Jeff Mahoneybtrfs_find_root and btrfs_del_root always use the tree_root. Let's pass fs_info instead. Signed-off-by: Jeff Mahoney --- fs/btrfs/ctree.h | 7 --- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent-tree.c | 4 ++-- fs/btrfs/free-space-tree.c | 2 +- fs/btrfs/qgroup.c | 3 +-- fs/btrfs/root-tree.c | 15 +-- 6 files changed, 18 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3f3eb7b17cac..eed7cc991a80 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2973,8 +2973,8 @@ int btrfs_del_root_ref(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 root_id, u64 ref_id, u64 dirid, u64 *sequence, const char *name, int name_len); -int btrfs_del_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, - const struct btrfs_key *key); +int btrfs_del_root(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, const struct btrfs_key *key); int btrfs_insert_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, const struct btrfs_key *key, struct btrfs_root_item *item); @@ -2982,7 +2982,8 @@ int __must_check btrfs_update_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_key *key, struct btrfs_root_item *item); -int btrfs_find_root(struct btrfs_root *root, const struct btrfs_key *search_key, +int btrfs_find_root(struct btrfs_fs_info *fs_info, + const struct btrfs_key *search_key, struct btrfs_path *path, struct btrfs_root_item *root_item, struct btrfs_key *root_key); int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 080e2ebb8aa0..ea1959937875 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1581,7 +1581,7 @@ static struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, __setup_root(root, fs_info, key->objectid); - ret = btrfs_find_root(tree_root, key, path, + ret = btrfs_find_root(fs_info, key, path, >root_item, >root_key); if (ret) { if (ret > 0) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 82d53a7b6652..12fa33accdcc 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9192,14 +9192,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, if (err) goto out_end_trans; - ret = btrfs_del_root(trans, tree_root, >root_key); + ret = btrfs_del_root(trans, fs_info, >root_key); if (ret) { btrfs_abort_transaction(trans, ret); goto out_end_trans; } if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) { - ret = btrfs_find_root(tree_root, >root_key, path, + ret = btrfs_find_root(fs_info, >root_key, path, NULL, NULL); if (ret < 0) { btrfs_abort_transaction(trans, ret); diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c index a5e34de06c2f..684f12247db7 100644 --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -1257,7 +1257,7 @@ int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info) if (ret) goto abort; - ret = btrfs_del_root(trans, tree_root, _space_root->root_key); + ret = btrfs_del_root(trans, fs_info, _space_root->root_key); if (ret) goto abort; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 4ce351efe281..ba60523a443c 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -946,7 +946,6 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { - struct btrfs_root *tree_root = fs_info->tree_root; struct btrfs_root *quota_root; int ret = 0; @@ -968,7 +967,7 @@ int btrfs_quota_disable(struct btrfs_trans_handle *trans, if (ret) goto out; - ret = btrfs_del_root(trans, tree_root, _root->root_key); + ret = btrfs_del_root(trans, fs_info, _root->root_key); if (ret) goto out; diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 460db0cb2d07..31c0e7265f44 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -62,7 +62,7 @@ static void btrfs_read_root_item(struct extent_buffer *eb, int slot, /* * btrfs_find_root - lookup the root by the key. - * root: the root of the root tree + * fs_info: the fs_info for the file system to search * search_key: the key to
Re: [PATCH 01/14] Btrfs: raid56: add raid56 log via add_dev v2 ioctl
On 1.08.2017 19:14, Liu Bo wrote: > This introduces add_dev_v2 ioctl to add a device as raid56 journal > device. With the help of a journal device, raid56 is able to to get > rid of potential write holes. > > Signed-off-by: Liu Bo> --- > fs/btrfs/ctree.h| 6 ++ > fs/btrfs/ioctl.c| 48 > - > fs/btrfs/raid56.c | 42 > fs/btrfs/raid56.h | 1 + > fs/btrfs/volumes.c | 26 -- > fs/btrfs/volumes.h | 3 ++- > include/uapi/linux/btrfs.h | 3 +++ > include/uapi/linux/btrfs_tree.h | 4 > 8 files changed, 125 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 643c70d..d967627 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -697,6 +697,7 @@ struct btrfs_stripe_hash_table { > void btrfs_init_async_reclaim_work(struct work_struct *work); > > /* fs_info */ > +struct btrfs_r5l_log; > struct reloc_control; > struct btrfs_device; > struct btrfs_fs_devices; > @@ -1114,6 +1115,9 @@ struct btrfs_fs_info { > u32 nodesize; > u32 sectorsize; > u32 stripesize; > + > + /* raid56 log */ > + struct btrfs_r5l_log *r5log; > }; > > static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) > @@ -2932,6 +2936,8 @@ static inline int btrfs_need_cleaner_sleep(struct > btrfs_fs_info *fs_info) > > static inline void free_fs_info(struct btrfs_fs_info *fs_info) > { > + if (fs_info->r5log) > + kfree(fs_info->r5log); > kfree(fs_info->balance_ctl); > kfree(fs_info->delayed_root); > kfree(fs_info->extent_root); > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index e176375..3d1ef4d 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -2653,6 +2653,50 @@ static int btrfs_ioctl_defrag(struct file *file, void > __user *argp) > return ret; > } > > +/* identical to btrfs_ioctl_add_dev, but this is with flags */ > +static long btrfs_ioctl_add_dev_v2(struct btrfs_fs_info *fs_info, void > __user *arg) > +{ > + struct btrfs_ioctl_vol_args_v2 *vol_args; > + int ret; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + if (test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags)) > + return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS; > + > + mutex_lock(_info->volume_mutex); > + vol_args = memdup_user(arg, sizeof(*vol_args)); > + if (IS_ERR(vol_args)) { > + ret = PTR_ERR(vol_args); > + goto out; > + } > + > + if (vol_args->flags & BTRFS_DEVICE_RAID56_LOG && > + fs_info->r5log) { > + ret = -EEXIST; > + btrfs_info(fs_info, "r5log: attempting to add another log > device!"); > + goto out_free; > + } > + > + vol_args->name[BTRFS_PATH_NAME_MAX] = '\0'; > + ret = btrfs_init_new_device(fs_info, vol_args->name, vol_args->flags); > + if (!ret) { > + if (vol_args->flags & BTRFS_DEVICE_RAID56_LOG) { > + ASSERT(fs_info->r5log); > + btrfs_info(fs_info, "disk added %s as raid56 log", > vol_args->name); > + } else { > + btrfs_info(fs_info, "disk added %s", vol_args->name); > + } > + } > +out_free: > + kfree(vol_args); > +out: > + mutex_unlock(_info->volume_mutex); > + clear_bit(BTRFS_FS_EXCL_OP, _info->flags); > + return ret; > +} > + > static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user > *arg) > { > struct btrfs_ioctl_vol_args *vol_args; > @@ -2672,7 +2716,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info > *fs_info, void __user *arg) > } > > vol_args->name[BTRFS_PATH_NAME_MAX] = '\0'; > - ret = btrfs_init_new_device(fs_info, vol_args->name); > + ret = btrfs_init_new_device(fs_info, vol_args->name, 0); > > if (!ret) > btrfs_info(fs_info, "disk added %s", vol_args->name); > @@ -5539,6 +5583,8 @@ long btrfs_ioctl(struct file *file, unsigned int > return btrfs_ioctl_resize(file, argp); > case BTRFS_IOC_ADD_DEV: > return btrfs_ioctl_add_dev(fs_info, argp); > + case BTRFS_IOC_ADD_DEV_V2: > + return btrfs_ioctl_add_dev_v2(fs_info, argp); > case BTRFS_IOC_RM_DEV: > return btrfs_ioctl_rm_dev(file, argp); > case BTRFS_IOC_RM_DEV_V2: > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c > index d8ea0eb..2b91b95 100644 > --- a/fs/btrfs/raid56.c > +++ b/fs/btrfs/raid56.c > @@ -177,6 +177,25 @@ struct btrfs_raid_bio { > unsigned long *dbitmap; > }; > > +/* raid56 log */ > +struct btrfs_r5l_log { > + /* protect this struct and log io */ > + struct mutex io_mutex; > + > + /* r5log device */ > + struct btrfs_device *dev; > + > + /* allocation range for log
Re: Massive loss of disk space
On 2017-08-02 13:52, Goffredo Baroncelli wrote: Hi, On 2017-08-01 17:00, Austin S. Hemmelgarn wrote: OK, I just did a dead simple test by hand, and it looks like I was right. The method I used to check this is as follows: 1. Create and mount a reasonably small filesystem (I used an 8G temporary LV for this, a file would work too though). 2. Using dd or a similar tool, create a test file that takes up half of the size of the filesystem. It is important that this _not_ be fallocated, but just written out. 3. Use `fallocate -l` to try and extend the size of the file beyond half the size of the filesystem. For BTRFS, this will result in -ENOSPC, while for ext4 and XFS, it will succeed with no error. Based on this and some low-level inspection, it looks like BTRFS treats the full range of the fallocate call as unallocated, and thus is trying to allocate space for regions of that range that are already allocated. I can confirm this behavior; below some step to reproduce it [2]; however I don't think that it is a bug, but this is the correct behavior for a COW filesystem (see below). Looking at the function btrfs_fallocate() (file fs/btrfs/file.c) static long btrfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { [...] alloc_start = round_down(offset, blocksize); alloc_end = round_up(offset + len, blocksize); [...] /* * Only trigger disk allocation, don't trigger qgroup reserve * * For qgroup space, it will be checked later. */ ret = btrfs_alloc_data_chunk_ondemand(BTRFS_I(inode), alloc_end - alloc_start) it seems that BTRFS always allocate the maximum space required, without consider the one already allocated. Is it too conservative ? I think no: consider the following scenario: a) create a 2GB file b) fallocate -o 1GB -l 2GB c) write from 1GB to 3GB after b), the expectation is that c) always succeed [1]: i.e. there is enough space on the filesystem. Due to the COW nature of BTRFS, you cannot rely on the already allocated space because there could be a small time window where both the old and the new data exists on the disk. There is also an expectation based on pretty much every other FS in existence that calling fallocate() on a range that is already in use is a (possibly expensive) no-op, and by extension using fallocate() with an offset of 0 like a ftruncate() call will succeed as long as the new size will fit. I've checked JFS, XFS, ext4, vfat, NTFS (via NTFS-3G, not the kernel driver), NILFS2, OCFS2 (local mode only), F2FS, UFS, and HFS+ on Linux, UFS and HFS+ on OS X, UFS and ZFS on FreeBSD, FFS (UFS with a different name) and LFS (log structured) on NetBSD, and UFS and ZFS on Solaris, and VxFS on HP-UX, and _all_ of them behave correctly here and succeed with the test I listed, while BTRFS does not. This isn't codified in POSIX, but it's also not something that is listed as implementation defined, which in turn means that we should be trying to match the other implementations. My opinion is that in general this behavior is correct due to the COW nature of BTRFS. The only exception that I can find, is about the "nocow" file. For these cases taking in accout the already allocated space would be better. There are other, saner ways to make that expectation hold though, and I'm not even certain that it does as things are implemented (I believe we still CoW unwritten extents when data is written to them, because I _have_ had writes to fallocate'ed files fail on BTRFS before with -ENOSPC). The ideal situation IMO is as follows: 1. This particular case (using fallocate() with an offset of 0 to extend a file that is already larger than half the remaining free space on the FS) _should_ succeed. Short of very convoluted configurations, extending a file with fallocate will not result in over-committing space on a CoW filesystem unless it would extend the file by more than the remaining free space, and therefore barring long external interactions, subsequent writes will also succeed. Proof of this for a general case is somewhat complicated, but in the very specific case of the script I posted as a reproducer in the other thread about this and the test case I gave in this thread, it's trivial to prove that the writes will succeed. Either way, the behavior of SnapRAID, while not optimal in this case, is still a legitimate usage (I've seen programs do things like that just to make sure the file isn't sparse). 2. Conversion of unwritten extents to written ones should not require new allocation. Ideally, we need to be allocating not just space for the data, but also reasonable space for the associated metadata when allocating an unwritten extent, and there should be no CoW involved when they are written to except for the small metadata updates required to account the new blocks. Unless we're
Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
On Wed, Aug 02, 2017 at 12:14:27AM +0200, Goffredo Baroncelli wrote: > On 2017-08-01 19:24, Liu Bo wrote: > > On Tue, Aug 01, 2017 at 07:42:14PM +0200, Goffredo Baroncelli wrote: > >> Hi Liu, > >> > >> On 2017-08-01 18:14, Liu Bo wrote: > >>> This aims to fix write hole issue on btrfs raid5/6 setup by adding a > >>> separate disk as a journal (aka raid5/6 log), so that after unclean > >>> shutdown we can make sure data and parity are consistent on the raid > >>> array by replaying the journal. > >>> > >> > >> it would be possible to have more information ? > >> - what is logged ? data, parity or data + parity ? > > > > Patch 5 has more details(sorry for not making it clear that in the > > cover letter). > > > > So both data and parity are logged so that while replaying the journal > > everything is written to whichever disk it should be written to. > > It is correct reading this as: all data is written two times ? Or are logged > only the stripes involved by a RMW cycle (i.e. if a stripe is fully written, > the log is bypassed )? For data, only data in bios from high level will be logged, while for parity, the whole parity will be logged. Full stripe write still logs all data and parity, as full stripe write may not survive from unclean shutdown. Taking a raid5 setup with 3 disks as an example, doing an overwrite of 4k will log 4K(data) + 64K(parity). > > > >> - in the past I thought that it would be sufficient to log only the stripe > >> position involved by a RMW cycle, and then start a scrub on these stripes > >> in case of an unclean shutdown: do you think that it is feasible ? > > > > An unclean shutdown causes inconsistence between data and parity, so > > scrub won't help as it's not able to tell which one (data or parity) > > is valid > Scrub compares data against its checksum; so it knows if the data is correct. > If no disk is lost, a scrub process is sufficient/needed to rebuild the > parity/data. > If no disk is lost, it depends on whether the number of errors caused by an unclean shutdown can be tolerated by the raid setup. > The problem born when after "an unclean shutdown" a disk failure happens. But > these are *two* distinct failures. These together break the BTRFS raid5 > redundancy. But if you run a scrub process between these two failures, the > btrfs raid5 redundancy is still effective. > I wouldn't say that the redundancy is still effective after a scrub process, but rather those data which match their checksum can still be read out while the mismatched data are lost forever after unclean shutdown. Thanks, -liubo > > > > > With nodatacow, we do overwrite, so RMW during unclean shutdown is not safe. > > With datacow, we don't do overwrite, but the following situation may happen, > > say we have a raid5 setup with 3 disks, the stripe length is 64k, so > > > > 1) write 64K --> now the raid layout is > > [64K data + 64K random + 64K parity] > > 2) write another 64K --> now the raid layout after RMW is > > [64K 1)'s data + 64K 2)'s data + 64K new parity] > > > > If unclean shutdown occurs before 2) finishes, then parity may be > > corrupted and then 1)'s data may be recovered wrongly if the disk > > which holds 1)'s data is offline. > > > >> - does this journal disk also host other btrfs log ? > >> > > > > No, purely data/parity and some associated metadata. > > > > Thanks, > > > > -liubo > > > >>> The idea and the code are similar to the write-through mode of md > >>> raid5-cache, so ppl(partial parity log) is also feasible to implement. > >>> (If you've been familiar with md, you may find this patch set is > >>> boring to read...) > >>> > >>> Patch 1-3 are about adding a log disk, patch 5-8 are the main part of > >>> the implementation, the rest patches are improvements and bugfixes, > >>> eg. readahead for recovery, checksum. > >>> > >>> Two btrfs-progs patches are required to play with this patch set, one > >>> is to enhance 'btrfs device add' to add a disk as raid5/6 log with the > >>> option '-L', the other is to teach 'btrfs-show-super' to show > >>> %journal_tail. > >>> > >>> This is currently based on 4.12-rc3. > >>> > >>> The patch set is tagged with RFC, and comments are always welcome, > >>> thanks. > >>> > >>> Known limitations: > >>> - Deleting a log device is not implemented yet. > >>> > >>> > >>> Liu Bo (14): > >>> Btrfs: raid56: add raid56 log via add_dev v2 ioctl > >>> Btrfs: raid56: do not allocate chunk on raid56 log > >>> Btrfs: raid56: detect raid56 log on mount > >>> Btrfs: raid56: add verbose debug > >>> Btrfs: raid56: add stripe log for raid5/6 > >>> Btrfs: raid56: add reclaim support > >>> Btrfs: raid56: load r5log > >>> Btrfs: raid56: log recovery > >>> Btrfs: raid56: add readahead for recovery > >>> Btrfs: raid56: use the readahead helper to get page > >>> Btrfs: raid56: add csum support > >>> Btrfs: raid56: fix error handling while adding a log device > >>> Btrfs: raid56: initialize raid5/6 log after
[PATCH 2/3] fixed android.mk
From: Filip BystrickySigned-off-by: Filip Bystricky Reviewed-by: Mark Salyzyn --- Android.mk | 53 + 1 file changed, 21 insertions(+), 32 deletions(-) diff --git a/Android.mk b/Android.mk index 52fe9ab4..9516c2d1 100644 --- a/Android.mk +++ b/Android.mk @@ -1,18 +1,19 @@ LOCAL_PATH:= $(call my-dir) -#include $(call all-subdir-makefiles) +# temporary flags to reduce the number of emitted warnings until they can be +# fixed properly +TEMP_CFLAGS := -Wno-pointer-arith -Wno-tautological-constant-out-of-range-compare \ + -Wno-sign-compare -Wno-format -Wno-unused-parameter CFLAGS := -g -O1 -Wall -D_FORTIFY_SOURCE=2 -include config.h \ - -DBTRFS_FLAT_INCLUDES -D_XOPEN_SOURCE=700 -fno-strict-aliasing -fPIC + -DBTRFS_FLAT_INCLUDES -D_XOPEN_SOURCE=700 -fno-strict-aliasing -fPIC \ + -Wno-macro-redefined -Wno-typedef-redefinition -Wno-address-of-packed-member \ + -Wno-missing-field-initializers $(TEMP_CFLAGS) -LDFLAGS := -static -rdynamic - -LIBS := -luuid -lblkid -lz -llzo2 -L. -lpthread -LIBBTRFS_LIBS := $(LIBS) - -STATIC_CFLAGS := $(CFLAGS) -ffunction-sections -fdata-sections -STATIC_LDFLAGS := -static -Wl,--gc-sections -STATIC_LIBS := -luuid -lblkid -luuid -lz -llzo2 -L. -pthread +STATIC_CFLAGS := $(CFLAGS) -ffunction-sections -fdata-sections \ + -D_GNU_SOURCE=1 \ + -DPACKAGE_STRING=\"btrfs\" \ + -DPACKAGE_URL=\"http://btrfs.wiki.kernel.org\; btrfs_shared_libraries := libext2_uuid \ libext2_blkid @@ -23,7 +24,8 @@ objects := ctree.c disk-io.c kernel-lib/radix-tree.c extent-tree.c print-tree.c qgroup.c free-space-cache.c kernel-lib/list_sort.c props.c \ kernel-shared/ulist.c qgroup-verify.c backref.c string-table.c task-utils.c \ inode.c file.c find-root.c free-space-tree.c help.c send-dump.c \ - fsfeatures.c kernel-lib/tables.c kernel-lib/raid56.c + fsfeatures.c raid56.c + cmds_objects := cmds-subvolume.c cmds-filesystem.c cmds-device.c cmds-scrub.c \ cmds-inspect.c cmds-balance.c cmds-send.c cmds-receive.c \ cmds-quota.c cmds-qgroup.c cmds-replace.c cmds-check.c \ @@ -38,12 +40,11 @@ libbtrfs_headers := send-stream.h send-utils.h send.h kernel-lib/rbtree.h btrfs- kernel-lib/crc32c.h kernel-lib/list.h kerncompat.h \ kernel-lib/radix-tree.h kernel-lib/sizes.h kernel-lib/raid56.h \ extent-cache.h extent_io.h ioctl.h ctree.h btrfsck.h version.h -TESTS := fsck-tests.sh convert-tests.sh -blkid_objects := partition/ superblocks/ topology/ - # external/e2fsprogs/lib is needed for uuid/uuid.h -common_C_INCLUDES := $(LOCAL_PATH) external/e2fsprogs/lib/ external/lzo/include/ external/zlib/ +common_C_INCLUDES := $(LOCAL_PATH) external/e2fsprogs/lib/ external/lzo/include/ external/zlib/ \ + $(LOCAL_PATH)/kernel-lib + #-- include $(CLEAR_VARS) @@ -56,23 +57,18 @@ include $(BUILD_STATIC_LIBRARY) #-- include $(CLEAR_VARS) LOCAL_MODULE := btrfs -#LOCAL_FORCE_STATIC_EXECUTABLE := true LOCAL_SRC_FILES := \ $(objects) \ $(cmds_objects) \ - btrfs.c \ - help.c \ + btrfs.c LOCAL_C_INCLUDES := $(common_C_INCLUDES) LOCAL_CFLAGS := $(STATIC_CFLAGS) -#LOCAL_LDLIBS := $(LIBBTRFS_LIBS) -#LOCAL_LDFLAGS := $(STATIC_LDFLAGS) LOCAL_SHARED_LIBRARIES := $(btrfs_shared_libraries) LOCAL_STATIC_LIBRARIES := libbtrfs liblzo-static libz LOCAL_SYSTEM_SHARED_LIBRARIES := libc libcutils - LOCAL_EXPORT_C_INCLUDES := $(common_C_INCLUDES) -#LOCAL_MODULE_TAGS := optional + include $(BUILD_EXECUTABLE) #-- @@ -85,14 +81,11 @@ LOCAL_SRC_FILES := \ LOCAL_C_INCLUDES := $(common_C_INCLUDES) LOCAL_CFLAGS := $(STATIC_CFLAGS) -#LOCAL_LDLIBS := $(LIBBTRFS_LIBS) -#LOCAL_LDFLAGS := $(STATIC_LDFLAGS) LOCAL_SHARED_LIBRARIES := $(btrfs_shared_libraries) LOCAL_STATIC_LIBRARIES := libbtrfs liblzo-static LOCAL_SYSTEM_SHARED_LIBRARIES := libc libcutils - LOCAL_EXPORT_C_INCLUDES := $(common_C_INCLUDES) -#LOCAL_MODULE_TAGS := optional + include $(BUILD_EXECUTABLE) #--- @@ -105,13 +98,9 @@ LOCAL_SRC_FILES := \ LOCAL_C_INCLUDES := $(common_C_INCLUDES) LOCAL_CFLAGS := $(STATIC_CFLAGS) LOCAL_SHARED_LIBRARIES := $(btrfs_shared_libraries) -#LOCAL_LDLIBS := $(LIBBTRFS_LIBS) -#LOCAL_LDFLAGS := $(STATIC_LDFLAGS) -LOCAL_SHARED_LIBRARIES := $(btrfs_shared_libraries) LOCAL_STATIC_LIBRARIES := libbtrfs liblzo-static LOCAL_SYSTEM_SHARED_LIBRARIES := libc libcutils - LOCAL_EXPORT_C_INCLUDES := $(common_C_INCLUDES) -LOCAL_MODULE_TAGS := optional + include
[PATCH 3/3] compile error fixes
From: Filip BystrickyAndroid currently does not fully support libblkid, and android's bionic doesn't implement some pthread extras such as pthread_tryjoin_np and pthread_cancel. This patch fixes the resulting errors while trying to be as unobtrusive as possible, and is therefore just a temporary fix. For complete support of tools that use background tasks, the way those are managed (in particular, how they are cancelled) would need to be reworked. Signed-off-by: Filip Bystricky Reviewed-by: Mark Salyzyn --- androidcompat.h | 38 -- cmds-scrub.c| 5 + mkfs/common.c | 8 mkfs/main.c | 7 +++ task-utils.c| 1 + utils.c | 18 ++ utils.h | 1 + 7 files changed, 72 insertions(+), 6 deletions(-) diff --git a/androidcompat.h b/androidcompat.h index eec76dad..bd0be172 100644 --- a/androidcompat.h +++ b/androidcompat.h @@ -7,22 +7,48 @@ #ifndef __ANDROID_H__ #define __ANDROID_H__ -#ifdef ANDROID - -#define pthread_setcanceltype(type, oldtype) (0) -#define pthread_setcancelstate(state, oldstate)(0) +#ifdef __BIONIC__ +/* + * Bionic doesn't implement pthread_cancel or helpers. + * + * TODO: this is a temporary fix to just get the tools to compile. + * What we really want is to rework how background tasks are managed. + * All of the threads that are being cancelled are running in infinite loops. + * They should instead be checking a flag at each iteration to see if they + * should continue. Then cancelling would just be a matter of setting the flag. + * + * Most background tasks are managed using btrfs's task_utils library, in which + * case they are passed a task_ctx struct pointer. + * + * However, in two cases, they are created and cancelled directly with the pthread library: + * - chunk-recover.c:scan_devices creates a thread for each device to scan, giving + * each a struct device_scan*. + * - cmds-scrub.c:scrub_start creates a single thread and gives it a struct task_ctx*. + * + * Breakdown by command: + * - btrfs check (cmds-check.c) uses a task (task_ctx) for indicating progress + * - mkfs.btrfs (mkfs/main.c) doesn't appear to use any background tasks. + */ #define pthread_cancel(ret)pthread_kill((ret), SIGUSR1) +/* + * If given pointers are non-null, just zero out the pointed-to value. + * This also eliminates some unused variable warnings. + */ +#define pthread_setcanceltype(type, oldtype) ((oldtype) ? (*(oldtype) = 0) : 0) +#define pthread_setcancelstate(state, oldstate)((oldstate) ? (*(oldstate) = 0) : 0) +#define pthread_tryjoin_np(thread, retval) ((retval) ? ((int)(*(retval) = NULL)) : 0) + typedef struct blkid_struct_probe *blkid_probe; #include #define direct dirent -#else /* !ANDROID */ +#else /* !__BIONIC__ */ #include -#endif /* !ANDROID */ +#endif /* !__BIONIC__ */ #endif /* __ANDROID_H__ */ diff --git a/cmds-scrub.c b/cmds-scrub.c index 5388fdcf..5d8f6c24 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -46,6 +46,11 @@ #include "commands.h" #include "help.h" +#if defined(__BIONIC__) && !defined(PTHREAD_CANCELED) +/* bionic's pthread does not define PTHREAD_CANCELED */ +#define PTHREAD_CANCELED ((void *)-1) +#endif + static const char * const scrub_cmd_group_usage[] = { "btrfs scrub [options] |", NULL diff --git a/mkfs/common.c b/mkfs/common.c index 1e8f26ea..0e4d5c39 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -549,6 +549,13 @@ out: * 0 for nothing found * -1 for internal error */ +#ifdef ANDROID /* none of these blkid functions exist in Android */ +static int check_overwrite(const char *device) +{ + /* We can't tell, so assume there is an existing fs or partition */ + return 1; +} +#else static int check_overwrite(const char *device) { const char *type; @@ -619,6 +626,7 @@ out: "existing filesystem.\n", device); return ret; } +#endif /* ANDROID */ /* * Check if a device is suitable for btrfs diff --git a/mkfs/main.c b/mkfs/main.c index 61f746b3..8ebb11a4 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -1149,6 +1149,12 @@ static int zero_output_file(int out_fd, u64 size) return ret; } +#ifdef ANDROID /* all Androids use ssd (and android currently does not fully support libblkid) */ +static int is_ssd(const char *file) +{ + return 1; +} +#else static int is_ssd(const char *file) { blkid_probe probe; @@ -1196,6 +1202,7 @@ static int is_ssd(const char *file) return rotational == '0'; } +#endif /* ANDROID */ static int _cmp_device_by_id(void *priv, struct list_head *a, struct list_head *b) diff --git a/task-utils.c b/task-utils.c index 12b00027..1e89f13c 100644 --- a/task-utils.c +++ b/task-utils.c @@ -21,6 +21,7 @@ #include #include "task-utils.h" +#include
[PATCH 1/3] copied android.mk from devel branch
From: Filip BystrickyThis series of patches fixes some compile errors that trigger when compiling to android devices. This first patch just brings in devel's Android.mk, to which kdave@ added a few fixes recently. Signed-off-by: Filip Bystricky Reviewed-by: Mark Salyzyn --- Android.mk | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/Android.mk b/Android.mk index fe3209b6..52fe9ab4 100644 --- a/Android.mk +++ b/Android.mk @@ -17,22 +17,27 @@ STATIC_LIBS := -luuid -lblkid -luuid -lz -llzo2 -L. -pthread btrfs_shared_libraries := libext2_uuid \ libext2_blkid -objects := ctree.c disk-io.c radix-tree.c extent-tree.c print-tree.c \ +objects := ctree.c disk-io.c kernel-lib/radix-tree.c extent-tree.c print-tree.c \ root-tree.c dir-item.c file-item.c inode-item.c inode-map.c \ extent-cache.c extent_io.c volumes.c utils.c repair.c \ - qgroup.c raid6.c free-space-cache.c list_sort.c props.c \ - ulist.c qgroup-verify.c backref.c string-table.c task-utils.c \ - inode.c file.c find-root.c + qgroup.c free-space-cache.c kernel-lib/list_sort.c props.c \ + kernel-shared/ulist.c qgroup-verify.c backref.c string-table.c task-utils.c \ + inode.c file.c find-root.c free-space-tree.c help.c send-dump.c \ + fsfeatures.c kernel-lib/tables.c kernel-lib/raid56.c cmds_objects := cmds-subvolume.c cmds-filesystem.c cmds-device.c cmds-scrub.c \ cmds-inspect.c cmds-balance.c cmds-send.c cmds-receive.c \ cmds-quota.c cmds-qgroup.c cmds-replace.c cmds-check.c \ cmds-restore.c cmds-rescue.c chunk-recover.c super-recover.c \ - cmds-property.c cmds-fi-usage.c -libbtrfs_objects := send-stream.c send-utils.c rbtree.c btrfs-list.c crc32c.c \ + cmds-property.c cmds-fi-usage.c cmds-inspect-dump-tree.c \ + cmds-inspect-dump-super.c cmds-inspect-tree-stats.c cmds-fi-du.c \ + mkfs/common.c +libbtrfs_objects := send-stream.c send-utils.c kernel-lib/rbtree.c btrfs-list.c \ + kernel-lib/crc32c.c messages.c \ uuid-tree.c utils-lib.c rbtree-utils.c -libbtrfs_headers := send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \ - crc32c.h list.h kerncompat.h radix-tree.h extent-cache.h \ - extent_io.h ioctl.h ctree.h btrfsck.h version.h +libbtrfs_headers := send-stream.h send-utils.h send.h kernel-lib/rbtree.h btrfs-list.h \ + kernel-lib/crc32c.h kernel-lib/list.h kerncompat.h \ + kernel-lib/radix-tree.h kernel-lib/sizes.h kernel-lib/raid56.h \ + extent-cache.h extent_io.h ioctl.h ctree.h btrfsck.h version.h TESTS := fsck-tests.sh convert-tests.sh blkid_objects := partition/ superblocks/ topology/ @@ -75,7 +80,8 @@ include $(CLEAR_VARS) LOCAL_MODULE := mkfs.btrfs LOCAL_SRC_FILES := \ $(objects) \ -mkfs.c +mkfs/common.c \ +mkfs/main.c LOCAL_C_INCLUDES := $(common_C_INCLUDES) LOCAL_CFLAGS := $(STATIC_CFLAGS) @@ -108,4 +114,4 @@ LOCAL_SYSTEM_SHARED_LIBRARIES := libc libcutils LOCAL_EXPORT_C_INCLUDES := $(common_C_INCLUDES) LOCAL_MODULE_TAGS := optional include $(BUILD_EXECUTABLE) -#-- +#-- \ No newline at end of file -- 2.14.0.rc1.383.gd1ce394fe2-goog -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
On 08/01/2017 01:39 PM, Austin S. Hemmelgarn wrote: On 2017-08-01 13:25, Roman Mamedov wrote: On Tue, 1 Aug 2017 10:14:23 -0600 Liu Bowrote: This aims to fix write hole issue on btrfs raid5/6 setup by adding a separate disk as a journal (aka raid5/6 log), so that after unclean shutdown we can make sure data and parity are consistent on the raid array by replaying the journal. Could it be possible to designate areas on the in-array devices to be used as journal? While md doesn't have much spare room in its metadata for extraneous things like this, Btrfs could use almost as much as it wants to, adding to size of the FS metadata areas. Reliability-wise, the log could be stored as RAID1 chunks. It doesn't seem convenient to need having an additional storage device around just for the log, and also needing to maintain its fault tolerance yourself (so the log device would better be on a mirror, such as mdadm RAID1? more expense and maintenance complexity). I agree, MD pretty much needs a separate device simply because they can't allocate arbitrary space on the other array members. BTRFS can do that though, and I would actually think that that would be _easier_ to implement than having a separate device. That said, I do think that it would need to be a separate chunk type, because things could get really complicated if the metadata is itself using a parity raid profile. Thanks for running with this Liu, I'm reading through all the patches. I do agree that it's better to put the logging into a dedicated chunk type, that way we can have it default to either double or triple mirroring. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On 08/02/2017 04:38 AM, Brendan Hide wrote: The title seems alarmist to me - and I suspect it is going to be misconstrued. :-/ Supporting any filesystem is a huge amount of work. I don't have a problem with Redhat or any distro picking and choosing the projects they want to support. At least inside of FB, our own internal btrfs usage is continuing to grow. Btrfs is becoming a big part of how we ship containers and other workloads where snapshots improve performance. We also heavily use XFS, so I'm happy to see RH's long standing investment there continue. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive loss of disk space
Hi, On 2017-08-01 17:00, Austin S. Hemmelgarn wrote: > OK, I just did a dead simple test by hand, and it looks like I was right. > The method I used to check this is as follows: > 1. Create and mount a reasonably small filesystem (I used an 8G temporary LV > for this, a file would work too though). > 2. Using dd or a similar tool, create a test file that takes up half of the > size of the filesystem. It is important that this _not_ be fallocated, but > just written out. > 3. Use `fallocate -l` to try and extend the size of the file beyond half the > size of the filesystem. > > For BTRFS, this will result in -ENOSPC, while for ext4 and XFS, it will > succeed with no error. Based on this and some low-level inspection, it looks > like BTRFS treats the full range of the fallocate call as unallocated, and > thus is trying to allocate space for regions of that range that are already > allocated. I can confirm this behavior; below some step to reproduce it [2]; however I don't think that it is a bug, but this is the correct behavior for a COW filesystem (see below). Looking at the function btrfs_fallocate() (file fs/btrfs/file.c) static long btrfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { [...] alloc_start = round_down(offset, blocksize); alloc_end = round_up(offset + len, blocksize); [...] /* * Only trigger disk allocation, don't trigger qgroup reserve * * For qgroup space, it will be checked later. */ ret = btrfs_alloc_data_chunk_ondemand(BTRFS_I(inode), alloc_end - alloc_start) it seems that BTRFS always allocate the maximum space required, without consider the one already allocated. Is it too conservative ? I think no: consider the following scenario: a) create a 2GB file b) fallocate -o 1GB -l 2GB c) write from 1GB to 3GB after b), the expectation is that c) always succeed [1]: i.e. there is enough space on the filesystem. Due to the COW nature of BTRFS, you cannot rely on the already allocated space because there could be a small time window where both the old and the new data exists on the disk. My opinion is that in general this behavior is correct due to the COW nature of BTRFS. The only exception that I can find, is about the "nocow" file. For these cases taking in accout the already allocated space would be better. Comments are welcome. BR G.Baroncelli [1] from man 2 fallocate [...] After a successful call, subsequent writes into the range specified by offset and len are guaranteed not to fail because of lack of disk space. [...] [2] -- create a 5G btrfs filesystem # mkdir t1 # truncate --size 5G disk # losetup /dev/loop0 disk # mkfs.btrfs /dev/loop0 # mount /dev/loop0 t1 -- test -- create a 1500 MB file, the expand it to 4000MB -- expected result: the file is 4000MB size -- result: fail: the expansion fails # fallocate -l $((1024*1024*100*15)) file.bin # fallocate -l $((1024*1024*100*40)) file.bin fallocate: fallocate failed: No space left on device # ls -lh file.bin -rw-r--r-- 1 root root 1.5G Aug 2 19:09 file.bin -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: copy fsid to super_block s_uuid
On Wed, Aug 02, 2017 at 02:02:11PM +0800, Anand Jain wrote: > > Hi Darrick, > > Thanks for commenting.. > > >>+ memcpy(>s_uuid, fs_info->fsid, BTRFS_FSID_SIZE); > > > >uuid_copy()? > > It requires a larger migration to use uuid_t, IMO it can be done all > together, in a separate patch ? > > Just for experiment, starting with struct btrfs_fs_info.fsid and > to check its foot prints, I just renamed fsid to fs_id, and compiled. > It reports 73 'has no member named ‘fsid'' errors. > So looks like redefining u8 fsid[] to uuid_t fsid and further updating > all its foot prints, has to be simplified. Any suggestions ? Cocinelle script? It was a fairly simply transition for xfs and others, though from a simple grep it looks like btrfs uses open coded u8 arrays in a few more places. --D > > Thanks, Anand > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On 2017-08-02 08:55, Lutz Vieweg wrote: On 08/02/2017 01:25 PM, Austin S. Hemmelgarn wrote: And this is a worst-case result of the fact that most distros added BTRFS support long before it was ready. RedHat still advertises "Ceph", and given Ceph initially recommended btrfs as the filesystem to use for its nodes, it is interesting to read how clearly they recommend against btrfs now: http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/ We recommand against using btrfs due to the lack of a stable version to test against and frequent bugs in the ENOSPC handling. Yes, and the one thing they don't mention there is that Ceph is already doing most of the same things that BTRFS is, so you end up having performance issues due to duplicated work too. What they specifically call out though is first the reason that it should not be supported yet in RHEL, OEL, and many other distros (I'm explicitly leaving SLES/OpenSUSE off of that list, because while I disagree with their choices of default behavior WRT BTRFS, they are actively involved in it's development, unlike most of the other distros that 'support' it), and then second one of the biggest issues for regular usage. German IT magazine "Golem" speculates that RedHat's decision is influenced by its recent acquisition of Permabit. But I don't really see how XFS or Permabit tackle the problem that if you need to create consistent backups of file systems while they are in use, block-device level snapshots damage the write performance big time. When you're talking about data safety though, most people are willing to sacrifice write performance in favor of significantly lowering perceived risk. The misguided early support of BTRFS without sufficient explanation of exactly how 'in-development' it is by many distros means that there are a lot of stories of issues and failures with BTRFS than ones of success (partly also because the filesystem is one of those things that people tend to complain about if it breaks, and not praise all that much if it works), and as a result, the general perception outside of people who use it actively is that it's pretty risky to use (which is absolutely accurate if you don't do routine maintenance on it). (That backup topic is the one reason we use btrfs for a lot of /home/ directories.) I understand that XFS is expected to get some COW-features in the future as well - but it remains to be seen what performance and robustness implications that will have on XFS. I believe basic reflink functionality is already upstream, and I wasn't aware of any other specific development for XFS. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On 08/02/2017 01:25 PM, Austin S. Hemmelgarn wrote: And this is a worst-case result of the fact that most distros added BTRFS support long before it was ready. RedHat still advertises "Ceph", and given Ceph initially recommended btrfs as the filesystem to use for its nodes, it is interesting to read how clearly they recommend against btrfs now: http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/ We recommand against using btrfs due to the lack of a stable version to test against and frequent bugs in the ENOSPC handling. German IT magazine "Golem" speculates that RedHat's decision is influenced by its recent acquisition of Permabit. But I don't really see how XFS or Permabit tackle the problem that if you need to create consistent backups of file systems while they are in use, block-device level snapshots damage the write performance big time. (That backup topic is the one reason we use btrfs for a lot of /home/ directories.) I understand that XFS is expected to get some COW-features in the future as well - but it remains to be seen what performance and robustness implications that will have on XFS. Regards, Lutz Vieweg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On 2017-08-02 04:38, Brendan Hide wrote: The title seems alarmist to me - and I suspect it is going to be misconstrued. :-/ From the release notes at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html "Btrfs has been deprecated The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux. The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature. Red Hat will continue to invest in future technologies to address the use cases of our customers, specifically those related to snapshots, compression, NVRAM, and ease of use. We encourage feedback through your Red Hat representative on features and requirements you have for file systems and storage technology." And this is a worst-case result of the fact that most distros added BTRFS support long before it was ready. I'm betting some RH customer lost a lot of data because they didn't pay attention to the warnings and didn't do their research and were using raid5/6, and thus RH is considering it not worth investing in. That, or they got fed up with the grandiose plans with no realistic timeline. There have been a number of cases of mishandled patches (chunk-level degraded check anyone?), and a lot of important (from an enterprise usage sense) features that have been proposed but to a naive outside have seen little to no progress (hot-spare support, device failure detection and handling, higher-order replication, working erasure coding (raid56), etc), and from both aspects, I can understand them not wanting to deal with it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive loss of disk space
On 2017-08-02 00:14, Duncan wrote: Austin S. Hemmelgarn posted on Tue, 01 Aug 2017 10:47:30 -0400 as excerpted: I think I _might_ understand what's going on here. Is that test program calling fallocate using the desired total size of the file, or just trying to allocate the range beyond the end to extend the file? I've seen issues with the first case on BTRFS before, and I'm starting to think that it might actually be trying to allocate the exact amount of space requested by fallocate, even if part of the range is already allocated space. If I've interpreted correctly (not being a dev, only a btrfs user, sysadmin, and list regular) previous discussions I've seen on this list... That's exactly what it's doing, and it's _intended_ behavior. The reasoning is something like this: fallocate is supposed to pre- allocate some space with the intent being that writes into that space won't fail, because the space is already allocated. For an existing file with some data already in it, ext4 and xfs do that counting the existing space. But btrfs is copy-on-write, meaning it's going to have to write the new data to a different location than the existing data, and it may well not free up the existing allocation (if even a single 4k block of the existing allocation remains unwritten, it will remain to hold down the entire previous allocation, which isn't released until *none* of it is still in use -- of course in normal usage "in use" can be due to old snapshots or other reflinks to the same extent, as well, tho in these test cases it's not). So in ordered to provide the writes to preallocated space shouldn't ENOSPC guarantee, btrfs can't count currently actually used space as part of the fallocate. The different behavior is entirely due to btrfs being COW, and thus a choice having to be made, do we worst-case fallocate-reserve for writes over currently used data that will have to be COWed elsewhere, possibly without freeing the existing extents because there's still something referencing them, or do we risk ENOSPCing on write to a previously fallocated area? The choice was to worst-case-reserve and take the ENOSPC risk at fallocate time, so the write into that fallocated space could then proceed without the ENOSPC risk that COW would otherwise imply. Make sense, or is my understanding a horrible misunderstanding? =:^) Your reasoning is sound, except for the fact that at least on older kernels (not sure if this is still the case), BTRFS will still perform a COW operation when updating a fallocate'ed region. So if you're actually only appending, fallocate the /additional/ space, not the /entire/ space, and you'll get what you need. But if you're potentially overwriting what's there already, better fallocate the entire space, which triggers the btrfs worst-case allocation behavior you see, in ordered to guarantee it won't ENOSPC during the actual write. Of course the only time the behavior actually differs is with COW, but then there's a BIG difference, but that BIG difference has a GOOD BIG reason! =:^) Tho that difference will certainly necessitate some relearning the /correct/ way to do it, for devs who were doing it the COW-worst-case way all along, even if they didn't actually need to, because it didn't happen to make a difference on what they happened to be testing on, which happened not to be COW... Reminds me of the way newer versions of gcc and/or trying to build with clang as well tends to trigger relearning, because newer versions are stricter in ordered to allow better optimization, and other implementations are simply different in what they're strict on, /because/ they're a different implementation. Well, btrfs is stricter... because it's a different implementation that /has/ to be stricter... due to COW. Except that that strictness breaks userspace programs that are doing perfectly reasonable things. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashed filesystem, nothing helps
With the help of btrfs-corrupt-block i was able to get a little bit farer. I marked some of my problem block corrupt. Now i am in this stage mainframe:~ # btrfs restore /dev/sdb1 /mnt parent transid verify failed on 29409280 wanted 1486829 found 1488801 parent transid verify failed on 29409280 wanted 1486829 found 1488801 parent transid verify failed on 29409280 wanted 1486829 found 1488801 parent transid verify failed on 29409280 wanted 1486829 found 1488801 Ignoring transid failure parent transid verify failed on 29376512 wanted 1327723 found 1489835 parent transid verify failed on 29376512 wanted 1327723 found 1489835 parent transid verify failed on 29376512 wanted 1327723 found 1489835 parent transid verify failed on 29376512 wanted 1327723 found 1489835 Ignoring transid failure parent transid verify failed on 29786112 wanted 1489835 found 1489871 parent transid verify failed on 29786112 wanted 1489835 found 1489871 parent transid verify failed on 29786112 wanted 1489835 found 1489871 parent transid verify failed on 29786112 wanted 1489835 found 1489871 Ignoring transid failure leaf parent key incorrect 29786112 Error searching -1 Regards, Thomas -- Thomas Wurfbaum Starkertshofen 15 85084 Reichertshofen Tel.: +49-160-3696336 Mail: tho...@wurfbaum.net Google+:http://google.com/+ThomasWurfbaum Facebook: https://www.facebook.com/profile.php?id=16061335414 Xing: https://www.xing.com/profile/Thomas_Wurfbaum signature.asc Description: This is a digitally signed message part.
Re: Crashed filesystem, nothing helps
Am Mittwoch, 2. August 2017, 11:31:41 CEST schrieb Roman Mamedov: > Did it just abruptly exit there? Or you terminated it? It apruptly stopped there Regards, Thomas -- Thomas Wurfbaum Starkertshofen 15 85084 Reichertshofen Tel.: +49-160-3696336 Mail: tho...@wurfbaum.net Google+:http://google.com/+ThomasWurfbaum Facebook: https://www.facebook.com/profile.php?id=16061335414 Xing: https://www.xing.com/profile/Thomas_Wurfbaum signature.asc Description: This is a digitally signed message part.
Re: Crashed filesystem, nothing helps
On Wed, 02 Aug 2017 11:17:04 +0200 Thomas Wurfbaumwrote: > A restore does also not help: > mainframe:~ # btrfs restore /dev/sdb1 /mnt > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > Ignoring transid failure > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > Ignoring transid failure > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > Ignoring transid failure Did it just abruptly exit there? Or you terminated it? IIRC these messages (about ignoring) are not a problem for restore, it should be able to continue. Or if not, it would print a more definitive error message, e.g. "Couldn't read tree root" or such. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashed filesystem, nothing helps
Maybe you are right, but i just followed the Suse guide: https://en.opensuse.org/SDB:BTRFS How to repair a broken/unmountable btrfs filesystem I already tried the commands mount with the -o usebackuproot option. (And -o usebackuproot,ro as well) But they just produce this in dmesg: [61054.470771] BTRFS info (device sdb1): trying to use backup root at mount time [61054.470778] BTRFS info (device sdb1): disk space caching is enabled [61054.470782] BTRFS info (device sdb1): has skinny extents [61054.560876] BTRFS error (device sdb1): parent transid verify failed on 29392896 wanted 1486833 found 1486836 [61054.563423] BTRFS error (device sdb1): parent transid verify failed on 29392896 wanted 1486833 found 1486836 [61054.604057] BTRFS error (device sdb1): open_ctree failed [61079.137435] BTRFS info (device sdb1): trying to use backup root at mount time [61079.137443] BTRFS info (device sdb1): disk space caching is enabled [61079.137445] BTRFS info (device sdb1): has skinny extents [61079.227242] BTRFS error (device sdb1): parent transid verify failed on 29392896 wanted 1486833 found 1486836 [61079.230087] BTRFS error (device sdb1): parent transid verify failed on 29392896 wanted 1486833 found 1486836 [61079.260062] BTRFS error (device sdb1): open_ctree failed And on the cli i get the following: mainframe:~ # mount -o usebackuproot,ro /dev/sdb1 /data mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. A restore does also not help: mainframe:~ # btrfs restore /dev/sdb1 /mnt parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 Ignoring transid failure parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 Ignoring transid failure parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 Ignoring transid failure -- Thomas Wurfbaum Starkertshofen 15 85084 Reichertshofen Tel.: +49-160-3696336 Mail: tho...@wurfbaum.net Google+:http://google.com/+ThomasWurfbaum Facebook: https://www.facebook.com/profile.php?id=16061335414 Xing: https://www.xing.com/profile/Thomas_Wurfbaum signature.asc Description: This is a digitally signed message part.
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
I haven't seen active btrfs developers from some time, Redhat looks put most of their efforts on XFS, It is time to switch to SLES/opensuse! On Wed, Aug 2, 2017 at 4:38 PM, Brendan Hidewrote: > The title seems alarmist to me - and I suspect it is going to be > misconstrued. :-/ > > From the release notes at > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html > > "Btrfs has been deprecated > > The Btrfs file system has been in Technology Preview state since the initial > release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a > fully supported feature and it will be removed in a future major release of > Red Hat Enterprise Linux. > > The Btrfs file system did receive numerous updates from the upstream in Red > Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise > Linux 7 series. However, this is the last planned update to this feature. > > Red Hat will continue to invest in future technologies to address the use > cases of our customers, specifically those related to snapshots, > compression, NVRAM, and ease of use. We encourage feedback through your Red > Hat representative on features and requirements you have for file systems > and storage technology." > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashed filesystem, nothing helps
On Wed, Aug 02, 2017 at 10:27:50AM +0200, Thomas Wurfbaum wrote: > Hello, > > Yesterday morning i recognized a hard reboot of my system, but the /data > filesystem was > not possible to mount. > > > mainframe:~ # uname -a > Linux mainframe 4.11.8-2-default #1 SMP PREEMPT Thu Jun 29 14:37:33 UTC 2017 > (42bd7a0) x86_64 x86_64 x86_64 GNU/Linux > mainframe:~ # btrfs --version > btrfs-progs v4.10.2+20170406 > mainframe:~ # btrfs fi show > Label: none uuid: 2276-0885-4683-ac04-477c27cfab80 > Total devices 1 FS bytes used 2.88TiB > devid1 size 4.53TiB used 2.92TiB path /dev/sdb1 > mainframe:~ # btrfs restore /dev/sdb1 /mnt > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > Ignoring transid failure > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > Ignoring transid failure > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > Ignoring transid failure > mainframe:~ # mount /dev/sdb1 /data > mount: wrong fs type, bad option, bad superblock on /dev/sdb1, >missing codepage or helper program, or other error > >In some cases useful info is found in syslog - try >dmesg | tail or so. > mainframe:~ # mount -o usebackuproot /dev/sdb1 /data > mount: wrong fs type, bad option, bad superblock on /dev/sdb1, >missing codepage or helper program, or other error > >In some cases useful info is found in syslog - try >dmesg | tail or so. > mainframe:~ # btrfs check /dev/sdb1 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > parent transid verify failed on 29392896 wanted 1486833 found 1486836 > Ignoring transid failure > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > parent transid verify failed on 29409280 wanted 1486829 found 1486833 > Ignoring transid failure > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > parent transid verify failed on 29376512 wanted 1327723 found 1486833 > Ignoring transid failure > Checking filesystem on /dev/sdb1 > UUID: 2276-0885-4683-ac04-477c27cfab80 > checking extents > parent transid verify failed on 290766848 wanted 1486826 found 1486085 > parent transid verify failed on 290766848 wanted 1486826 found 1486085 > parent transid verify failed on 290766848 wanted 1486826 found 1486085 > parent transid verify failed on 290766848 wanted 1486826 found 1486085 > Ignoring transid failure > parent transid verify failed on 292339712 wanted 1486826 found 1486086 > parent transid verify failed on 292339712 wanted 1486826 found 1486086 > parent transid verify failed on 291078144 wanted 1486826 found 1486085 > parent transid verify failed on 291078144 wanted 1486826 found 1486085 > parent transid verify failed on 291078144 wanted 1486826 found 1486085 > parent transid verify failed on 291078144 wanted 1486826 found 1486085 > Ignoring transid failure > parent transid verify failed on 292978688 wanted 1486826 found 1486086 > parent transid verify failed on 292978688 wanted 1486826 found 1486086 > parent transid verify failed on 292978688 wanted 1486826 found 1486086 > parent transid verify failed on 292978688 wanted 1486826 found 1486086 > Ignoring transid failure > parent transid verify failed on 292519936 wanted 1486826 found 1486086 > parent transid verify failed on 292519936 wanted 1486826 found 1486086 > parent transid verify failed on 292536320 wanted 1486826 found 1486086 > parent transid verify failed on 292536320 wanted 1486826 found 1486086 > parent transid verify failed on 292552704 wanted 1486826 found 1486086 > parent transid verify failed on 292552704 wanted 1486826 found 1486086 > parent transid verify failed on 292585472 wanted 1486826 found 1486086 > parent transid verify failed on 292585472 wanted 1486826 found 1486086 > parent transid verify failed on 292585472 wanted 1486826 found 1486086 >
RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
The title seems alarmist to me - and I suspect it is going to be misconstrued. :-/ From the release notes at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html "Btrfs has been deprecated The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux. The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature. Red Hat will continue to invest in future technologies to address the use cases of our customers, specifically those related to snapshots, compression, NVRAM, and ease of use. We encourage feedback through your Red Hat representative on features and requirements you have for file systems and storage technology." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe linux-btrfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS error: bad tree block start 0 623771648
Thanks you for the information. It looks indeed like there are some important parts all zerod... I will try to hijack to code to get easy access to some config directories. I already reinstalled the Operating System, now with dup Metadata and will have a deeper look at the discard flag. The restore tools don't work out of the box, because like Liu Bo mentioned, they'll check metadata and exit on an error. Thanks for your support marcel On Tue, Aug 1, 2017 at 11:45 PM, Liu Bowrote: > On Tue, Aug 01, 2017 at 11:04:10AM +0500, Roman Mamedov wrote: >> On Mon, 31 Jul 2017 11:12:01 -0700 >> Liu Bo wrote: >> >> > Superblock and chunk tree root is OK, looks like the header part of >> > the tree root is now all-zero, but I'm unable to think of a btrfs bug >> > which can lead to that (if there is, it is a serious enough one) >> >> I see that the FS is being mounted with "discard". So maybe it was a TRIM >> gone >> bad (wrong location or in a wrong sequence). >> > > By checking discard path in btrfs, looks OK to me, more likely it's > caused by problems from underlying stuff. > > Thanks, > > -liubo > >> Generally it appears to be not recommended to use "discard" by now (because >> of >> its performance impact, and maybe possible issues like this), instead >> schedule >> to call "fstrim " once a day or so, and/or on boot-up. >> >> > on ssd like disks, by default there is only one copy for metadata. >> >> Time and time again, the default of "single" metadata for SSD is a terrible >> idea. Most likely DUP metadata would save the FS in this case. >> >> -- >> With respect, >> Roman >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashed filesystem, nothing helps
Please find attached my dmesg.log Regards, Thomas[0.00] Linux version 4.11.8-2-default (geeko@buildhost) (gcc version 7.1.1 20170629 [gcc-7-branch revision 249772] (SUSE Linux) ) #1 SMP PREEMPT Thu Jun 29 14:37:33 UTC 2017 (42bd7a0) [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.11.8-2-default root=UUID=6b92e93a-86f2-4007-b374-4c7ad6a57063 resume=/dev/disk/by-id/scsi-1AMCC_J0827296748EC30067A8-part2 splash=silent quiet showopts [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009e7ff] usable [0.00] BIOS-e820: [mem 0x0009e800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xbfea] usable [0.00] BIOS-e820: [mem 0xbfeb-0xbfee2fff] ACPI NVS [0.00] BIOS-e820: [mem 0xbfee3000-0xbfee] ACPI data [0.00] BIOS-e820: [mem 0xbfef-0xbfef] reserved [0.00] BIOS-e820: [mem 0xf000-0xf3ff] reserved [0.00] BIOS-e820: [mem 0xfec0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00023fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: System manufacturer System Product Name/M2N32 WS Professional, BIOS ASUS M2N32 WS Pro ACPI BIOS Revision 2001 05/05/2008 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x24 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C7FFF write-protect [0.00] C8000-F uncachable [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask FF8000 write-back [0.00] 1 base 008000 mask FFC000 write-back [0.00] 2 base 00BFF0 mask F0 uncachable [0.00] 3 base 01 mask FF write-back [0.00] 4 base 02 mask FFC000 write-back [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00024000 aka 9216M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xbff0-0x] usable ==> reserved [0.00] e820: last_pfn = 0xbfeb0 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000f6040-0x000f604f] mapped at [8808000f6040] [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [880800098000] 98000 size 24576 [0.00] BRK [0x1b3269000, 0x1b3269fff] PGTABLE [0.00] BRK [0x1b326a000, 0x1b326afff] PGTABLE [0.00] BRK [0x1b326b000, 0x1b326bfff] PGTABLE [0.00] BRK [0x1b326c000, 0x1b326cfff] PGTABLE [0.00] BRK [0x1b326d000, 0x1b326dfff] PGTABLE [0.00] BRK [0x1b326e000, 0x1b326efff] PGTABLE [0.00] BRK [0x1b326f000, 0x1b326] PGTABLE [0.00] BRK [0x1b327, 0x1b3270fff] PGTABLE [0.00] RAMDISK: [mem 0x36b91000-0x375b] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F7F20 24 (v02 Nvidia) [0.00] ACPI: XSDT 0xBFEE3100 54 (v01 Nvidia ASUSACPI 42302E31 AWRD ) [0.00] ACPI: FACP 0xBFEEB480 F4 (v03 Nvidia ASUSACPI 42302E31 AWRD ) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Pm1aEventBlock: 32/8 (20170119/tbfadt-603) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Pm1aControlBlock: 16/8 (20170119/tbfadt-603) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/PmTimerBlock: 32/8 (20170119/tbfadt-603) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/8 (20170119/tbfadt-603) [0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/8 (20170119/tbfadt-603) [0.00] ACPI BIOS Warning (bug): Invalid length for FADT/Pm1aEventBlock: 8, using default 32 (20170119/tbfadt-708) [0.00] ACPI BIOS Warning (bug): Invalid length for FADT/Pm1aControlBlock: 8, using default 16 (20170119/tbfadt-708) [0.00] ACPI BIOS Warning (bug): Invalid length for FADT/PmTimerBlock: 8, using default 32 (20170119/tbfadt-708) [0.00] ACPI: DSDT 0xBFEE3280 008189 (v01 NVIDIA AWRDACPI 1000 MSFT 0300) [0.00] ACPI: FACS 0xBFEB 40 [0.00] ACPI: FACS 0xBFEB 40 [0.00] ACPI: TCPA 0xBFEEB6C0 32 (v01 HTC
Crashed filesystem, nothing helps
Hello, Yesterday morning i recognized a hard reboot of my system, but the /data filesystem was not possible to mount. mainframe:~ # uname -a Linux mainframe 4.11.8-2-default #1 SMP PREEMPT Thu Jun 29 14:37:33 UTC 2017 (42bd7a0) x86_64 x86_64 x86_64 GNU/Linux mainframe:~ # btrfs --version btrfs-progs v4.10.2+20170406 mainframe:~ # btrfs fi show Label: none uuid: 2276-0885-4683-ac04-477c27cfab80 Total devices 1 FS bytes used 2.88TiB devid1 size 4.53TiB used 2.92TiB path /dev/sdb1 mainframe:~ # btrfs restore /dev/sdb1 /mnt parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 Ignoring transid failure parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 Ignoring transid failure parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 Ignoring transid failure mainframe:~ # mount /dev/sdb1 /data mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. mainframe:~ # mount -o usebackuproot /dev/sdb1 /data mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. mainframe:~ # btrfs check /dev/sdb1 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 parent transid verify failed on 29392896 wanted 1486833 found 1486836 Ignoring transid failure parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 parent transid verify failed on 29409280 wanted 1486829 found 1486833 Ignoring transid failure parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 parent transid verify failed on 29376512 wanted 1327723 found 1486833 Ignoring transid failure Checking filesystem on /dev/sdb1 UUID: 2276-0885-4683-ac04-477c27cfab80 checking extents parent transid verify failed on 290766848 wanted 1486826 found 1486085 parent transid verify failed on 290766848 wanted 1486826 found 1486085 parent transid verify failed on 290766848 wanted 1486826 found 1486085 parent transid verify failed on 290766848 wanted 1486826 found 1486085 Ignoring transid failure parent transid verify failed on 292339712 wanted 1486826 found 1486086 parent transid verify failed on 292339712 wanted 1486826 found 1486086 parent transid verify failed on 291078144 wanted 1486826 found 1486085 parent transid verify failed on 291078144 wanted 1486826 found 1486085 parent transid verify failed on 291078144 wanted 1486826 found 1486085 parent transid verify failed on 291078144 wanted 1486826 found 1486085 Ignoring transid failure parent transid verify failed on 292978688 wanted 1486826 found 1486086 parent transid verify failed on 292978688 wanted 1486826 found 1486086 parent transid verify failed on 292978688 wanted 1486826 found 1486086 parent transid verify failed on 292978688 wanted 1486826 found 1486086 Ignoring transid failure parent transid verify failed on 292519936 wanted 1486826 found 1486086 parent transid verify failed on 292519936 wanted 1486826 found 1486086 parent transid verify failed on 292536320 wanted 1486826 found 1486086 parent transid verify failed on 292536320 wanted 1486826 found 1486086 parent transid verify failed on 292552704 wanted 1486826 found 1486086 parent transid verify failed on 292552704 wanted 1486826 found 1486086 parent transid verify failed on 292585472 wanted 1486826 found 1486086 parent transid verify failed on 292585472 wanted 1486826 found 1486086 parent transid verify failed on 292585472 wanted 1486826 found 1486086 parent transid verify failed on 292585472 wanted 1486826 found 1486086 Ignoring transid failure parent transid verify failed on 290766848 wanted 1486826 found 1486085 Ignoring transid failure leaf parent key incorrect 290766848 bad block 290766848
Re: [PATCH 0/2] More nritems range checking
Hello, Am 02.06.2017 um 12:08 schrieb Philipp Hahn: > thank you for applying my last patch, but regarding my corrputed file system I > found two other cases were btrfs crashes: > - btrfs_del_items() was overlooked by me > - deleting from an empty node > > Find attached two patches to improve that. > Please check the second patch hunk 2, as I'm unsure if "mid == nritems" is > valid. > > (If someone can give me a hand on how to get my FS fixed again, I would > appreciate that.) > > Philipp Hahn (2): > btrfs-progs: Check slot + nr >= nritems overflow > btrfs-progs: Check nritems under-/overflow > > ctree.c | 13 +++-- > 1 file changed, 7 insertions(+), 6 deletions(-) Ping? Philipp -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] btrfs: preserve i_mode if __btrfs_set_acl() fails
When changing a file's acl mask, btrfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by restoring the original mode bits if __btrfs_set_acl fails. Signed-off-by: Ernesto A. Fernández--- Please ignore the two previous versions, this is far simpler and has the same effect. To Josef Bacik: thank you for your review, I'm sorry I wasted your time. fs/btrfs/acl.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index 8d8370d..1ba49eb 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -114,13 +114,17 @@ static int __btrfs_set_acl(struct btrfs_trans_handle *trans, int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) { int ret; + umode_t old_mode = inode->i_mode; if (type == ACL_TYPE_ACCESS && acl) { ret = posix_acl_update_mode(inode, >i_mode, ); if (ret) return ret; } - return __btrfs_set_acl(NULL, inode, acl, type); + ret = __btrfs_set_acl(NULL, inode, acl, type); + if (ret) + inode->i_mode = old_mode; + return ret; } /* -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: verify_dir_item fails in replay_xattr_deletes
On 2.08.2017 08:35, Lu Fengqi wrote: > From: Su Yue> > In replay_xattr_deletes(), the argument @slot of verify_dir_item() > should be variable @i instead of path->slots[0]. This was already fix in a patch sent by Filipe. Title is: [PATCH] Btrfs: fix dir item validation when replaying xattr deletes > > The bug causes failure of generic/066 and shared/002 in xfstest. > dmesg: > [12507.810781] BTRFS critical (device dm-0): invalid dir item name len: 10 > [12507.811185] BTRFS: error (device dm-0) in btrfs_replay_log:2475: errno=-5 > IO failure (Failed to recover log tree) > [12507.811928] BTRFS error (device dm-0): cleaner transaction attach returned > -30 > [12507.821020] BTRFS error (device dm-0): open_ctree failed > [12508.131526] BTRFS info (device dm-0): disk space caching is enabled > [12508.132145] BTRFS info (device dm-0): has skinny extents > [12508.136265] BTRFS critical (device dm-0): invalid dir item name len: 10 > [12508.136678] BTRFS: error (device dm-0) in btrfs_replay_log:2475: errno=-5 > IO failure (Failed to recover log tree) > [12508.137501] BTRFS error (device dm-0): cleaner transaction attach returned > -30 > [12508.147982] BTRFS error (device dm-0): open_ctree failed > > Signed-off-by: Su Yue > --- > fs/btrfs/tree-log.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index f20ef211a73d..3a11ae63676e 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -2153,8 +2153,7 @@ static int replay_xattr_deletes(struct > btrfs_trans_handle *trans, > u32 this_len = sizeof(*di) + name_len + data_len; > char *name; > > - ret = verify_dir_item(fs_info, path->nodes[0], > - path->slots[0], di); > + ret = verify_dir_item(fs_info, path->nodes[0], i, di); > if (ret) { > ret = -EIO; > goto out; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html