Urgently need money? We can help you!
Urgently need money? We can help you! Are you by the current situation in trouble or threatens you in trouble? In this way, we give you the ability to take a new development. As a rich person I feel obliged to assist people who are struggling to give them a chance. Everyone deserved a second chance and since the Government fails, it will have to come from others. No amount is too crazy for us and the maturity we determine by mutual agreement. No surprises, no extra costs, but just the agreed amounts and nothing else. Don't wait any longer and comment on this post. Please specify the amount you want to borrow and we will contact you with all the possibilities. contact us today at stewarrt.l...@gmail.com
Re: HELP unmountable partition after btrfs balance to RAID0
Thomas Mohr posted on Thu, 06 Dec 2018 12:31:15 +0100 as excerpted: > We wanted to convert a file system to a RAID0 with two partitions. > Unfortunately we had to reboot the server during the balance operation > before it could complete. > > Now following happens: > > A mount attempt of the array fails with following error code: > > btrfs recover yields roughly 1.6 out of 4 TB. [Just another btrfs user and list regular, not a dev. A dev may reply to your specific case, but meanwhile, for next time...] That shouldn't be a problem. Because with raid0 a failure of any of the components will take down the entire raid, making it less reliable than a single device, raid0 (in general, not just btrfs) is considered only useful for data of low enough value that its loss is no big deal, either because it's truly of little value (internet cache being a good example), or because backups are kept available and updated for whenever the raid0 array fails. Because with raid0, it's always a question of when it'll fail, not if. So loss of a filesystem being converted to raid0 isn't a problem, because the data on it, by virtue of being in the process of conversion to raid0, is defined as of throw-away value in any case. If it's of higher value than that, it's not going to be raid0 (or in the process of conversion to it) in the first place. Of course that's simply an extension of the more general first sysadmin's rule of backups, that the true value of data is defined not by arbitrary claims, but by the number of backups of that data it's worth having. Because "things happen", whether it's fat-fingering, bad hardware, buggy software, or simply someone tripping over the power cable or running into the power pole outside at the wrong time. So no backup is simply defining the data as worth less than the time/ trouble/resources necessary to make that backup. Note that you ALWAYS save what was of most value to you, either the time/ trouble/resources to do the backup, if your actions defined that to be of more value than the data, or the data, if you had that backup, thereby defining the value of the data to be worth backing up. Similarly, failure of the only backup isn't a problem because by virtue of there being only that one backup, the data is defined as not worth having more than one, and likewise, having an outdated backup isn't a problem, because that's simply the special case of defining the data in the delta between the backup time and the present as not (yet) worth the time/hassle/resources to make/refresh that backup. (And FWIW, the second sysadmin's rule of backups is that it's not a backup until you've successfully tested it recoverable in the same sort of conditions you're likely to need to recover it in. Because so many people have /thought/ they had backups, that turned out not to be, because they never tested that they could actually recover the data from them. For instance, if the backup tools you'll need to recover the backup are on the backup itself, how do you get to them? Can you create a filesystem for the new copy of the data and recover it from the backup with just the tools and documentation available from your emergency boot media? Untested backup == no backup, or at best, backup still in process!) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman
HELP unmountable partition after btrfs balance to RAID0
Dear developers of BTRFS, we have a problem. We wanted to convert a file system to a RAID0 with two partitions. Unfortunately we had to reboot the server during the balance operation before it could complete. Now following happens: A mount attempt of the array fails with following error code: btrfs recover yields roughly 1.6 out of 4 TB. to recover the rest we have tried: mount: [18192.357444] BTRFS info (device sdb1): disk space caching is enabled [18192.357447] BTRFS info (device sdb1): has skinny extents [18192.370664] BTRFS error (device sdb1): parent transid verify failed on 30523392 wanted 7432 found 7445 [18192.370810] BTRFS error (device sdb1): parent transid verify failed on 30523392 wanted 7432 found 7445 [18192.394745] BTRFS error (device sdb1): open_ctree failed mounting with options ro, degraded, cache_clear etc yields the same errors. btrfs rescue zero-log. This operation works, however, the error persists and the array remains unmountable parent transid verify failed on 59768832 wanted 7422 found 7187 parent transid verify failed on 59768832 wanted 7422 found 7187 parent transid verify failed on 59768832 wanted 7422 found 7187 parent transid verify failed on 59768832 wanted 7422 found 7187 Ignoring transid failure parent transid verify failed on 30408704 wanted 7430 found 7443 parent transid verify failed on 30408704 wanted 7430 found 7443 parent transid verify failed on 30408704 wanted 7430 found 7443 parent transid verify failed on 30408704 wanted 7430 found 7443 Ignoring transid failure Clearing log on /dev/sdb1, previous log_root 0, level 0 btrfs rescue chunk-recover fails with following error message: btrfs check results in: Opening filesystem to check... parent transid verify failed on 59768832 wanted 7422 found 7187 parent transid verify failed on 59768832 wanted 7422 found 7187 parent transid verify failed on 59768832 wanted 7422 found 7187 parent transid verify failed on 59768832 wanted 7422 found 7187 Ignoring transid failure parent transid verify failed on 30408704 wanted 7430 found 7443 parent transid verify failed on 30408704 wanted 7430 found 7443 parent transid verify failed on 30408704 wanted 7430 found 7443 parent transid verify failed on 30408704 wanted 7430 found 7443 Ignoring transid failure Checking filesystem on /dev/sdb1 UUID: 6c9ed4e1-d63f-46f0-b1e9-608b8fa43bb8 [1/7] checking root items parent transid verify failed on 30523392 wanted 7432 found 7443 parent transid verify failed on 30523392 wanted 7432 found 7443 parent transid verify failed on 30523392 wanted 7432 found 7443 parent transid verify failed on 30523392 wanted 7432 found 7443 Ignoring transid failure leaf parent key incorrect 30523392ERROR: failed to repair root items: Operation not permitted Any ideas what is going on or how to recover the file system ? I would greatly appreciate your help !!! best, Thomas uname -a: Linux server2 4.19.5-1-default #1 SMP PREEMPT Tue Nov 27 19:56:09 UTC 2018 (6210279) x86_64 x86_64 x86_64 GNU/Linux btrfs-progs version 4.19 -- ScienceConsult - DI Thomas Mohr KG DI Thomas Mohr Enzianweg 10a 2353 Guntramsdorf Austria +43 2236 56793 +43 660 461 1966 http://www.mohrkeg.co.at
Re: Need help with potential ~45TB dataloss
On Tue, Dec 4, 2018 at 3:09 AM Patrick Dijkgraaf wrote: > > Hi Chris, > > See the output below. Any suggestions based on it? If they're SATA drives, they may not support SCT ERC; and if they're SAS, depending on what controller they're behind, smartctl might need a hint to properly ask the drive for SCT ERC status. Simplest way to know is do 'smartctl -x' on one drive, assuming they're all the same basic make/model other than size. -- Chris Murphy
Re: Need help with potential ~45TB dataloss
Hi Chris, See the output below. Any suggestions based on it? Thanks! -- Groet / Cheers, Patrick Dijkgraaf On Mon, 2018-12-03 at 20:16 -0700, Chris Murphy wrote: > Also useful information for autopsy, perhaps not for fixing, is to > know whether the SCT ERC value for every drive is less than the > kernel's SCSI driver block device command timeout value. It's super > important that the drive reports an explicit read failure before the > read command is considered failed by the kernel. If the drive is > still > trying to do a read, and the kernel command timer times out, it'll > just do a reset of the whole link and we lose the outcome for the > hanging command. Upon explicit read error only, can Btrfs, or md > RAID, > know what device and physical sector has a problem, and therefore how > to reconstruct the block, and fix the bad sector with a write of > known > good data. > > smartctl -l scterc /device/ Seems to not work: [root@cornelis ~]# for disk in /dev/sd{e..x}; do echo ${disk}; smartctl -l scterc ${disk}; done /dev/sde smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdf smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdg smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdh smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdi smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdj smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdk smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org Smartctl open device: /dev/sdk failed: No such device /dev/sdl smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdm smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdn smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control command not supported /dev/sdo smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdp smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdq smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdr smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sds smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW
Re: Need help with potential ~45TB dataloss
Hi, thanks again. Please see answers inline. -- Groet / Cheers, Patrick Dijkgraaf On Mon, 2018-12-03 at 08:35 +0800, Qu Wenruo wrote: > > On 2018/12/2 下午5:03, Patrick Dijkgraaf wrote: > > Hi Qu, > > > > Thanks for helping me! > > > > Please see the reponses in-line. > > Any suggestions based on this? > > > > Thanks! > > > > > > On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: > > > On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > > > > Hi all, > > > > > > > > I have been a happy BTRFS user for quite some time. But now I'm > > > > facing > > > > a potential ~45TB dataloss... :-( > > > > I hope someone can help! > > > > > > > > I have Server A and Server B. Both having a 20-devices BTRFS > > > > RAID6 > > > > filesystem. Because of known RAID5/6 risks, Server B was a > > > > backup > > > > of > > > > Server A. > > > > After applying updates to server B and reboot, the FS would not > > > > mount > > > > anymore. Because it was "just" a backup. I decided to recreate > > > > the > > > > FS > > > > and perform a new backup. Later, I discovered that the FS was > > > > not > > > > broken, but I faced this issue: > > > > https://patchwork.kernel.org/patch/10694997/ > > > > > > > > > > > > > > Sorry for the inconvenience. > > > > > > I didn't realize the max_chunk_size limit isn't reliable at that > > > timing. > > > > No problem, I should not have jumped to the conclusion to recreate > > the > > backup volume. > > > > > > Anyway, the FS was already recreated, so I needed to do a new > > > > backup. > > > > During the backup (using rsync -vah), Server A (the source) > > > > encountered > > > > an I/O error and my rsync failed. In an attempt to "quick fix" > > > > the > > > > issue, I rebooted Server A after which the FS would not mount > > > > anymore. > > > > > > Did you have any dmesg about that IO error? > > > > Yes there was. But I omitted capturing it... The system is now > > rebooted > > and I can't retrieve it anymore. :-( > > > > > And how is the reboot scheduled? Forced power off or normal > > > reboot > > > command? > > > > The system was rebooted using a normal reboot command. > > Then the problem is pretty serious. > > Possibly already corrupted before. > > > > > I documented what I have tried, below. I have not yet tried > > > > anything > > > > except what is shown, because I am afraid of causing more harm > > > > to > > > > the FS. > > > > > > Pretty clever, no btrfs check --repair is a pretty good move. > > > > > > > I hope somebody here can give me advice on how to (hopefully) > > > > retrieve my data... > > > > > > > > Thanks in advance! > > > > > > > > == > > > > > > > > [root@cornelis ~]# btrfs fi show > > > > Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c- > > > > f329fc3795fd > > > > Total devices 1 FS bytes used 463.92GiB > > > > devid1 size 800.00GiB used 493.02GiB path > > > > /dev/mapper/cornelis-cornelis--btrfs > > > > > > > > Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 > > > > Total devices 20 FS bytes used 44.85TiB > > > > devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 > > > > devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 > > > > devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 > > > > devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 > > > > devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 > > > > devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 > > > > devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 > > > > devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 > > > > devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 > > > > devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 > > > > devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 > > > > devid 12 size 3.64TiB
Re: Need help with potential ~45TB dataloss
Also useful information for autopsy, perhaps not for fixing, is to know whether the SCT ERC value for every drive is less than the kernel's SCSI driver block device command timeout value. It's super important that the drive reports an explicit read failure before the read command is considered failed by the kernel. If the drive is still trying to do a read, and the kernel command timer times out, it'll just do a reset of the whole link and we lose the outcome for the hanging command. Upon explicit read error only, can Btrfs, or md RAID, know what device and physical sector has a problem, and therefore how to reconstruct the block, and fix the bad sector with a write of known good data. smartctl -l scterc /device/ and cat /sys/block/sda/device/timeout Only if SCT ERC is enabled with a value below 30, or if the kernel command timer is change to be well above 30 (like 180, which is absolutely crazy but a separate conversation) can we be sure that there haven't just been resets going on for a while, preventing bad sectors from being fixed up all along, and can contribute to the problem. This comes up on the linux-raid (mainly md driver) list all the time, and it contributes to lost RAID all the time. And arguably it leads to unnecessary data loss in even the single device desktop/laptop use case as well. Chris Murphy
Re: Need help with potential ~45TB dataloss
On 2018/12/3 上午4:30, Andrei Borzenkov wrote: > 02.12.2018 23:14, Patrick Dijkgraaf пишет: >> I have some additional info. >> >> I found the reason the FS got corrupted. It was a single failing drive, >> which caused the entire cabinet (containing 7 drives) to reset. So the >> FS suddenly lost 7 drives. >> > > This remains mystery for me. btrfs is marketed to be always consistent > on disk - you either have previous full transaction or current full > transaction. If current transaction was interrupted the promise is you > are left with previous valid consistent transaction. > > Obviously this is not what happens in practice. Which nullifies the main > selling point of btrfs. > > Unless this is expected behavior, it sounds like some barriers are > missing and summary data is updated before (and without waiting for) > subordinate data. And if it is expected behavior ... There are one (unfortunately) known problem for RAID5/6 and one special problem for RAID6. The common problem is write hole. For a RAID5 stripe like: Disk 1 |Disk 2| Disk 3 --- DATA1 |DATA2 | PARITY If we have written something into DATA1, but powerloss happened before we update PARITY in disk 3. In this case, we can't tolerant Disk 2 loss, since DATA1 doesn't match PARAITY anymore. Without the ability to know what exactly block we have written, for write hole problem exists for any parity based solution, including BTRFS RAID5/6. From the guys in the mail list, other RAID5/6 implementations have their own record of which block is updated on-disk, and for powerloss case they will rebuild involved stripes. Since btrfs doesn't has such ability, we need to scrub the whole fs to regain the disk loss tolerance (and hope there will not be another power loss during it) The RAID6 special problem is the missing of rebuilt retry logic. (Not any more after 4.16 kernel, but still missing btrfs-progs support) For a RAID6 stripe like: Disk 1 |Disk 2 | Disk 3 |Disk 4 DATA1 |DATA2 | P| Q If data read from DATA1 failed, we have 3 ways to rebuild the data: 1) Using DATA2 and P (just as RAID5) 2) Using P and Q 3) Using DATA2 and Q However until 4.16 we won't retry all possible ways to build it. (Thanks Liu for solving this problem). Thanks, Qu > >> I have removed the failed drive, so the RAID is now degraded. I hope >> the data is still recoverable... ☹ >> > signature.asc Description: OpenPGP digital signature
Re: Need help with potential ~45TB dataloss
On 2018/12/3 上午8:35, Qu Wenruo wrote: > > > On 2018/12/2 下午5:03, Patrick Dijkgraaf wrote: >> Hi Qu, >> >> Thanks for helping me! >> >> Please see the reponses in-line. >> Any suggestions based on this? >> >> Thanks! >> >> >> On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: >>> On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: >>>> Hi all, >>>> >>>> I have been a happy BTRFS user for quite some time. But now I'm >>>> facing >>>> a potential ~45TB dataloss... :-( >>>> I hope someone can help! >>>> >>>> I have Server A and Server B. Both having a 20-devices BTRFS RAID6 >>>> filesystem. I forgot one important thing here, specially for RAID6. If one data device corrupted, RAID6 will normally try to rebuild using RAID5 way, and if another one disk get corrupted, it may not recover correctly. Current way to recover is try *all* combination. IIRC Liu Bo tried such patch but not merged. This means current RAID6 can only handle two missing devices at its best condition. But for corruption, it can only be as good as RAID5. Thanks, Qu > Because of known RAID5/6 risks, Server B was a backup >>>> of >>>> Server A. >>>> After applying updates to server B and reboot, the FS would not >>>> mount >>>> anymore. Because it was "just" a backup. I decided to recreate the >>>> FS >>>> and perform a new backup. Later, I discovered that the FS was not >>>> broken, but I faced this issue: >>>> https://patchwork.kernel.org/patch/10694997/ >>>> >>> >>> Sorry for the inconvenience. >>> >>> I didn't realize the max_chunk_size limit isn't reliable at that >>> timing. >> >> No problem, I should not have jumped to the conclusion to recreate the >> backup volume. >> >>>> Anyway, the FS was already recreated, so I needed to do a new >>>> backup. >>>> During the backup (using rsync -vah), Server A (the source) >>>> encountered >>>> an I/O error and my rsync failed. In an attempt to "quick fix" the >>>> issue, I rebooted Server A after which the FS would not mount >>>> anymore. >>> >>> Did you have any dmesg about that IO error? >> >> Yes there was. But I omitted capturing it... The system is now rebooted >> and I can't retrieve it anymore. :-( >> >>> And how is the reboot scheduled? Forced power off or normal reboot >>> command? >> >> The system was rebooted using a normal reboot command. > > Then the problem is pretty serious. > > Possibly already corrupted before. > >> >>>> I documented what I have tried, below. I have not yet tried >>>> anything >>>> except what is shown, because I am afraid of causing more harm to >>>> the FS. >>> >>> Pretty clever, no btrfs check --repair is a pretty good move. >>> >>>> I hope somebody here can give me advice on how to (hopefully) >>>> retrieve my data... >>>> >>>> Thanks in advance! >>>> >>>> == >>>> >>>> [root@cornelis ~]# btrfs fi show >>>> Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c-f329fc3795fd >>>>Total devices 1 FS bytes used 463.92GiB >>>>devid1 size 800.00GiB used 493.02GiB path >>>> /dev/mapper/cornelis-cornelis--btrfs >>>> >>>> Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 >>>>Total devices 20 FS bytes used 44.85TiB >>>>devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 >>>>devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 >>>>devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 >>>>devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 >>>>devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 >>>>devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 >>>>devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 >>>>devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 >>>>devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 >>>>devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 >>>>devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 >>>>devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 >>>>devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 >>>>devid 14 size 3
Re: Need help with potential ~45TB dataloss
On 2018/12/2 下午5:03, Patrick Dijkgraaf wrote: > Hi Qu, > > Thanks for helping me! > > Please see the reponses in-line. > Any suggestions based on this? > > Thanks! > > > On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: >> On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: >>> Hi all, >>> >>> I have been a happy BTRFS user for quite some time. But now I'm >>> facing >>> a potential ~45TB dataloss... :-( >>> I hope someone can help! >>> >>> I have Server A and Server B. Both having a 20-devices BTRFS RAID6 >>> filesystem. Because of known RAID5/6 risks, Server B was a backup >>> of >>> Server A. >>> After applying updates to server B and reboot, the FS would not >>> mount >>> anymore. Because it was "just" a backup. I decided to recreate the >>> FS >>> and perform a new backup. Later, I discovered that the FS was not >>> broken, but I faced this issue: >>> https://patchwork.kernel.org/patch/10694997/ >>> >> >> Sorry for the inconvenience. >> >> I didn't realize the max_chunk_size limit isn't reliable at that >> timing. > > No problem, I should not have jumped to the conclusion to recreate the > backup volume. > >>> Anyway, the FS was already recreated, so I needed to do a new >>> backup. >>> During the backup (using rsync -vah), Server A (the source) >>> encountered >>> an I/O error and my rsync failed. In an attempt to "quick fix" the >>> issue, I rebooted Server A after which the FS would not mount >>> anymore. >> >> Did you have any dmesg about that IO error? > > Yes there was. But I omitted capturing it... The system is now rebooted > and I can't retrieve it anymore. :-( > >> And how is the reboot scheduled? Forced power off or normal reboot >> command? > > The system was rebooted using a normal reboot command. Then the problem is pretty serious. Possibly already corrupted before. > >>> I documented what I have tried, below. I have not yet tried >>> anything >>> except what is shown, because I am afraid of causing more harm to >>> the FS. >> >> Pretty clever, no btrfs check --repair is a pretty good move. >> >>> I hope somebody here can give me advice on how to (hopefully) >>> retrieve my data... >>> >>> Thanks in advance! >>> >>> == >>> >>> [root@cornelis ~]# btrfs fi show >>> Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c-f329fc3795fd >>> Total devices 1 FS bytes used 463.92GiB >>> devid1 size 800.00GiB used 493.02GiB path >>> /dev/mapper/cornelis-cornelis--btrfs >>> >>> Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 >>> Total devices 20 FS bytes used 44.85TiB >>> devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 >>> devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 >>> devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 >>> devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 >>> devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 >>> devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 >>> devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 >>> devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 >>> devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 >>> devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 >>> devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 >>> devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 >>> devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 >>> devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2 >>> devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2 >>> devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2 >>> devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2 >>> devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2 >>> devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2 >>> devid 20 size 7.28TiB used 588.80GiB path /dev/sde2 >>> >>> [root@cornelis ~]# mount /dev/sdn2 /mnt/data >>> mount: /mnt/data: wrong fs type, bad option, bad superblock on >>> /dev/sdn2, missing codepage or helper program, or other error. >> >> What is the dmesg of the mount failure? > > [Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): disk space caching > is enabled > [Sun Dec 2 09:41:08 2018] BTRFS inf
Re: Need help with potential ~45TB dataloss
02.12.2018 23:14, Patrick Dijkgraaf пишет: > I have some additional info. > > I found the reason the FS got corrupted. It was a single failing drive, > which caused the entire cabinet (containing 7 drives) to reset. So the > FS suddenly lost 7 drives. > This remains mystery for me. btrfs is marketed to be always consistent on disk - you either have previous full transaction or current full transaction. If current transaction was interrupted the promise is you are left with previous valid consistent transaction. Obviously this is not what happens in practice. Which nullifies the main selling point of btrfs. Unless this is expected behavior, it sounds like some barriers are missing and summary data is updated before (and without waiting for) subordinate data. And if it is expected behavior ... > I have removed the failed drive, so the RAID is now degraded. I hope > the data is still recoverable... ☹ >
Re: Need help with potential ~45TB dataloss
I have some additional info. I found the reason the FS got corrupted. It was a single failing drive, which caused the entire cabinet (containing 7 drives) to reset. So the FS suddenly lost 7 drives. I have removed the failed drive, so the RAID is now degraded. I hope the data is still recoverable... ☹ -- Groet / Cheers, Patrick Dijkgraaf On Sun, 2018-12-02 at 10:03 +0100, Patrick Dijkgraaf wrote: > Hi Qu, > > Thanks for helping me! > > Please see the reponses in-line. > Any suggestions based on this? > > Thanks! > > > On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: > > On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > > > Hi all, > > > > > > I have been a happy BTRFS user for quite some time. But now I'm > > > facing > > > a potential ~45TB dataloss... :-( > > > I hope someone can help! > > > > > > I have Server A and Server B. Both having a 20-devices BTRFS > > > RAID6 > > > filesystem. Because of known RAID5/6 risks, Server B was a backup > > > of > > > Server A. > > > After applying updates to server B and reboot, the FS would not > > > mount > > > anymore. Because it was "just" a backup. I decided to recreate > > > the > > > FS > > > and perform a new backup. Later, I discovered that the FS was not > > > broken, but I faced this issue: > > > https://patchwork.kernel.org/patch/10694997/ > > > > > > > > > > Sorry for the inconvenience. > > > > I didn't realize the max_chunk_size limit isn't reliable at that > > timing. > > No problem, I should not have jumped to the conclusion to recreate > the > backup volume. > > > > Anyway, the FS was already recreated, so I needed to do a new > > > backup. > > > During the backup (using rsync -vah), Server A (the source) > > > encountered > > > an I/O error and my rsync failed. In an attempt to "quick fix" > > > the > > > issue, I rebooted Server A after which the FS would not mount > > > anymore. > > > > Did you have any dmesg about that IO error? > > Yes there was. But I omitted capturing it... The system is now > rebooted > and I can't retrieve it anymore. :-( > > > And how is the reboot scheduled? Forced power off or normal reboot > > command? > > The system was rebooted using a normal reboot command. > > > > I documented what I have tried, below. I have not yet tried > > > anything > > > except what is shown, because I am afraid of causing more harm to > > > the FS. > > > > Pretty clever, no btrfs check --repair is a pretty good move. > > > > > I hope somebody here can give me advice on how to (hopefully) > > > retrieve my data... > > > > > > Thanks in advance! > > > > > > == > > > > > > [root@cornelis ~]# btrfs fi show > > > Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c- > > > f329fc3795fd > > > Total devices 1 FS bytes used 463.92GiB > > > devid1 size 800.00GiB used 493.02GiB path > > > /dev/mapper/cornelis-cornelis--btrfs > > > > > > Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 > > > Total devices 20 FS bytes used 44.85TiB > > > devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 > > > devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 > > > devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 > > > devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 > > > devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 > > > devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 > > > devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 > > > devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 > > > devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 > > > devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 > > > devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 > > > devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 > > > devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 > > > devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2 > > > devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2 > > > devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2 > > > devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2 > > > devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2 > > > devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2 > > > devid
Re: Need help with potential ~45TB dataloss
Hi Qu, Thanks for helping me! Please see the reponses in-line. Any suggestions based on this? Thanks! On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: > On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > > Hi all, > > > > I have been a happy BTRFS user for quite some time. But now I'm > > facing > > a potential ~45TB dataloss... :-( > > I hope someone can help! > > > > I have Server A and Server B. Both having a 20-devices BTRFS RAID6 > > filesystem. Because of known RAID5/6 risks, Server B was a backup > > of > > Server A. > > After applying updates to server B and reboot, the FS would not > > mount > > anymore. Because it was "just" a backup. I decided to recreate the > > FS > > and perform a new backup. Later, I discovered that the FS was not > > broken, but I faced this issue: > > https://patchwork.kernel.org/patch/10694997/ > > > > Sorry for the inconvenience. > > I didn't realize the max_chunk_size limit isn't reliable at that > timing. No problem, I should not have jumped to the conclusion to recreate the backup volume. > > Anyway, the FS was already recreated, so I needed to do a new > > backup. > > During the backup (using rsync -vah), Server A (the source) > > encountered > > an I/O error and my rsync failed. In an attempt to "quick fix" the > > issue, I rebooted Server A after which the FS would not mount > > anymore. > > Did you have any dmesg about that IO error? Yes there was. But I omitted capturing it... The system is now rebooted and I can't retrieve it anymore. :-( > And how is the reboot scheduled? Forced power off or normal reboot > command? The system was rebooted using a normal reboot command. > > I documented what I have tried, below. I have not yet tried > > anything > > except what is shown, because I am afraid of causing more harm to > > the FS. > > Pretty clever, no btrfs check --repair is a pretty good move. > > > I hope somebody here can give me advice on how to (hopefully) > > retrieve my data... > > > > Thanks in advance! > > > > == > > > > [root@cornelis ~]# btrfs fi show > > Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c-f329fc3795fd > > Total devices 1 FS bytes used 463.92GiB > > devid1 size 800.00GiB used 493.02GiB path > > /dev/mapper/cornelis-cornelis--btrfs > > > > Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 > > Total devices 20 FS bytes used 44.85TiB > > devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 > > devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 > > devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 > > devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 > > devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 > > devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 > > devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 > > devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 > > devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 > > devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 > > devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 > > devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 > > devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 > > devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2 > > devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2 > > devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2 > > devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2 > > devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2 > > devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2 > > devid 20 size 7.28TiB used 588.80GiB path /dev/sde2 > > > > [root@cornelis ~]# mount /dev/sdn2 /mnt/data > > mount: /mnt/data: wrong fs type, bad option, bad superblock on > > /dev/sdn2, missing codepage or helper program, or other error. > > What is the dmesg of the mount failure? [Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): disk space caching is enabled [Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): has skinny extents [Sun Dec 2 09:41:08 2018] BTRFS error (device sdn2): parent transid verify failed on 46451963543552 wanted 114401 found 114173 [Sun Dec 2 09:41:08 2018] BTRFS critical (device sdn2): corrupt leaf: root=2 block=46451963543552 slot=0, unexpected item end, have 1387359977 expect 16283 [Sun Dec 2 09:41:08 2018] BTRFS warning (device sdn2): failed to read tree root [Sun Dec 2 09:41:08 2018] BTRFS error (device sdn2): open_ctree failed > And have you tried -o ro,degraded
Re: Need help with potential ~45TB dataloss
On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > Hi all, > > I have been a happy BTRFS user for quite some time. But now I'm facing > a potential ~45TB dataloss... :-( > I hope someone can help! > > I have Server A and Server B. Both having a 20-devices BTRFS RAID6 > filesystem. Because of known RAID5/6 risks, Server B was a backup of > Server A. > After applying updates to server B and reboot, the FS would not mount > anymore. Because it was "just" a backup. I decided to recreate the FS > and perform a new backup. Later, I discovered that the FS was not > broken, but I faced this issue: > https://patchwork.kernel.org/patch/10694997/ Sorry for the inconvenience. I didn't realize the max_chunk_size limit isn't reliable at that timing. > > Anyway, the FS was already recreated, so I needed to do a new backup. > During the backup (using rsync -vah), Server A (the source) encountered > an I/O error and my rsync failed. In an attempt to "quick fix" the > issue, I rebooted Server A after which the FS would not mount anymore. Did you have any dmesg about that IO error? And how is the reboot scheduled? Forced power off or normal reboot command? > > I documented what I have tried, below. I have not yet tried anything > except what is shown, because I am afraid of causing more harm to > the FS. Pretty clever, no btrfs check --repair is a pretty good move. > I hope somebody here can give me advice on how to (hopefully) > retrieve my data... > > Thanks in advance! > > == > > [root@cornelis ~]# btrfs fi show > Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c-f329fc3795fd > Total devices 1 FS bytes used 463.92GiB > devid1 size 800.00GiB used 493.02GiB path > /dev/mapper/cornelis-cornelis--btrfs > > Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 > Total devices 20 FS bytes used 44.85TiB > devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 > devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 > devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 > devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 > devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 > devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 > devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 > devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 > devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 > devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 > devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 > devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 > devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 > devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2 > devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2 > devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2 > devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2 > devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2 > devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2 > devid 20 size 7.28TiB used 588.80GiB path /dev/sde2 > > [root@cornelis ~]# mount /dev/sdn2 /mnt/data > mount: /mnt/data: wrong fs type, bad option, bad superblock on > /dev/sdn2, missing codepage or helper program, or other error. What is the dmesg of the mount failure? And have you tried -o ro,degraded ? > > [root@cornelis ~]# btrfs check /dev/sdn2 > Opening filesystem to check... > parent transid verify failed on 46451963543552 wanted 114401 found > 114173 > parent transid verify failed on 46451963543552 wanted 114401 found > 114173 > checksum verify failed on 46451963543552 found A8F2A769 wanted 4C111ADF > checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4 > checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4 > bad tree block 46451963543552, bytenr mismatch, want=46451963543552, > have=75208089814272 > Couldn't read tree root Would you please also paste the output of "btrfs ins dump-super /dev/sdn2" ? It looks like your tree root (or at least some tree root nodes/leaves get corrupted) > ERROR: cannot open file system And since it's your tree root corrupted, you could also try "btrfs-find-root " to try to get a good old copy of your tree root. But I suspect the corruption happens before you noticed, thus the old tree root may not help much. Also, the output of "btrfs ins dump-tree -t root " will help. Thanks, Qu > > [root@cornelis ~]# btrfs restore /dev/sdn2 /mnt/data/ > parent transid verify failed on 46451963543552 wanted 114401 found > 114173 > parent transid verify failed on 46451963543552 wanted 114401 found > 114173 > checksum verify failed on 464
Need help with potential ~45TB dataloss
Hi all, I have been a happy BTRFS user for quite some time. But now I'm facing a potential ~45TB dataloss... :-( I hope someone can help! I have Server A and Server B. Both having a 20-devices BTRFS RAID6 filesystem. Because of known RAID5/6 risks, Server B was a backup of Server A. After applying updates to server B and reboot, the FS would not mount anymore. Because it was "just" a backup. I decided to recreate the FS and perform a new backup. Later, I discovered that the FS was not broken, but I faced this issue: https://patchwork.kernel.org/patch/10694997/ Anyway, the FS was already recreated, so I needed to do a new backup. During the backup (using rsync -vah), Server A (the source) encountered an I/O error and my rsync failed. In an attempt to "quick fix" the issue, I rebooted Server A after which the FS would not mount anymore. I documented what I have tried, below. I have not yet tried anything except what is shown, because I am afraid of causing more harm to the FS. I hope somebody here can give me advice on how to (hopefully) retrieve my data... Thanks in advance! == [root@cornelis ~]# btrfs fi show Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c-f329fc3795fd Total devices 1 FS bytes used 463.92GiB devid1 size 800.00GiB used 493.02GiB path /dev/mapper/cornelis-cornelis--btrfs Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 Total devices 20 FS bytes used 44.85TiB devid1 size 3.64TiB used 3.64TiB path /dev/sdn2 devid2 size 3.64TiB used 3.64TiB path /dev/sdp2 devid3 size 3.64TiB used 3.64TiB path /dev/sdu2 devid4 size 3.64TiB used 3.64TiB path /dev/sdx2 devid5 size 3.64TiB used 3.64TiB path /dev/sdh2 devid6 size 3.64TiB used 3.64TiB path /dev/sdg2 devid7 size 3.64TiB used 3.64TiB path /dev/sdm2 devid8 size 3.64TiB used 3.64TiB path /dev/sdw2 devid9 size 3.64TiB used 3.64TiB path /dev/sdj2 devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2 devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2 devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2 devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2 devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2 devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2 devid 20 size 7.28TiB used 588.80GiB path /dev/sde2 [root@cornelis ~]# mount /dev/sdn2 /mnt/data mount: /mnt/data: wrong fs type, bad option, bad superblock on /dev/sdn2, missing codepage or helper program, or other error. [root@cornelis ~]# btrfs check /dev/sdn2 Opening filesystem to check... parent transid verify failed on 46451963543552 wanted 114401 found 114173 parent transid verify failed on 46451963543552 wanted 114401 found 114173 checksum verify failed on 46451963543552 found A8F2A769 wanted 4C111ADF checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4 checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4 bad tree block 46451963543552, bytenr mismatch, want=46451963543552, have=75208089814272 Couldn't read tree root ERROR: cannot open file system [root@cornelis ~]# btrfs restore /dev/sdn2 /mnt/data/ parent transid verify failed on 46451963543552 wanted 114401 found 114173 parent transid verify failed on 46451963543552 wanted 114401 found 114173 checksum verify failed on 46451963543552 found A8F2A769 wanted 4C111ADF checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4 checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4 bad tree block 46451963543552, bytenr mismatch, want=46451963543552, have=75208089814272 Couldn't read tree root Could not open root, trying backup super warning, device 14 is missing warning, device 13 is missing warning, device 12 is missing warning, device 11 is missing warning, device 10 is missing warning, device 9 is missing warning, device 8 is missing warning, device 7 is missing warning, device 6 is missing warning, device 5 is missing warning, device 4 is missing warning, device 3 is missing warning, device 2 is missing checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0 checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0 bad tree block 22085632, bytenr mismatch, want=22085632, have=1147797504 ERROR: cannot read chunk root Could not open root, trying backup super warning, device 14 is missing warning, device 13 is missing warning, device 12 is missing warning, device 11 is missing warning, device 10 is missing warning, device 9 is missing warning, device 8 is missing warning, device 7 is missing warning, device 6 is missing warning, device 5 is missi
[PATCH v2 15/20] btrfs-progs: sub list: Update help message of -d option
Explicitly states that -d requires root privileges. Also, update some option handling with regard to -d option. Signed-off-by: Misono Tomohiro --- Documentation/btrfs-subvolume.asciidoc | 3 ++- cmds-subvolume.c | 8 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/btrfs-subvolume.asciidoc b/Documentation/btrfs-subvolume.asciidoc index 0381c92c..2db1d479 100644 --- a/Documentation/btrfs-subvolume.asciidoc +++ b/Documentation/btrfs-subvolume.asciidoc @@ -149,7 +149,8 @@ only snapshot subvolumes in the filesystem will be listed. -r only readonly subvolumes in the filesystem will be listed. -d -list deleted subvolumes that are not yet cleaned. +list deleted subvolumes that are not yet cleaned +(require root privileges). Other;; -t diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 552c6dea..ef39789a 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -1569,6 +1569,7 @@ static const char * const cmd_subvol_list_usage[] = { "-s list only snapshots", "-r list readonly subvolumes (including snapshots)", "-d list deleted subvolumes that are not yet cleaned", + " (require root privileges)", "", "Other:", "-t print the result as a table", @@ -1744,6 +1745,13 @@ static int cmd_subvol_list(int argc, char **argv) goto out; } + if (filter_set->only_deleted && + (is_list_all || absolute_path || follow_mount)) { + ret = -1; + error("cannot use -d with -a/f/A option"); + goto out; + } + subvol = argv[optind]; fd = btrfs_open_dir(subvol, , 1); if (fd < 0) { -- 2.14.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 14/20] btrfs-progs: sub list: Update help message of -o option
Currently "sub list -o" lists only child subvolumes of the specified path. So, update help message and variable name more appropriately. Signed-off-by: Misono Tomohiro --- Documentation/btrfs-subvolume.asciidoc | 2 +- cmds-subvolume.c | 10 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/btrfs-subvolume.asciidoc b/Documentation/btrfs-subvolume.asciidoc index 20fae1e1..0381c92c 100644 --- a/Documentation/btrfs-subvolume.asciidoc +++ b/Documentation/btrfs-subvolume.asciidoc @@ -116,7 +116,7 @@ or at mount time via the subvolid= option. + Path filtering;; -o -print only subvolumes below specified . +print only subvolumes which the subvolume of contains. -a print all the subvolumes in the filesystem, including subvolumes which cannot be accessed from current mount point. diff --git a/cmds-subvolume.c b/cmds-subvolume.c index dab266aa..552c6dea 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -1550,7 +1550,7 @@ static const char * const cmd_subvol_list_usage[] = { "It is possible to specify non-subvolume directory as .", "", "Path filtering:", - "-o print only subvolumes below specified path", + "-o print only subvolumes which the subvolume of contains", "-a print all the subvolumes in the filesystem.", " path to be shown is relative to the top-level", " subvolume (require root privileges)", @@ -1605,7 +1605,7 @@ static int cmd_subvol_list(int argc, char **argv) int follow_mount = 0; int sort = 0; int no_sort = 0; - int is_only_in_path = 0; + int is_only_child = 0; int absolute_path = 0; DIR *dirstream = NULL; enum btrfs_list_layout layout = BTRFS_LIST_LAYOUT_DEFAULT; @@ -1651,7 +1651,7 @@ static int cmd_subvol_list(int argc, char **argv) btrfs_list_setup_print_column_v2(BTRFS_LIST_GENERATION); break; case 'o': - is_only_in_path = 1; + is_only_child = 1; break; case 't': layout = BTRFS_LIST_LAYOUT_TABLE; @@ -1732,7 +1732,7 @@ static int cmd_subvol_list(int argc, char **argv) goto out; } - if (follow_mount && (is_list_all || is_only_in_path)) { + if (follow_mount && (is_list_all || is_only_child)) { ret = -1; error("cannot use -f with -a or -o option"); goto out; @@ -1760,7 +1760,7 @@ static int cmd_subvol_list(int argc, char **argv) if (ret) goto out; - if (is_only_in_path) + if (is_only_child) btrfs_list_setup_filter_v2(_set, BTRFS_LIST_FILTER_TOPID_EQUAL, top_id); -- 2.14.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/18] btrfs-progs: reorder placement of help declarations for send/receive
From: Jeff MahoneyThe usage definitions for send and receive follow the command definitions, which use them. This works because we declare them in commands.h. When we move to using cmd_struct as the entry point, these declarations will be removed, breaking the commands. Since that would be an otherwise unrelated change, this patch reorders them separately. Signed-off-by: Jeff Mahoney --- cmds-receive.c | 62 ++-- cmds-send.c| 69 +- 2 files changed, 66 insertions(+), 65 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 68123a31..b3709f36 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -1248,6 +1248,37 @@ out: return ret; } +const char * const cmd_receive_usage[] = { + "btrfs receive [options] \n" + "btrfs receive --dump [options]", + "Receive subvolumes from a stream", + "Receives one or more subvolumes that were previously", + "sent with btrfs send. The received subvolumes are stored", + "into MOUNT.", + "The receive will fail in case the receiving subvolume", + "already exists. It will also fail in case a previously", + "received subvolume has been changed after it was received.", + "After receiving a subvolume, it is immediately set to", + "read-only.", + "", + "-v increase verbosity about performed actions", + "-f FILE read the stream from FILE instead of stdin", + "-e terminate after receiving an marker in the stream.", + " Without this option the receiver side terminates only in case", + " of an error on end of file.", + "-C|--chroot confine the process to using chroot", + "-E|--max-errors NERR", + " terminate as soon as NERR errors occur while", + " stream processing commands from the stream.", + " Default value is 1. A value of 0 means no limit.", + "-m ROOTMOUNT the root mount point of the destination filesystem.", + " If /proc is not accessible, use this to tell us where", + " this file system is mounted.", + "--dump dump stream metadata, one line per operation,", + " does not require the MOUNT parameter", + NULL +}; + int cmd_receive(int argc, char **argv) { char *tomnt = NULL; @@ -1357,34 +1388,3 @@ out: return !!ret; } - -const char * const cmd_receive_usage[] = { - "btrfs receive [options] \n" - "btrfs receive --dump [options]", - "Receive subvolumes from a stream", - "Receives one or more subvolumes that were previously", - "sent with btrfs send. The received subvolumes are stored", - "into MOUNT.", - "The receive will fail in case the receiving subvolume", - "already exists. It will also fail in case a previously", - "received subvolume has been changed after it was received.", - "After receiving a subvolume, it is immediately set to", - "read-only.", - "", - "-v increase verbosity about performed actions", - "-f FILE read the stream from FILE instead of stdin", - "-e terminate after receiving an marker in the stream.", - " Without this option the receiver side terminates only in case", - " of an error on end of file.", - "-C|--chroot confine the process to using chroot", - "-E|--max-errors NERR", - " terminate as soon as NERR errors occur while", - " stream processing commands from the stream.", - " Default value is 1. A value of 0 means no limit.", - "-m ROOTMOUNT the root mount point of the destination filesystem.", - " If /proc is not accessible, use this to tell us where", - " this file system is mounted.", - "--dump dump stream metadata, one line per operation,", - " does not require the MOUNT parameter", - NULL -}; diff --git a/cmds-send.c b/cmds-send.c index c5ecdaa1..8365e9c9 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -489,6 +489,41 @@ static void free_send_info(struct btrfs_send *sctx) subvol_uuid_search_finit(>sus); } + +const char * const cmd_send_usage[] = { + "btrfs send [-ve] [-p ] [-c ] [-f ] [...]", + "Send the subvolume(s) to stdout.", + "Sends the subvolume(s) specified by to stdout.", + " should be read-only here.", + "By default, this will send the whole subvolume. To do an incremental", + "send, use '-p '. If you want to allow btrfs to clone from", + "any additional local snapshots, use '-c ' (multiple times", +
[PATCH 09/18] btrfs-progs: help: convert ints used as bools to bool
From: Jeff Mahoney <je...@suse.com> We use an int for 'full', 'all', and 'err' when we really mean a boolean. Reviewed-by: Qu Wenruo <w...@suse.com> Signed-off-by: Jeff Mahoney <je...@suse.com> --- btrfs.c | 14 +++--- help.c | 25 + help.h | 4 ++-- 3 files changed, 22 insertions(+), 21 deletions(-) diff --git a/btrfs.c b/btrfs.c index 2d39f2ce..fec1a135 100644 --- a/btrfs.c +++ b/btrfs.c @@ -109,7 +109,7 @@ static void handle_help_options_next_level(const struct cmd_struct *cmd, argv++; help_command_group(cmd->next, argc, argv); } else { - usage_command(cmd, 1, 0); + usage_command(cmd, true, false); } exit(0); @@ -125,7 +125,7 @@ int handle_command_group(const struct cmd_group *grp, int argc, argc--; argv++; if (argc < 1) { - usage_command_group(grp, 0, 0); + usage_command_group(grp, false, false); exit(1); } @@ -212,20 +212,20 @@ static int handle_global_options(int argc, char **argv) void handle_special_globals(int shift, int argc, char **argv) { - int has_help = 0; - int has_full = 0; + bool has_help = false; + bool has_full = false; int i; for (i = 0; i < shift; i++) { if (strcmp(argv[i], "--help") == 0) - has_help = 1; + has_help = true; else if (strcmp(argv[i], "--full") == 0) - has_full = 1; + has_full = true; } if (has_help) { if (has_full) - usage_command_group(_cmd_group, 1, 0); + usage_command_group(_cmd_group, true, false); else cmd_help(argc, argv); exit(0); diff --git a/help.c b/help.c index f1dd3946..99fd325b 100644 --- a/help.c +++ b/help.c @@ -196,8 +196,8 @@ static int do_usage_one_command(const char * const *usagestr, } static int usage_command_internal(const char * const *usagestr, - const char *token, int full, int lst, - int alias, FILE *outf) + const char *token, bool full, bool lst, + bool alias, FILE *outf) { unsigned int flags = 0; int ret; @@ -223,17 +223,17 @@ static int usage_command_internal(const char * const *usagestr, } static void usage_command_usagestr(const char * const *usagestr, - const char *token, int full, int err) + const char *token, bool full, bool err) { FILE *outf = err ? stderr : stdout; int ret; - ret = usage_command_internal(usagestr, token, full, 0, 0, outf); + ret = usage_command_internal(usagestr, token, full, false, false, outf); if (!ret) fputc('\n', outf); } -void usage_command(const struct cmd_struct *cmd, int full, int err) +void usage_command(const struct cmd_struct *cmd, bool full, bool err) { usage_command_usagestr(cmd->usagestr, cmd->token, full, err); } @@ -241,11 +241,11 @@ void usage_command(const struct cmd_struct *cmd, int full, int err) __attribute__((noreturn)) void usage(const char * const *usagestr) { - usage_command_usagestr(usagestr, NULL, 1, 1); + usage_command_usagestr(usagestr, NULL, true, true); exit(1); } -static void usage_command_group_internal(const struct cmd_group *grp, int full, +static void usage_command_group_internal(const struct cmd_group *grp, bool full, FILE *outf) { const struct cmd_struct *cmd = grp->commands; @@ -265,7 +265,8 @@ static void usage_command_group_internal(const struct cmd_group *grp, int full, } usage_command_internal(cmd->usagestr, cmd->token, full, - 1, cmd->flags & CMD_ALIAS, outf); + true, cmd->flags & CMD_ALIAS, + outf); if (cmd->flags & CMD_ALIAS) putchar('\n'); continue; @@ -327,7 +328,7 @@ void usage_command_group_short(const struct cmd_group *grp) fprintf(stderr, "All command groups have their manual page named 'btrfs-'.\n"); } -void usage_command_group(const struct cmd_group *grp, int full, int err) +void usage_command_group(const struct cmd_group *grp, bool full, bool err) { const char * const *usagestr = grp->usagestr; FILE *outf = err ? stderr : stdout; @@ -350,7 +351,7 @@ __attribute__((nore
Help
Dear Sir/Madam, I am Sgt Swanson Dennis, I have a good business proposal for you. There are no risks involved and it is easy. Please reply for briefs and procedures. Best regards, Sgt Swanson Dennis -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/20] btrfs-progs: help: convert ints used as bools to bool
On 2018年03月08日 10:40, je...@suse.com wrote: > From: Jeff Mahoney <je...@suse.com> > > We use an int for 'full', 'all', and 'err' when we really mean a boolean. > > Signed-off-by: Jeff Mahoney <je...@suse.com> Reviewed-by: Qu Wenruo <w...@suse.com> Thanks, Qu > --- > btrfs.c | 14 +++--- > help.c | 25 + > help.h | 4 ++-- > 3 files changed, 22 insertions(+), 21 deletions(-) > > diff --git a/btrfs.c b/btrfs.c > index 2d39f2ce..fec1a135 100644 > --- a/btrfs.c > +++ b/btrfs.c > @@ -109,7 +109,7 @@ static void handle_help_options_next_level(const struct > cmd_struct *cmd, > argv++; > help_command_group(cmd->next, argc, argv); > } else { > - usage_command(cmd, 1, 0); > + usage_command(cmd, true, false); > } > > exit(0); > @@ -125,7 +125,7 @@ int handle_command_group(const struct cmd_group *grp, int > argc, > argc--; > argv++; > if (argc < 1) { > - usage_command_group(grp, 0, 0); > + usage_command_group(grp, false, false); > exit(1); > } > > @@ -212,20 +212,20 @@ static int handle_global_options(int argc, char **argv) > > void handle_special_globals(int shift, int argc, char **argv) > { > - int has_help = 0; > - int has_full = 0; > + bool has_help = false; > + bool has_full = false; > int i; > > for (i = 0; i < shift; i++) { > if (strcmp(argv[i], "--help") == 0) > - has_help = 1; > + has_help = true; > else if (strcmp(argv[i], "--full") == 0) > - has_full = 1; > + has_full = true; > } > > if (has_help) { > if (has_full) > - usage_command_group(_cmd_group, 1, 0); > + usage_command_group(_cmd_group, true, false); > else > cmd_help(argc, argv); > exit(0); > diff --git a/help.c b/help.c > index 311a4320..ef7986b4 100644 > --- a/help.c > +++ b/help.c > @@ -196,8 +196,8 @@ static int do_usage_one_command(const char * const > *usagestr, > } > > static int usage_command_internal(const char * const *usagestr, > - const char *token, int full, int lst, > - int alias, FILE *outf) > + const char *token, bool full, bool lst, > + bool alias, FILE *outf) > { > unsigned int flags = 0; > int ret; > @@ -223,17 +223,17 @@ static int usage_command_internal(const char * const > *usagestr, > } > > static void usage_command_usagestr(const char * const *usagestr, > -const char *token, int full, int err) > +const char *token, bool full, bool err) > { > FILE *outf = err ? stderr : stdout; > int ret; > > - ret = usage_command_internal(usagestr, token, full, 0, 0, outf); > + ret = usage_command_internal(usagestr, token, full, false, false, outf); > if (!ret) > fputc('\n', outf); > } > > -void usage_command(const struct cmd_struct *cmd, int full, int err) > +void usage_command(const struct cmd_struct *cmd, bool full, bool err) > { > usage_command_usagestr(cmd->usagestr, cmd->token, full, err); > } > @@ -241,11 +241,11 @@ void usage_command(const struct cmd_struct *cmd, int > full, int err) > __attribute__((noreturn)) > void usage(const char * const *usagestr) > { > - usage_command_usagestr(usagestr, NULL, 1, 1); > + usage_command_usagestr(usagestr, NULL, true, true); > exit(1); > } > > -static void usage_command_group_internal(const struct cmd_group *grp, int > full, > +static void usage_command_group_internal(const struct cmd_group *grp, bool > full, >FILE *outf) > { > const struct cmd_struct *cmd = grp->commands; > @@ -265,7 +265,8 @@ static void usage_command_group_internal(const struct > cmd_group *grp, int full, > } > > usage_command_internal(cmd->usagestr, cmd->token, full, > -1, cmd->flags & CMD_ALIAS, outf); > +true, cmd->flags & CMD_ALIAS, > +outf); > if (cmd->flags & CMD_
[PATCH 11/20] btrfs-progs: reorder placement of help declarations for send/receive
From: Jeff MahoneyThe usage definitions for send and receive follow the command definitions, which use them. This works because we declare them in commands.h. When we move to using cmd_struct as the entry point, these declarations will be removed, breaking the commands. Since that would be an otherwise unrelated change, this patch reorders them separately. Signed-off-by: Jeff Mahoney --- cmds-receive.c | 62 ++-- cmds-send.c| 69 +- 2 files changed, 66 insertions(+), 65 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 68123a31..b3709f36 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -1248,6 +1248,37 @@ out: return ret; } +const char * const cmd_receive_usage[] = { + "btrfs receive [options] \n" + "btrfs receive --dump [options]", + "Receive subvolumes from a stream", + "Receives one or more subvolumes that were previously", + "sent with btrfs send. The received subvolumes are stored", + "into MOUNT.", + "The receive will fail in case the receiving subvolume", + "already exists. It will also fail in case a previously", + "received subvolume has been changed after it was received.", + "After receiving a subvolume, it is immediately set to", + "read-only.", + "", + "-v increase verbosity about performed actions", + "-f FILE read the stream from FILE instead of stdin", + "-e terminate after receiving an marker in the stream.", + " Without this option the receiver side terminates only in case", + " of an error on end of file.", + "-C|--chroot confine the process to using chroot", + "-E|--max-errors NERR", + " terminate as soon as NERR errors occur while", + " stream processing commands from the stream.", + " Default value is 1. A value of 0 means no limit.", + "-m ROOTMOUNT the root mount point of the destination filesystem.", + " If /proc is not accessible, use this to tell us where", + " this file system is mounted.", + "--dump dump stream metadata, one line per operation,", + " does not require the MOUNT parameter", + NULL +}; + int cmd_receive(int argc, char **argv) { char *tomnt = NULL; @@ -1357,34 +1388,3 @@ out: return !!ret; } - -const char * const cmd_receive_usage[] = { - "btrfs receive [options] \n" - "btrfs receive --dump [options]", - "Receive subvolumes from a stream", - "Receives one or more subvolumes that were previously", - "sent with btrfs send. The received subvolumes are stored", - "into MOUNT.", - "The receive will fail in case the receiving subvolume", - "already exists. It will also fail in case a previously", - "received subvolume has been changed after it was received.", - "After receiving a subvolume, it is immediately set to", - "read-only.", - "", - "-v increase verbosity about performed actions", - "-f FILE read the stream from FILE instead of stdin", - "-e terminate after receiving an marker in the stream.", - " Without this option the receiver side terminates only in case", - " of an error on end of file.", - "-C|--chroot confine the process to using chroot", - "-E|--max-errors NERR", - " terminate as soon as NERR errors occur while", - " stream processing commands from the stream.", - " Default value is 1. A value of 0 means no limit.", - "-m ROOTMOUNT the root mount point of the destination filesystem.", - " If /proc is not accessible, use this to tell us where", - " this file system is mounted.", - "--dump dump stream metadata, one line per operation,", - " does not require the MOUNT parameter", - NULL -}; diff --git a/cmds-send.c b/cmds-send.c index c5ecdaa1..8365e9c9 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -489,6 +489,41 @@ static void free_send_info(struct btrfs_send *sctx) subvol_uuid_search_finit(>sus); } + +const char * const cmd_send_usage[] = { + "btrfs send [-ve] [-p ] [-c ] [-f ] [...]", + "Send the subvolume(s) to stdout.", + "Sends the subvolume(s) specified by to stdout.", + " should be read-only here.", + "By default, this will send the whole subvolume. To do an incremental", + "send, use '-p '. If you want to allow btrfs to clone from", + "any additional local snapshots, use '-c ' (multiple times", +
[PATCH 10/20] btrfs-progs: help: convert ints used as bools to bool
From: Jeff Mahoney <je...@suse.com> We use an int for 'full', 'all', and 'err' when we really mean a boolean. Signed-off-by: Jeff Mahoney <je...@suse.com> --- btrfs.c | 14 +++--- help.c | 25 + help.h | 4 ++-- 3 files changed, 22 insertions(+), 21 deletions(-) diff --git a/btrfs.c b/btrfs.c index 2d39f2ce..fec1a135 100644 --- a/btrfs.c +++ b/btrfs.c @@ -109,7 +109,7 @@ static void handle_help_options_next_level(const struct cmd_struct *cmd, argv++; help_command_group(cmd->next, argc, argv); } else { - usage_command(cmd, 1, 0); + usage_command(cmd, true, false); } exit(0); @@ -125,7 +125,7 @@ int handle_command_group(const struct cmd_group *grp, int argc, argc--; argv++; if (argc < 1) { - usage_command_group(grp, 0, 0); + usage_command_group(grp, false, false); exit(1); } @@ -212,20 +212,20 @@ static int handle_global_options(int argc, char **argv) void handle_special_globals(int shift, int argc, char **argv) { - int has_help = 0; - int has_full = 0; + bool has_help = false; + bool has_full = false; int i; for (i = 0; i < shift; i++) { if (strcmp(argv[i], "--help") == 0) - has_help = 1; + has_help = true; else if (strcmp(argv[i], "--full") == 0) - has_full = 1; + has_full = true; } if (has_help) { if (has_full) - usage_command_group(_cmd_group, 1, 0); + usage_command_group(_cmd_group, true, false); else cmd_help(argc, argv); exit(0); diff --git a/help.c b/help.c index 311a4320..ef7986b4 100644 --- a/help.c +++ b/help.c @@ -196,8 +196,8 @@ static int do_usage_one_command(const char * const *usagestr, } static int usage_command_internal(const char * const *usagestr, - const char *token, int full, int lst, - int alias, FILE *outf) + const char *token, bool full, bool lst, + bool alias, FILE *outf) { unsigned int flags = 0; int ret; @@ -223,17 +223,17 @@ static int usage_command_internal(const char * const *usagestr, } static void usage_command_usagestr(const char * const *usagestr, - const char *token, int full, int err) + const char *token, bool full, bool err) { FILE *outf = err ? stderr : stdout; int ret; - ret = usage_command_internal(usagestr, token, full, 0, 0, outf); + ret = usage_command_internal(usagestr, token, full, false, false, outf); if (!ret) fputc('\n', outf); } -void usage_command(const struct cmd_struct *cmd, int full, int err) +void usage_command(const struct cmd_struct *cmd, bool full, bool err) { usage_command_usagestr(cmd->usagestr, cmd->token, full, err); } @@ -241,11 +241,11 @@ void usage_command(const struct cmd_struct *cmd, int full, int err) __attribute__((noreturn)) void usage(const char * const *usagestr) { - usage_command_usagestr(usagestr, NULL, 1, 1); + usage_command_usagestr(usagestr, NULL, true, true); exit(1); } -static void usage_command_group_internal(const struct cmd_group *grp, int full, +static void usage_command_group_internal(const struct cmd_group *grp, bool full, FILE *outf) { const struct cmd_struct *cmd = grp->commands; @@ -265,7 +265,8 @@ static void usage_command_group_internal(const struct cmd_group *grp, int full, } usage_command_internal(cmd->usagestr, cmd->token, full, - 1, cmd->flags & CMD_ALIAS, outf); + true, cmd->flags & CMD_ALIAS, + outf); if (cmd->flags & CMD_ALIAS) putchar('\n'); continue; @@ -327,7 +328,7 @@ void usage_command_group_short(const struct cmd_group *grp) fprintf(stderr, "All command groups have their manual page named 'btrfs-'.\n"); } -void usage_command_group(const struct cmd_group *grp, int full, int err) +void usage_command_group(const struct cmd_group *grp, bool full, bool err) { const char * const *usagestr = grp->usagestr; FILE *outf = err ? stderr : stdout; @@ -350,7 +351,7 @@ __attribute__((noreturn)) void help_unknown_token(const ch
RE: Help with leaf parent key incorrect
> -Original Message- > From: Anand Jain [mailto:anand.j...@oracle.com] > Sent: Monday, 26 February 2018 7:27 PM > To: Paul Jones <p...@pauljones.id.au>; linux-btrfs@vger.kernel.org > Subject: Re: Help with leaf parent key incorrect > > > > > There is one io error in the log below, > > Apparently, that's not a real EIO. We need to fix it. > But can't be the root cause we are looking for here. > > > > Feb 24 22:41:59 home kernel: BTRFS: error (device dm-6) in > btrfs_run_delayed_refs:3076: errno=-5 IO failure > Feb 24 22:41:59 home > kernel: BTRFS info (device dm-6): forced readonly > > static int run_delayed_extent_op(struct btrfs_trans_handle *trans, > struct btrfs_fs_info *fs_info, > struct btrfs_delayed_ref_head *head, > struct btrfs_delayed_extent_op *extent_op) { > :: > > } else { > err = -EIO; > goto out; > } > > > > but other than that I have never had io errors before, or any other > troubles. > > Hm. btrfs dev stat shows real disk IO errors. > As this FS isn't mountable .. pls try >btrfs dev stat > file >search for 'device stats', there will be one for each disk. > Or it reports in the syslog when it happens not necessarily > during dedupe. vm-server ~ # btrfs dev stat /media/storage/ [/dev/mapper/b-storage--b].write_io_errs0 [/dev/mapper/b-storage--b].read_io_errs 0 [/dev/mapper/b-storage--b].flush_io_errs0 [/dev/mapper/b-storage--b].corruption_errs 0 [/dev/mapper/b-storage--b].generation_errs 0 [/dev/mapper/a-storage--a].write_io_errs0 [/dev/mapper/a-storage--a].read_io_errs 0 [/dev/mapper/a-storage--a].flush_io_errs0 [/dev/mapper/a-storage--a].corruption_errs 0 [/dev/mapper/a-storage--a].generation_errs 0 vm-server ~ # btrfs dev stat / [/dev/sdb1].write_io_errs0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sda1].write_io_errs0 [/dev/sda1].read_io_errs 0 [/dev/sda1].flush_io_errs0 [/dev/sda1].corruption_errs 0 [/dev/sda1].generation_errs 0 vm-server ~ # btrfs dev stat /dev/mapper/a-backup--a ERROR: '/dev/mapper/a-backup--a' is not a mounted btrfs device I check syslog regularly and I haven't seen any errors on any drives for over a year. > > > One of my other filesystems share the same two discs and it is still fine, > so I > think the hardware is probably ok. > Right. I guess that too. A confirmation will be better. > > I've copied the beginning of the errors below. > > > At my end finding the root cause of 'parent transid verify failed' > during/after dedupe is is kind of fading as disk seems to be had > no issues. which I had in mind. > > Also, there wasn't abrupt power-recycle here? I presume. No, although now that I think about it I just realised it happened right after I upgraded from 4.15.4 to 4.15.5 and I didn't quit bees before rebooting, I let the system do it. Not sure if it's relevant or not. I also just noticed that the kernel has spawned hundreds of kworkers - the highest number I can see is 516. > > It's better to save the output disk1-log and disk2-log as below > before further efforts to recovery. Just in case if something > pops out. > >btrfs in dump-super -fa disk1 > disk1-log >btrfs in dump-tree --degraded disk1 >> disk1-log [1] I applied the patch and started dumping the tree, but I stopped it after about 10 mins and 9GB. Because I use zstd and free space tree the recovery tools wouldn't do anything in RW mode, so I've decided to just blow it away and restore from a backup. I made a block level copy of both discs in case I need anything. Thanks for your help anyway. Regards, Paul.
Re: Help with leaf parent key incorrect
> There is one io error in the log below, Apparently, that's not a real EIO. We need to fix it. But can't be the root cause we are looking for here. > Feb 24 22:41:59 home kernel: BTRFS: error (device dm-6) in btrfs_run_delayed_refs:3076: errno=-5 IO failure > Feb 24 22:41:59 home kernel: BTRFS info (device dm-6): forced readonly static int run_delayed_extent_op(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head, struct btrfs_delayed_extent_op *extent_op) { :: } else { err = -EIO; goto out; } > but other than that I have never had io errors before, or any other troubles. Hm. btrfs dev stat shows real disk IO errors. As this FS isn't mountable .. pls try btrfs dev stat > file search for 'device stats', there will be one for each disk. Or it reports in the syslog when it happens not necessarily during dedupe. > One of my other filesystems share the same two discs and it is still fine, so I think the hardware is probably ok. Right. I guess that too. A confirmation will be better. > I've copied the beginning of the errors below. At my end finding the root cause of 'parent transid verify failed' during/after dedupe is is kind of fading as disk seems to be had no issues. which I had in mind. Also, there wasn't abrupt power-recycle here? I presume. It's better to save the output disk1-log and disk2-log as below before further efforts to recovery. Just in case if something pops out. btrfs in dump-super -fa disk1 > disk1-log btrfs in dump-tree --degraded disk1 >> disk1-log [1] btrfs in dump-super -fa disk2 > disk2-log btrfs in dump-tree --degraded disk2 >> disk2-log [1] [1] --degraded option is in the ML. [PATCH] btrfs-progs: dump-tree: add degraded option Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with leaf parent key incorrect
On 02/25/2018 06:16 PM, Paul Jones wrote: Hi all, I was running dedupe on my filesystem and something went wrong overnight, by the time I noticed the fs was readonly. Thanks for the report. I have few questions.. Kind of raid profile used here? Dedupe tool that was used? Was the fs full before dedupe? Were there any IO errors? Thanks, Anand When trying to check it this is what I get: vm-server ~ # btrfs check /dev/mapper/a-backup--a parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 Ignoring transid failure leaf parent key incorrect 2371034071040 ERROR: cannot open file system Is there a way to fix this? I'm using kernel 4.15.5 This is the last part of dmesg [ +0.02] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +1.107963] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +0.05] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +1.473598] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.001927] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.03] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.60] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.01] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +2.676048] verify_parent_transid: 10362 callbacks suppressed [ +0.02] BTRFS error (device dm-6): parent transid verify failed on 2373991677952 wanted 63210 found 63208 [ +0.03] BTRFS error (device dm-6): parent transid verify failed on 2373991677952 wanted 63210 found 63208 [ +0.078432] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.43] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.01] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.058638] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.139174] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [Feb25 20:48] BTRFS info (device dm-6): using free space tree [ +0.02] BTRFS error (device dm-6): Remounting read-write after error is not allowed [Feb25 20:49] BTRFS error (device dm-6): cleaner transaction attach returned -30 [ +0.238718] BTRFS warning (device dm-6): page private not zero on page 1596642967552 [ +0.03] BTRFS warning (device dm-6): page private not zero on page 1596642971648 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596642975744 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596642979840 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643672064 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643676160 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643680256 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643684352 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643704832 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643708928 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643713024 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643717120 [ +0.28] BTRFS warning (device dm-6): page private not zero on page 2363051098112 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051102208 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051106304 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051110400 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2368056344576 [ +0.00] BTRFS
Help with leaf parent key incorrect
Hi all, I was running dedupe on my filesystem and something went wrong overnight, by the time I noticed the fs was readonly. When trying to check it this is what I get: vm-server ~ # btrfs check /dev/mapper/a-backup--a parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 parent transid verify failed on 2371034071040 wanted 62977 found 62893 Ignoring transid failure leaf parent key incorrect 2371034071040 ERROR: cannot open file system Is there a way to fix this? I'm using kernel 4.15.5 This is the last part of dmesg [ +0.02] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +1.107963] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +0.05] BTRFS error (device dm-6): parent transid verify failed on 2374016368640 wanted 63210 found 63208 [ +1.473598] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.001927] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.03] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.60] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +0.01] BTRFS error (device dm-6): parent transid verify failed on 2373996298240 wanted 63210 found 63208 [ +2.676048] verify_parent_transid: 10362 callbacks suppressed [ +0.02] BTRFS error (device dm-6): parent transid verify failed on 2373991677952 wanted 63210 found 63208 [ +0.03] BTRFS error (device dm-6): parent transid verify failed on 2373991677952 wanted 63210 found 63208 [ +0.078432] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.43] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.01] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.058638] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.139174] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [ +0.04] BTRFS error (device dm-6): parent transid verify failed on 2373996232704 wanted 63210 found 63208 [Feb25 20:48] BTRFS info (device dm-6): using free space tree [ +0.02] BTRFS error (device dm-6): Remounting read-write after error is not allowed [Feb25 20:49] BTRFS error (device dm-6): cleaner transaction attach returned -30 [ +0.238718] BTRFS warning (device dm-6): page private not zero on page 1596642967552 [ +0.03] BTRFS warning (device dm-6): page private not zero on page 1596642971648 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596642975744 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596642979840 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643672064 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643676160 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643680256 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643684352 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643704832 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643708928 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643713024 [ +0.02] BTRFS warning (device dm-6): page private not zero on page 1596643717120 [ +0.28] BTRFS warning (device dm-6): page private not zero on page 2363051098112 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051102208 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051106304 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2363051110400 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2368056344576 [ +0.00] BTRFS warning (device dm-6): page private not zero on page 2368056348672 [ +0.01] BTRFS warning (device dm-6): page private not zero on page 2368056352768 [ +0.01] BTRFS warning (device dm-6): page private not zero on page
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Fri, 17 Nov 2017 06:51:52 +0300 schrieb Andrei Borzenkov: > 16.11.2017 19:13, Kai Krakow пишет: > ... > > > BTW: From user API perspective, btrfs snapshots do not guarantee > > perfect granular consistent backups. > > Is it documented somewhere? I was relying on crash-consistent > write-order-preserving snapshots in NetApp for as long as I remember. > And I was sure btrfs offers is as it is something obvious for > redirect-on-write idea. I think it has ordering guarantees, but it is not as atomic in time as one might think. That's the point. But devs may tell better. > > A user-level file transaction may > > still end up only partially in the snapshot. If you are running > > transaction sensitive applications, those usually do provide some > > means of preparing a freeze and a thaw of transactions. > > > > Is snapshot creation synchronous to know when thaw? I think you could do "btrfs snap create", then "btrfs fs sync", and everything should be fine. > > I think the user transactions API which could've been used for this > > will even be removed during the next kernel cycles. I remember > > reiserfs4 tried to deploy something similar. But there's no > > consistent layer in the VFS for subscribing applications to > > filesystem snapshots so they could prepare and notify the kernel > > when they are ready. > > I do not see what VFS has to do with it. NetApp works by simply > preserving previous consistency point instead of throwing it away. > I.e. snapshot is always last committed image on stable storage. Would > something like this be possible on btrfs level by duplicating current > on-disk root (sorry if I use wrong term)? I think btrfs gives the same consistency. But the moment you issue "btrfs snap create" may delay snapshot creation a little bit. So if your application relies on exact point in time snapshots, you need to ensure synchronizing your application to the filesystem. I think the same is true for NetApp. I just wanted to point that out because it may not be obvious, given that btrfs snapshot creation is built right into the tool chain of filesystem itself, unlike e.g. NetApp or LVM, or other storage layers. Background: A good while back I was told that btrfs snapshots during ongoing IO may result in some of the later IO carried over to before the snapshot. Transactional ordering of IO operations is still guaranteed but it may overlap with snapshot creation. So you can still loose a transaction you didn't expect to loose at that point in time. So I understood this as: If you just want to ensure transactional integrity of your database, you are all fine with btrfs snapshots. But if you want to ensure that a just finished transaction makes it into the snapshot completely, you have to sync the processes. However, things may have changed since then. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
16.11.2017 19:13, Kai Krakow пишет: ... > > BTW: From user API perspective, btrfs snapshots do not guarantee > perfect granular consistent backups. Is it documented somewhere? I was relying on crash-consistent write-order-preserving snapshots in NetApp for as long as I remember. And I was sure btrfs offers is as it is something obvious for redirect-on-write idea. > A user-level file transaction may > still end up only partially in the snapshot. If you are running > transaction sensitive applications, those usually do provide some means > of preparing a freeze and a thaw of transactions. > Is snapshot creation synchronous to know when thaw? > I think the user transactions API which could've been used for this > will even be removed during the next kernel cycles. I remember > reiserfs4 tried to deploy something similar. But there's no consistent > layer in the VFS for subscribing applications to filesystem snapshots > so they could prepare and notify the kernel when they are ready. > I do not see what VFS has to do with it. NetApp works by simply preserving previous consistency point instead of throwing it away. I.e. snapshot is always last committed image on stable storage. Would something like this be possible on btrfs level by duplicating current on-disk root (sorry if I use wrong term)? ... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Link 2 slipped away, adding it below... Am Tue, 14 Nov 2017 15:51:57 -0500 schrieb Dave: > On Tue, Nov 14, 2017 at 3:50 AM, Roman Mamedov wrote: > > > > On Mon, 13 Nov 2017 22:39:44 -0500 > > Dave wrote: > > > > > I have my live system on one block device and a backup snapshot > > > of it on another block device. I am keeping them in sync with > > > hourly rsync transfers. > > > > > > Here's how this system works in a little more detail: > > > > > > 1. I establish the baseline by sending a full snapshot to the > > > backup block device using btrfs send-receive. > > > 2. Next, on the backup device I immediately create a rw copy of > > > that baseline snapshot. > > > 3. I delete the source snapshot to keep the live filesystem free > > > of all snapshots (so it can be optimally defragmented, etc.) > > > 4. hourly, I take a snapshot of the live system, rsync all > > > changes to the backup block device, and then delete the source > > > snapshot. This hourly process takes less than a minute currently. > > > (My test system has only moderate usage.) > > > 5. hourly, following the above step, I use snapper to take a > > > snapshot of the backup subvolume to create/preserve a history of > > > changes. For example, I can find the version of a file 30 hours > > > prior. > > > > Sounds a bit complex, I still don't get why you need all these > > snapshot creations and deletions, and even still using btrfs > > send-receive. > > > Hopefully, my comments below will explain my reasons. > > > > > Here is my scheme: > > > > /mnt/dst <- mounted backup storage volume > > /mnt/dst/backup <- a subvolume > > /mnt/dst/backup/host1/ <- rsync destination for host1, regular > > directory /mnt/dst/backup/host2/ <- rsync destination for host2, > > regular directory /mnt/dst/backup/host3/ <- rsync destination for > > host3, regular directory etc. > > > > /mnt/dst/backup/host1/bin/ > > /mnt/dst/backup/host1/etc/ > > /mnt/dst/backup/host1/home/ > > ... > > Self explanatory. All regular directories, not subvolumes. > > > > Snapshots: > > /mnt/dst/snaps/backup <- a regular directory > > /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T13:00/ <- > > snapshot 2 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T14:00/ <- > > snapshot 3 of /mnt/dst/backup > > > > Accessing historic data: > > /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash > > ... > > /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup > > system). > > > > > > No need for btrfs send-receive, only plain rsync is used, directly > > from hostX:/ to /mnt/dst/backup/host1/; > > > I prefer to start with a BTRFS snapshot at the backup destination. I > think that's the most "accurate" starting point. No, you should finish with a snapshot. Use the rsync destination as a "dirty" scratch area, let rsync also delete files which are no longer in the source. After successfully running rsync, make a snapshot of that directory and make it RO, leave the scratch in place (even when rsync dies or becomes killed). I once made some scripts[2] following those rules, you may want to adapt them. > > No need to create or delete snapshots during the actual backup > > process; > > Then you can't guarantee consistency of the backed up information. Take a temporary snapshot of the source, rsync to to the scratch destination, take a RO snapshot of that destination, remove the temporary snapshot. BTW: From user API perspective, btrfs snapshots do not guarantee perfect granular consistent backups. A user-level file transaction may still end up only partially in the snapshot. If you are running transaction sensitive applications, those usually do provide some means of preparing a freeze and a thaw of transactions. I think the user transactions API which could've been used for this will even be removed during the next kernel cycles. I remember reiserfs4 tried to deploy something similar. But there's no consistent layer in the VFS for subscribing applications to filesystem snapshots so they could prepare and notify the kernel when they are ready. > > A single common timeline is kept for all hosts to be backed up, > > snapshot count not multiplied by the number of hosts (in my case > > the backup location is multi-purpose, so I somewhat care about > > total number of snapshots there as well); > > > > Also, all of this works even with source hosts which do not use > > Btrfs. > > That's not a concern for me because I prefer to use BTRFS everywhere. At least I suggest looking into bees[1] to deduplicate the backup destination. Rsync is not very efficient to work with btrfs snapshots. It will break reflinks often and write inefficiently sized blocks, even with inplace option. Also,
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Tue, 14 Nov 2017 15:51:57 -0500 schrieb Dave: > On Tue, Nov 14, 2017 at 3:50 AM, Roman Mamedov wrote: > > > > On Mon, 13 Nov 2017 22:39:44 -0500 > > Dave wrote: > > > > > I have my live system on one block device and a backup snapshot > > > of it on another block device. I am keeping them in sync with > > > hourly rsync transfers. > > > > > > Here's how this system works in a little more detail: > > > > > > 1. I establish the baseline by sending a full snapshot to the > > > backup block device using btrfs send-receive. > > > 2. Next, on the backup device I immediately create a rw copy of > > > that baseline snapshot. > > > 3. I delete the source snapshot to keep the live filesystem free > > > of all snapshots (so it can be optimally defragmented, etc.) > > > 4. hourly, I take a snapshot of the live system, rsync all > > > changes to the backup block device, and then delete the source > > > snapshot. This hourly process takes less than a minute currently. > > > (My test system has only moderate usage.) > > > 5. hourly, following the above step, I use snapper to take a > > > snapshot of the backup subvolume to create/preserve a history of > > > changes. For example, I can find the version of a file 30 hours > > > prior. > > > > Sounds a bit complex, I still don't get why you need all these > > snapshot creations and deletions, and even still using btrfs > > send-receive. > > > Hopefully, my comments below will explain my reasons. > > > > > Here is my scheme: > > > > /mnt/dst <- mounted backup storage volume > > /mnt/dst/backup <- a subvolume > > /mnt/dst/backup/host1/ <- rsync destination for host1, regular > > directory /mnt/dst/backup/host2/ <- rsync destination for host2, > > regular directory /mnt/dst/backup/host3/ <- rsync destination for > > host3, regular directory etc. > > > > /mnt/dst/backup/host1/bin/ > > /mnt/dst/backup/host1/etc/ > > /mnt/dst/backup/host1/home/ > > ... > > Self explanatory. All regular directories, not subvolumes. > > > > Snapshots: > > /mnt/dst/snaps/backup <- a regular directory > > /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T13:00/ <- > > snapshot 2 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T14:00/ <- > > snapshot 3 of /mnt/dst/backup > > > > Accessing historic data: > > /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash > > ... > > /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup > > system). > > > > > > No need for btrfs send-receive, only plain rsync is used, directly > > from hostX:/ to /mnt/dst/backup/host1/; > > > I prefer to start with a BTRFS snapshot at the backup destination. I > think that's the most "accurate" starting point. No, you should finish with a snapshot. Use the rsync destination as a "dirty" scratch area, let rsync also delete files which are no longer in the source. After successfully running rsync, make a snapshot of that directory and make it RO, leave the scratch in place (even when rsync dies or becomes killed). I once made some scripts[2] following those rules, you may want to adapt them. > > No need to create or delete snapshots during the actual backup > > process; > > Then you can't guarantee consistency of the backed up information. Take a temporary snapshot of the source, rsync to to the scratch destination, take a RO snapshot of that destination, remove the temporary snapshot. BTW: From user API perspective, btrfs snapshots do not guarantee perfect granular consistent backups. A user-level file transaction may still end up only partially in the snapshot. If you are running transaction sensitive applications, those usually do provide some means of preparing a freeze and a thaw of transactions. I think the user transactions API which could've been used for this will even be removed during the next kernel cycles. I remember reiserfs4 tried to deploy something similar. But there's no consistent layer in the VFS for subscribing applications to filesystem snapshots so they could prepare and notify the kernel when they are ready. > > A single common timeline is kept for all hosts to be backed up, > > snapshot count not multiplied by the number of hosts (in my case > > the backup location is multi-purpose, so I somewhat care about > > total number of snapshots there as well); > > > > Also, all of this works even with source hosts which do not use > > Btrfs. > > That's not a concern for me because I prefer to use BTRFS everywhere. At least I suggest looking into bees[1] to deduplicate the backup destination. Rsync is not very efficient to work with btrfs snapshots. It will break reflinks often and write inefficiently sized blocks, even with inplace option. Also, rsync won't efficiently catch files
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Tue, Nov 14, 2017 at 3:50 AM, Roman Mamedovwrote: > > On Mon, 13 Nov 2017 22:39:44 -0500 > Dave wrote: > > > I have my live system on one block device and a backup snapshot of it > > on another block device. I am keeping them in sync with hourly rsync > > transfers. > > > > Here's how this system works in a little more detail: > > > > 1. I establish the baseline by sending a full snapshot to the backup > > block device using btrfs send-receive. > > 2. Next, on the backup device I immediately create a rw copy of that > > baseline snapshot. > > 3. I delete the source snapshot to keep the live filesystem free of > > all snapshots (so it can be optimally defragmented, etc.) > > 4. hourly, I take a snapshot of the live system, rsync all changes to > > the backup block device, and then delete the source snapshot. This > > hourly process takes less than a minute currently. (My test system has > > only moderate usage.) > > 5. hourly, following the above step, I use snapper to take a snapshot > > of the backup subvolume to create/preserve a history of changes. For > > example, I can find the version of a file 30 hours prior. > > Sounds a bit complex, I still don't get why you need all these snapshot > creations and deletions, and even still using btrfs send-receive. Hopefully, my comments below will explain my reasons. > > Here is my scheme: > > /mnt/dst <- mounted backup storage volume > /mnt/dst/backup <- a subvolume > /mnt/dst/backup/host1/ <- rsync destination for host1, regular directory > /mnt/dst/backup/host2/ <- rsync destination for host2, regular directory > /mnt/dst/backup/host3/ <- rsync destination for host3, regular directory > etc. > > /mnt/dst/backup/host1/bin/ > /mnt/dst/backup/host1/etc/ > /mnt/dst/backup/host1/home/ > ... > Self explanatory. All regular directories, not subvolumes. > > Snapshots: > /mnt/dst/snaps/backup <- a regular directory > /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 of /mnt/dst/backup > /mnt/dst/snaps/backup/2017-11-14T13:00/ <- snapshot 2 of /mnt/dst/backup > /mnt/dst/snaps/backup/2017-11-14T14:00/ <- snapshot 3 of /mnt/dst/backup > > Accessing historic data: > /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash > ... > /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup system). > > > No need for btrfs send-receive, only plain rsync is used, directly from > hostX:/ to /mnt/dst/backup/host1/; I prefer to start with a BTRFS snapshot at the backup destination. I think that's the most "accurate" starting point. > > No need to create or delete snapshots during the actual backup process; Then you can't guarantee consistency of the backed up information. > > A single common timeline is kept for all hosts to be backed up, snapshot count > not multiplied by the number of hosts (in my case the backup location is > multi-purpose, so I somewhat care about total number of snapshots there as > well); > > Also, all of this works even with source hosts which do not use Btrfs. That's not a concern for me because I prefer to use BTRFS everywhere. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Mon, 13 Nov 2017 22:39:44 -0500 Davewrote: > I have my live system on one block device and a backup snapshot of it > on another block device. I am keeping them in sync with hourly rsync > transfers. > > Here's how this system works in a little more detail: > > 1. I establish the baseline by sending a full snapshot to the backup > block device using btrfs send-receive. > 2. Next, on the backup device I immediately create a rw copy of that > baseline snapshot. > 3. I delete the source snapshot to keep the live filesystem free of > all snapshots (so it can be optimally defragmented, etc.) > 4. hourly, I take a snapshot of the live system, rsync all changes to > the backup block device, and then delete the source snapshot. This > hourly process takes less than a minute currently. (My test system has > only moderate usage.) > 5. hourly, following the above step, I use snapper to take a snapshot > of the backup subvolume to create/preserve a history of changes. For > example, I can find the version of a file 30 hours prior. Sounds a bit complex, I still don't get why you need all these snapshot creations and deletions, and even still using btrfs send-receive. Here is my scheme: /mnt/dst <- mounted backup storage volume /mnt/dst/backup <- a subvolume /mnt/dst/backup/host1/ <- rsync destination for host1, regular directory /mnt/dst/backup/host2/ <- rsync destination for host2, regular directory /mnt/dst/backup/host3/ <- rsync destination for host3, regular directory etc. /mnt/dst/backup/host1/bin/ /mnt/dst/backup/host1/etc/ /mnt/dst/backup/host1/home/ ... Self explanatory. All regular directories, not subvolumes. Snapshots: /mnt/dst/snaps/backup <- a regular directory /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T13:00/ <- snapshot 2 of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T14:00/ <- snapshot 3 of /mnt/dst/backup Accessing historic data: /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash ... /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup system). No need for btrfs send-receive, only plain rsync is used, directly from hostX:/ to /mnt/dst/backup/host1/; No need to create or delete snapshots during the actual backup process; A single common timeline is kept for all hosts to be backed up, snapshot count not multiplied by the number of hosts (in my case the backup location is multi-purpose, so I somewhat care about total number of snapshots there as well); Also, all of this works even with source hosts which do not use Btrfs. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Tue, 14 Nov 2017 10:14:55 +0300 Marat Khaliliwrote: > Don't keep snapshots under rsync target, place them under ../snapshots > (if snapper supports this): > Or, specify them in --exclude and avoid using --delete-excluded. Both are good suggestions, in my case each system does have its own snapshots as well, but they are retained for much shorter. So I both use --exclude to avoid fetching the entire /snaps tree from the source system, and store snapshots of the destination system outside of the rsync target dirs. >Or keep using -x if it works, why not? -x will exclude content of all subvolumes down the tree on the source side -- not only the time-based ones. If you take care to never casually create any subvolumes content of which you'd still want backed up, then I guess it can work. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On 14/11/17 06:39, Dave wrote: My rsync command currently looks like this: rsync -axAHv --inplace --delete-delay --exclude-from="/some/file" "$source_snapshop/" "$backup_location" As I learned from Kai Krakow in this maillist, you should also add --no-whole-file if both sides are local. Otherwise target space usage can be much worse (but fragmentation much better). I wonder what is your justification for --delete-delay, I just use --delete. Here's what I use: --verbose --archive --hard-links --acls --xattrs --numeric-ids --inplace --delete --delete-excluded --stats. Since in my case source is always remote, there's no --no-whole-file, but there's --numeric-ids. In particular, I want to know if I should or should not be using these options: -H, --hard-linkspreserve hard links -A, --acls preserve ACLs (implies -p) -X, --xattrspreserve extended attributes -x, --one-file-system don't cross filesystem boundaries I don't know any semantic use of hard links in modern systems. There're ACLs on some files in /var/log/journal on systems with systemd. Synology actively uses ACL, but it's implementation is sadly incompatible with rsync. There can always be some ACLs or xattrs set by sysadmin manually. End result, I always specify first three options where possible just in case (even though man page says that --hard-links may affect performance). I had to use the "x" option to prevent rsync from deleting files in snapshots in the backup location (as the source location does not retain any snapshots). Is there a better way? Don't keep snapshots under rsync target, place them under ../snapshots (if snapper supports this): # find . -maxdepth 2 . ./snapshots ./snapshots/2017-11-08T13:18:20+00:00 ./snapshots/2017-11-08T15:10:03+00:00 ./snapshots/2017-11-08T23:28:44+00:00 ./snapshots/2017-11-09T23:41:30+00:00 ./snapshots/2017-11-10T22:44:36+00:00 ./snapshots/2017-11-11T21:48:19+00:00 ./snapshots/2017-11-12T21:27:41+00:00 ./snapshots/2017-11-13T23:29:49+00:00 ./rsync Or, specify them in --exclude and avoid using --delete-excluded. Or keep using -x if it works, why not? -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 1:15 AM, Roman Mamedovwrote: > On Wed, 1 Nov 2017 01:00:08 -0400 > Dave wrote: > >> To reconcile those conflicting goals, the only idea I have come up >> with so far is to use btrfs send-receive to perform incremental >> backups as described here: >> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . > > Another option is to just use the regular rsync to a designated destination > subvolume on the backup host, AND snapshot that subvolume on that host from > time to time (or on backup completions, if you can synchronize that). > > rsync --inplace will keep space usage low as it will not reupload entire files > in case of changes/additions to them. > > Yes rsync has to traverse both directory trees to find changes, but that's > pretty fast (couple of minutes at most, for a typical root filesystem), > especially if you use SSD or SSD caching. Hello. I am implementing this suggestion. So far, so good. However, I need some further recommendations on rsync options to use for this purpose. My rsync command currently looks like this: rsync -axAHv --inplace --delete-delay --exclude-from="/some/file" "$source_snapshop/" "$backup_location" In particular, I want to know if I should or should not be using these options: -H, --hard-linkspreserve hard links -A, --acls preserve ACLs (implies -p) -X, --xattrspreserve extended attributes -x, --one-file-system don't cross filesystem boundaries I had to use the "x" option to prevent rsync from deleting files in snapshots in the backup location (as the source location does not retain any snapshots). Is there a better way? I have my live system on one block device and a backup snapshot of it on another block device. I am keeping them in sync with hourly rsync transfers. Here's how this system works in a little more detail: 1. I establish the baseline by sending a full snapshot to the backup block device using btrfs send-receive. 2. Next, on the backup device I immediately create a rw copy of that baseline snapshot. 3. I delete the source snapshot to keep the live filesystem free of all snapshots (so it can be optimally defragmented, etc.) 4. hourly, I take a snapshot of the live system, rsync all changes to the backup block device, and then delete the source snapshot. This hourly process takes less than a minute currently. (My test system has only moderate usage.) 5. hourly, following the above step, I use snapper to take a snapshot of the backup subvolume to create/preserve a history of changes. For example, I can find the version of a file 30 hours prior. The backup volume contains up to 100 snapshots while the live volume has no snapshots. Best of both worlds? I guess I'll find out over time. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Thu, 2 Nov 2017 23:24:29 -0400 schrieb Dave: > On Thu, Nov 2, 2017 at 4:46 PM, Kai Krakow > wrote: > > Am Wed, 1 Nov 2017 02:51:58 -0400 > > schrieb Dave : > > > [...] > [...] > [...] > >> > >> Thanks for confirming. I must have missed those reports. I had > >> never considered this idea until now -- but I like it. > >> > >> Are there any blogs or wikis where people have done something > >> similar to what we are discussing here? > > > > I used rsync before, backup source and destination both were btrfs. > > I was experiencing the same btrfs bug from time to time on both > > devices, luckily not at the same time. > > > > I instead switched to using borgbackup, and xfs as the destination > > (to not fall the same-bug-in-two-devices pitfall). > > I'm going to stick with btrfs everywhere. My reasoning is that my > biggest pitfalls will be related to lack of knowledge. So focusing on > learning one filesystem better (vs poorly learning two) is the better > strategy for me, given my limited time. (I'm not an IT professional of > any sort.) > > Is there any problem with the Borgbackup repository being on btrfs? No. I just wanted to point out that keeping backup and source on different media (which includes different technology, too) is common best practice and adheres to the 3-2-1 backup strategy. > > Borgbackup achieves a > > much higher deduplication density and compression, and as such also > > is able to store much more backup history in the same storage > > space. The first run is much slower than rsync (due to enabled > > compression) but successive runs are much faster (like 20 minutes > > per backup run instead of 4-5 hours). > > > > I'm currently storing 107 TB of backup history in just 2.2 TB backup > > space, which counts a little more than one year of history now, > > containing 56 snapshots. This is my retention policy: > > > > * 5 yearly snapshots > > * 12 monthly snapshots > > * 14 weekly snapshots (worth around 3 months) > > * 30 daily snapshots > > > > Restore is fast enough, and a snapshot can even be fuse-mounted > > (tho, in that case mounted access can be very slow navigating > > directories). > > > > With latest borgbackup version, the backup time increased to around > > 1 hour from 15-20 minutes in the previous version. That is due to > > switching the file cache strategy from mtime to ctime. This can be > > tuned to get back to old performance, but it may miss some files > > during backup if you're doing awkward things to file timestamps. > > > > I'm also backing up some servers with it now, then use rsync to sync > > the borg repository to an offsite location. > > > > Combined with same-fs local btrfs snapshots with short retention > > times, this could be a viable solution for you. > > Yes, I appreciate the idea. I'm going to evaluate both rsync and > Borgbackup. > > The advantage of rsync, I think, is that it will likely run in just a > couple minutes. That will allow me to run it hourly and to keep my > live volume almost entire free of snapshots and fully defragmented. > It's also very simple as I already have rsync. And since I'm going to > run btrfs on the backup volume, I can perform hourly snapshots there > and use Snapper to manage retention. It's all very simple and relies > on tools I already have and know. > > However, the advantages of Borgbackup you mentioned (much higher > deduplication density and compression) make it worth considering. > Maybe Borgbackup won't take long to complete successive (incremental) > backups on my system. Once a full backup was taken, incremental backups are extremely fast. At least for me, it works much faster than rsync. And as with btrfs snapshots, each incremental backup is also a full backup. It's not like traditional backup software that needs the backup parent and grand parent to make use of the differential and/or incremental backups. There's one caveat, tho: Only one process can access a repository at a time, that is you need to serialize different backup jobs if you want them to go into the same repository. Deduplication is done only within the same repository. Tho, you might be able to leverage btrfs deduplication (e.g. using bees) across multiple repositories if you're not using encrypted repositories. But since you're currently using send/receive and/or rsync, encrypted storage of the backup doesn't seem to be an important point to you. Burp with its client/server approach may have an advantage here, so its setup seems to be more complicated. Borg is really easy to use. I never tried burp, tho. > I'll have to try it to see. It's a very nice > looking project. I'm surprised I never heard of it before. It seems to follow similar principles as burp (which I never heard of previously). It seems like the really good backup software has some sort of PR problem... ;-) -- Regards, Kai Replies to list-only preferred. --
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Thu, Nov 2, 2017 at 4:46 PM, Kai Krakowwrote: > Am Wed, 1 Nov 2017 02:51:58 -0400 > schrieb Dave : > >> > >> >> To reconcile those conflicting goals, the only idea I have come up >> >> with so far is to use btrfs send-receive to perform incremental >> >> backups >> > >> > As already said by Romain Mamedov, rsync is viable alternative to >> > send-receive with much less hassle. According to some reports it >> > can even be faster. >> >> Thanks for confirming. I must have missed those reports. I had never >> considered this idea until now -- but I like it. >> >> Are there any blogs or wikis where people have done something similar >> to what we are discussing here? > > I used rsync before, backup source and destination both were btrfs. I > was experiencing the same btrfs bug from time to time on both devices, > luckily not at the same time. > > I instead switched to using borgbackup, and xfs as the destination (to > not fall the same-bug-in-two-devices pitfall). I'm going to stick with btrfs everywhere. My reasoning is that my biggest pitfalls will be related to lack of knowledge. So focusing on learning one filesystem better (vs poorly learning two) is the better strategy for me, given my limited time. (I'm not an IT professional of any sort.) Is there any problem with the Borgbackup repository being on btrfs? > Borgbackup achieves a > much higher deduplication density and compression, and as such also is > able to store much more backup history in the same storage space. The > first run is much slower than rsync (due to enabled compression) but > successive runs are much faster (like 20 minutes per backup run instead > of 4-5 hours). > > I'm currently storing 107 TB of backup history in just 2.2 TB backup > space, which counts a little more than one year of history now, > containing 56 snapshots. This is my retention policy: > > * 5 yearly snapshots > * 12 monthly snapshots > * 14 weekly snapshots (worth around 3 months) > * 30 daily snapshots > > Restore is fast enough, and a snapshot can even be fuse-mounted (tho, > in that case mounted access can be very slow navigating directories). > > With latest borgbackup version, the backup time increased to around 1 > hour from 15-20 minutes in the previous version. That is due to > switching the file cache strategy from mtime to ctime. This can be > tuned to get back to old performance, but it may miss some files during > backup if you're doing awkward things to file timestamps. > > I'm also backing up some servers with it now, then use rsync to sync > the borg repository to an offsite location. > > Combined with same-fs local btrfs snapshots with short retention times, > this could be a viable solution for you. Yes, I appreciate the idea. I'm going to evaluate both rsync and Borgbackup. The advantage of rsync, I think, is that it will likely run in just a couple minutes. That will allow me to run it hourly and to keep my live volume almost entire free of snapshots and fully defragmented. It's also very simple as I already have rsync. And since I'm going to run btrfs on the backup volume, I can perform hourly snapshots there and use Snapper to manage retention. It's all very simple and relies on tools I already have and know. However, the advantages of Borgbackup you mentioned (much higher deduplication density and compression) make it worth considering. Maybe Borgbackup won't take long to complete successive (incremental) backups on my system. I'll have to try it to see. It's a very nice looking project. I'm surprised I never heard of it before. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Wed, 1 Nov 2017 02:51:58 -0400 schrieb Dave: > > > >> To reconcile those conflicting goals, the only idea I have come up > >> with so far is to use btrfs send-receive to perform incremental > >> backups > > > > As already said by Romain Mamedov, rsync is viable alternative to > > send-receive with much less hassle. According to some reports it > > can even be faster. > > Thanks for confirming. I must have missed those reports. I had never > considered this idea until now -- but I like it. > > Are there any blogs or wikis where people have done something similar > to what we are discussing here? I used rsync before, backup source and destination both were btrfs. I was experiencing the same btrfs bug from time to time on both devices, luckily not at the same time. I instead switched to using borgbackup, and xfs as the destination (to not fall the same-bug-in-two-devices pitfall). Borgbackup achieves a much higher deduplication density and compression, and as such also is able to store much more backup history in the same storage space. The first run is much slower than rsync (due to enabled compression) but successive runs are much faster (like 20 minutes per backup run instead of 4-5 hours). I'm currently storing 107 TB of backup history in just 2.2 TB backup space, which counts a little more than one year of history now, containing 56 snapshots. This is my retention policy: * 5 yearly snapshots * 12 monthly snapshots * 14 weekly snapshots (worth around 3 months) * 30 daily snapshots Restore is fast enough, and a snapshot can even be fuse-mounted (tho, in that case mounted access can be very slow navigating directories). With latest borgbackup version, the backup time increased to around 1 hour from 15-20 minutes in the previous version. That is due to switching the file cache strategy from mtime to ctime. This can be tuned to get back to old performance, but it may miss some files during backup if you're doing awkward things to file timestamps. I'm also backing up some servers with it now, then use rsync to sync the borg repository to an offsite location. Combined with same-fs local btrfs snapshots with short retention times, this could be a viable solution for you. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
[ ... ] > The poor performance has existed from the beginning of using > BTRFS + KDE + Firefox (almost 2 years ago), at a point when > very few snapshots had yet been created. A comparison system > running similar hardware as well as KDE + Firefox (and LVM + > EXT4) did not have the performance problems. The difference > has been consistent and significant. That seems rather unlikely to depend on Btrfs, as I use Firefox 56 + KDE4 + Btrfs without issue, on somewhat old/small desktop and laptop, and is implausible on general grounds. You haven't provided so far any indication or quantification of your "speed" problem (which may or not be a "performance" issue". The things to look at usually at disk IO latency and rates, and system CPU time while the bad speed is observable (user CPU time is usually stuck at 100% on any JS based site as written earlier). To look at IO latency and rates the #1 choice is always: 'iostat -dk -zyx 1' and to look as system CPU (and user CPU) and other interesting details I suggest using 'htop' with the attached configuration file to write to "$HOME/.config/htop/htoprc". > Sometimes I have used Snapper settings like this: > TIMELINE_MIN_AGE="1800" > TIMELINE_LIMIT_HOURLY="36" > TIMELINE_LIMIT_DAILY="30" > TIMELINE_LIMIT_MONTHLY="12" > TIMELINE_LIMIT_YEARLY="10" > However, I also have some computers set like this: > TIMELINE_MIN_AGE="1800" > TIMELINE_LIMIT_HOURLY="10" > TIMELINE_LIMIT_DAILY="10" > TIMELINE_LIMIT_WEEKLY="0" > TIMELINE_LIMIT_MONTHLY="0" > TIMELINE_LIMIT_YEARLY="0" The first seems a bit "aspirational". IIRC "someone" confessed that the SUSE default of 'TIMELINE_LIMIT_YEARLY="10"' was imposed by external forces in the SUSE default configuration: https://github.com/openSUSE/snapper/blob/master/data/default-config https://wiki.archlinux.org/index.php/Snapper#Set_snapshot_limits https://lists.opensuse.org/yast-devel/2014-05/msg00036.html # Beware! This file is rewritten by htop when settings are changed in the interface. # The parser is also very primitive, and not human-friendly. fields=0 48 38 39 40 44 62 63 2 46 13 14 1 sort_key=47 sort_direction=1 hide_threads=1 hide_kernel_threads=1 hide_userland_threads=1 shadow_other_users=0 show_thread_names=1 highlight_base_name=1 highlight_megabytes=1 highlight_threads=1 tree_view=0 header_margin=0 detailed_cpu_time=1 cpu_count_from_zero=1 update_process_names=0 color_scheme=0 delay=15 left_meters=AllCPUs Memory Swap left_meter_modes=1 1 1 right_meters=Tasks LoadAverage Uptime right_meter_modes=2 2 2
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 4:34 AM, Marat Khaliliwrote: >> We do experience severe performance problems now, especially with >> Firefox. Part of my experiment is to reduce the number of snapshots on >> the live volumes, hence this question. > > Just for statistics, how many snapshots do you have and how often do you > take them? It's on SSD, right? I don't think the severe performance problems stem solely from the number of snapshots. I think it is also related to Firefox stuff (cache fragmentation, lack of multi-processor mode maybe, etc.) I still have to investigate the Firefox issues, but I'm starting at the foundation by trying to get a basic BTRFS setup that will support better desktop application performance first. The poor performance has existed from the beginning of using BTRFS + KDE + Firefox (almost 2 years ago), at a point when very few snapshots had yet been created. A comparison system running similar hardware as well as KDE + Firefox (and LVM + EXT4) did not have the performance problems. The difference has been consistent and significant. For a while I thought the difference was due to the hardware, as one system used the z170 chipset and the other used the X99 chipset (but were otherwise equivalent). So I repeated the testing on identical hardware and the stark performance difference remained. When I realized that, I began focusing on BTRFS, as it is the only consistent difference I can recognize. Sometimes I have used Snapper settings like this: TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="36" TIMELINE_LIMIT_DAILY="30" TIMELINE_LIMIT_MONTHLY="12" TIMELINE_LIMIT_YEARLY="10" However, I also have some computers set like this: TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="10" TIMELINE_LIMIT_DAILY="10" TIMELINE_LIMIT_WEEKLY="0" TIMELINE_LIMIT_MONTHLY="0" TIMELINE_LIMIT_YEARLY="0" > BTW beware of deleting too many snapshots at once with any tool. Delete few > and let filesystem stabilize before proceeding. OK, thanks for the tip. > For deduplication tool to be useful you ought to have some duplicate data on > your live volume. Do you have any (e.g. many LXC containers with the same > distribution)? No, no containers and no duplication to that large extent. > P.S. I still think you need some off-system backup solution too, either > rsync+snapshot-based over ssh or e.g. Burp (shameless advertising: > http://burp.grke.org/ ). I agree, but that's beyond the scope of the current problem I'm trying to solve. However, I'll check out Burp once I have a base configuration that is working satisfactorily. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On 01/11/17 09:51, Dave wrote: As already said by Romain Mamedov, rsync is viable alternative to send-receive with much less hassle. According to some reports it can even be faster. Thanks for confirming. I must have missed those reports. I had never considered this idea until now -- but I like it. Are there any blogs or wikis where people have done something similar to what we are discussing here? I don't know any. Probably someone needs to write it. We will delete most snapshots on the live volume, but retain many (or all) snapshots on the backup block device. Is that a good strategy, given my goals? Depending on the way you use it, retaining even a dozen snapshots on a live volume might hurt performance (for high-performance databases) or be completely transparent (for user folders). You may want to experiment with this number. We do experience severe performance problems now, especially with Firefox. Part of my experiment is to reduce the number of snapshots on the live volumes, hence this question. Just for statistics, how many snapshots do you have and how often do you take them? It's on SSD, right? Thanks. I hope you do find time to publish it. (And what do you mean by portable?) For now, Snapper has a cleanup algorithm that we can use. At least one of the tools listed here has a thinout algorithm too: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup It is currently a small part of yet another home-grown backup tool which is itself fairly big and tuned to particular environment. I thought many times that it would be very nice to have thinning tool separately and with no unnecessary dependencies, but... BTW beware of deleting too many snapshots at once with any tool. Delete few and let filesystem stabilize before proceeding. Should I consider a dedup tool like one of these? Certainly NOT for snapshot-based backups: it is already deduplicated almost as much as possible, dedup tools can only make it *less* deduplicated. The question is whether to use a dedup tool on the live volume which has a few snapshots. Even with the new strategy (based on rsync), the live volume may sometimes have two snapshots (pre- and post- pacman upgrades). For deduplication tool to be useful you ought to have some duplicate data on your live volume. Do you have any (e.g. many LXC containers with the same distribution)? Also still wondering about these options: no-holes, skinny metadata, or extended inode refs? I don't know anything about any of these, sorry. P.S. I still think you need some off-system backup solution too, either rsync+snapshot-based over ssh or e.g. Burp (shameless advertising: http://burp.grke.org/ ). -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 2:19 AM, Marat Khaliliwrote: > You seem to have two tasks: (1) same-volume snapshots (I would not call them > backups) and (2) updating some backup volume (preferably on a different > box). By solving them separately you can avoid some complexity... Yes, it appears that is a very good strategy -- solve the concerns separately. Make the live volume performant and the backup volume historical. > >> To reconcile those conflicting goals, the only idea I have come up >> with so far is to use btrfs send-receive to perform incremental >> backups > > As already said by Romain Mamedov, rsync is viable alternative to > send-receive with much less hassle. According to some reports it can even be > faster. Thanks for confirming. I must have missed those reports. I had never considered this idea until now -- but I like it. Are there any blogs or wikis where people have done something similar to what we are discussing here? > >> Given the hourly snapshots, incremental backups are the only practical >> option. They take mere moments. Full backups could take an hour or >> more, which won't work with hourly backups. > > I don't see much sense in re-doing full backups to the same physical device. > If you care about backup integrity, it is probably more important to invest > in backups verification. (OTOH, while you didn't reveal data size, if full > backup takes just an hour on your system then why not?) I was saying that a full backup could take an hour or more. That means full backups are not compatible with an hourly backup schedule. And it is certainly not a potential solution to making the system perform better because the system will be spending all its time running backups -- it would be never ending. With hourly backups, they should complete in just a few moments, which is the case with incremental backups. (It sounds like this will be the case with rsync as well.) > >> We will delete most snapshots on the live volume, but retain many (or >> all) snapshots on the backup block device. Is that a good strategy, >> given my goals? > > Depending on the way you use it, retaining even a dozen snapshots on a live > volume might hurt performance (for high-performance databases) or be > completely transparent (for user folders). You may want to experiment with > this number. We do experience severe performance problems now, especially with Firefox. Part of my experiment is to reduce the number of snapshots on the live volumes, hence this question. > > In any case I'd not recommend retaining ALL snapshots on backup device, even > if you have infinite space. Such filesystem would be as dangerous as the > demon core, only good for adding more snapshots (not even deleting them), > and any little mistake will blow everything up. Keep a few dozen, hundred at > most. The intention -- if we were to keep all snapshots on a backup device -- would be to never ever try to delete them. However, with the suggestion to separate the concerns and use rsync, we could also easily run the Snapper timeline cleanup on the backup volume, thereby limiting the retained snapshots to some reasonable number. > Unlike other backup systems, you can fairly easily remove snapshots in the > middle of sequence, use this opportunity. My thinout rule is: remove > snapshot if resulting gap will be less than some fraction (e.g. 1/4) of its > age. One day I'll publish portable solution on github. Thanks. I hope you do find time to publish it. (And what do you mean by portable?) For now, Snapper has a cleanup algorithm that we can use. At least one of the tools listed here has a thinout algorithm too: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup >> Given this minimal retention of snapshots on the live volume, should I >> defrag it (assuming there is at least 50% free space available on the >> device)? (BTW, is defrag OK on an NVMe drive? or an SSD?) >> >> In the above procedure, would I perform that defrag before or after >> taking the snapshot? Or should I use autodefrag? > > I ended up using autodefrag, didn't try manual defragmentation. I don't use > SSDs as backup volumes. I don't use SSD's as backup volumes either. I was asking about the live volume. > >> Should I consider a dedup tool like one of these? > > Certainly NOT for snapshot-based backups: it is already deduplicated almost > as much as possible, dedup tools can only make it *less* deduplicated. The question is whether to use a dedup tool on the live volume which has a few snapshots. Even with the new strategy (based on rsync), the live volume may sometimes have two snapshots (pre- and post- pacman upgrades). I still wish to know, in that case, about using both a dedup tool and defragmenting the btrfs filesystem. Also still wondering about these options: no-holes, skinny metadata, or extended inode refs? This is a very helpful discussion. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 1:15 AM, Roman Mamedovwrote: > On Wed, 1 Nov 2017 01:00:08 -0400 > Dave wrote: > >> To reconcile those conflicting goals, the only idea I have come up >> with so far is to use btrfs send-receive to perform incremental >> backups as described here: >> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . > > Another option is to just use the regular rsync to a designated destination > subvolume on the backup host, AND snapshot that subvolume on that host from > time to time (or on backup completions, if you can synchronize that). > > rsync --inplace will keep space usage low as it will not reupload entire files > in case of changes/additions to them. > This seems like a brilliant idea, something that has a lot of potential... On a system where the root filesystem is on an SSD and the backup volume on an HDD, I could rsync hourly, and then run Snapper on the backup volume hourly, as well as using Snapper's timeline cleanup on the backup volume. The live filesystem would have zero snapshots and could be optimized for performance. The backup volume could retain a large number of snapshots (even more than several hundred) because performance would not be very important (as far as I can guess). This seems to resolve our conflict. How about on a system (such as a laptop) with only a single SSD? Would this same idea work where the backup volume is on the same block device? I know that is not technically a backup, but what it does accomplish is separation of the live filesystem from the snapshotted backup volume for performance reasons -- yet the hourly snapshot history is still available. That would seem to meet our use case too. (An external backup disk would be connected to the laptop periodically, of course, too.) Currently, for most btrfs volumes, I have three volumes: the main volume, a snapshot subvolume which contains all the individual snapshots, and a backup volume* (on a different block device but on the same machine). With this new idea, I would have a main volume without any snapshots and a backup volume which contains all the snapshots. It simplifies things on that level and it also simplifies performance tuning on the main volume. In fact it simplifies backup snapshot management too. My initial impression is that this simplifies everything as well as optimizing everything. So surely it must have some disadvantages compared to btrfs send-receive incremental backups (https://btrfs.wiki.kernel.org/index.php/Incremental_Backup). What would those disadvantages be? The first one that comes to mind is that I would lose the functionality of pre- and post- upgrade snapshots on the root filesystem. But I think that's minor. I could either keep those two snapshots for a few hours or days after major upgrades or maybe I could find a pacman hook that uses rsync to make pre- and post- upgrade copies... * Footnote: on some workstation computers, we have 2 or 3 separate backup block devices (e..g, external USB hard drives, etc.). Laptops, however, generally only have a single block device and are not always connected to an external USB hard drive for backup as often as would be ideal. But we also don't keep any critical data on laptops. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
I'm active user of backup using btrfs snapshots. Generally it works with some caveats. You seem to have two tasks: (1) same-volume snapshots (I would not call them backups) and (2) updating some backup volume (preferably on a different box). By solving them separately you can avoid some complexity like accidental remove of snapshot that's still needed for updating backup volume. To reconcile those conflicting goals, the only idea I have come up with so far is to use btrfs send-receive to perform incremental backups as described here: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . As already said by Romain Mamedov, rsync is viable alternative to send-receive with much less hassle. According to some reports it can even be faster. Given the hourly snapshots, incremental backups are the only practical option. They take mere moments. Full backups could take an hour or more, which won't work with hourly backups. I don't see much sense in re-doing full backups to the same physical device. If you care about backup integrity, it is probably more important to invest in backups verification. (OTOH, while you didn't reveal data size, if full backup takes just an hour on your system then why not?) We will delete most snapshots on the live volume, but retain many (or all) snapshots on the backup block device. Is that a good strategy, given my goals? Depending on the way you use it, retaining even a dozen snapshots on a live volume might hurt performance (for high-performance databases) or be completely transparent (for user folders). You may want to experiment with this number. In any case I'd not recommend retaining ALL snapshots on backup device, even if you have infinite space. Such filesystem would be as dangerous as the demon core, only good for adding more snapshots (not even deleting them), and any little mistake will blow everything up. Keep a few dozen, hundred at most. Unlike other backup systems, you can fairly easily remove snapshots in the middle of sequence, use this opportunity. My thinout rule is: remove snapshot if resulting gap will be less than some fraction (e.g. 1/4) of its age. One day I'll publish portable solution on github. Given this minimal retention of snapshots on the live volume, should I defrag it (assuming there is at least 50% free space available on the device)? (BTW, is defrag OK on an NVMe drive? or an SSD?) In the above procedure, would I perform that defrag before or after taking the snapshot? Or should I use autodefrag? I ended up using autodefrag, didn't try manual defragmentation. I don't use SSDs as backup volumes. Should I consider a dedup tool like one of these? Certainly NOT for snapshot-based backups: it is already deduplicated almost as much as possible, dedup tools can only make it *less* deduplicated. * Footnote: On the backup device, maybe we will never delete snapshots. In any event, that's not a concern now. We'll retain many, many snapshots on the backup device. Again, DO NOT do this, btrfs in its current state does not support it. Good rule of thumb for time of some operations is data size multiplied by number of snapshots (raised to some power >= 1) and divided by IO/CPU speed. By creating snapshots it is very easy to create petabytes of data for kernel to process, which it won't be able to do in many years. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, 1 Nov 2017 01:00:08 -0400 Davewrote: > To reconcile those conflicting goals, the only idea I have come up > with so far is to use btrfs send-receive to perform incremental > backups as described here: > https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . Another option is to just use the regular rsync to a designated destination subvolume on the backup host, AND snapshot that subvolume on that host from time to time (or on backup completions, if you can synchronize that). rsync --inplace will keep space usage low as it will not reupload entire files in case of changes/additions to them. Yes rsync has to traverse both directory trees to find changes, but that's pretty fast (couple of minutes at most, for a typical root filesystem), especially if you use SSD or SSD caching. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Our use case requires snapshots. btrfs snapshots are best solution we have found for our requirements, and over the last year snapshots have proven their value to us. (For this discussion I am considering both the "root" volume and the "home" volume on a typical desktop workstation. Also, all btfs volumes are mounted with noatime and nodiratime flags.) For performance reasons, I now wish to minimize the number of snapshots retained on the live btrfs volume. However, for backup purposes, I wish to maximize the number of snapshots retained over time. We'll keep yearly, monthly, weekly, daily and hourly snapshots for as long as possible. To reconcile those conflicting goals, the only idea I have come up with so far is to use btrfs send-receive to perform incremental backups as described here: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . Given the hourly snapshots, incremental backups are the only practical option. They take mere moments. Full backups could take an hour or more, which won't work with hourly backups. We will delete most snapshots on the live volume, but retain many (or all) snapshots on the backup block device. Is that a good strategy, given my goals? The steps: I know step one is to do the "bootstrapping" where a full initial copy of the live volume is sent to the backup volume. I also know the steps for doing incremental backups. However, the first problem I see is that performing incremental backups requires both the live volume and the backup volume to have an identical "parent" snapshot before each new incremental can be sent. I have found it easy to accidentally delete that specific required parent snapshot when hourly snapshots are being taken and many snaphots exist. Given that I want to retain the minimum number of snapshots on the live volume, how do I ensure that a valid "parent" subvolume exists there in order to perform the incremental backup? (Again, I have often run into the error "no valid parent exists" when doing incremental backups.) I think the rule is like this: Do not delete a snapshot from the live volume until the next snapshot based on it has been sent to the backup volume. In other words, always retain the *exact* snapshot that was the last one sent to the backup volume. Deleting that one then taking another one does not seem sufficient. BTRFS does not seem to recognize parent-child-grandchild relationships of snapshots when doing send-receive incremental backups. However, maybe I'm wrong. Would it be sufficient to first take another snapshot, then delete the prior snapshot? Will the send-receive algorithm be able to infer a parent exists on the backup volume when it receives an incremental based on a child snapshot? (My experience says "no", but I'd like a more authoritative answer.) The next step in my proposed procedure is to take a new snapshot, send it to the backup volume, and only then delete the prior snapshot ( and only from the live volume* ). Using this strategy, the live volume will always have the current snapshot (which I guess should not be called a snapshot -- it's the live volume) plus at least one more snapshot. Briefly, during the incremental backup, it will have an additional snapshot until the older one gets deleted. Given this minimal retention of snapshots on the live volume, should I defrag it (assuming there is at least 50% free space available on the device)? (BTW, is defrag OK on an NVMe drive? or an SSD?) In the above procedure, would I perform that defrag before or after taking the snapshot? Or should I use autodefrag? Should I consider a dedup tool like one of these? g2p/bedup: Btrfs deduplication https://github.com/g2p/bedup markfasheh/duperemove: Tools for deduping file systems https://github.com/markfasheh/duperemove Zygo/bees: Best-Effort Extent-Same, a btrfs dedup agent https://github.com/Zygo/bees Does anyone care to elaborate on the relationship between a dedup tool like Bees and defragmenting a btrfs filesystem with snapshots? I understand they do opposing things, but I think it was suggested in another thread on defragmenting that they can be combined to good effect. Should I consider this as a possible solution for my situation? Should I consider any of these options: no-holes, skinny metadata, or extended inode refs? Finally, are there any good BTRFS performance wiki articles or blogs I should refer to for my situation? * Footnote: On the backup device, maybe we will never delete snapshots. In any event, that's not a concern now. We'll retain many, many snapshots on the backup device. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/2] btrfs-progs: device: add description of alias to help message
State that the 'delete' is the alias of 'remove' as the man page says. Signed-off-by: Tomohiro MisonoReviewed-by: Satoru Takeuchi --- cmds-device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-device.c b/cmds-device.c index 4337eb2..3b6b985 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -237,7 +237,7 @@ static int cmd_device_remove(int argc, char **argv) static const char * const cmd_device_delete_usage[] = { "btrfs device delete | [|...] ", - "Remove a device from a filesystem", + "Remove a device from a filesystem (alias of \"btrfs device remove\")", NULL }; -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] btrfs-progs: device: add description of alias to help message
State the 'delete' is the alias of 'remove' as the man page says. Signed-off-by: Tomohiro MisonoReviewed-by: Satoru Takeuchi --- cmds-device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-device.c b/cmds-device.c index 4337eb2..3b6b985 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -237,7 +237,7 @@ static int cmd_device_remove(int argc, char **argv) static const char * const cmd_device_delete_usage[] = { "btrfs device delete | [|...] ", - "Remove a device from a filesystem", + "Remove a device from a filesystem (alias of \"btrfs device remove\")", NULL }; -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] btrfs-progs: device: add description of alias to help message
State the 'delete' is the alias of 'remove' as the man page says. Signed-off-by: Tomohiro MisonoReviewed-by: Satoru Takeuchi --- cmds-device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-device.c b/cmds-device.c index 4337eb2..3b6b985 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -237,7 +237,7 @@ static int cmd_device_remove(int argc, char **argv) static const char * const cmd_device_delete_usage[] = { "btrfs device delete | [|...] ", - "Remove a device from a filesystem", + "Remove a device from a filesystem (alias of \"btrfs device remove\")", NULL }; -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: doc: update help/document of btrfs device remove
On 2017/10/11 6:22, Satoru Takeuchi wrote: > At Tue, 3 Oct 2017 17:12:39 +0900, > Misono, Tomohiro wrote: >> >> This patch updates help/document of "btrfs device remove" in two points: >> >> 1. Add explanation of 'missing' for 'device remove'. This is only >> written in wikipage currently. >> (https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices) >> >> 2. Add example of device removal in the man document. This is because >> that explanation of "remove" says "See the example section below", but >> there is no example of removal currently. >> >> Signed-off-by: Tomohiro Misono <misono.tomoh...@jp.fujitsu.com> >> --- >> Documentation/btrfs-device.asciidoc | 19 +++ >> cmds-device.c | 10 +- >> 2 files changed, 28 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/btrfs-device.asciidoc >> b/Documentation/btrfs-device.asciidoc >> index 88822ec..dc523a9 100644 >> --- a/Documentation/btrfs-device.asciidoc >> +++ b/Documentation/btrfs-device.asciidoc >> @@ -75,6 +75,10 @@ The operation can take long as it needs to move all data >> from the device. >> It is possible to delete the device that was used to mount the filesystem. >> The >> device entry in mount table will be replaced by another device name with the >> lowest device id. >> ++ >> +If device is mounted as degraded mode (-o degraded), special term "missing" >> +can be used for . In that case, the first device that is described >> by >> +the filesystem metadata, but not presented at the mount time will be >> removed. >> >> *delete* | [|...] :: >> Alias of remove kept for backward compatibility >> @@ -206,6 +210,21 @@ data or the block groups occupy the whole first device. >> The device size of '/dev/sdb' as seen by the filesystem remains unchanged, >> but >> the logical space from 50-100GiB will be unused. >> >> + REMOVE DEVICE > > It's a part of "TYPICAL USECASES" section. So it's also necessary to modify > the following sentence > > === > See the example section below. > === > > to as follow. > > === > See the *TYPICAL USECASES* section below. > === > > Or just removing the above mentioned sentence is also OK since there is > "See the section *TYPICAL USECASES* for some examples." in "DEVICE MANAGEMENT" > section. > >> + >> +Device removal must satisfy the profile constraints, otherwise the command >> +fails. For example: >> + >> + $ btrfs device remove /dev/sda /mnt >> + $ ERROR: error removing device '/dev/sda': unable to go below two devices >> on raid1 > > s/^$ ERROR/ERROR/ > >> + >> + >> +In order to remove a device, you need to convert profile in this case: >> + >> + $ btrfs balance start -mconvert=dup /mnt >> + $ btrfs balance start -dconvert=single /mnt > > It's simpler to convert both the RAID configuration of data and metadata > by the following one command. > > $ btrfs balance -mconvert=dup -dconvert=single /mnt > >> + $ btrfs device remove /dev/sda /mnt >> + >> DEVICE STATS >> >> >> diff --git a/cmds-device.c b/cmds-device.c >> index 4337eb2..6cb53ff 100644 >> --- a/cmds-device.c >> +++ b/cmds-device.c >> @@ -224,9 +224,16 @@ static int _cmd_device_remove(int argc, char **argv, >> return !!ret; >> } >> >> +#define COMMON_USAGE_REMOVE_DELETE \ >> +"", \ >> +"If 'missing' is specified for , the first device that is", \ >> +"described by the filesystem metadata, but not presented at the", \ >> +"mount time will be removed." >> + >> static const char * const cmd_device_remove_usage[] = { >> "btrfs device remove | [|...] ", >> "Remove a device from a filesystem", >> +COMMON_USAGE_REMOVE_DELETE, >> NULL >> }; >> >> @@ -237,7 +244,8 @@ static int cmd_device_remove(int argc, char **argv) >> >> static const char * const cmd_device_delete_usage[] = { >> "btrfs device delete | [|...] ", >> -"Remove a device from a filesystem", >> +"Remove a device from a filesystem (alias of \"btrfs device remove\")", >> +COMMON_USAGE_REMOVE_DELETE, >> NULL >> }; > > This snippet is not related to the description of this patch. > Dividing this patch is better. &g
Re: [PATCH] btrfs-progs: doc: update help/document of btrfs device remove
At Tue, 3 Oct 2017 17:12:39 +0900, Misono, Tomohiro wrote: > > This patch updates help/document of "btrfs device remove" in two points: > > 1. Add explanation of 'missing' for 'device remove'. This is only > written in wikipage currently. > (https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices) > > 2. Add example of device removal in the man document. This is because > that explanation of "remove" says "See the example section below", but > there is no example of removal currently. > > Signed-off-by: Tomohiro Misono <misono.tomoh...@jp.fujitsu.com> > --- > Documentation/btrfs-device.asciidoc | 19 +++ > cmds-device.c | 10 +- > 2 files changed, 28 insertions(+), 1 deletion(-) > > diff --git a/Documentation/btrfs-device.asciidoc > b/Documentation/btrfs-device.asciidoc > index 88822ec..dc523a9 100644 > --- a/Documentation/btrfs-device.asciidoc > +++ b/Documentation/btrfs-device.asciidoc > @@ -75,6 +75,10 @@ The operation can take long as it needs to move all data > from the device. > It is possible to delete the device that was used to mount the filesystem. > The > device entry in mount table will be replaced by another device name with the > lowest device id. > ++ > +If device is mounted as degraded mode (-o degraded), special term "missing" > +can be used for . In that case, the first device that is described by > +the filesystem metadata, but not presented at the mount time will be removed. > > *delete* | [|...] :: > Alias of remove kept for backward compatibility > @@ -206,6 +210,21 @@ data or the block groups occupy the whole first device. > The device size of '/dev/sdb' as seen by the filesystem remains unchanged, > but > the logical space from 50-100GiB will be unused. > > + REMOVE DEVICE It's a part of "TYPICAL USECASES" section. So it's also necessary to modify the following sentence === See the example section below. === to as follow. === See the *TYPICAL USECASES* section below. === Or just removing the above mentioned sentence is also OK since there is "See the section *TYPICAL USECASES* for some examples." in "DEVICE MANAGEMENT" section. > + > +Device removal must satisfy the profile constraints, otherwise the command > +fails. For example: > + > + $ btrfs device remove /dev/sda /mnt > + $ ERROR: error removing device '/dev/sda': unable to go below two devices > on raid1 s/^$ ERROR/ERROR/ > + > + > +In order to remove a device, you need to convert profile in this case: > + > + $ btrfs balance start -mconvert=dup /mnt > + $ btrfs balance start -dconvert=single /mnt It's simpler to convert both the RAID configuration of data and metadata by the following one command. $ btrfs balance -mconvert=dup -dconvert=single /mnt > + $ btrfs device remove /dev/sda /mnt > + > DEVICE STATS > > > diff --git a/cmds-device.c b/cmds-device.c > index 4337eb2..6cb53ff 100644 > --- a/cmds-device.c > +++ b/cmds-device.c > @@ -224,9 +224,16 @@ static int _cmd_device_remove(int argc, char **argv, > return !!ret; > } > > +#define COMMON_USAGE_REMOVE_DELETE \ > + "", \ > + "If 'missing' is specified for , the first device that is", \ > + "described by the filesystem metadata, but not presented at the", \ > + "mount time will be removed." > + > static const char * const cmd_device_remove_usage[] = { > "btrfs device remove | [|...] ", > "Remove a device from a filesystem", > + COMMON_USAGE_REMOVE_DELETE, > NULL > }; > > @@ -237,7 +244,8 @@ static int cmd_device_remove(int argc, char **argv) > > static const char * const cmd_device_delete_usage[] = { > "btrfs device delete | [|...] ", > - "Remove a device from a filesystem", > + "Remove a device from a filesystem (alias of \"btrfs device remove\")", > + COMMON_USAGE_REMOVE_DELETE, > NULL > }; This snippet is not related to the description of this patch. Dividing this patch is better. Thanks, Satoru > > -- > 2.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking Help on Corruption Issues
On Tue, Oct 03, 2017 at 03:49:25PM -0700, Stephen Nesbitt wrote: > > On 10/3/2017 2:11 PM, Hugo Mills wrote: > >Hi, Stephen, > > > >On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote: > >>Here it i. There are a couple of out-of-order entries beginning at 117. And > >>yes I did uncover a bad stick of RAM: > >> > >>btrfs-progs v4.9.1 > >>leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2 > >>fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > >>chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6 > >[snip] > >>item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53 > >>extent refs 1 gen 3346444 flags DATA > >>extent data backref root 271 objectid 2478 offset 0 count 1 > >>item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53 > >>extent refs 1 gen 3346495 flags DATA > >>extent data backref root 271 objectid 21751764 offset 6733824 count 1 > >>item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53 > >>extent refs 1 gen 3351513 flags DATA > >>extent data backref root 271 objectid 5724364 offset 680640512 count 1 > >>item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53 > >>extent refs 1 gen 3346376 flags DATA > >>extent data backref root 271 objectid 21751764 offset 6701056 count 1 > hex(1623012749312) > >'0x179e3193000' > hex(1621939052544) > >'0x179a319e000' > hex(1623012450304) > >'0x179e314a000' > hex(1623012802560) > >'0x179e31a' > > > >That's "e" -> "a" in the fourth hex digit, which is a single-bit > >flip, and should be fixable by btrfs check (I think). However, even > >fixing that, it's not ordered, because 118 is then before 117, which > >could be another bitflip ("9" -> "4" in the 7th digit), but two bad > >bits that close to each other seems unlikely to me. > > > >Hugo. > > Hope this is a duplicate reply - I might have fat fingered something. > > The underlying file is disposable/replaceable. Any way to zero > out/zap the bad BTRFS entry? Not really. Even trying to delete the related file(s), it's going to fall over when reading the metadata in in the first place. (The key order check is a metadata invariant, like the csum checks and transid checks). At best, you'd have to get btrfs check to fix it. It should be able to manage a single-bit error, but you've got two single-bit errors in close proximity, and I'm not sure it'll be able to deal with it. Might be worth trying it. The FS _might_ blow up as a result of an attempted fix, but you say it's replacable, so that's kind of OK. The worst I'd _expect_ to happen with btrfs check --repair is that it just won't be able to deal with it and you're left where you started. Go for it. Hugo. -- Hugo Mills | You shouldn't anthropomorphise computers. They hugo@... carfax.org.uk | really don't like that. http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Seeking Help on Corruption Issues
On 10/3/2017 2:11 PM, Hugo Mills wrote: Hi, Stephen, On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote: Here it i. There are a couple of out-of-order entries beginning at 117. And yes I did uncover a bad stick of RAM: btrfs-progs v4.9.1 leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2 fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3 chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6 [snip] item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53 extent refs 1 gen 3346444 flags DATA extent data backref root 271 objectid 2478 offset 0 count 1 item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53 extent refs 1 gen 3346495 flags DATA extent data backref root 271 objectid 21751764 offset 6733824 count 1 item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53 extent refs 1 gen 3351513 flags DATA extent data backref root 271 objectid 5724364 offset 680640512 count 1 item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53 extent refs 1 gen 3346376 flags DATA extent data backref root 271 objectid 21751764 offset 6701056 count 1 hex(1623012749312) '0x179e3193000' hex(1621939052544) '0x179a319e000' hex(1623012450304) '0x179e314a000' hex(1623012802560) '0x179e31a' That's "e" -> "a" in the fourth hex digit, which is a single-bit flip, and should be fixable by btrfs check (I think). However, even fixing that, it's not ordered, because 118 is then before 117, which could be another bitflip ("9" -> "4" in the 7th digit), but two bad bits that close to each other seems unlikely to me. Hugo. Hope this is a duplicate reply - I might have fat fingered something. The underlying file is disposable/replaceable. Any way to zero out/zap the bad BTRFS entry? -steve -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking Help on Corruption Issues
Hi, Stephen, On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote: > Here it i. There are a couple of out-of-order entries beginning at 117. And > yes I did uncover a bad stick of RAM: > > btrfs-progs v4.9.1 > leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2 > fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6 [snip] > item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53 > extent refs 1 gen 3346444 flags DATA > extent data backref root 271 objectid 2478 offset 0 count 1 > item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53 > extent refs 1 gen 3346495 flags DATA > extent data backref root 271 objectid 21751764 offset 6733824 count 1 > item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53 > extent refs 1 gen 3351513 flags DATA > extent data backref root 271 objectid 5724364 offset 680640512 count 1 > item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53 > extent refs 1 gen 3346376 flags DATA > extent data backref root 271 objectid 21751764 offset 6701056 count 1 >>> hex(1623012749312) '0x179e3193000' >>> hex(1621939052544) '0x179a319e000' >>> hex(1623012450304) '0x179e314a000' >>> hex(1623012802560) '0x179e31a' That's "e" -> "a" in the fourth hex digit, which is a single-bit flip, and should be fixable by btrfs check (I think). However, even fixing that, it's not ordered, because 118 is then before 117, which could be another bitflip ("9" -> "4" in the 7th digit), but two bad bits that close to each other seems unlikely to me. Hugo. -- Hugo Mills | Great films about cricket: Silly Point Break hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Seeking Help on Corruption Issues
On Tue, Oct 03, 2017 at 01:06:50PM -0700, Stephen Nesbitt wrote: > All: > > I came back to my computer yesterday to find my filesystem in read > only mode. Running a btrfs scrub start -dB aborts as follows: > > btrfs scrub start -dB /mnt > ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 > (Input/output error) > ERROR: scrubbing /mnt failed for device id 5: ret=-1, errno=5 > (Input/output error) > scrub device /dev/sdb (id 4) canceled > scrub started at Mon Oct 2 21:51:46 2017 and was aborted after > 00:09:02 > total bytes scrubbed: 75.58GiB with 1 errors > error details: csum=1 > corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 > scrub device /dev/sdc (id 5) canceled > scrub started at Mon Oct 2 21:51:46 2017 and was aborted after > 00:11:11 > total bytes scrubbed: 50.75GiB with 0 errors > > The resulting dmesg is: > [ 699.534066] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, > rd 0, flush 0, corrupt 6, gen 0 > [ 699.703045] BTRFS error (device sdc): unable to fixup (regular) > error at logical 1609808347136 on dev /dev/sdb > [ 783.306525] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 This error usually means bad RAM. Can you show us the output of "btrfs-debug-tree -b 2589782867968 /dev/sdc"? Hugo. > [ 789.776132] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 > [ 911.529842] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 > [ 918.365225] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 > > Running btrfs check /dev/sdc results in: > btrfs check /dev/sdc > Checking filesystem on /dev/sdc > UUID: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > checking extents > bad key ordering 116 117 > bad block 2589782867968 > ERROR: errors found in extent allocation tree or chunk allocation > checking free space cache > There is no free space entry for 1623012450304-1623012663296 > There is no free space entry for 1623012450304-1623225008128 > cache appears valid but isn't 1622151266304 > found 288815742976 bytes used err is -22 > total csum bytes: 0 > total tree bytes: 350781440 > total fs tree bytes: 0 > total extent tree bytes: 350027776 > btree space waste bytes: 115829777 > file data blocks allocated: 156499968 > > uname -a: > Linux sysresccd 4.9.24-std500-amd64 #2 SMP Sat Apr 22 17:14:43 UTC > 2017 x86_64 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz GenuineIntel > GNU/Linux > > btrfs --version: btrfs-progs v4.9.1 > > btrfs fi show: > Label: none uuid: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > Total devices 2 FS bytes used 475.08GiB > devid 4 size 931.51GiB used 612.06GiB path /dev/sdb > devid 5 size 931.51GiB used 613.09GiB path /dev/sdc > > btrfs fi df /mnt: > Data, RAID1: total=603.00GiB, used=468.03GiB > System, RAID1: total=64.00MiB, used=112.00KiB > System, single: total=32.00MiB, used=0.00B > Metadata, RAID1: total=9.00GiB, used=7.04GiB > Metadata, single: total=1.00GiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > What is the recommended procedure at this point? Run btrfs check > --repair? I have backups so losing a file or two isn't critical, but > I really don't want to go through the effort of a bare metal > reinstall. > > In the process of researching this I did uncover a bad DIMM. Am I > correct that the problems I'm seeing are likely linked to the > resulting memory errors. > > Thx in advance, > > -steve > -- Hugo Mills | Quidquid latine dictum sit, altum videtur hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Seeking Help on Corruption Issues
All: I came back to my computer yesterday to find my filesystem in read only mode. Running a btrfs scrub start -dB aborts as follows: btrfs scrub start -dB /mnt ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output error) ERROR: scrubbing /mnt failed for device id 5: ret=-1, errno=5 (Input/output error) scrub device /dev/sdb (id 4) canceled scrub started at Mon Oct 2 21:51:46 2017 and was aborted after 00:09:02 total bytes scrubbed: 75.58GiB with 1 errors error details: csum=1 corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 scrub device /dev/sdc (id 5) canceled scrub started at Mon Oct 2 21:51:46 2017 and was aborted after 00:11:11 total bytes scrubbed: 50.75GiB with 0 errors The resulting dmesg is: [ 699.534066] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 [ 699.703045] BTRFS error (device sdc): unable to fixup (regular) error at logical 1609808347136 on dev /dev/sdb [ 783.306525] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 [ 789.776132] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 [ 911.529842] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 [ 918.365225] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 Running btrfs check /dev/sdc results in: btrfs check /dev/sdc Checking filesystem on /dev/sdc UUID: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 checking extents bad key ordering 116 117 bad block 2589782867968 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache There is no free space entry for 1623012450304-1623012663296 There is no free space entry for 1623012450304-1623225008128 cache appears valid but isn't 1622151266304 found 288815742976 bytes used err is -22 total csum bytes: 0 total tree bytes: 350781440 total fs tree bytes: 0 total extent tree bytes: 350027776 btree space waste bytes: 115829777 file data blocks allocated: 156499968 uname -a: Linux sysresccd 4.9.24-std500-amd64 #2 SMP Sat Apr 22 17:14:43 UTC 2017 x86_64 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz GenuineIntel GNU/Linux btrfs --version: btrfs-progs v4.9.1 btrfs fi show: Label: none uuid: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 Total devices 2 FS bytes used 475.08GiB devid 4 size 931.51GiB used 612.06GiB path /dev/sdb devid 5 size 931.51GiB used 613.09GiB path /dev/sdc btrfs fi df /mnt: Data, RAID1: total=603.00GiB, used=468.03GiB System, RAID1: total=64.00MiB, used=112.00KiB System, single: total=32.00MiB, used=0.00B Metadata, RAID1: total=9.00GiB, used=7.04GiB Metadata, single: total=1.00GiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B What is the recommended procedure at this point? Run btrfs check --repair? I have backups so losing a file or two isn't critical, but I really don't want to go through the effort of a bare metal reinstall. In the process of researching this I did uncover a bad DIMM. Am I correct that the problems I'm seeing are likely linked to the resulting memory errors. Thx in advance, -steve -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: doc: update help/document of btrfs device remove
This patch updates help/document of "btrfs device remove" in two points: 1. Add explanation of 'missing' for 'device remove'. This is only written in wikipage currently. (https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices) 2. Add example of device removal in the man document. This is because that explanation of "remove" says "See the example section below", but there is no example of removal currently. Signed-off-by: Tomohiro Misono <misono.tomoh...@jp.fujitsu.com> --- Documentation/btrfs-device.asciidoc | 19 +++ cmds-device.c | 10 +- 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/Documentation/btrfs-device.asciidoc b/Documentation/btrfs-device.asciidoc index 88822ec..dc523a9 100644 --- a/Documentation/btrfs-device.asciidoc +++ b/Documentation/btrfs-device.asciidoc @@ -75,6 +75,10 @@ The operation can take long as it needs to move all data from the device. It is possible to delete the device that was used to mount the filesystem. The device entry in mount table will be replaced by another device name with the lowest device id. ++ +If device is mounted as degraded mode (-o degraded), special term "missing" +can be used for . In that case, the first device that is described by +the filesystem metadata, but not presented at the mount time will be removed. *delete* | [|...] :: Alias of remove kept for backward compatibility @@ -206,6 +210,21 @@ data or the block groups occupy the whole first device. The device size of '/dev/sdb' as seen by the filesystem remains unchanged, but the logical space from 50-100GiB will be unused. + REMOVE DEVICE + +Device removal must satisfy the profile constraints, otherwise the command +fails. For example: + + $ btrfs device remove /dev/sda /mnt + $ ERROR: error removing device '/dev/sda': unable to go below two devices on raid1 + + +In order to remove a device, you need to convert profile in this case: + + $ btrfs balance start -mconvert=dup /mnt + $ btrfs balance start -dconvert=single /mnt + $ btrfs device remove /dev/sda /mnt + DEVICE STATS diff --git a/cmds-device.c b/cmds-device.c index 4337eb2..6cb53ff 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -224,9 +224,16 @@ static int _cmd_device_remove(int argc, char **argv, return !!ret; } +#define COMMON_USAGE_REMOVE_DELETE \ + "", \ + "If 'missing' is specified for , the first device that is", \ + "described by the filesystem metadata, but not presented at the", \ + "mount time will be removed." + static const char * const cmd_device_remove_usage[] = { "btrfs device remove | [|...] ", "Remove a device from a filesystem", + COMMON_USAGE_REMOVE_DELETE, NULL }; @@ -237,7 +244,8 @@ static int cmd_device_remove(int argc, char **argv) static const char * const cmd_device_delete_usage[] = { "btrfs device delete | [|...] ", - "Remove a device from a filesystem", + "Remove a device from a filesystem (alias of \"btrfs device remove\")", + COMMON_USAGE_REMOVE_DELETE, NULL }; -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help Recovering BTRFS array
Hi Duncan, I'm not sure if this will attache to my original message... Thank you for your reply. For some reason i'm not getting list messages even tho i know i am subscribed. I know all to well about the golden rule of data. It has bitten me a few times. The data on this array is mostly data that i don't really care about. I was able to copy off what i wanted. The main reason i sent it to the list was just to see if i could somehow return the FS to a working state without having to recreate. I'm just surprised that all 3 copies of the super block got corrupted. Probably my lack of understanding but i always assumed that if one copy got corrupted it would be replaced by a good copy therefore leaving all copies in a good state. Is that not the case. If it is then what back luck that all 3 got messed up at same time. Some information i forgot to include in my original message uname -a Linux thebeach 4.12.13-gentoo-GMAN #1 SMP Sat Sep 16 15:28:26 ADT 2017 x86_64 Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz GenuineIntel GNU/Linux btrfs --version btrfs-progs v4.10.2 Anyways thank you again for your reply. I will leave the FS intact for a few days in case anymore details could help the development of BTRFS and maybe avoid this happening or having a recovery option. Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help Recovering BTRFS array
grondinm posted on Mon, 18 Sep 2017 14:14:08 -0300 as excerpted: > superblock: bytenr=65536, device=/dev/md0 > - > ERROR: bad magic on superblock on /dev/md0 at 65536 > > superblock: bytenr=67108864, device=/dev/md0 > - > ERROR: bad magic on superblock on /dev/md0 at 67108864 > > superblock: bytenr=274877906944, device=/dev/md0 > - > ERROR: bad magic on superblock on /dev/md0 at 274877906944 > > Now i'm really panicked. Is the FS toast? Can any recovery be attempted? First I'm a user and list regular, not a dev. With luck they can help beyond the below suggestions... However, there's no need to panic in any case, due to the sysadmin's first rule of backups: The true value of any data is defined by the number of backups of that data you consider(ed) it worth having. As a result, there are precisely two possibilities, neither one of which calls for panic. 1) No need to panic because you have a backup, and recovery is as simple as restoring from that backup. 2) You don't have a backup, in which case the lack of that backup means you have defined the value of the data as only trivial, worth less than the time/trouble/resources you saved by not making that backup. Because the data is only of trivial value anyway, and you saved the more valuable assets of the time/trouble/resources you would have put into that backup were the data of more than trivial value, you've still saved the stuff you considered most valuable, so again, no need to panic. It's a binary state. There's no third possibility available, and no possibility you lost what your actions, or lack of them in the case of no backup, defined as of most value to you. (As for the freshness of that backup, the same rule applies, but to the data delta between the state as of the backup and the current state. If the value of the changed data is worth it to you to have it backed up, you'll have freshened your backup. If not, you defined it to be as of such trivial value as to not be worth the time/trouble/resources to do so.) That said, at the time you're calculating the value of the data against the value of the time/trouble/resources required to back it up, the loss potential remains theoretical. Once something actually happens to the data, it's no longer theoretical, and the data, while of trivial enough value to be worth the risk when it was theoretical, may still be valuable enough to you to spend at least some time/trouble on trying to recover it. In that case, since you can still mount, I'd suggest mounting read-only to prevent any further damage, and then do a copy off of the data you can, to a different, unaffected, filesystem. Then if there's still data you want that you couldn't simply copy off, you can try btrfs restore. While I do have backups here, a couple times when things went bad, btrfs restore was able to get back pretty much everything to current, while were I to have had to restore from backups, I'd have lost enough changed data to hurt, even if I had defined it as of trivial enough value when the risk remained theoretical that I hadn't yet freshened the backup. (Since then I upgraded the rest of my storage to ssd, thus lowering the time and hassle cost of backups, encouraging me to do them more frequently. Talking about which, I need to freshen them in the near future. It's now on my list for my next day off...) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help Recovering BTRFS array
Hello, I will try to provide all information pertinent to the situation i find myself in. Yesterday while trying to write some data to a BTRFS filesystem on top of a mdadm raid5 array encrypted with dmcrypt comprising of 4 1tb HDD my system became unresponsive and i had no choice but to hard reset. System came back up no problem and the array in question mounted without a complaint. Once i tried to write data to it again however the system became unresponsive again and required another hard reset. Again system came back up and everything mounted with no complaints. This time i decided to run some checks. Ran a raid check by issuing 'echo check > /sys/block/md0/md/sync_action'. This completed without a single error. So i performed a proper restart just because and once the system came back up i initiated a scrub on the btrfs filesystem. This greeted me with my first indication that something is wrong: btrfs sc stat /media/Storage2 scrub status for e5bd5cf3-c736-48ff-b1c6-c9f678567788 scrub started at Mon Sep 18 06:05:21 2017, running for 07:40:47 total bytes scrubbed: 1.03TiB with 1 errors error details: super=1 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 I was concerned but since it was still scrubbing i left it. Now things look really bleak... Every few minutes the scrub process goes into a D status as shown by htop it eventually keeps going and as far as i can see is still scrubbing(slowly). I decided to check a something else(based on the error above) I ran btrfs inspect-internal dump-super -a -f /dev/md0 which gave me this: superblock: bytenr=65536, device=/dev/md0 - ERROR: bad magic on superblock on /dev/md0 at 65536 superblock: bytenr=67108864, device=/dev/md0 - ERROR: bad magic on superblock on /dev/md0 at 67108864 superblock: bytenr=274877906944, device=/dev/md0 - ERROR: bad magic on superblock on /dev/md0 at 274877906944 Now i'm really panicked. Is the FS toast? Can any recovery be attempted? Here is the output of dump-super with the -F option: superblock: bytenr=65536, device=/dev/md0 - csum_type 43668 (INVALID) csum_size 32 csum 0x76c647b04abf1057f04e40d1dc52522397258064b98a1b8f6aa6934c74c0dd55 [DON'T MATCH] bytenr 6376050623103086821 flags 0x7edcc412b742c79f ( WRITTEN | RELOC | METADUMP | unknown flag: 0x7edcc410b742c79c ) magic ..l~...q [DON'T MATCH] fsid2cf827fa-7ab8-e290-b152-1735c2735a37 label .a.9.@.=4.#.|.D...]..dh=d,..k..n..~.5.i.8...(.._.tl.a.@..2..qidj.>Hy.U..{X5.kG0.)t..;/.2...@.T.|.u.<.`!J*9./8...&.g\.V...*.,/95.uEs..W.i..z..h...n(...VGn^F...H...5.DT..3.A..mK...~..}.1..n. generation 1769598730239175261 root14863846352370317867 sys_array_size 1744503544 chunk_root_generation 18100024505086712407 root_level 79 chunk_root 10848092274453435018 chunk_root_level156 log_root7514172289378668244 log_root_transid6227239369566282426 log_root_level 18 total_bytes 5481087866519986730 bytes_used 13216280034370888020 sectorsize 4102056786 nodesize1038279258 leafsize276348297 stripesize 2473897044 root_dir12090183195204234845 num_devices 12836127619712721941 compat_flags0xf98ff436fc954bd4 compat_ro_flags 0x3fe8246616164da7 ( FREE_SPACE_TREE | FREE_SPACE_TREE_VALID | unknown flag: 0x3fe8246616164da4 ) incompat_flags 0x3989a5037330bfd8 ( COMPRESS_LZO | COMPRESS_LZOv2 | EXTENDED_IREF | RAID56 | SKINNY_METADATA | NO_HOLES | unknown flag: 0x3989a5037330bc10 ) cache_generation10789185961859482334 uuid_tree_generation14921288820846890813 dev_item.uuid e6e382b3-de66-4c25-7cc9-3cc43cde9c24 dev_item.fsid f8430e37-12ca-adaf-b038-f0ee10ce6327 [DON'T MATCH] dev_item.type 7909001383421391155 dev_item.total_bytes4839925749276763097 dev_item.bytes_used 14330418354255459170 dev_item.io_align 4136652250 dev_item.io_width 1113335506 dev_item.sector_size1197062542 dev_item.devid 16559830033162408461 dev_item.dev_group 3271056113
Re: Please help with exact actions for raid1 hot-swap
On 2017-09-11 17:33, Duncan wrote: Austin S. Hemmelgarn posted on Mon, 11 Sep 2017 11:11:01 -0400 as excerpted: On 2017-09-11 09:16, Marat Khalili wrote: Patrik, Duncan, thank you for the help. The `btrfs replace start /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't try to reboot yet, still have grub/efi/several mdadm partitions to copy). Does this mean: * I should not be afraid to reboot and find /dev/sdb7 mounted again? * I will not be able to easily mount /dev/sdb7 on a different computer to do some tests? This depends. I don't remember if the replace command wipes the super-block on the old device after the replace completes or not. AFAIK it does. Based on checking after I sent my reply, it does. If it does not, then you can't safely mount the filesystem while that device is still in the system, but can transfer it to another system and mount it degraded (probably, not a certainty). It's worth noting that while this shouldn't be a problem here (because the magic should be gone), the problem does appear in other contexts. In particular, any context that does device duplication is a problem. This means dd-ing the content of a device to another device is a problem, because once btrfs device scan is triggered (and udev can trigger it automatically/unexpectedly), btrfs will see the second device and consider it part of the same filesystem as the first, causing problems if either one is mounted. dd-ing to a file tends to be less of a problem, because it's just a file until activated as a loopback device, and that doesn't tend to happen automatically. Similarly, lvm's device mirroring modes can be problematic, with udev again sometimes unexpectedly triggering btrfs device scan on device appearance, unless measures are taken to hide the new device. I tried lvm some time ago and decided I didn't find it useful for my on use- cases, so I don't know the details here, in particular, I'm not sure of the device hiding options, but there have certainly been threads on the list discussing the problem and the option to hide the device to prevent it came up in one of them. Based on my own experience, LVM works fine as of right now provided you use the standard LVM udev rules (which disable almost all udev processing on LVM internal devices). In fact, the only issues I've had in the past with BTRFS on LVM were related to dm-cache not properly hiding the backing device originally, and some generic stability issues early on with BTRFS on top of dm-thinp if it does, then you can safely keep the device in the system, but won't be able to move it to another computer and get data off of it. This should be the case. Tho it may be as simple as restoring the btrfs magic in the superblock to restore it to mountability, but I believe the replace process deletes chunks as they are transfered, so actually getting data off it may be more complicated than simply making it mountable again. Regardless of which is the case, you won't see /dev/sdb7 mounted as a separate filesystem when you reboot. Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi show` still displays it as 2.71TiB, why? `btrfs replace` is functionally equivalent to using dd to copy the contents of the device being replaced to the new device, albeit a bit smarter (as mentioned above). This means in particular that it does not resize the filesystem (although i think I saw some discussion and possibly patches to handle that with a command-line option). This is documented. From the btrfs-replace manpage (from btrfs-progs 4.12, reformatted a bit here for posting): The needs to be same size or larger than the . Note: The filesystem has to be resized to fully take advantage of a larger target device, this can be achieved with btrfs filesystem resize :max /path <<<<<< -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Austin S. Hemmelgarn posted on Mon, 11 Sep 2017 11:11:01 -0400 as excerpted: > On 2017-09-11 09:16, Marat Khalili wrote: >> Patrik, Duncan, thank you for the help. The `btrfs replace start >> /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't >> try to reboot yet, still have grub/efi/several mdadm partitions to >> copy). >> Does this mean: >> * I should not be afraid to reboot and find /dev/sdb7 mounted again? >> * I will not be able to easily mount /dev/sdb7 on a different computer >> to do some tests? > This depends. I don't remember if the replace command wipes the > super-block on the old device after the replace completes or not. AFAIK it does. > If it > does not, then you can't safely mount the filesystem while that device > is still in the system, but can transfer it to another system and mount > it degraded (probably, not a certainty). It's worth noting that while this shouldn't be a problem here (because the magic should be gone), the problem does appear in other contexts. In particular, any context that does device duplication is a problem. This means dd-ing the content of a device to another device is a problem, because once btrfs device scan is triggered (and udev can trigger it automatically/unexpectedly), btrfs will see the second device and consider it part of the same filesystem as the first, causing problems if either one is mounted. dd-ing to a file tends to be less of a problem, because it's just a file until activated as a loopback device, and that doesn't tend to happen automatically. Similarly, lvm's device mirroring modes can be problematic, with udev again sometimes unexpectedly triggering btrfs device scan on device appearance, unless measures are taken to hide the new device. I tried lvm some time ago and decided I didn't find it useful for my on use- cases, so I don't know the details here, in particular, I'm not sure of the device hiding options, but there have certainly been threads on the list discussing the problem and the option to hide the device to prevent it came up in one of them. > if it does, then you can > safely keep the device in the system, but won't be able to move it to > another computer and get data off of it. This should be the case. Tho it may be as simple as restoring the btrfs magic in the superblock to restore it to mountability, but I believe the replace process deletes chunks as they are transfered, so actually getting data off it may be more complicated than simply making it mountable again. > Regardless of which is the > case, you won't see /dev/sdb7 mounted as a separate filesystem when you > reboot. >> Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi >> show` still displays it as 2.71TiB, why? > `btrfs replace` is functionally equivalent to using dd to copy the > contents of the device being replaced to the new device, albeit a bit > smarter (as mentioned above). This means in particular that it does not > resize the filesystem (although i think I saw some discussion and > possibly patches to handle that with a command-line option). This is documented. From the btrfs-replace manpage (from btrfs-progs 4.12, reformatted a bit here for posting): >>>>>> The needs to be same size or larger than the . Note: The filesystem has to be resized to fully take advantage of a larger target device, this can be achieved with btrfs filesystem resize :max /path <<<<<< -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 2017-09-11 09:16, Marat Khalili wrote: Patrik, Duncan, thank you for the help. The `btrfs replace start /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't try to reboot yet, still have grub/efi/several mdadm partitions to copy). It also worked much faster than mdadm would take, apparently only moving 126GB used, not 2.71TB total. This is why replace is preferred over add/remove. The replace operation only copies exactly the data that is needed off of the old device, instead of copying the whole device like LVM and MD need to, or rewriting the whole filesystem (like add/remove does). For what it's worth, if you can't use replace for some reason and have to use add and remove, it is more efficient to add the new device and then remove the old one, because it will require less data movement to get a properly balanced filesystem (removing a device is actually a balance operation that prevents writes to the device being removed). Interestingly, according to HDD lights it mostly read from the remaining /dev/sda, not from replaced /dev/sdb (which must be completely readable now according to smartctl -- problematic sector got finally remapped after ~1day). This is odd. I was under the impression that replace preferentially reads from the device being replaced unless you tell it to avoid reading from said device. It now looks like follows: $ sudo blkid /dev/sda7 /dev/sdb7 /dev/sdd7 /dev/sda7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="db644855-2334-4d61-a27b-9a591255aa39" TYPE="btrfs" PARTUUID="c5ceab7e-e5f8-47c8-b922-c5fa0678831f" /dev/sdb7: PARTUUID="493923cd-9ecb-4ee8-988b-5d0bfa8991b3" /dev/sdd7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="9c2f05e9-5996-479f-89ad-f94f7ce130e6" TYPE="btrfs" PARTUUID="178cd274-7251-4d25-9116-ce0732d2410b" $ sudo btrfs fi show /dev/sdb7 ERROR: no btrfs on /dev/sdb7 $ sudo btrfs fi show /dev/sdd7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 108.05GiB devid 1 size 2.71TiB used 131.03GiB path /dev/sda7 devid 2 size 2.71TiB used 131.03GiB path /dev/sdd7 Does this mean: * I should not be afraid to reboot and find /dev/sdb7 mounted again? * I will not be able to easily mount /dev/sdb7 on a different computer to do some tests? This depends. I don't remember if the replace command wipes the super-block on the old device after the replace completes or not. If it does not, then you can't safely mount the filesystem while that device is still in the system, but can transfer it to another system and mount it degraded (probably, not a certainty). if it does, then you can safely keep the device in the system, but won't be able to move it to another computer and get data off of it. Regardless of which is the case, you won't see /dev/sdb7 mounted as a separate filesystem when you reboot. Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi show` still displays it as 2.71TiB, why? `btrfs replace` is functionally equivalent to using dd to copy the contents of the device being replaced to the new device, albeit a bit smarter (as mentioned above). This means in particular that it does not resize the filesystem (although i think I saw some discussion and possibly patches to handle that with a command-line option). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Patrik, Duncan, thank you for the help. The `btrfs replace start /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't try to reboot yet, still have grub/efi/several mdadm partitions to copy). It also worked much faster than mdadm would take, apparently only moving 126GB used, not 2.71TB total. Interestingly, according to HDD lights it mostly read from the remaining /dev/sda, not from replaced /dev/sdb (which must be completely readable now according to smartctl -- problematic sector got finally remapped after ~1day). It now looks like follows: $ sudo blkid /dev/sda7 /dev/sdb7 /dev/sdd7 /dev/sda7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="db644855-2334-4d61-a27b-9a591255aa39" TYPE="btrfs" PARTUUID="c5ceab7e-e5f8-47c8-b922-c5fa0678831f" /dev/sdb7: PARTUUID="493923cd-9ecb-4ee8-988b-5d0bfa8991b3" /dev/sdd7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="9c2f05e9-5996-479f-89ad-f94f7ce130e6" TYPE="btrfs" PARTUUID="178cd274-7251-4d25-9116-ce0732d2410b" $ sudo btrfs fi show /dev/sdb7 ERROR: no btrfs on /dev/sdb7 $ sudo btrfs fi show /dev/sdd7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 108.05GiB devid1 size 2.71TiB used 131.03GiB path /dev/sda7 devid2 size 2.71TiB used 131.03GiB path /dev/sdd7 Does this mean: * I should not be afraid to reboot and find /dev/sdb7 mounted again? * I will not be able to easily mount /dev/sdb7 on a different computer to do some tests? Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi show` still displays it as 2.71TiB, why? -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 2017-09-10 02:33, Marat Khalili wrote: It doesn't need replaced disk to be readable, right? Then what prevents same procedure to work without a spare bay? In theory, nothing. In practice, there are reliability issues with mounting a filesystem degraded (and you should be avoiding running any array degraded, regardless of if it's BTRFS or actual RAID (be that LVM, MD, or hardware)). It's also significantly faster to do it with a spare drive bay because that will just read from the device being replaced and copy data directly, while pulling the device to be replaced requires rebuilding the data (there is more involved than just copying, even with a raid1 profile). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
Thanks everyone for the helpful and detailed responses. Now that you confirmed that everything is fine with my FS, I'm all relaxed because I can for sure live with the output of df. On Mon, Sep 11, 2017 at 5:29 AM, Andrei Borzenkovwrote: > 10.09.2017 23:17, Dmitrii Tcvetkov пишет: Drive1 Drive2Drive3 X X X X X X Where X is a chunk of raid1 block group. >>> >>> But this table clearly shows that adding third drive increases free >>> space by 50%. You need to reallocate data to actually make use of it, >>> but it was done in this case. >> >> It increases it but I don't see how this space is in any way useful >> unless data is in single profile. After full balance chunks will be >> spread over 3 devices, how it helps in raid1 data profile case? >> > A1 A2 => A1 A2 - => A1 A2 B1 => A1 A2 B1 > B1 B2 B1 B2 -- B2 -C1 B2 C2 > > It is raid1 profile on three disks fully utilizing them (assuming equal > sizes of course). Where "raid1" means - each data block has two copies > on different devices. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
10.09.2017 23:17, Dmitrii Tcvetkov пишет: >>> Drive1 Drive2Drive3 >>> X X >>> X X >>> X X >>> >>> Where X is a chunk of raid1 block group. >> >> But this table clearly shows that adding third drive increases free >> space by 50%. You need to reallocate data to actually make use of it, >> but it was done in this case. > > It increases it but I don't see how this space is in any way useful > unless data is in single profile. After full balance chunks will be > spread over 3 devices, how it helps in raid1 data profile case? > A1 A2 => A1 A2 - => A1 A2 B1 => A1 A2 B1 B1 B2 B1 B2 -- B2 -C1 B2 C2 It is raid1 profile on three disks fully utilizing them (assuming equal sizes of course). Where "raid1" means - each data block has two copies on different devices. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
FLJ posted on Sun, 10 Sep 2017 15:45:42 +0200 as excerpted: > I have a BTRFS RAID1 volume running for the past year. I avoided all > pitfalls known to me that would mess up this volume. I never > experimented with quotas, no-COW, snapshots, defrag, nothing really. > The volume is a RAID1 from day 1 and is working reliably until now. > > Until yesterday it consisted of two 3 TB drives, something along the > lines: > > Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db > Total devices 2 FS bytes used 2.47TiB > devid1 size 2.73TiB used 2.47TiB path /dev/sdb > devid2 size 2.73TiB used 2.47TiB path /dev/sdc I'm going to try a different approach than I see in the two existing subthreads, so I started from scratch with my own subthread... So the above looks reasonable so far... > > Yesterday I've added a new drive to the FS and did a full rebalance > (without filters) over night, which went through without any issues. > > Now I have: > Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db > Total devices 3 FS bytes used 2.47TiB > devid1 size 2.73TiB used 1.24TiB path /dev/sdb > devid2 size 2.73TiB used 1.24TiB path /dev/sdc > devid3 size 7.28TiB used 2.48TiB path /dev/sda That's exactly as expected, after a balance. Note the size, 2.73 TiB (twos-power) for the smaller two, not 3 (tho it's probably 3 TB, tens-power), 7.28 TiB, not 8, for the larger one. The most-free-space chunk allocation, with raid1-paired chunks, means the first chunk of every pair will get allocated to the largest, 7.28 TiB device. The other two devices are equal in size, 2.73 TiB each, and the second chunk can't get allocated to the largest device as only one chunk of the pair can go there, so the allocator will in general alternate allocations from the smaller two, for the second chunk of each pair. (I say in general, because metadata chunks are smaller than data chunks, so it's possible that two chunks in a row, a metadata chunk and a data chunk, will be allocated from the same device, before it switches to the other.) Because the larger device is larger than the other two combined, it'll always get one copy, while the others fill up evenly at half the usage of the larger device, until both smaller devices are full, at which point you won't be able to allocate further raid1 chunks and you'll ENOSPC. > # btrfs fi df /mnt/BigVault/ > Data, RAID1: total=2.47TiB, used=2.47TiB > System, RAID1: total=32.00MiB, used=384.00KiB > Metadata, RAID1: total=4.00GiB, used=2.74GiB > GlobalReserve, single: total=512.00MiB, used=0.00B Still looks reasonable. Note that assuming you're using a reasonably current btrfs-progs, there's also the btrfs fi usage and btrfs dev usage commands. Btrfs fi df is an older form that has much less information than the fi and dev usage commands, tho between btrfs fi show and btrfs fi df, /most/ of the filesystem-level information in btrfs fi usage can be deduced, tho not necessarily the device-level detail. Btrfs fi usage is thus preferred, assuming it's available to you. (In addition to btrfs fi usage being newer, both it and btrfs fi df require a mounted btrfs. If the filesystem refuses to mount, btrfs fi show may be all that's available.) While I'm digressing, I'm guessing you know this already, but for others, global reserve is reserved from and comes out of metadata, so you can add global reserve total to metadata used. Normally, btrfs won't use anything from the global reserve, so usage there will be zero. If it's not, that's a very strong indication that your filesystem believes it is very short on space (even if data and metadata say they both have lots of unused space left, for some reason, very likely a bug in that case, the filesystem believes otherwise) and you need to take corrective action immediately, or risk the filesystem effectively going read-only when nothing else can be written. > But still df -h is giving me: > Filesystem Size Used Avail Use% Mounted on > /dev/sdb 6.4T 2.5T 1.5T 63% /mnt/BigVault > > Although I've heard and read about the difficulty in reporting free > space due to the flexibility of BTRFS, snapshots and subvolumes, etc., > but I only have a single volume, no subvolumes, no snapshots, no quotas > and both data and metadata are RAID1. The most practical advice I've seen regarding "normal" df (that is, the one from coreutils, not btrfs fi df) in the case of uneven device sizes in particular, is simply ignore its numbers -- they're not reliable. The only thing you need to be sure of is that it says you have enough space for whatever you're actually doing ATM, since various applications will trust its numbers and may refuse to do whatever filesystem operation at all, if it says there's not enough space. The algorithm reasonably new coreutils df (and the kernel calls it depends on) uses is much better
Re: Help me understand what is going on with my RAID1 FS
Am Sun, 10 Sep 2017 20:15:52 +0200 schrieb Ferenc-Levente Juhos: > >Problem is that each raid1 block group contains two chunks on two > >separate devices, it can't utilize fully three devices no matter > >what. If that doesn't suit you then you need to add 4th disk. After > >that FS will be able to use all unallocated space on all disks in > >raid1 profile. But even then you'll be able to safely lose only one > >disk since BTRFS still will be storing only 2 copies of data. > > I hope I didn't say that I want to utilize all three devices fully. It > was clear to me that there will be 2 TB of wasted space. > Also I'm not questioning the chunk allocator for RAID1 at all. It's > clear and it always has been clear that for RAID1 the chunks need to > be allocated on different physical devices. > If I understood Kai's point of view, he even suggested that I might > need to do balancing to make sure that the free space on the three > devices is being used smartly. Hence the questions about balancing. It will allocate chunks from the device with the most space available. So while you fill your disks space usage will evenly distribute. The problem comes when you start deleting stuff, some chunks may even be freed, and everything becomes messed up. In an aging file system you may notice that the chunks are no longer evenly distributed. A balance is a way to fix that because it will reallocate chunks and coalesce data back into single chunks, making free space for new allocations. In this process it will actually evenly distribute your data again. You may want to use this rebalance script: https://www.spinics.net/lists/linux-btrfs/msg52076.html > I mean in worst case it could happen like this: > > Again I have disks of sizes 3, 3, 8: > Fig.1 > Drive1(8) Drive2(3) Drive3(3) > - X1X1 > - X2X2 > - X3X3 > Here the new drive is completely unused. Even if one X1 chunk would be > on Drive1 it would be still a sub-optimal allocation. This won't happen while filling a fresh btrfs. Chunks are always allocated from a device with most free space (within the raid1 constraints). This it will allocate space alternating between disk1+2 and disk1+3. > This is the optimal allocation. Will btrfs allocate like this? > Considering that Drive1 has the most free space. > Fig. 2 > Drive1(8) Drive2(3) Drive3(3) > X1X1- > X2- X2 > X3X3- > X4- X4 Yes. > From my point of view Fig.2 shows the optimal allocation, by the time > the disks Drive2 and Drive3 are full (3TB) Drive1 must have 6TB > (because it is exclusively holding the mirrors for both Drive2 and 3). > For sure now btrfs can say, since two of the drives are completely > full he can't allocate any more chunks and the remaining 2 TB of space > from Drive1 is wasted. This is clear it's even pointed out by the > btrfs size calculator. Yes. > But again if the above statements are true, then df might as well tell > the "truth" and report that I have 3.5 TB space free and not 1.5TB (as > it is reported now). Again here I fully understand Kai's explanation. > Because coming back to my first e-mail, my "problem" was that df is > reporting 1.5 TB free, whereas the whole FS holds 2.5 TB of data. The size calculator has undergone some revisions. I think it currently estimates the free space from net data to raw data ratio across all devices, taking the current raid constraints into account. Calculating free space in btrfs is difficult because in the future btrfs may even support different raid levels for different sub volumes. It's probably best to calculate for the worst case scenario then. Even today it's already difficult if you use different raid levels for meta data and content data: The filesystem cannot predict the future of allocations. It can only give an educated guess. And the calculation was revised a few times to not "overshoot". > So the question still remains, is it just that df is intentionally not > smart enough to give a more accurate estimation, The df utility doesn't now anything about btrfs allocations. The value is estimated by btrfs itself. To get more detailed info for capacity planning, you should use "btrfs fi df" and its various siblings. > or is the assumption > that the allocator picks the drive with most free space mistaken? > If I continue along the lines of what Kai said, and I need to do > re-balance, because the allocation is not like shown above (Fig.2), > then my question is still legitimate. Are there any filters that one > might use to speed up or to selectively balance in my case? or will I > need to do full balance? Your assumption is misguided. The total free space estimation is a totally different thing than what the allocator bases its decision on. See "btrfs dev usage". The allocator uses space from the biggest unallocated space
Re: Help me understand what is going on with my RAID1 FS
> > Drive1 Drive2Drive3 > > X X > > X X > > X X > > > > Where X is a chunk of raid1 block group. > > But this table clearly shows that adding third drive increases free > space by 50%. You need to reallocate data to actually make use of it, > but it was done in this case. It increases it but I don't see how this space is in any way useful unless data is in single profile. After full balance chunks will be spread over 3 devices, how it helps in raid1 data profile case? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
10.09.2017 19:11, Dmitrii Tcvetkov пишет: >> Actually based on http://carfax.org.uk/btrfs-usage/index.html I >> would've expected 6 TB of usable space. Here I get 6.4 which is odd, >> but that only 1.5 TB is available is even stranger. >> >> Could anyone explain what I did wrong or why my expectations are wrong? >> >> Thank you in advance > > I'd say df and the website calculate different things. In btrfs raid1 profile > stores exactly 2 copies of data, each copy is on separate device. > So by adding third drive, no matter how big, effective free space didn't > expand because btrfs still needs space on any one of other two drives to > store second half of each raid1 chunk stored on that third drive. > > Basically: > > Drive1 Drive2Drive3 > X X > X X > X X > > Where X is a chunk of raid1 block group. But this table clearly shows that adding third drive increases free space by 50%. You need to reallocate data to actually make use of it, but it was done in this case. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
10.09.2017 18:47, Kai Krakow пишет: > Am Sun, 10 Sep 2017 15:45:42 +0200 > schrieb FLJ: > >> Hello all, >> >> I have a BTRFS RAID1 volume running for the past year. I avoided all >> pitfalls known to me that would mess up this volume. I never >> experimented with quotas, no-COW, snapshots, defrag, nothing really. >> The volume is a RAID1 from day 1 and is working reliably until now. >> >> Until yesterday it consisted of two 3 TB drives, something along the >> lines: >> >> Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db >> Total devices 2 FS bytes used 2.47TiB >> devid1 size 2.73TiB used 2.47TiB path /dev/sdb >> devid2 size 2.73TiB used 2.47TiB path /dev/sdc >> >> Yesterday I've added a new drive to the FS and did a full rebalance >> (without filters) over night, which went through without any issues. >> >> Now I have: >> Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db >> Total devices 3 FS bytes used 2.47TiB >> devid1 size 2.73TiB used 1.24TiB path /dev/sdb >> devid2 size 2.73TiB used 1.24TiB path /dev/sdc >> devid3 size 7.28TiB used 2.48TiB path /dev/sda >> >> # btrfs fi df /mnt/BigVault/ >> Data, RAID1: total=2.47TiB, used=2.47TiB >> System, RAID1: total=32.00MiB, used=384.00KiB >> Metadata, RAID1: total=4.00GiB, used=2.74GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >> But still df -h is giving me: >> Filesystem Size Used Avail Use% Mounted on >> /dev/sdb 6.4T 2.5T 1.5T 63% /mnt/BigVault >> >> Although I've heard and read about the difficulty in reporting free >> space due to the flexibility of BTRFS, snapshots and subvolumes, etc., >> but I only have a single volume, no subvolumes, no snapshots, no >> quotas and both data and metadata are RAID1. >> >> My expectation would've been that in case of BigVault Size == Used + >> Avail. >> >> Actually based on http://carfax.org.uk/btrfs-usage/index.html I >> would've expected 6 TB of usable space. Here I get 6.4 which is odd, Total size is estimation which in this case is computed as (sum of device sizes)/2 which is approximately 6.4TiB. >> but that only 1.5 TB is available is even stranger. >> >> Could anyone explain what I did wrong or why my expectations are >> wrong? >> >> Thank you in advance > > Btrfs reports estimated free space from the free space of the smallest > member as it can only guarantee that. It's not exactly true. For three devices with free space of 1TiB, 2TiB and 3TiB it would return 2TiB as available space. But it is not sophisticated enough to notice that it actually has 3TiB available. I wonder if this is only free space calculation or actual allocation algorithm behaves similar (effectively ignoring part of available space). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
>Problem is that each raid1 block group contains two chunks on two >separate devices, it can't utilize fully three devices no matter what. >If that doesn't suit you then you need to add 4th disk. After >that FS will be able to use all unallocated space on all disks in raid1 >profile. But even then you'll be able to safely lose only one disk >since BTRFS still will be storing only 2 copies of data. I hope I didn't say that I want to utilize all three devices fully. It was clear to me that there will be 2 TB of wasted space. Also I'm not questioning the chunk allocator for RAID1 at all. It's clear and it always has been clear that for RAID1 the chunks need to be allocated on different physical devices. If I understood Kai's point of view, he even suggested that I might need to do balancing to make sure that the free space on the three devices is being used smartly. Hence the questions about balancing. I mean in worst case it could happen like this: Again I have disks of sizes 3, 3, 8: Fig.1 Drive1(8) Drive2(3) Drive3(3) - X1X1 - X2X2 - X3X3 Here the new drive is completely unused. Even if one X1 chunk would be on Drive1 it would be still a sub-optimal allocation. This is the optimal allocation. Will btrfs allocate like this? Considering that Drive1 has the most free space. Fig. 2 Drive1(8) Drive2(3) Drive3(3) X1X1- X2- X2 X3X3- X4- X4 >From my point of view Fig.2 shows the optimal allocation, by the time the disks Drive2 and Drive3 are full (3TB) Drive1 must have 6TB (because it is exclusively holding the mirrors for both Drive2 and 3). For sure now btrfs can say, since two of the drives are completely full he can't allocate any more chunks and the remaining 2 TB of space from Drive1 is wasted. This is clear it's even pointed out by the btrfs size calculator. But again if the above statements are true, then df might as well tell the "truth" and report that I have 3.5 TB space free and not 1.5TB (as it is reported now). Again here I fully understand Kai's explanation. Because coming back to my first e-mail, my "problem" was that df is reporting 1.5 TB free, whereas the whole FS holds 2.5 TB of data. So the question still remains, is it just that df is intentionally not smart enough to give a more accurate estimation, or is the assumption that the allocator picks the drive with most free space mistaken? If I continue along the lines of what Kai said, and I need to do re-balance, because the allocation is not like shown above (Fig.2), then my question is still legitimate. Are there any filters that one might use to speed up or to selectively balance in my case? or will I need to do full balance? On Sun, Sep 10, 2017 at 7:19 PM, Dmitrii Tcvetkovwrote: >> @Kai and Dmitrii >> thank you for your explanations if I understand you correctly, you're >> saying that btrfs makes no attempt to "optimally" use the physical >> devices it has in the FS, once a new RAID1 block group needs to be >> allocated it will semi-randomly pick two devices with enough space and >> allocate two equal sized chunks, one on each. This new chunk may or >> may not fall onto my newly added 8 TB drive. Am I understanding this >> correctly? > If I remember correctly chunk allocator allocates new chunks on device > which has the most unallocated space. > >> Is there some sort of balance filter that would speed up this sort of >> balancing? Will balance be smart enough to make the "right" decision? >> As far as I read the chunk allocator used during balance is the same >> that is used during normal operation. If the allocator is already >> sub-optimal during normal operations, what's the guarantee that it >> will make a "better" decision during balancing? > > I don't really see any way that being possible in raid1 profile. How > can you fill all three devices if you can split data only twice? There > will be moment when two of three disks are full and BTRFS can't > allocate new raid1 block group because it has only one drive with > unallocated space. > >> >> When I say "right" and "better" I mean this: >> Drive1(8) Drive2(3) Drive3(3) >> X1X1 >> X2X2 >> X3X3 >> X4X4 >> I was convinced until now that the chunk allocator at least tries a >> best possible allocation. I'm sure it's complicated to develop a >> generic algorithm to fit all setups, but it should be possible. > > > Problem is that each raid1 block group contains two chunks on two > separate devices, it can't utilize fully three devices no matter what. > If that doesn't suit you then you need to add 4th disk. After > that FS will be able to use all unallocated space on all disks in raid1 > profile. But even then you'll be able to safely lose only one disk > since BTRFS still will be storing only 2
Re: Help me understand what is going on with my RAID1 FS
> @Kai and Dmitrii > thank you for your explanations if I understand you correctly, you're > saying that btrfs makes no attempt to "optimally" use the physical > devices it has in the FS, once a new RAID1 block group needs to be > allocated it will semi-randomly pick two devices with enough space and > allocate two equal sized chunks, one on each. This new chunk may or > may not fall onto my newly added 8 TB drive. Am I understanding this > correctly? If I remember correctly chunk allocator allocates new chunks on device which has the most unallocated space. > Is there some sort of balance filter that would speed up this sort of > balancing? Will balance be smart enough to make the "right" decision? > As far as I read the chunk allocator used during balance is the same > that is used during normal operation. If the allocator is already > sub-optimal during normal operations, what's the guarantee that it > will make a "better" decision during balancing? I don't really see any way that being possible in raid1 profile. How can you fill all three devices if you can split data only twice? There will be moment when two of three disks are full and BTRFS can't allocate new raid1 block group because it has only one drive with unallocated space. > > When I say "right" and "better" I mean this: > Drive1(8) Drive2(3) Drive3(3) > X1X1 > X2X2 > X3X3 > X4X4 > I was convinced until now that the chunk allocator at least tries a > best possible allocation. I'm sure it's complicated to develop a > generic algorithm to fit all setups, but it should be possible. Problem is that each raid1 block group contains two chunks on two separate devices, it can't utilize fully three devices no matter what. If that doesn't suit you then you need to add 4th disk. After that FS will be able to use all unallocated space on all disks in raid1 profile. But even then you'll be able to safely lose only one disk since BTRFS still will be storing only 2 copies of data. This behavior is not relevant for single or raid0 profiles of multidevice BTRFS filesystems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
@Kai and Dmitrii thank you for your explanations if I understand you correctly, you're saying that btrfs makes no attempt to "optimally" use the physical devices it has in the FS, once a new RAID1 block group needs to be allocated it will semi-randomly pick two devices with enough space and allocate two equal sized chunks, one on each. This new chunk may or may not fall onto my newly added 8 TB drive. Am I understanding this correctly? > You will probably need to >run balance once in a while to evenly redistribute allocated chunks >across all disks. Is there some sort of balance filter that would speed up this sort of balancing? Will balance be smart enough to make the "right" decision? As far as I read the chunk allocator used during balance is the same that is used during normal operation. If the allocator is already sub-optimal during normal operations, what's the guarantee that it will make a "better" decision during balancing? When I say "right" and "better" I mean this: Drive1(8) Drive2(3) Drive3(3) X1X1 X2X2 X3X3 X4X4 I was convinced until now that the chunk allocator at least tries a best possible allocation. I'm sure it's complicated to develop a generic algorithm to fit all setups, but it should be possible. On Sun, Sep 10, 2017 at 5:47 PM, Kai Krakowwrote: > Am Sun, 10 Sep 2017 15:45:42 +0200 > schrieb FLJ : > >> Hello all, >> >> I have a BTRFS RAID1 volume running for the past year. I avoided all >> pitfalls known to me that would mess up this volume. I never >> experimented with quotas, no-COW, snapshots, defrag, nothing really. >> The volume is a RAID1 from day 1 and is working reliably until now. >> >> Until yesterday it consisted of two 3 TB drives, something along the >> lines: >> >> Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db >> Total devices 2 FS bytes used 2.47TiB >> devid1 size 2.73TiB used 2.47TiB path /dev/sdb >> devid2 size 2.73TiB used 2.47TiB path /dev/sdc >> >> Yesterday I've added a new drive to the FS and did a full rebalance >> (without filters) over night, which went through without any issues. >> >> Now I have: >> Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db >> Total devices 3 FS bytes used 2.47TiB >> devid1 size 2.73TiB used 1.24TiB path /dev/sdb >> devid2 size 2.73TiB used 1.24TiB path /dev/sdc >> devid3 size 7.28TiB used 2.48TiB path /dev/sda >> >> # btrfs fi df /mnt/BigVault/ >> Data, RAID1: total=2.47TiB, used=2.47TiB >> System, RAID1: total=32.00MiB, used=384.00KiB >> Metadata, RAID1: total=4.00GiB, used=2.74GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >> But still df -h is giving me: >> Filesystem Size Used Avail Use% Mounted on >> /dev/sdb 6.4T 2.5T 1.5T 63% /mnt/BigVault >> >> Although I've heard and read about the difficulty in reporting free >> space due to the flexibility of BTRFS, snapshots and subvolumes, etc., >> but I only have a single volume, no subvolumes, no snapshots, no >> quotas and both data and metadata are RAID1. >> >> My expectation would've been that in case of BigVault Size == Used + >> Avail. >> >> Actually based on http://carfax.org.uk/btrfs-usage/index.html I >> would've expected 6 TB of usable space. Here I get 6.4 which is odd, >> but that only 1.5 TB is available is even stranger. >> >> Could anyone explain what I did wrong or why my expectations are >> wrong? >> >> Thank you in advance > > Btrfs reports estimated free space from the free space of the smallest > member as it can only guarantee that. In your case this is 2.73 minus > 1.24 free which is roughly around 1.5T. But since this free space > distributes across three disks with one having much more free space, it > probably will use up that space at half the rate of actual allocation. > But due to how btrfs allocates from free space in chunks, that may not > be possible - thus the low unexpected value. You will probably need to > run balance once in a while to evenly redistribute allocated chunks > across all disks. > > It may give you better estimates if you combine sdb and sdc into one > logical device, e.g. using raid0 or jbod via md or lvm. > > > -- > Regards, > Kai > > Replies to list-only preferred. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
>Actually based on http://carfax.org.uk/btrfs-usage/index.html I >would've expected 6 TB of usable space. Here I get 6.4 which is odd, >but that only 1.5 TB is available is even stranger. > >Could anyone explain what I did wrong or why my expectations are wrong? > >Thank you in advance I'd say df and the website calculate different things. In btrfs raid1 profile stores exactly 2 copies of data, each copy is on separate device. So by adding third drive, no matter how big, effective free space didn't expand because btrfs still needs space on any one of other two drives to store second half of each raid1 chunk stored on that third drive. Basically: Drive1 Drive2Drive3 X X X X X X Where X is a chunk of raid1 block group. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help me understand what is going on with my RAID1 FS
Am Sun, 10 Sep 2017 15:45:42 +0200 schrieb FLJ: > Hello all, > > I have a BTRFS RAID1 volume running for the past year. I avoided all > pitfalls known to me that would mess up this volume. I never > experimented with quotas, no-COW, snapshots, defrag, nothing really. > The volume is a RAID1 from day 1 and is working reliably until now. > > Until yesterday it consisted of two 3 TB drives, something along the > lines: > > Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db > Total devices 2 FS bytes used 2.47TiB > devid1 size 2.73TiB used 2.47TiB path /dev/sdb > devid2 size 2.73TiB used 2.47TiB path /dev/sdc > > Yesterday I've added a new drive to the FS and did a full rebalance > (without filters) over night, which went through without any issues. > > Now I have: > Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db > Total devices 3 FS bytes used 2.47TiB > devid1 size 2.73TiB used 1.24TiB path /dev/sdb > devid2 size 2.73TiB used 1.24TiB path /dev/sdc > devid3 size 7.28TiB used 2.48TiB path /dev/sda > > # btrfs fi df /mnt/BigVault/ > Data, RAID1: total=2.47TiB, used=2.47TiB > System, RAID1: total=32.00MiB, used=384.00KiB > Metadata, RAID1: total=4.00GiB, used=2.74GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > But still df -h is giving me: > Filesystem Size Used Avail Use% Mounted on > /dev/sdb 6.4T 2.5T 1.5T 63% /mnt/BigVault > > Although I've heard and read about the difficulty in reporting free > space due to the flexibility of BTRFS, snapshots and subvolumes, etc., > but I only have a single volume, no subvolumes, no snapshots, no > quotas and both data and metadata are RAID1. > > My expectation would've been that in case of BigVault Size == Used + > Avail. > > Actually based on http://carfax.org.uk/btrfs-usage/index.html I > would've expected 6 TB of usable space. Here I get 6.4 which is odd, > but that only 1.5 TB is available is even stranger. > > Could anyone explain what I did wrong or why my expectations are > wrong? > > Thank you in advance Btrfs reports estimated free space from the free space of the smallest member as it can only guarantee that. In your case this is 2.73 minus 1.24 free which is roughly around 1.5T. But since this free space distributes across three disks with one having much more free space, it probably will use up that space at half the rate of actual allocation. But due to how btrfs allocates from free space in chunks, that may not be possible - thus the low unexpected value. You will probably need to run balance once in a while to evenly redistribute allocated chunks across all disks. It may give you better estimates if you combine sdb and sdc into one logical device, e.g. using raid0 or jbod via md or lvm. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help me understand what is going on with my RAID1 FS
Hello all, I have a BTRFS RAID1 volume running for the past year. I avoided all pitfalls known to me that would mess up this volume. I never experimented with quotas, no-COW, snapshots, defrag, nothing really. The volume is a RAID1 from day 1 and is working reliably until now. Until yesterday it consisted of two 3 TB drives, something along the lines: Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db Total devices 2 FS bytes used 2.47TiB devid1 size 2.73TiB used 2.47TiB path /dev/sdb devid2 size 2.73TiB used 2.47TiB path /dev/sdc Yesterday I've added a new drive to the FS and did a full rebalance (without filters) over night, which went through without any issues. Now I have: Label: 'BigVault' uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db Total devices 3 FS bytes used 2.47TiB devid1 size 2.73TiB used 1.24TiB path /dev/sdb devid2 size 2.73TiB used 1.24TiB path /dev/sdc devid3 size 7.28TiB used 2.48TiB path /dev/sda # btrfs fi df /mnt/BigVault/ Data, RAID1: total=2.47TiB, used=2.47TiB System, RAID1: total=32.00MiB, used=384.00KiB Metadata, RAID1: total=4.00GiB, used=2.74GiB GlobalReserve, single: total=512.00MiB, used=0.00B But still df -h is giving me: Filesystem Size Used Avail Use% Mounted on /dev/sdb 6.4T 2.5T 1.5T 63% /mnt/BigVault Although I've heard and read about the difficulty in reporting free space due to the flexibility of BTRFS, snapshots and subvolumes, etc., but I only have a single volume, no subvolumes, no snapshots, no quotas and both data and metadata are RAID1. My expectation would've been that in case of BigVault Size == Used + Avail. Actually based on http://carfax.org.uk/btrfs-usage/index.html I would've expected 6 TB of usable space. Here I get 6.4 which is odd, but that only 1.5 TB is available is even stranger. Could anyone explain what I did wrong or why my expectations are wrong? Thank you in advance -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 10 September 2017 at 08:33, Marat Khaliliwrote: > It doesn't need replaced disk to be readable, right? Only enough to be mountable, which it already is, so your read errors on /dev/sdb isn't a problem. > Then what prevents same procedure to work without a spare bay? It is basically the same procedure but with a bunch of gotchas due to bugs and odd behaviour. Only having one shot at it, before it can only be mounted read-only, is especially problematic (will be fixed in Linux 4.14). > -- > > With Best Regards, > Marat Khalili > > On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquist > wrote: >>On 9 September 2017 at 12:05, Marat Khalili wrote: >>> Forgot to add, I've got a spare empty bay if it can be useful here. >> >>That makes it much easier since you don't have to mount it degraded, >>with the risks involved. >> >>Add and partition the disk. >> >># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data >> >>Remove the old disk when it is done. >> >>> -- >>> >>> With Best Regards, >>> Marat Khalili >>> >>> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili >> wrote: Dear list, I'm going to replace one hard drive (partition actually) of a btrfs raid1. Can you please spell exactly what I need to do in order to get my filesystem working as RAID1 again after replacement, exactly as it >>was before? I saw some bad examples of drive replacement in this list so >>I afraid to just follow random instructions on wiki, and putting this system out of action even temporarily would be very inconvenient. For this filesystem: > $ sudo btrfs fi show /dev/sdb7 > Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 > Total devices 2 FS bytes used 106.23GiB > devid1 size 2.71TiB used 126.01GiB path /dev/sda7 > devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 > $ grep /mnt/data /proc/mounts > /dev/sda7 /mnt/data btrfs > rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 > $ sudo btrfs fi df /mnt/data > Data, RAID1: total=123.00GiB, used=104.57GiB > System, RAID1: total=8.00MiB, used=48.00KiB > Metadata, RAID1: total=3.00GiB, used=1.67GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > $ uname -a > Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux I've got this in dmesg: > [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 > action 0x0 > [ +0.51] ata6.00: irq_stat 0x4008 > [ +0.29] ata6.00: failed command: READ FPDMA QUEUED > [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag >>3 > ncq 57344 in >res 41/40:00:68:6c:f3/00:00:79:00:00/40 >>Emask > 0x409 (media error) > [ +0.94] ata6.00: status: { DRDY ERR } > [ +0.26] ata6.00: error: { UNC } > [ +0.001195] ata6.00: configured for UDMA/133 > [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: >>hostbyte=DID_OK > driverbyte=DRIVER_SENSE > [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error > [current] [descriptor] > [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read > error - auto reallocate failed > [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 >>00 > 79 f3 6c 50 00 00 00 70 00 00 > [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 > [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd > 1, flush 0, corrupt 0, gen 0 > [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd > 2, flush 0, corrupt 0, gen 0 > [ +0.77] ata6: EH complete There's still 1 in Current_Pending_Sector line of smartctl output as >>of now, so it probably won't heal by itself. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe >>linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>-- >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
It doesn't need replaced disk to be readable, right? Then what prevents same procedure to work without a spare bay? -- With Best Regards, Marat Khalili On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquistwrote: >On 9 September 2017 at 12:05, Marat Khalili wrote: >> Forgot to add, I've got a spare empty bay if it can be useful here. > >That makes it much easier since you don't have to mount it degraded, >with the risks involved. > >Add and partition the disk. > ># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data > >Remove the old disk when it is done. > >> -- >> >> With Best Regards, >> Marat Khalili >> >> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili > wrote: >>>Dear list, >>> >>>I'm going to replace one hard drive (partition actually) of a btrfs >>>raid1. Can you please spell exactly what I need to do in order to get >>>my >>>filesystem working as RAID1 again after replacement, exactly as it >was >>>before? I saw some bad examples of drive replacement in this list so >I >>>afraid to just follow random instructions on wiki, and putting this >>>system out of action even temporarily would be very inconvenient. >>> >>>For this filesystem: >>> $ sudo btrfs fi show /dev/sdb7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 106.23GiB devid1 size 2.71TiB used 126.01GiB path /dev/sda7 devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 $ grep /mnt/data /proc/mounts /dev/sda7 /mnt/data btrfs rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 $ sudo btrfs fi df /mnt/data Data, RAID1: total=123.00GiB, used=104.57GiB System, RAID1: total=8.00MiB, used=48.00KiB Metadata, RAID1: total=3.00GiB, used=1.67GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ uname -a Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>>I've got this in dmesg: >>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 0x0 [ +0.51] ata6.00: irq_stat 0x4008 [ +0.29] ata6.00: failed command: READ FPDMA QUEUED [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag >3 ncq 57344 in res 41/40:00:68:6c:f3/00:00:79:00:00/40 >Emask 0x409 (media error) [ +0.94] ata6.00: status: { DRDY ERR } [ +0.26] ata6.00: error: { UNC } [ +0.001195] ata6.00: configured for UDMA/133 [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: >hostbyte=DID_OK driverbyte=DRIVER_SENSE [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] [descriptor] [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 >00 >>> 79 f3 6c 50 00 00 00 70 00 00 [ +0.03] blk_update_request: I/O error, dev sdb, sector >>>2045996136 [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>>rd 1, flush 0, corrupt 0, gen 0 [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>>rd 2, flush 0, corrupt 0, gen 0 [ +0.77] ata6: EH complete >>> >>>There's still 1 in Current_Pending_Sector line of smartctl output as >of >>> >>>now, so it probably won't heal by itself. >>> >>>-- >>> >>>With Best Regards, >>>Marat Khalili >>>-- >>>To unsubscribe from this list: send the line "unsubscribe >linux-btrfs" >>>in >>>the body of a message to majord...@vger.kernel.org >>>More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe >linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Patrik Lundquist posted on Sat, 09 Sep 2017 12:29:08 +0200 as excerpted: > On 9 September 2017 at 12:05, Marat Khaliliwrote: >> Forgot to add, I've got a spare empty bay if it can be useful here. > > That makes it much easier since you don't have to mount it degraded, > with the risks involved. > > Add and partition the disk. > > # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data > > Remove the old disk when it is done. I did this with my dozen-plus (but small) btrfs raid1s on ssd partitions several kernel cycles ago. It went very smoothly. =:^) (TL;DR can stop there.) I had actually been taking advantage of btrfs raid1's checksumming and scrub ability to continue running a failing ssd, with more and more sectors going bad and being replaced from spares, for quite some time after I'd have otherwise replaced it. Everything of value was backed up, and I was simply doing it for the experience with both btrfs raid1 scrubbing and continuing ssd sector failure. But eventually the scrubs were finding and fixing errors every boot, especially when off for several hours, and further experience was of diminishing value while the hassle factor was building fast, so I attached the spare ssd, partitioned it up, did a final scrub on all the btrfs, and then one btrfs at a time btrfs replaced the devices from the old ssd's partitions to the new one's partitions. Given that I was already used to running scrubs at every boot, the entirely uneventful replacements were actually somewhat anticlimactic, but that was a good thing! =:^) Then more recently I bought a larger/newer pair of ssds (1 TB each, the old ones were quarter TB each) and converted my media partitions and secondary backups, which had still been on reiserfs on spinning rust, to btrfs raid1 on ssd as well, making me all-btrfs on all-ssd now, with everything but /boot and its backups on the other ssds being btrfs raid1, and /boot and its backups being btrfs dup. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 9 September 2017 at 12:05, Marat Khaliliwrote: > Forgot to add, I've got a spare empty bay if it can be useful here. That makes it much easier since you don't have to mount it degraded, with the risks involved. Add and partition the disk. # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data Remove the old disk when it is done. > -- > > With Best Regards, > Marat Khalili > > On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili wrote: >>Dear list, >> >>I'm going to replace one hard drive (partition actually) of a btrfs >>raid1. Can you please spell exactly what I need to do in order to get >>my >>filesystem working as RAID1 again after replacement, exactly as it was >>before? I saw some bad examples of drive replacement in this list so I >>afraid to just follow random instructions on wiki, and putting this >>system out of action even temporarily would be very inconvenient. >> >>For this filesystem: >> >>> $ sudo btrfs fi show /dev/sdb7 >>> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >>> Total devices 2 FS bytes used 106.23GiB >>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >>> $ grep /mnt/data /proc/mounts >>> /dev/sda7 /mnt/data btrfs >>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >>> $ sudo btrfs fi df /mnt/data >>> Data, RAID1: total=123.00GiB, used=104.57GiB >>> System, RAID1: total=8.00MiB, used=48.00KiB >>> Metadata, RAID1: total=3.00GiB, used=1.67GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> $ uname -a >>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC >>> 2017 x86_64 x86_64 x86_64 GNU/Linux >> >>I've got this in dmesg: >> >>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 >>> action 0x0 >>> [ +0.51] ata6.00: irq_stat 0x4008 >>> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >>> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 >>> ncq 57344 in >>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask >>> 0x409 (media error) >>> [ +0.94] ata6.00: status: { DRDY ERR } >>> [ +0.26] ata6.00: error: { UNC } >>> [ +0.001195] ata6.00: configured for UDMA/133 >>> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >>> driverbyte=DRIVER_SENSE >>> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error >>> [current] [descriptor] >>> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read >>> error - auto reallocate failed >>> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 >> >>> 79 f3 6c 50 00 00 00 70 00 00 >>> [ +0.03] blk_update_request: I/O error, dev sdb, sector >>2045996136 >>> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>rd >>> 1, flush 0, corrupt 0, gen 0 >>> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>rd >>> 2, flush 0, corrupt 0, gen 0 >>> [ +0.77] ata6: EH complete >> >>There's still 1 in Current_Pending_Sector line of smartctl output as of >> >>now, so it probably won't heal by itself. >> >>-- >> >>With Best Regards, >>Marat Khalili >>-- >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Forgot to add, I've got a spare empty bay if it can be useful here. -- With Best Regards, Marat Khalili On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khaliliwrote: >Dear list, > >I'm going to replace one hard drive (partition actually) of a btrfs >raid1. Can you please spell exactly what I need to do in order to get >my >filesystem working as RAID1 again after replacement, exactly as it was >before? I saw some bad examples of drive replacement in this list so I >afraid to just follow random instructions on wiki, and putting this >system out of action even temporarily would be very inconvenient. > >For this filesystem: > >> $ sudo btrfs fi show /dev/sdb7 >> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >> Total devices 2 FS bytes used 106.23GiB >> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >> $ grep /mnt/data /proc/mounts >> /dev/sda7 /mnt/data btrfs >> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >> $ sudo btrfs fi df /mnt/data >> Data, RAID1: total=123.00GiB, used=104.57GiB >> System, RAID1: total=8.00MiB, used=48.00KiB >> Metadata, RAID1: total=3.00GiB, used=1.67GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> $ uname -a >> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC >> 2017 x86_64 x86_64 x86_64 GNU/Linux > >I've got this in dmesg: > >> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 >> action 0x0 >> [ +0.51] ata6.00: irq_stat 0x4008 >> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 >> ncq 57344 in >>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask >> 0x409 (media error) >> [ +0.94] ata6.00: status: { DRDY ERR } >> [ +0.26] ata6.00: error: { UNC } >> [ +0.001195] ata6.00: configured for UDMA/133 >> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error >> [current] [descriptor] >> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read >> error - auto reallocate failed >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 > >> 79 f3 6c 50 00 00 00 70 00 00 >> [ +0.03] blk_update_request: I/O error, dev sdb, sector >2045996136 >> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >rd >> 1, flush 0, corrupt 0, gen 0 >> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >rd >> 2, flush 0, corrupt 0, gen 0 >> [ +0.77] ata6: EH complete > >There's still 1 in Current_Pending_Sector line of smartctl output as of > >now, so it probably won't heal by itself. > >-- > >With Best Regards, >Marat Khalili >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 9 September 2017 at 09:46, Marat Khaliliwrote: > > Dear list, > > I'm going to replace one hard drive (partition actually) of a btrfs raid1. > Can you please spell exactly what I need to do in order to get my filesystem > working as RAID1 again after replacement, exactly as it was before? I saw > some bad examples of drive replacement in this list so I afraid to just > follow random instructions on wiki, and putting this system out of action > even temporarily would be very inconvenient. I recently replaced both disks in a two disk Btrfs raid1 to increase capacity and took some notes. Using systemd? systemd will automatically unmount a degraded disk and ruin your one chance to replace the disk as long as Btrfs has the bug where it notes single chunks and one disk missing and refuses to mount degraded again. Comment out your mount in fstab and run "systemctl daemon-reload". The mount file in /var/run/systemd/generator/ will be removed. (Is there a better way?) Unmount the volume. # hdparm -Y /dev/sdb # echo 1 > /sys/block/sdb/device/delete Replace the disk. Create partitions etc. You might have to restart smartd, if using it. Make Btrfs forget the old device. Will otherwise think the old disk is still there. (Is there a better way?) # rmmod btrfs; modprobe btrfs # btrfs device scan # mount -o degraded /dev/sda7 /mnt/data # btrfs device usage /mnt/data # btrfs replace start /dev/sdbX /mnt/data # btrfs replace status /mnt/data Convert single or dup chunks to raid1 # btrfs balance start -fv -dconvert=raid1,soft -mconvert=raid1,soft -sconvert=raid1,soft /mnt/data Unmount, restore fstab, reload systemd again, mount. > > For this filesystem: > >> $ sudo btrfs fi show /dev/sdb7 >> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >> Total devices 2 FS bytes used 106.23GiB >> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >> $ grep /mnt/data /proc/mounts >> /dev/sda7 /mnt/data btrfs >> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >> $ sudo btrfs fi df /mnt/data >> Data, RAID1: total=123.00GiB, used=104.57GiB >> System, RAID1: total=8.00MiB, used=48.00KiB >> Metadata, RAID1: total=3.00GiB, used=1.67GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> $ uname -a >> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 >> x86_64 x86_64 x86_64 GNU/Linux > > > I've got this in dmesg: > >> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action >> 0x0 >> [ +0.51] ata6.00: irq_stat 0x4008 >> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq >> 57344 in >>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 >> (media error) >> [ +0.94] ata6.00: status: { DRDY ERR } >> [ +0.26] ata6.00: error: { UNC } >> [ +0.001195] ata6.00: configured for UDMA/133 >> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] >> [descriptor] >> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - >> auto reallocate failed >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 >> 6c 50 00 00 00 70 00 00 >> [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 >> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, >> flush 0, corrupt 0, gen 0 >> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, >> flush 0, corrupt 0, gen 0 >> [ +0.77] ata6: EH complete > > > There's still 1 in Current_Pending_Sector line of smartctl output as of now, > so it probably won't heal by itself. > > -- > > With Best Regards, > Marat Khalili > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please help with exact actions for raid1 hot-swap
Dear list, I'm going to replace one hard drive (partition actually) of a btrfs raid1. Can you please spell exactly what I need to do in order to get my filesystem working as RAID1 again after replacement, exactly as it was before? I saw some bad examples of drive replacement in this list so I afraid to just follow random instructions on wiki, and putting this system out of action even temporarily would be very inconvenient. For this filesystem: $ sudo btrfs fi show /dev/sdb7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 106.23GiB devid1 size 2.71TiB used 126.01GiB path /dev/sda7 devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 $ grep /mnt/data /proc/mounts /dev/sda7 /mnt/data btrfs rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 $ sudo btrfs fi df /mnt/data Data, RAID1: total=123.00GiB, used=104.57GiB System, RAID1: total=8.00MiB, used=48.00KiB Metadata, RAID1: total=3.00GiB, used=1.67GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ uname -a Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux I've got this in dmesg: [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 0x0 [ +0.51] ata6.00: irq_stat 0x4008 [ +0.29] ata6.00: failed command: READ FPDMA QUEUED [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq 57344 in res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 (media error) [ +0.94] ata6.00: status: { DRDY ERR } [ +0.26] ata6.00: error: { UNC } [ +0.001195] ata6.00: configured for UDMA/133 [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] [descriptor] [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 6c 50 00 00 00 70 00 00 [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 [ +0.77] ata6: EH complete There's still 1 in Current_Pending_Sector line of smartctl output as of now, so it probably won't heal by itself. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help. Repair probably bitflip damage and suspected bug
> [Sun Jun 18 04:02:43 2017] BTRFS critical (device sdb2): corrupt node, bad key order: block=5123372711936, root=1, slot=82 >From the archives, most likely it's bad RAM. I see this system also uses XFS v4 file system, if it were made as XFS v5 using metadata csums you'd probably eventually run into a similar problem that would be caught by metadata checksum errors. It'll fail faster with Btrfs because it's checksumming everything. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
Thanks Мяу! I will ensure I reply all :) On 19 June 2017 at 23:38, Adam Borowski <kilob...@angband.pl> wrote: > On Mon, Jun 19, 2017 at 12:48:54PM +0300, Ivan Sizov wrote: >> 2017-06-19 12:32 GMT+03:00 Jesse <btrfs_mail_l...@mymail.isbest.biz>: >> > So I guess that means when I initiate a post, I also need to send it >> > to myself as well as the mail list. >> You need to do it in the reply only, not in the initial post. >> >> > Does it make any difference where I put respective addresses, eg: TO: CC: >> > BCC: >> You need to put a person to whom you reply in "TO" field and mailing >> list in "CC" field. > > Any mail client I know (but I haven't looked at many...) can do all of this > by "Reply All" (a button by that name in Thunderbird, 'g' in mutt, ...). > > It's worth noting that vger lists have rules different to those in most of > Free Software communities: on vger, you're supposed to send copies to > everyone -- pretty much everywhere else you are expected to send to the list > only. This is done by "Reply List" (in Thunderbird, 'L' in mutt, ...). > Such lists do add a set of "List-*:" headers that help the client. > > > Мяу! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can. > ⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener. > ⠈⠳⣄ A master species delegates. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
On Mon, Jun 19, 2017 at 12:48:54PM +0300, Ivan Sizov wrote: > 2017-06-19 12:32 GMT+03:00 Jesse <btrfs_mail_l...@mymail.isbest.biz>: > > So I guess that means when I initiate a post, I also need to send it > > to myself as well as the mail list. > You need to do it in the reply only, not in the initial post. > > > Does it make any difference where I put respective addresses, eg: TO: CC: > > BCC: > You need to put a person to whom you reply in "TO" field and mailing > list in "CC" field. Any mail client I know (but I haven't looked at many...) can do all of this by "Reply All" (a button by that name in Thunderbird, 'g' in mutt, ...). It's worth noting that vger lists have rules different to those in most of Free Software communities: on vger, you're supposed to send copies to everyone -- pretty much everywhere else you are expected to send to the list only. This is done by "Reply List" (in Thunderbird, 'L' in mutt, ...). Such lists do add a set of "List-*:" headers that help the client. Мяу! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can. ⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener. ⠈⠳⣄ A master species delegates. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help. Repair probably bitflip damage and suspected bug
I just noticed a series of seemingly btrfs related call traces that for the first time, did not lock up the system. I have uploaded dmesg to https://paste.ee/p/An8Qy Anyone able to help advise on these? Thanks Jesse On 19 June 2017 at 17:19, Jesse <btrfs_mail_l...@mymail.isbest.biz> wrote: > Further to the above message reporting problems, I have been able to > capture a call trace under the main system rather than live media. > > Note this occurred in rsync from btrfs to a separate drive running xfs > on a local filesystem (both sata drives). So I presume that btrfs is > only reading the drive at the time of crash, unless rsync is also > doing some sort of disc caching of the files to btrfs as it is the OS > filesystem. > > The destination drive directories being copied to in this case were > empty, so I was making a copy of the data off of the btrfs drive (due > to the btrfs tree errors and problems reported in the post I am here > replying to). > > I am suspecting that there is a direct correlation to using rsync > while (or subsequent to) touching areas of the btrfs tree that have > corruption which results in a complete system lockup/crash. > > I have also noted that when these crashes while running rsync occur, > the prior x files (eg: 10 files) show in the rsync log as being > synced, however, show on the destination drive with filesize of zero. > > The trace (/var/log/messages | grep btrfs) I have uploaded to > https://paste.ee/p/nRcj0 > > The important part of which is: > > Jun 18 23:43:24 Orion vmunix: [38084.183174] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.183195] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.183209] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.183222] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.217552] BTRFS info (device sda2): > csum failed ino 12497 extent 1700305813504 csum 1405070872 wanted 0 > mirror 0 > Jun 18 23:43:24 Orion vmunix: [38084.217626] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.217643] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.217657] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: [38084.217669] BTRFS info (device sda2): > no csum found for inode 12497 start 0 > Jun 18 23:43:24 Orion vmunix: auth_rpcgss nfs_acl nfs lockd grace > sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) > spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log > hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm > drm_kms_helper drm r8169 ahci mii libahci wmi > Jun 18 23:43:24 Orion vmunix: [38084.220604] Workqueue: btrfs-endio > btrfs_endio_helper [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.220812] RIP: > 0010:[] [] > __btrfs_map_block+0x32a/0x1180 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.222459] [] ? > __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.222632] [] > btrfs_map_bio+0x7d/0x2b0 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.222781] [] > btrfs_submit_compressed_read+0x484/0x4e0 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.222948] [] > btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.223198] [] ? > btrfs_create_repair_bio+0xf0/0x110 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.223360] [] > bio_readpage_error+0x117/0x180 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.223514] [] ? > clean_io_failure+0x1b0/0x1b0 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.223667] [] > end_bio_extent_readpage+0x3be/0x3f0 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.223996] [] > end_workqueue_fn+0x48/0x60 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.224145] [] > normal_work_helper+0x82/0x210 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.224297] [] > btrfs_endio_helper+0x12/0x20 [btrfs] > Jun 18 23:43:24 Orion vmunix: auth_rpcgss nfs_acl nfs lockd grace > sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) > spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log > hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm > drm_kms_helper drm r8169 ahci mii libahci wmi > Jun 18 23:43:24 Orion vmunix: [38084.330053] [] ? > __btrfs_map_block+0x32a/0x1180 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.330106] [] ? > __btrfs_map_block+0x2cc/0x1180 [btrfs] > Jun 18 23:43:24 Orion vmunix: [38084.330154] [] ? > __btrfs_loo
Re: Help on using linux-btrfs mailing list please
2017-06-19 13:15 GMT+03:00 Jesse: > Thanks again. So am I to understand that you go into your 'sent' > folder, find a mail to the mail list (that is not CC to yourself), > then you reply to this and add the mail list when you need to update > your own post that no-one has yet replied to? Yes, exactly. -- Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
2017-06-19 13:03 GMT+03:00 Jesse: > Thanks Ivan. > What about when initiating a post, do I do the same eg: > TO: myself > CC: mailing list > > or do I > TO: mailing list > CC: myself If your mail client doesn't have "sent" folder, you can, of course, follow one of these examples. But I didn't face with such situation ever. -- Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
2017-06-19 13:03 GMT+03:00 Jesse: > Thanks Ivan. > What about when initiating a post, do I do the same eg: > TO: myself > CC: mailing list > > or do I > TO: mailing list > CC: myself When initiating a post you should to specify "TO: mailing list" only, without any other addresses. At least I used to initiate posts in such way. -- Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
Thanks Ivan. What about when initiating a post, do I do the same eg: TO: myself CC: mailing list or do I TO: mailing list CC: myself TIA On 19 June 2017 at 17:48, Ivan Sizovwrote: > 2017-06-19 12:32 GMT+03:00 Jesse : >> So I guess that means when I initiate a post, I also need to send it >> to myself as well as the mail list. > You need to do it in the reply only, not in the initial post. > >> Does it make any difference where I put respective addresses, eg: TO: CC: >> BCC: > You need to put a person to whom you reply in "TO" field and mailing > list in "CC" field. > > -- > Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
2017-06-19 12:32 GMT+03:00 Jesse: > So I guess that means when I initiate a post, I also need to send it > to myself as well as the mail list. You need to do it in the reply only, not in the initial post. > Does it make any difference where I put respective addresses, eg: TO: CC: BCC: You need to put a person to whom you reply in "TO" field and mailing list in "CC" field. -- Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
Ok thanks Ivan. So I guess that means when I initiate a post, I also need to send it to myself as well as the mail list. Does it make any difference where I put respective addresses, eg: TO: CC: BCC: Regards Jesse On 19 June 2017 at 17:20, Ivan Sizovwrote: > You should reply both to linux-btrfs@vger.kernel.org and the person > whom you talk to. > > 2017-06-19 11:37 GMT+03:00 Jesse : >> I have subscribed successfully and am able to post successfully and >> eventually view the post on spinics.net when it becomes available: >> eg: http://www.spinics.net/lists/linux-btrfs/msg66605.html >> >> However I do not know how to reply to messages, especially my own to >> add more information, such as a call trace. >> 1. I do not receive an email of my post for which I could reply >> 2. The emails that I do receive from the list are from the respective >> sender, and not the vger.kernel.org, as such I do not even know how to >> reply to someone in a way that it ends up on the mailing list and not >> directly to that person. >> >> Could someone please be so kind as to direct me to a good guide for >> using this mailing list? >> >> Thanks >> >> Jesse >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help on using linux-btrfs mailing list please
You should reply both to linux-btrfs@vger.kernel.org and the person whom you talk to. 2017-06-19 11:37 GMT+03:00 Jesse: > I have subscribed successfully and am able to post successfully and > eventually view the post on spinics.net when it becomes available: > eg: http://www.spinics.net/lists/linux-btrfs/msg66605.html > > However I do not know how to reply to messages, especially my own to > add more information, such as a call trace. > 1. I do not receive an email of my post for which I could reply > 2. The emails that I do receive from the list are from the respective > sender, and not the vger.kernel.org, as such I do not even know how to > reply to someone in a way that it ends up on the mailing list and not > directly to that person. > > Could someone please be so kind as to direct me to a good guide for > using this mailing list? > > Thanks > > Jesse > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ivan Sizov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help. Repair probably bitflip damage and suspected bug
18 23:43:24 Orion vmunix: [38084.330618] [] normal_work_helper+0x82/0x210 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.330668] [] btrfs_endio_helper+0x12/0x20 [btrfs] Jun 18 23:43:24 Orion vmunix: auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm drm_kms_helper drm r8169 ahci mii libahci wmi Jun 18 23:43:24 Orion vmunix: [38084.331102] [] ? __btrfs_map_block+0x32a/0x1180 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331152] [] ? __btrfs_map_block+0x2cc/0x1180 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331202] [] ? __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331255] [] btrfs_map_bio+0x7d/0x2b0 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331310] [] btrfs_submit_compressed_read+0x484/0x4e0 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331360] [] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331423] [] ? btrfs_create_repair_bio+0xf0/0x110 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331477] [] bio_readpage_error+0x117/0x180 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331530] [] ? clean_io_failure+0x1b0/0x1b0 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331585] [] end_bio_extent_readpage+0x3be/0x3f0 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331649] [] end_workqueue_fn+0x48/0x60 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331703] [] normal_work_helper+0x82/0x210 [btrfs] Jun 18 23:43:24 Orion vmunix: [38084.331757] [] btrfs_endio_helper+0x12/0x20 [btrfs] Jun 19 07:29:22 Orion vmunix: [3.107113] Btrfs loaded Jun 19 07:29:22 Orion vmunix: [3.665536] BTRFS: device label btrfs1 devid 2 transid 1086759 /dev/sdb2 Jun 19 07:29:22 Orion vmunix: [3.665811] BTRFS: device label btrfs1 devid 1 transid 1086759 /dev/sda2 Jun 19 07:29:22 Orion vmunix: [8.673689] BTRFS info (device sda2): disk space caching is enabled Jun 19 07:29:22 Orion vmunix: [ 28.190962] BTRFS info (device sda2): enabling auto defrag Jun 19 07:29:22 Orion vmunix: [ 28.191039] BTRFS info (device sda2): disk space caching is enabled I notice that this page https://btrfs.wiki.kernel.org/index.php/Gotchas mentions "Files with a lot of random writes can become heavily fragmented (1+ extents) causing thrashing on HDDs and excessive multi-second spikes" and as such I am wondering if this is related to the crashing. AFAIK rsync should be creating the temp file in the destination drive (xfs), unless there is some part of rsync that I am not understanding that would be writing to the file system drive (btrfs) which is also in the case the source hdd (btrfs). Can someone please help with these btrfs problems. Thankyou > My Linux Mint system is starting up and usable, however, I am unable > to complete any scrub as they abort before finished. There are various > inode errors in dmesg. Badblocks (readonly) finds no errors. checking > extents gives bad block 5123372711936 on both /dev/sda2 and /dev/sda2. > A btrfscheck (readonly) results in a 306MB text file of output of root > xxx inode errors. > There are two drives 3TB each in RAID 1 for sda2/sdb2 for which > partition 2 is nearly the entire drive. > > Currently I am now using a Manjaro Live Boot with btrfs tools > btrfs-progs v4.10.1 in an attempt to recover/repair what seems to be > bitflip > (The original Linux Mint System has btrfs-progs v4.5.3) > > When doing a scrub on '/', the status of /dev/sdb2 aborts always at ~ > 383GiB with 0 errors. Whereas the /dev/sda2 and thus the '/' scrub > aborts at more diverse values starting at 537.90GiB with 0 errors. > > btrfs inspect-internal dump-tree -b 5123372711936 has one item > evidently out of order: > 2551224532992 -> 2551253647360 -> 2551251468288 > > I am currently attempting to copy files off the system while in > Manjaro using rsync prior to attempting whatever the knowlegable > people here recommend. It has resulting in two files not being able to > be read so far, however, a lot of messages in dmesg for btrfs errors > https://ptpb.pw/L9Z9 > > Pastebins from original machine: > System specs as on original Linux Mint system: https://ptpb.pw/dFz3 > dmesg btrfs grep from prior to errors starting until scrub attempts: > https://ptpb.pw/rTzs > > Pastebins from subsequent live boot with newer btrfs tools 4.10: > LiveBoot Repair (Manjaro Arch) specs: https://ptpb.pw/ikMM > Scrub failing/aborting at same place on /dev/sdb: https://ptpb.pw/-vcP > badblock_extent_btrfscheck_5123372711936: https://ptpb.pw/T1rD > 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sda2': > https://ptpb.pw/zcyI > 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sdb2': > https://ptpb.pw/zcyI > dmesg on Manjaro attempting to rsync recove
Help on using linux-btrfs mailing list please
I have subscribed successfully and am able to post successfully and eventually view the post on spinics.net when it becomes available: eg: http://www.spinics.net/lists/linux-btrfs/msg66605.html However I do not know how to reply to messages, especially my own to add more information, such as a call trace. 1. I do not receive an email of my post for which I could reply 2. The emails that I do receive from the list are from the respective sender, and not the vger.kernel.org, as such I do not even know how to reply to someone in a way that it ends up on the mailing list and not directly to that person. Could someone please be so kind as to direct me to a good guide for using this mailing list? Thanks Jesse -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please help repair probably bitflip damage
My Linux Mint system is starting up and usable, however, I am unable to complete any scrub as they abort before finished. There are various inode errors in dmesg. Badblocks (readonly) finds no errors. checking extents gives bad block 5123372711936 on both /dev/sda2 and /dev/sda2. A btrfscheck (readonly) results in a 306MB text file of output of root xxx inode errors. There are two drives 3TB each in RAID 1 for sda2/sdb2 for which partition 2 is nearly the entire drive. Currently I am now using a Manjaro Live Boot with btrfs tools btrfs-progs v4.10.1 in an attempt to recover/repair what seems to be bitflip (The original Linux Mint System has btrfs-progs v4.5.3) When doing a scrub on '/', the status of /dev/sdb2 aborts always at ~ 383GiB with 0 errors. Whereas the /dev/sda2 and thus the '/' scrub aborts at more diverse values starting at 537.90GiB with 0 errors. btrfs inspect-internal dump-tree -b 5123372711936 has one item evidently out of order: 2551224532992 -> 2551253647360 -> 2551251468288 I am currently attempting to copy files off the system while in Manjaro using rsync prior to attempting whatever the knowlegable people here recommend. It has resulting in two files not being able to be read so far, however, a lot of messages in dmesg for btrfs errors https://ptpb.pw/L9Z9 Pastebins from original machine: System specs as on original Linux Mint system: https://ptpb.pw/dFz3 dmesg btrfs grep from prior to errors starting until scrub attempts: https://ptpb.pw/rTzs Pastebins from subsequent live boot with newer btrfs tools 4.10: LiveBoot Repair (Manjaro Arch) specs: https://ptpb.pw/ikMM Scrub failing/aborting at same place on /dev/sdb: https://ptpb.pw/-vcP badblock_extent_btrfscheck_5123372711936: https://ptpb.pw/T1rD 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sda2': https://ptpb.pw/zcyI 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sdb2': https://ptpb.pw/zcyI dmesg on Manjaro attempting to rsync recover files: https://ptpb.pw/L9Z9 Could someone please advise the steps to repair this. Thankyou Jesse -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html