Re: some free space cache corruptions

2016-12-25 Thread Janos Toth F.
I am not sure I can remember a time when btrfs check did not print this "cache and super generation don't match, space cache will be invalidated" message, so I started ignoring it a long time ago because I never seemed to have problem with missing free space and never got any similar warnings/error

[PATCH v2 12/19] btrfs-progs: scrub: Introduce function to scrub one extent

2016-12-25 Thread Qu Wenruo
Introduce a new function, scrub_one_extent(), as a wrapper to check one extent. It will accept a btrfs_path parameter @path, which must points to a META/EXTENT_ITEM. And @start, @len, which must be a subset of META/EXTENT_ITEM. Parameter @report will determine if we output error. Since the functi

[PATCH v2 13/19] btrfs-progs: scrub: Introduce function to scrub one data stripe

2016-12-25 Thread Qu Wenruo
Introduce new function, scrub_one_data_stripe(), to check all data and tree blocks inside the data stripe. This function will not try to recovery any error, but only check if any data/tree blocks has mismatch csum. If data missing csum, which is completely valid for case like nodatasum, it will j

[PATCH v2 05/19] btrfs-progs: Introduce wrapper to recover raid56 data

2016-12-25 Thread Qu Wenruo
Introduce a wrapper to recover raid56 data. The logical is the same with kernel one, but with different interfaces, since kernel ones cares the performance while in btrfs we don't care that much. And the interface is more caller friendly inside btrfs-progs. Signed-off-by: Qu Wenruo --- kernel-

[PATCH v2 09/19] btrfs-progs: scrub: Introduce structures to support fsck scrub for RAID56

2016-12-25 Thread Qu Wenruo
Introuduce new local structures, scrub_full_stripe and scrub_stripe, for incoming offline RAID56 scrub support. For pure stripe/mirror based profiles, like raid0/1/10/dup/single, we will follow the original bytenr and mirror number based iteration, so they don't need any extra structures for these

[PATCH v2 06/19] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result.

2016-12-25 Thread Qu Wenruo
Introduce a new function, __btrfs_map_block_v2(). Unlike old btrfs_map_block(), which needs different parameter to handle different RAID profile, this new function uses unified btrfs_map_block structure to handle all RAID profile in a more meaningful method: Return physical address along with log

[PATCH v2 18/19] btrfs-progs: scrub: Introduce function to check a whole block group

2016-12-25 Thread Qu Wenruo
Introduce new function, scrub_one_block_group(), to scrub a block group. For Single/DUP/RAID0/RAID1/RAID10, we use old mirror number based map_block, and check extent by extent. For parity based profile (RAID5/6), we use new map_block_v2() and check full stripe by full stripe. Signed-off-by: Qu

[PATCH v2 01/19] btrfs-progs: raid56: Introduce raid56 header for later recovery usage

2016-12-25 Thread Qu Wenruo
Introduce a new header, kernel-lib/raid56.h, for later raid56 works. It contains 2 functions, from original btrfs-progs code: void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs); int raid5_gen_result(int nr_devs, size_t stripe_len, int dest, void **data); Will be expanded later and some

[PATCH v2 19/19] btrfs-progs: fsck: Introduce offline scrub function

2016-12-25 Thread Qu Wenruo
Now, btrfs check has a kernel scrub equivalent. A new option, --scrub is added for "btrfs check". If --scrub is given, btrfs check will just act like kernel scrub, to check every copy of extent and do a report on corrupted data and if it's recoverable. The advantage compare to kernel scrub is: 1)

[PATCH v2 10/19] btrfs-progs: scrub: Introduce function to scrub mirror based tree block

2016-12-25 Thread Qu Wenruo
Introduce a new function, scrub_tree_mirror(), to scrub mirror based tree blocks (Single/DUP/RAID0/1/10) This function can also be used on in-memory tree blocks using @data parameter. This is very handy for RAID5/6 case, either checking the data stripe tree block by @bytenr and 0 as @mirror, or us

[PATCH v2 11/19] btrfs-progs: scrub: Introduce function to scrub mirror based data blocks

2016-12-25 Thread Qu Wenruo
Introduce a new function, scrub_data_mirror(), to check mirror based data blocks. It can also accept @data parameter to use in-memory data instead of reading them out of disk. This is a handy feature for RAID5/6 recovery verification code. Signed-off-by: Qu Wenruo --- scrub.c | 82 +

[PATCH v2 17/19] btrfs-progs: scrub: Introduce a function to scrub one full stripe

2016-12-25 Thread Qu Wenruo
Introduce a new function, scrub_one_full_stripe(), to check a full stripe. It handles the full stripe scrub in the following steps: 0) Check if we need to check full stripe If full stripe contains no extent, why waste our CPU and IO? 1) Read out full stripe Then we know how many devices are

[PATCH v2 03/19] btrfs-progs: raid56: Allow raid6 to recover 2 data stripes

2016-12-25 Thread Qu Wenruo
Copied from kernel lib/raid6/recov.c raid6_2data_recov_intx1() function. With the following modification: - Rename to raid6_recov_data2() for shorter name - s/kfree/free/g modification Signed-off-by: Qu Wenruo --- Makefile.in | 4 +-- raid56.c => kernel-lib/raid56.c | 69 +++

[PATCH v2 00/19]

2016-12-25 Thread Qu Wenruo
For any one who wants to try it, it can be get from my repo: https://github.com/adam900710/btrfs-progs/tree/offline_scrub Currently, I only tested it on SINGLE/DUP/RAID1/RAID5 filesystems, with mirror or parity or data corrupted. The tool are all able to detect them and give recoverbility report.

[PATCH v2 16/19] btrfs-progs: scrub: Introduce function to recover data parity

2016-12-25 Thread Qu Wenruo
Introduce function, recover_from_parities(), to recover data stripes. It just wraps raid56_recov() with extra check functions to scrub_full_stripe structure. Signed-off-by: Qu Wenruo --- scrub.c | 51 +++ 1 file changed, 51 insertions(+) diff --g

[PATCH v2 14/19] btrfs-progs: scrub: Introduce function to verify parities

2016-12-25 Thread Qu Wenruo
Introduce new function, verify_parities(), to check if parities matches for full stripe which all data stripes matches with their csum. Caller should fill the scrub_full_stripe structure properly before calling this function. Signed-off-by: Qu Wenruo --- scrub.c | 69 +++

[PATCH v2 02/19] btrfs-progs: raid56: Introduce tables for RAID6 recovery

2016-12-25 Thread Qu Wenruo
Use kernel RAID6 galois tables for later RAID6 recovery. Galois tables file, kernel-lib/tables.c is generated by user space program, mktable. Galois field tables declaration, in kernel-lib/raid56.h, is completely copied from kernel. The mktables.c is copied from kernel with minor header/macro mo

[PATCH v2 08/19] btrfs-progs: csum: Introduce function to read out one data csum

2016-12-25 Thread Qu Wenruo
Introduce a new function: btrfs_read_one_data_csum(), to read out a csum for a sectorsize. This is quite useful for read out data csum so we don't need to do it using open code. Signed-off-by: Qu Wenruo --- Makefile.in | 2 +- csum.c | 96 ++

[PATCH v2 07/19] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes

2016-12-25 Thread Qu Wenruo
For READ, caller normally hopes to get what they request, other than full stripe map. In this case, we should remove unrelated stripe map, just like the following case: 32K 96K |<-request range->| 0 64k 128K RAID0: |

[PATCH v2 15/19] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range.

2016-12-25 Thread Qu Wenruo
Introduce a new function, btrfs_check_extent_exists(), to check if there is any extent in the range specified by user. The parameter can be a large range, and if any extent exists in the range, it will return >0 (in fact it will return 1). Or return 0 if no extent is found. Signed-off-by: Qu Wenr

[PATCH v2 04/19] btrfs-progs: raid56: Allow raid6 to recover data and p

2016-12-25 Thread Qu Wenruo
Copied from kernel lib/raid6/recov.c. Minor modifications includes: - Rename from raid6_datap_recov_intx() to raid5_recov_datap() - Rename parameter from faila to dest1 Signed-off-by: Qu Wenruo --- kernel-lib/raid56.c | 41 + kernel-lib/raid56.h | 2 ++

[PATCH] fstests: btrfs/132: Use better method to wait the writer to avoid EBUSY

2016-12-25 Thread Qu Wenruo
The kill and wait method will only wait for the children process to exit, while the xfs_io can still run at background. This makes the test always fails on HDD backed physical machine. Use the "while ps aux | grep" method in btrfs/069 to truely wait the xfs_io to finish. Signed-off-by: Qu Wenruo

Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive

2016-12-25 Thread Duncan
Xin Zhou posted on Mon, 26 Dec 2016 03:36:09 +0100 as excerpted: > One interesting thing to investigate might be the btrfs send / receive > result, under a disruptive network environment. If the connection breaks > in the middle of transfer (at different phase, maybe), see what could be > the file

Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive

2016-12-25 Thread Xin Zhou
Hi, For free software with open source code, that is quite good. Most commercial product has a very robust error handling in transport, to guarantee no corruption due to transfer issues. One interesting thing to investigate might be the btrfs send / receive result, under a disruptive network envi

Re: [PATCH] fstests: btrfs: Test scrub and replace race for RAID56

2016-12-25 Thread Qu Wenruo
At 12/24/2016 05:45 PM, Eryu Guan wrote: On Thu, Dec 22, 2016 at 10:02:51AM +0800, Qu Wenruo wrote: Although by design, btrfs scrub and replace share the same code path, so they are exclusive to each other. But the fact is, there is still some critical region not protected well, so we can hav

Re: some free space cache corruptions

2016-12-25 Thread Duncan
Christoph Anton Mitterer posted on Sun, 25 Dec 2016 23:00:34 +0100 as excerpted: > # btrfs check /dev/mapper/data-a2 ; echo $? > Checking filesystem on /dev/mapper/data-a2 [...] > checking free space cache > block group 5431552376832 has wrong amount of free space > failed to load free space cac

Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive

2016-12-25 Thread Duncan
Xin Zhou posted on Sat, 24 Dec 2016 21:15:40 +0100 as excerpted: > The code is relatively new to me, I did not see retry logic in stream > handling, please correct me if I am wrong about this. > So, I am not quite sure about the transfer behavior, if the system > subject to network issues in heavy

some free space cache corruptions

2016-12-25 Thread Christoph Anton Mitterer
Hey. Had the following on a Debian sid: Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux btrfs-progs v4.7.3 I was doing a btrfs check of a rather big btrfs (8TB device, nearly full), having many snapshots on it, all incrementally send from another 8TB device,