Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
> Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
> That's probably us failing btrfs_lookup_inode(), but just to make sure,
> could you apply the updated diff at the same link as before
> (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
> that's the case, I'm even more confused about what xattrs have to do
> with it.

[   35.015363] __btrfs_update_delayed_inode(): inode is missing
[   35.015372] btrfs_update_delayed_inode(ino=2) -> -2


osandov-9f223b_2-dmesg.log
https://drive.google.com/open?id=0B_2Asp8DGjJ9UnNSRXpualprWHM


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: Introduce macro to calculate backup superblock offset

2017-01-23 Thread Qu Wenruo
Introduce a new macro, BTRFS_SB_OFFSET() to calculate backup superblock
offset, this is handy if one wants to initialize static array at
declaration time.

Suggested-by: David Sterba 
Signed-off-by: Qu Wenruo 
---
 disk-io.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/disk-io.h b/disk-io.h
index c4afea3f..08ee5cee 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -98,11 +98,17 @@ enum btrfs_read_sb_flags {
SBREAD_PARTIAL  = (1 << 1),
 };
 
+/*
+ * Use macro to define mirror super block position,
+ * so we can use it in static array initialization
+ */
+#define BTRFS_SB_MIRROR_OFFSET(mirror) ((u64)(SZ_16K) << \
+   (BTRFS_SUPER_MIRROR_SHIFT * (mirror)))
+
 static inline u64 btrfs_sb_offset(int mirror)
 {
-   u64 start = SZ_16K;
if (mirror)
-   return start << (BTRFS_SUPER_MIRROR_SHIFT * mirror);
+   return BTRFS_SB_MIRROR_OFFSET(mirror);
return BTRFS_SUPER_INFO_OFFSET;
 }
 
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: Introduce kernel sizes to cleanup large intermediate number

2017-01-23 Thread Qu Wenruo
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.

Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.

Signed-off-by: Qu Wenruo 
---
 btrfs-map-logical.c |  2 +-
 cmds-fi-usage.c |  2 +-
 cmds-filesystem.c   |  2 +-
 cmds-inspect.c  |  6 +++---
 cmds-scrub.c|  2 +-
 cmds-send.c |  2 +-
 ctree.c |  7 ---
 ctree.h |  6 --
 disk-io.c   |  2 +-
 disk-io.h   |  5 +++--
 extent-tree.c   | 15 +++
 free-space-cache.c  |  2 +-
 kernel-lib/sizes.h  | 47 +++
 send.h  |  2 +-
 utils.c |  8 
 utils.h |  9 +
 volumes.c   | 20 ++--
 volumes.h   |  2 +-
 18 files changed, 96 insertions(+), 45 deletions(-)
 create mode 100644 kernel-lib/sizes.h

diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
index e49a735e..bcbf2d90 100644
--- a/btrfs-map-logical.c
+++ b/btrfs-map-logical.c
@@ -30,7 +30,7 @@
 #include "list.h"
 #include "utils.h"
 
-#define BUFFER_SIZE (64 * 1024)
+#define BUFFER_SIZE SZ_64K
 
 /* we write the mirror info to stdout unless they are dumping the data
  * to stdout
diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index 8764fef6..5d8496fe 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -301,7 +301,7 @@ static void get_raid56_used(int fd, struct chunk_info 
*chunks, int chunkcount,
}
 }
 
-#defineMIN_UNALOCATED_THRESH   (16 * 1024 * 1024)
+#defineMIN_UNALOCATED_THRESH   SZ_16M
 static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
int chunkcount, struct device_info *devinfo, int devcount,
char *path, unsigned unit_mode)
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index c66709b3..f3949b3b 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -1044,7 +1044,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
 * but it does not defragment very well. The 32M will likely lead to
 * better results and is independent of the kernel default.
 */
-   thresh = 32 * 1024 * 1024;
+   thresh = SZ_32M;
 
defrag_global_errors = 0;
defrag_global_verbose = 0;
diff --git a/cmds-inspect.c b/cmds-inspect.c
index 5e58a284..ac3da618 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -173,7 +173,7 @@ static int cmd_inspect_logical_resolve(int argc, char 
**argv)
if (check_argc_exact(argc - optind, 2))
usage(cmd_inspect_logical_resolve_usage);
 
-   size = min(size, (u64)64 * 1024);
+   size = min(size, (u64)SZ_64K);
inodes = malloc(size);
if (!inodes)
return 1;
@@ -486,7 +486,7 @@ static void adjust_dev_min_size(struct list_head *extents,
 * chunk tree, so often this can lead to the need of allocating
 * a new system chunk too, which has a maximum size of 32Mb.
 */
-   *min_size += 32 * 1024 * 1024;
+   *min_size += SZ_32M;
}
 }
 
@@ -500,7 +500,7 @@ static int print_min_dev_size(int fd, u64 devid)
 * possibility of deprecating/removing it has been discussed, so we
 * ignore it here.
 */
-   u64 min_size = 1 * 1024 * 1024ull;
+   u64 min_size = SZ_1M;
struct btrfs_ioctl_search_args args;
struct btrfs_ioctl_search_key *sk = 
u64 last_pos = (u64)-1;
diff --git a/cmds-scrub.c b/cmds-scrub.c
index 2cf7f308..292a5dfd 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -467,7 +467,7 @@ static struct scrub_file_record **scrub_read_file(int fd, 
int report_errors)
 {
int avail = 0;
int old_avail = 0;
-   char l[16 * 1024];
+   char l[SZ_16K];
int state = 0;
int curr = -1;
int i = 0;
diff --git a/cmds-send.c b/cmds-send.c
index cec11e6b..6c0a3dc3 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -44,7 +44,7 @@
 #include "send.h"
 #include "send-utils.h"
 
-#define SEND_BUFFER_SIZE   (64 * 1024)
+#define SEND_BUFFER_SIZE   SZ_64K
 
 /*
  * Default is 1 for historical reasons, changing may break scripts that expect
diff --git a/ctree.c b/ctree.c
index d07ec7d9..e3d687fb 100644
--- a/ctree.c
+++ b/ctree.c
@@ -21,6 +21,7 @@
 #include "print-tree.h"
 #include "repair.h"
 #include "internal.h"
+#include "sizes.h"
 
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, struct btrfs_path *path, int level);
@@ -368,7 +369,7 @@ int btrfs_cow_block(struct btrfs_trans_handle *trans,
return 0;
}
 
-   search_start = buf->start & ~((u64)(1024 * 1024 * 1024) - 1);
+   search_start = buf->start & ~((u64)SZ_1G - 1);
ret = __btrfs_cow_block(trans, root, buf, parent,
 parent_slot, 

Re: [PATCH v3 5/6] btrfs-progs: convert: Switch to new rollback function

2017-01-23 Thread Qu Wenruo



At 01/24/2017 01:54 AM, David Sterba wrote:

On Mon, Dec 19, 2016 at 02:56:41PM +0800, Qu Wenruo wrote:

Since we have the whole facilities needed to rollback, switch to the new
rollback.


Sorry, the change from patch 4 to patch 5 seems too big to grasp for me,
reviewing is really hard and I'm not sure I could even do that. My
concern is namely about patch 5/6 that throws out a lot of code that
does not obviously map to the new code.

I can try again to see if there are points where the patch could be
split, but at the moment the patchset is too scary to merge.



So this implies the current implementation is not good enough for review.

I'll try to extract more more set operation and make the core part more 
refined, with more ascii art comment for it.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Omar Sandoval
On Mon, Jan 23, 2017 at 04:48:54PM -0700, Chris Murphy wrote:
> On Mon, Jan 23, 2017 at 3:04 PM, Omar Sandoval  wrote:
> > On Mon, Jan 23, 2017 at 02:55:21PM -0700, Chris Murphy wrote:
> >> On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
> >> > I haven't found the commit for that patch, so maybe it's something
> >> > with the combination of that patch and the previous commit.
> >>
> >> I think that's provably not the case based on the bisect log, because
> >> I hit the problem with kernel that has only the commit, as well as the
> >> commit plus the updated patch. So the patch neither causes nor fixes
> >> the problem I'm experiencing.
> >>
> >> If it's useful the btrfs-image is here; mentioned in a previous
> >> thread, this image mounts find, btrfs check --mode=original has no
> >> complaints, but btrfs check --mode=lowmem has complaints. There's no
> >> problem using the parent subvolume as rootfs. Only snapshots of that
> >> subvolume result in the problem.
> >> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA
> >
> > What I meant to ask was if there were false positives/false negatives in
> > booting from the subvolume that would lead you to doubt the results of
> > the git bisect, but it sounds like it's 100% reproducible for you?
> 
> OH I see. Yes. It happens within 60 seconds, during startup or shortly
> after gnome-shell login. So it's really clear that it definitely
> happens or does not happen. But it requires two things to be true:
> kernel version 4.9 or higher *and* a normal rw snapshot is used as
> rootfs. If either of those things is not true, then the problem
> doesn't happen.
> 
> I've also tried building a 4.9 kernel with CONFIG_BTRFS_DEBUG and
> CONFIG_BTRFS_FS_CHECK_INTEGRITY with check_int, but the results are
> the same - no additional debug info shown by dmesg.
> 
> 
> > I'll take a look at the image. In the meantime, could you try booting
> > with https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4
> > applied on top of 4.9 so we can hopefully narrow it down? It'd also be
> > great to know if it always fails the same way or if it varies.
> 
> Appears to always fail the same way.
> 
> [chris@f25h ~]$ dmesg | grep -i btrfs
> [2.705333] Btrfs loaded, crc32c=crc32c-intel
> [2.705905] BTRFS: device label fedora devid 1 transid 113458 
> /dev/nvme0n1p4
> [2.764563] BTRFS: device label fedora devid 2 transid 113458 
> /dev/nvme0n1p6
> [3.990957] BTRFS info (device nvme0n1p6): disk space caching is enabled
> [3.990988] BTRFS info (device nvme0n1p6): has skinny extents
> [4.010618] BTRFS info (device nvme0n1p6): detected SSD devices,
> enabling SSD mode
> [4.551046] BTRFS info (device nvme0n1p6): disk space caching is enabled
> [   13.906182] btrfs_update_delayed_inode() -> -2
> [   13.906261] WARNING: CPU: 0 PID: 488 at
> fs/btrfs/delayed-inode.c:1179 __btrfs_run_delayed_items+0x1b7/0x660
> [btrfs]
> [   13.906266] BTRFS: Transaction aborted (error -2)
> [   13.906460]  tpm_tis acpi_thermal_rel lis3lv02d tpm_tis_core
> input_polldev acpi_pad wmi nfs_acl hp_wireless tpm lockd grace sunrpc
> btrfs i915 xor raid6_pq i2c_algo_bit drm_kms_helper drm crc32c_intel
> nvme serio_raw nvme_core i2c_hid video fjes
> [   13.906635]  [] ?
> __btrfs_release_delayed_node+0x70/0x1c0 [btrfs]
> [   13.906690]  []
> __btrfs_run_delayed_items+0x1b7/0x660 [btrfs]
> [   13.906743]  [] btrfs_run_delayed_items+0x13/0x20 [btrfs]
> [   13.906793]  []
> btrfs_commit_transaction+0x23a/0xa20 [btrfs]
> [   13.906853]  [] ?
> btrfs_wait_ordered_range+0x7c/0x100 [btrfs]
> [   13.906910]  [] btrfs_sync_file+0x2fb/0x3e0 [btrfs]
> [   13.906970] BTRFS: error (device nvme0n1p6) in
> __btrfs_run_delayed_items:1179: errno=-2 No such entry
> [   13.906976] BTRFS info (device nvme0n1p6): forced readonly
> [   13.906982] BTRFS warning (device nvme0n1p6): Skipping commit of
> aborted transaction.
> [   13.906989] BTRFS: error (device nvme0n1p6) in
> cleanup_transaction:1850: errno=-2 No such entry
> [   13.907943] BTRFS info (device nvme0n1p6): delayed_refs has NO entry
> 
> Complete dmesg tags/v4.9 + osandov/9f223b
> https://drive.google.com/open?id=0B_2Asp8DGjJ9N1g5Wm9lVHpGWG8

Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
That's probably us failing btrfs_lookup_inode(), but just to make sure,
could you apply the updated diff at the same link as before
(https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
that's the case, I'm even more confused about what xattrs have to do
with it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 3:04 PM, Omar Sandoval  wrote:
> On Mon, Jan 23, 2017 at 02:55:21PM -0700, Chris Murphy wrote:
>> On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
>> > I haven't found the commit for that patch, so maybe it's something
>> > with the combination of that patch and the previous commit.
>>
>> I think that's provably not the case based on the bisect log, because
>> I hit the problem with kernel that has only the commit, as well as the
>> commit plus the updated patch. So the patch neither causes nor fixes
>> the problem I'm experiencing.
>>
>> If it's useful the btrfs-image is here; mentioned in a previous
>> thread, this image mounts find, btrfs check --mode=original has no
>> complaints, but btrfs check --mode=lowmem has complaints. There's no
>> problem using the parent subvolume as rootfs. Only snapshots of that
>> subvolume result in the problem.
>> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA
>
> What I meant to ask was if there were false positives/false negatives in
> booting from the subvolume that would lead you to doubt the results of
> the git bisect, but it sounds like it's 100% reproducible for you?

OH I see. Yes. It happens within 60 seconds, during startup or shortly
after gnome-shell login. So it's really clear that it definitely
happens or does not happen. But it requires two things to be true:
kernel version 4.9 or higher *and* a normal rw snapshot is used as
rootfs. If either of those things is not true, then the problem
doesn't happen.

I've also tried building a 4.9 kernel with CONFIG_BTRFS_DEBUG and
CONFIG_BTRFS_FS_CHECK_INTEGRITY with check_int, but the results are
the same - no additional debug info shown by dmesg.


> I'll take a look at the image. In the meantime, could you try booting
> with https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4
> applied on top of 4.9 so we can hopefully narrow it down? It'd also be
> great to know if it always fails the same way or if it varies.

Appears to always fail the same way.

[chris@f25h ~]$ dmesg | grep -i btrfs
[2.705333] Btrfs loaded, crc32c=crc32c-intel
[2.705905] BTRFS: device label fedora devid 1 transid 113458 /dev/nvme0n1p4
[2.764563] BTRFS: device label fedora devid 2 transid 113458 /dev/nvme0n1p6
[3.990957] BTRFS info (device nvme0n1p6): disk space caching is enabled
[3.990988] BTRFS info (device nvme0n1p6): has skinny extents
[4.010618] BTRFS info (device nvme0n1p6): detected SSD devices,
enabling SSD mode
[4.551046] BTRFS info (device nvme0n1p6): disk space caching is enabled
[   13.906182] btrfs_update_delayed_inode() -> -2
[   13.906261] WARNING: CPU: 0 PID: 488 at
fs/btrfs/delayed-inode.c:1179 __btrfs_run_delayed_items+0x1b7/0x660
[btrfs]
[   13.906266] BTRFS: Transaction aborted (error -2)
[   13.906460]  tpm_tis acpi_thermal_rel lis3lv02d tpm_tis_core
input_polldev acpi_pad wmi nfs_acl hp_wireless tpm lockd grace sunrpc
btrfs i915 xor raid6_pq i2c_algo_bit drm_kms_helper drm crc32c_intel
nvme serio_raw nvme_core i2c_hid video fjes
[   13.906635]  [] ?
__btrfs_release_delayed_node+0x70/0x1c0 [btrfs]
[   13.906690]  []
__btrfs_run_delayed_items+0x1b7/0x660 [btrfs]
[   13.906743]  [] btrfs_run_delayed_items+0x13/0x20 [btrfs]
[   13.906793]  []
btrfs_commit_transaction+0x23a/0xa20 [btrfs]
[   13.906853]  [] ?
btrfs_wait_ordered_range+0x7c/0x100 [btrfs]
[   13.906910]  [] btrfs_sync_file+0x2fb/0x3e0 [btrfs]
[   13.906970] BTRFS: error (device nvme0n1p6) in
__btrfs_run_delayed_items:1179: errno=-2 No such entry
[   13.906976] BTRFS info (device nvme0n1p6): forced readonly
[   13.906982] BTRFS warning (device nvme0n1p6): Skipping commit of
aborted transaction.
[   13.906989] BTRFS: error (device nvme0n1p6) in
cleanup_transaction:1850: errno=-2 No such entry
[   13.907943] BTRFS info (device nvme0n1p6): delayed_refs has NO entry

Complete dmesg tags/v4.9 + osandov/9f223b
https://drive.google.com/open?id=0B_2Asp8DGjJ9N1g5Wm9lVHpGWG8



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID56 status?

2017-01-23 Thread Christoph Anton Mitterer
On Mon, 2017-01-23 at 18:18 -0500, Chris Mason wrote:
> We've been focusing on the single-drive use cases internally.  This
> year 
> that's changing as we ramp up more users in different places.  
> Performance/stability work and raid5/6 are the top of my list right
> now.
+1

Would be nice to get some feedback on what happens behind the scenes...
 actually I think a regular btrfs development blog could be generally a
nice thing :)

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: RAID56 status?

2017-01-23 Thread Chris Mason

On Mon, Jan 23, 2017 at 06:53:21PM +0100, Christoph Anton Mitterer wrote:

Just wondered... is there any larger known RAID56 deployment? I mean
something with real-world production systems and ideally many different
IO scenarios, failures, pulling disks randomly and perhaps even so
many disks that it's also likely to hit something like silent data
corruption (on the disk level)?

Has CM already migrated all of Facebook's storage to btrfs RAID56?! ;-)
Well at least facebook.com seems till online ;-P *kidding*

I mean the good thing in having such a massive production-like
environment - especially when it's not just one homogeneous usage
pattern - is that it would help to build up quite some trust into the
code (once the already known bugs are fixed).


We've been focusing on the single-drive use cases internally.  This year 
that's changing as we ramp up more users in different places.  
Performance/stability work and raid5/6 are the top of my list right now.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Omar Sandoval
On Mon, Jan 23, 2017 at 02:55:21PM -0700, Chris Murphy wrote:
> On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
> > I haven't found the commit for that patch, so maybe it's something
> > with the combination of that patch and the previous commit.
> 
> I think that's provably not the case based on the bisect log, because
> I hit the problem with kernel that has only the commit, as well as the
> commit plus the updated patch. So the patch neither causes nor fixes
> the problem I'm experiencing.
> 
> If it's useful the btrfs-image is here; mentioned in a previous
> thread, this image mounts find, btrfs check --mode=original has no
> complaints, but btrfs check --mode=lowmem has complaints. There's no
> problem using the parent subvolume as rootfs. Only snapshots of that
> subvolume result in the problem.
> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA

What I meant to ask was if there were false positives/false negatives in
booting from the subvolume that would lead you to doubt the results of
the git bisect, but it sounds like it's 100% reproducible for you?

I'll take a look at the image. In the meantime, could you try booting
with https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4
applied on top of 4.9 so we can hopefully narrow it down? It'd also be
great to know if it always fails the same way or if it varies.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
> I haven't found the commit for that patch, so maybe it's something
> with the combination of that patch and the previous commit.

I think that's provably not the case based on the bisect log, because
I hit the problem with kernel that has only the commit, as well as the
commit plus the updated patch. So the patch neither causes nor fixes
the problem I'm experiencing.

If it's useful the btrfs-image is here; mentioned in a previous
thread, this image mounts find, btrfs check --mode=original has no
complaints, but btrfs check --mode=lowmem has complaints. There's no
problem using the parent subvolume as rootfs. Only snapshots of that
subvolume result in the problem.
https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 2:31 PM, Omar Sandoval  wrote:
> On Wed, Jan 18, 2017 at 02:27:13PM -0700, Chris Murphy wrote:
>> On Wed, Jan 11, 2017 at 4:13 PM, Chris Murphy  
>> wrote:
>> > Looks like there's some sort of xattr and Btrfs interaction happening
>> > here; but as it only happens with some subvolumes/snapshots not all
>> > (but 100% consistent) maybe the kernel version at the time the
>> > snapshot was taken is a factor?
>>
>> The kernel version at the time the snapshot is taken is not a factor.
>> I've taken a snapshot of a working subvolume, and booting the snapshot
>> fails during startup with the fs forced readonly with kernel 4.9 and
>> higher; the problem doesn't happen with kernel 4.8.17 and lower.
>>
>> As a further test I tried:
>>
>>
>> git checkout tags/v4.9
>> git revert 6c6ef9f26e598fb977f60935e109cd5b266c941a
>>
>> But I get a failure during compile:
>>
>> scripts/Makefile.build:293: recipe for target 'fs/xattr.o' failed
>> make[1]: *** [fs/xattr.o] Error 1
>> Makefile:988: recipe for target 'fs' failed
>> make: *** [fs] Error 2
>>
>> Anyway, the inability to boot snapshots means bootable rollbacks are
>> broken. I think this is a serious regression, what's the next step in
>> figuring out what's going on?
>
> Hm, so you're 100% sure that this exact commit caused the regression?
> I've stared at it for a little while and am not seeing anything obvious.

This is the git bisect log:
https://bugzilla.kernel.org/attachment.cgi?id=251271

I don't know how to evaluate your question. If there's a test that
helps answer the question, let me know. Searching for this commit
brings up an lkml thread:

https://lkml.org/lkml/2016/11/3/268

I haven't found the commit for that patch, so maybe it's something
with the combination of that patch and the previous commit. But I see
it's applied in at least 4.10-rc4 and this forced readonly event
happens with 4.10-rc4.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Omar Sandoval
On Wed, Jan 18, 2017 at 02:27:13PM -0700, Chris Murphy wrote:
> On Wed, Jan 11, 2017 at 4:13 PM, Chris Murphy  wrote:
> > Looks like there's some sort of xattr and Btrfs interaction happening
> > here; but as it only happens with some subvolumes/snapshots not all
> > (but 100% consistent) maybe the kernel version at the time the
> > snapshot was taken is a factor?
> 
> The kernel version at the time the snapshot is taken is not a factor.
> I've taken a snapshot of a working subvolume, and booting the snapshot
> fails during startup with the fs forced readonly with kernel 4.9 and
> higher; the problem doesn't happen with kernel 4.8.17 and lower.
> 
> As a further test I tried:
> 
> 
> git checkout tags/v4.9
> git revert 6c6ef9f26e598fb977f60935e109cd5b266c941a
> 
> But I get a failure during compile:
> 
> scripts/Makefile.build:293: recipe for target 'fs/xattr.o' failed
> make[1]: *** [fs/xattr.o] Error 1
> Makefile:988: recipe for target 'fs' failed
> make: *** [fs] Error 2
> 
> Anyway, the inability to boot snapshots means bootable rollbacks are
> broken. I think this is a serious regression, what's the next step in
> figuring out what's going on?

Hm, so you're 100% sure that this exact commit caused the regression?
I've stared at it for a little while and am not seeing anything obvious.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem vs original

2017-01-23 Thread Chris Murphy
OK so all of these pass original check, but have problems reported by
lowmem. Separate notes about each inline.

~500MiB each, these three are data volumes, first two are raid1, third
one is single.
https://drive.google.com/open?id=0B_2Asp8DGjJ9Z3UzWnFKT3A0clU
https://drive.google.com/open?id=0B_2Asp8DGjJ9V0ROdHNoMW1BVE0
https://drive.google.com/open?id=0B_2Asp8DGjJ9Zmd1LXl6MU5WeXc

19MiB, about 15 minutes old, rootfs, OS installation only
https://drive.google.com/open?id=0B_2Asp8DGjJ9TF9LVkFlcDBzOG8

55MiB, about 1 month old, rootfs, not much activity
https://drive.google.com/open?id=0B_2Asp8DGjJ9bkJFc01qcVJxNnM

324MiB, about 5 months old, used as rootfs, all read-write snapshots
used as rootfs are forced readonly, a regression previously reported
without any dev response
https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA
http://www.spinics.net/lists/linux-btrfs/msg61817.html
https://bugzilla.kernel.org/show_bug.cgi?id=191761
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard crash on 4.9.5

2017-01-23 Thread Hans van Kranenburg
On 01/23/2017 09:27 PM, Hans van Kranenburg wrote:
> [... press send without rereading ...]
> 
> Anyway, it seems to point to something that's going wrong with changes
> that are *not* on disk *yet*, and the crash is preventing ...

... whatever incorrect data this situation might result in from reaching
disk, at least.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard crash on 4.9.5

2017-01-23 Thread Hans van Kranenburg
On 01/23/2017 09:03 PM, Matt McKinnon wrote:
> Wondering what to do about this error which says 'reboot needed'.  Has
> happened a three times in the past week:
> 
> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device
> sda1): err add delayed dir index item(index: 23810) into the deletion
> tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
> Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
> ]
> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> fs/btrfs/delayed-inode.c:1557!
> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 
> [#1] SMP
> [...]

The purpose of the code involved is that if you create a directory or
file and quickly remove it again, the filesystem doesn't need to do two
disk writes, it can just erase it again from its memory before writing
anything to disk.

 8< more 

This is when the functionality was added:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=16cdcec736cd214350cdb591bf1091f8beedefa0

If you look for "err add delayed dir" in the source code of that commit
message, you see where the error message is constructed

errno: -17, just after it called __btrfs_add_delayed_insertion_item

__btrfs_add_delayed_insertion_item calls __btrfs_add_delayed_item, and
the only non-0 return in that function is: return -EEXIST, which is -17

I think this means you added a file or directory, and the kernel code
tried to add adding the file twice to the list of additions, which it
has no way to deal with except making the whole kernel crash.

 >8 

A while ago someone reported this on IRC, running a 4.8.13 kernel.
(that's when I looked up the above info). I can also find it in Oct 2016
in my IRC logs, but without any info on kernel version.

Anyway, it seems to point to something that's going wrong with changes
that are *not* on disk *yet*, and the crash is preventing .

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hard crash on 4.9.5

2017-01-23 Thread Matt McKinnon
Wondering what to do about this error which says 'reboot needed'.  Has 
happened a three times in the past week:


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device 
sda1): err add delayed dir index item(index: 23810) into the deletion 
tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here 
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at 
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  
[#1] SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs 
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac 
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si 
ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_

bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core 
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: 
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: 
Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28

/2014
Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80 
task.stack: b9da8533
Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP: 
0010:[]  [] 
btrfs_delete_delayed_dir_inde

x+0x286/0x290 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 
0018:b9da85333be0  EFLAGS: 00010286
Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX:  
RBX: 95a3b104b690 RCX: 
Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 
RSI: 95a42fc0dcc8 RDI: 95a42fc0dcc8
Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 
R08: 0491 R09: 
Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 
R11: 0006 R12: 95a3b104b6d8
Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 
R14: 95a82953d800 R15: ffef
Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: 
() GS:95a42fc0() knlGS:
Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS:  0010 DS:  ES: 
 CR0: 80050033
Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 
CR3: 0003e1e07000 CR4: 001406f0

Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
Jan 23 14:16:17 my_machine kernel: [ 2568.799524]  9b7fe5f2 
95a3b104b560 0004 95a3f96b3e80
Jan 23 14:16:17 my_machine kernel: [ 2568.806983]  95a3f96b3e80 
39ff95a814eeeb68 6000289c 5d02
Jan 23 14:16:17 my_machine kernel: [ 2568.814436]  95a3f7457c40 
95a3bcb74138 95a814eeeb68 00289c39

Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
Jan 23 14:16:17 my_machine kernel: [ 2568.824343]  [] 
? mutex_lock+0x12/0x2f
Jan 23 14:16:17 my_machine kernel: [ 2568.829671]  [] 
__btrfs_unlink_inode+0x198/0x4c0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.836555]  [] 
btrfs_unlink_inode+0x1c/0x40 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.843086]  [] 
btrfs_unlink+0x6b/0xb0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.849091]  [] 
vfs_unlink+0xda/0x190
Jan 23 14:16:17 my_machine kernel: [ 2568.854315]  [] 
? lookup_one_len+0xd3/0x130
Jan 23 14:16:17 my_machine kernel: [ 2568.860075]  [] 
nfsd_unlink+0x16e/0x210 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.866084]  [] 
nfsd3_proc_remove+0x7c/0x110 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.872529]  [] 
nfsd_dispatch+0xb8/0x1f0 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.878641]  [] 
svc_process_common+0x43f/0x700 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.885432]  [] 
svc_process+0xfc/0x1c0 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.891528]  [] 
nfsd+0xf0/0x160 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.896838]  [] 
? nfsd_destroy+0x60/0x60 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.902931]  [] 
kthread+0xca/0xe0
Jan 23 14:16:17 my_machine kernel: [ 2568.907807]  [] 
? kthread_park+0x60/0x60
Jan 23 14:16:17 my_machine kernel: [ 2568.913296]  [] 
ret_from_fork+0x25/0x30
Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 
10 49 8b 

Re: [PATCH v3 5/6] btrfs-progs: convert: Switch to new rollback function

2017-01-23 Thread David Sterba
On Mon, Dec 19, 2016 at 02:56:41PM +0800, Qu Wenruo wrote:
> Since we have the whole facilities needed to rollback, switch to the new
> rollback.

Sorry, the change from patch 4 to patch 5 seems too big to grasp for me,
reviewing is really hard and I'm not sure I could even do that. My
concern is namely about patch 5/6 that throws out a lot of code that
does not obviously map to the new code.

I can try again to see if there are points where the patch could be
split, but at the moment the patchset is too scary to merge.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID56 status?

2017-01-23 Thread Christoph Anton Mitterer
Just wondered... is there any larger known RAID56 deployment? I mean
something with real-world production systems and ideally many different
 IO scenarios, failures, pulling disks randomly and perhaps even so
many disks that it's also likely to hit something like silent data
corruption (on the disk level)?

Has CM already migrated all of Facebook's storage to btrfs RAID56?! ;-)
Well at least facebook.com seems till online ;-P *kidding*

I mean the good thing in having such a massive production-like
environment - especially when it's not just one homogeneous usage
pattern - is that it would help to build up quite some trust into the
code (once the already known bugs are fixed).



Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v3 2/6] btrfs-progs: utils: Introduce basic set operations for range

2017-01-23 Thread David Sterba
On Mon, Dec 19, 2016 at 02:56:38PM +0800, Qu Wenruo wrote:
> +static u64 reserved_range_starts[3] = { 0, BTRFS_SB_MIRROR_OFFSET(1),
> + BTRFS_SB_MIRROR_OFFSET(2) };
> +static u64 reserved_range_lens[3] = { 1024 * 1024, 64 * 1024, 64 * 1024 };

Also anywhere in the relevant code, the 3 should be better a named
constant and not either 3 or the ARRAY_SIZE, or 2 in some backward going
for loop I've seen.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] btrfs-progs: utils: Introduce basic set operations for range

2017-01-23 Thread David Sterba
On Mon, Dec 19, 2016 at 02:56:38PM +0800, Qu Wenruo wrote:
> Introduce basic set operations: is_subset() and is_intersection().
> 
> This is quite useful to check if a range [start, start + len) subset or
> intersection of another range.
> So we don't need to use open code to do it, which I sometimes do it
> wrong.
> 
> Also use these new facilities in btrfs-convert, to check if a range is a
> subset or intersects with btrfs convert reserved ranges.

I see the range helpers used only inside convert so I don't think we
need to export them into utils. Then you could introduce a helper
structure with start and len members and use that instead of 2 arrays

> --- a/disk-io.h
> +++ b/disk-io.h
> @@ -97,11 +97,16 @@ enum btrfs_read_sb_flags {
>   SBREAD_PARTIAL  = (1 << 1),
>  };
>  
> +/*
> + * Use macro to define mirror super block position
> + * So we can use it in static array initializtion
> + */
> +#define BTRFS_SB_MIRROR_OFFSET(mirror)   ((u64)(16 * 1024) << \
> +  (BTRFS_SUPER_MIRROR_SHIFT * (mirror)))

This is unrelated change and should go separately.

>  static inline u64 btrfs_sb_offset(int mirror)
>  {
> - u64 start = 16 * 1024;
>   if (mirror)
> - return start << (BTRFS_SUPER_MIRROR_SHIFT * mirror);
> + return BTRFS_SB_MIRROR_OFFSET(mirror);
>   return BTRFS_SUPER_INFO_OFFSET;
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: sanitize - Use correct source for memcpy

2017-01-23 Thread David Sterba
On Fri, Jan 20, 2017 at 01:03:33PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 
> While performing a memcpy, we are copying from uninitialized dst
> as opposed to src->data. Though using eb->len is correct, I used
> src->len to make it more readable.
> 
> Signed-off-by: Goldwyn Rodrigues 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Urgent Please,,

2017-01-23 Thread Joyes Dadi
Good Day Dear,

My name is Ms. Joyes Dadi, I am glad you are reading this letter and I hope
we will start our communication and I know that this message will look strange,
surprising and probably unbelievable to you, but it is the reality. I want to
make a donation of money to you.

I contact you by the will of God. I am a firm German woman specialized in
mining gold and diamonds in Africa. But now, I'm very sick of a cancer. My
husband died in an accident two years ago with our two children and now I have
cancer of the esophagus that damaged almost all the cells in my system/agencies
and I'll die soon according to my doctor.

My most concern now is, we grew up in the orphanage and were married in
orphanage. If I die this deposited fund will soon be left alone in the hand of
the bank, and I do want to it that  way. Please, if you can be reliable and
sincere to accept my humble proposal; I have (10.5Millions Euro) in a fixed
deposit account; I will order the Bank to transfer the money into your account
in your country immediately, and then you will take the fund to  your country
and invest it to the orphanage homes Please, answer as quickly as possible.

God bless you.
Ms. Joyes Dadi
Email: joyesdadi...@citromail.hu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5: btrfs rescue chunk-recover segfaults.

2017-01-23 Thread Roman Mamedov
On Mon, 23 Jan 2017 14:15:55 +0100
Simon Waid  wrote:

> I have a btrfs raid5 array that has become unmountable.

That's the third time you send this today. Will you keep resending every few
hours until you get a reply? That's not how mailing lists work.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID5: btrfs rescue chunk-recover segfaults.

2017-01-23 Thread Simon Waid
Dear all,

I have a btrfs raid5 array that has become unmountable. When trying to
mount dmesg containes the following:

[ 5686.334384] BTRFS info (device sdb): disk space caching is enabled
[ 5688.377244] BTRFS info (device sdb): bdev /dev/sdb errs: wr 2517, rd
77, flush 0, corrupt 0, gen 0
[ 5688.377254] BTRFS info (device sdb): bdev /dev/sdc errs: wr 0, rd 0,
flush 0, corrupt 10, gen 0
[ 5688.377261] BTRFS info (device sdb): bdev /dev/sdd1 errs: wr 0, rd 0,
flush 0, corrupt 5, gen 0
[ 5688.377268] BTRFS info (device sdb): bdev /dev/sde errs: wr 21, rd
8807, flush 0, corrupt 0, gen 0
[ 5688.744249] BTRFS error (device sdb): parent transid verify failed on
16227387371520 wanted 88711 found 88395
[ 5689.533817] BTRFS error (device sdb): parent transid verify failed on
16227388260352 wanted 88711 found 88395
[ 5689.609355] BTRFS error (device sdb): parent transid verify failed on
16227415158784 wanted 88711 found 88397
[ 5689.627715] BTRFS error (device sdb): parent transid verify failed on
16227415158784 wanted 88711 found 88397
[ 5689.627731] BTRFS error (device sdb): failed to read block groups: -5
[ 5689.675017] BTRFS error (device sdb): open_ctree failed

I tried to recover from the problem using:

btrfs rescue chunk-recover -v /dev/sdb

The command runs for a few minutes. Then it segfaults. I used gdb to
debug. This is the backtrace:

Starting program: btrfs-progs/btrfs rescue chunk-recover -v /dev/sdb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
All Devices:
 Device: id = 4, name = /dev/sde
 Device: id = 1, name = /dev/sdd1
 Device: id = 2, name = /dev/sdc
 Device: id = 3, name = /dev/sdb

[New Thread 0x76f6e700 (LWP 8155)]
[New Thread 0x7676d700 (LWP 8156)]
[New Thread 0x75f6c700 (LWP 8157)]
[New Thread 0x7576b700 (LWP 8158)]
Scanning: 24603734016 in dev0, 32581337088 in dev1, 37911248896 in dev2,
32217350144 in dev3
Thread 2 "btrfs" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x76f6e700 (LWP 8155)]
btrfs_new_device_extent_record (leaf=leaf@entry=0x78c0,
key=key@entry=0x76f6dc90, slot=slot@entry=12)
 at cmds-check.c:6656
6656rec->chunk_objecteid =
(gdb) backtrace
#0  btrfs_new_device_extent_record (leaf=leaf@entry=0x78c0,
key=key@entry=0x76f6dc90, slot=slot@entry=12)
 at cmds-check.c:6656
#1  0x004370d2 in process_device_extent_item (slot=12,
key=0x76f6dc90, leaf=0x78c0,
 devext_cache=0x7fffe410) at chunk-recover.c:332
#2  extract_metadata_record (rc=rc@entry=0x7fffe3c0,
leaf=leaf@entry=0x78c0) at chunk-recover.c:727
#3  0x0043759b in scan_one_device (dev_scan_struct=0x6ae420) at
chunk-recover.c:807
#4  0x7733f6ba in start_thread (arg=0x76f6e700) at
pthread_create.c:333
#5  0x7707582d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Information about the system:

uname -a: Linux 4.10.0-041000rc4-generic #201701152031 SMP Mon Jan 16
01:33:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
btrfs-progs --version: v4.9 (from
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git)
sudo btrfs fi show
Label: none  uuid: a27cc0cf-1665-43ba-8c63-bf236d31fcd2
 Total devices 4 FS bytes used 6.51TiB
 devid1 size 2.73TiB used 2.73TiB path /dev/sdd1
 devid2 size 7.28TiB used 2.73TiB path /dev/sdc
 devid3 size 3.64TiB used 3.56TiB path /dev/sdb
 devid4 size 1.82TiB used 1.46TiB path /dev/sde
btrfs fi df wont work as the filesystem is not mountable.

Any help would be appreciated!

Best regards,
Simon


PS: I'd also like to mention how the raid array became unmountable.

The system I was running at that time was:
Kernel: 4.8.0-34 generic #36~16.04.1 Ubuntu SMP
btrfs-progs --version: v4.4

- I issued a replace command on disk 2. During the replace, disc 4 was
disconnected. I noticed it and rebooted the system just a few second
after the event. After the reboot, the replace continued and eventually
finished. However, dmesg would showed errors like: parent transid verify
failed on 16227387371520 wanted 88711 found 88395.

- I issued a resize command on the new drive to free additional space:
btrfs resize 2:max, which completed without errors.

- I issued a balance without any filters in the hope it would correct
the "parent transid verify failed" errors. The balance started normally.
However, after about one hour, I saw that no I/O would happen and lots
of errors appeared in dmesg. I tried to reboot but the command had no
effect, so disconnected the PC from the power supply.

I have attached the dmesg for the resize and balance operations.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gdb log of crashed "btrfs-image -s"

2017-01-23 Thread Goldwyn Rodrigues


On 01/18/2017 02:11 PM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
>> Thanks Christoph for the backtrace. I am unable to reproduce it, but
>> looking at your backtrace, I found a bug. Would you be able to give it
>> a try and check if it fixes the problem?
> 
> I applied your patch to v4.9, and compiled the static binaries.
> Unfortunately, it still segfaults. (Perhaps your fix is correct, and
> there's a second problem?)  I attach a new backtrace.  Do let me know if
> I can help in another way.

I looked hard, and could not find the reason of a failure here. The
bakctrace of the new one is a little different than previous one, but I
am not sure why it crashes. Until I have a reproduction scneario, I may
not be able to fix this. How about a core? However, a core will have
values which you are trying to mask with sanitize.


-- 
Goldwyn



signature.asc
Description: OpenPGP digital signature


Re: btrfs recovery

2017-01-23 Thread Sebastian Gottschall

Hello again

by the way. the init-extent-tree is still running (now almost 7 days). 
is there any chance to find out how long it will take at the end?


Sebastian

Am 20.01.2017 um 02:08 schrieb Qu Wenruo:



At 01/19/2017 06:06 PM, Sebastian Gottschall wrote:

Hello

I have a question. after a power outage my system was turning into a
unrecoverable state using btrfs (kernel 4.9)
since im running --init-extent-tree now for 3 days i'm asking how long
this process normally takes and why it outputs millions of lines like


--init-extent-tree will trash *ALL* current extent tree, and *REBUILD* 
them from fs-tree.


This can takes a long time depending on the size of the fs, and how 
many shared extents there are (snapshots and reflinks all counts).


Such a huge operation should only be used if you're sure only extent 
tree is corrupted, and other tree are all OK.


Or you'll just totally screw your fs further, especially when 
interrupted.




Backref 1562890240 root 262 owner 483059214 offset 0 num_refs 0 not
found in extent tree
Incorrect local backref count on 1562890240 root 262 owner 483059214
offset 0 found 1 wanted 0 back 0x23b0211d0
backpointer mismatch on [1562890240 4096]


This is common, since --init-extent-tree trash all extent tree, so 
every tree-block/data extent will trigger such output



adding new data backref on 1562890240 root 262 owner 483059214 offset 0
found 1
Repaired extent references for 1562890240


But as you see, it repaired the extent tree by adding back 
EXTENT_ITEM/METADATA_ITEM into extent tree, so far it works.


If you see such output with all the same bytenr, then things goes 
really wrong and maybe a dead loop.



Personally speaking, normal problem like failed to mount should not 
need --init-extent-tree.


Especially, extent-tree corruption normally is not really related to 
mount failure, but sudden remount to RO and kernel wanring.


Thanks,
Qu



please avoid typical answers like "potential dangerous operation" since
all repair options are declared as potenial dangerous.


Sebastian








--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz:  Berliner Ring 101, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: s.gottsch...@dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] btrfs-progs: lowmem check: Fix false alert in checking data extent pointing to prealloc extent

2017-01-23 Thread Qu Wenruo
Btrfs lowmem check can report false csum error like:
ERROR: root 5 EXTENT_DATA[257 0] datasum missing
ERROR: root 5 EXTENT_DATA[257 4096] prealloc shouldn't have datasum

This is because lowmem check code always compare the found csum size
with the whole extent which data extents points to.

Normally it's OK, but when prealloc extent is written, or reflink is
done, data extent can points to part of a larger extent, making the csum
check wrong.

The fix changes the csum check part to the data extent size, other than
the disk_bytenr/disk_num_bytes which points to a larger extent.

Reported-by: Chris Murphy 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index f158daf9..fd176b76 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4695,6 +4695,7 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
u64 disk_bytenr;
u64 disk_num_bytes;
u64 extent_num_bytes;
+   u64 extent_offset;
u64 found;
unsigned int extent_type;
unsigned int is_hole;
@@ -4731,17 +4732,28 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
disk_bytenr = btrfs_file_extent_disk_bytenr(node, fi);
disk_num_bytes = btrfs_file_extent_disk_num_bytes(node, fi);
extent_num_bytes = btrfs_file_extent_num_bytes(node, fi);
+   extent_offset = btrfs_file_extent_offset(node, fi);
is_hole = (disk_bytenr == 0) && (disk_num_bytes == 0);
 
-   /* Check EXTENT_DATA datasum */
-   ret = count_csum_range(root, disk_bytenr, disk_num_bytes, );
+   /*
+* Check EXTENT_DATA datasum
+*
+* We should only check the range we're referring to, as it's possible
+* that part of prealloc extent has been written, and has csum:
+*
+* |<--- Original large preallocate extent A >|
+* |<- Prealloc File Extent ->|<- Regular Extent ->|
+*  No csum Has csum
+*/
+   ret = count_csum_range(root, disk_bytenr + extent_offset,
+  extent_num_bytes, );
if (found > 0 && nodatasum) {
err |= ODD_CSUM_ITEM;
error("root %llu EXTENT_DATA[%llu %llu] nodatasum shouldn't 
have datasum",
  root->objectid, fkey->objectid, fkey->offset);
} else if (extent_type == BTRFS_FILE_EXTENT_REG && !nodatasum &&
   !is_hole &&
-  (ret < 0 || found == 0 || found < disk_num_bytes)) {
+  (ret < 0 || found == 0 || found < extent_num_bytes)) {
err |= CSUM_ITEM_MISSING;
error("root %llu EXTENT_DATA[%llu %llu] datasum missing",
  root->objectid, fkey->objectid, fkey->offset);
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] btrfs-progs: fsck: Fix lowmem mode override to allow it skip repair work

2017-01-23 Thread Qu Wenruo
From: Lu Fengqi 

Current common.local doesn't handle lowmem mode well.
It passes "--mode=lowmem" alone with "--repair", making it unable to
check lowmem mode.

It's caused by the following bugs:

1) Wrong variable in test/common.local
   We should check TEST_ARGS_CHECK, not TEST_CHECK, which is not defined
   so we never return 1.

2) Wrong parameter passed to _cmd_spec() in test/common
   This prevents us from grepping the correct parameters.

Fix it.

Signed-off-by: Lu Fengqi 
---
 tests/common   | 8 
 tests/common.local | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/common b/tests/common
index 51c2e267..7ad436e3 100644
--- a/tests/common
+++ b/tests/common
@@ -106,7 +106,7 @@ run_check()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD: $@" > /dev/tty; fi
@@ -128,7 +128,7 @@ run_check_stdout()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD(stdout): $@" > /dev/tty; fi
@@ -152,7 +152,7 @@ run_mayfail()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD(mayfail): $@" > /dev/tty; fi
@@ -188,7 +188,7 @@ run_mustfail()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD(mustfail): $@" > /dev/tty; fi
diff --git a/tests/common.local b/tests/common.local
index 9f567c27..4f56bb08 100644
--- a/tests/common.local
+++ b/tests/common.local
@@ -17,7 +17,7 @@ TEST_ARGS_CHECK=--mode=lowmem
 # break tests
 _skip_spec()
 {
-   if echo "$TEST_CHECK" | grep -q 'mode=lowmem' &&
+   if echo "$TEST_ARGS_CHECK" | grep -q 'mode=lowmem' &&
   echo "$@" | grep -q -- '--repair'; then
return 0
fi
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] btrfs-progs: fsck: Fix lowmem mode override to allow it skip repair work

2017-01-23 Thread Qu Wenruo
From: Lu Fengqi 

Current common.local doesn't handle lowmem mode well.
It passes "--mode=lowmem" alone with "--repair", making it unable to
check lowmem mode.

It's caused by the following bugs:

1) Wrong variable in test/common.local
   We should check TEST_ARGS_CHECK, not TEST_CHECK, which is not defined
   so we never return 1.

2) Wrong parameter passed to _cmd_spec() in test/common
   This prevents us from grepping the correct parameters.

Fix it.

Signed-off-by: Lu Fengqi 
---
 tests/common   | 8 
 tests/common.local | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/common b/tests/common
index 51c2e267..7ad436e3 100644
--- a/tests/common
+++ b/tests/common
@@ -106,7 +106,7 @@ run_check()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD: $@" > /dev/tty; fi
@@ -128,7 +128,7 @@ run_check_stdout()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD(stdout): $@" > /dev/tty; fi
@@ -152,7 +152,7 @@ run_mayfail()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD(mayfail): $@" > /dev/tty; fi
@@ -188,7 +188,7 @@ run_mustfail()
ins=$(_get_spec_ins "$@")
spec=$(($ins-1))
cmd=$(eval echo "\${$spec}")
-   spec=$(_cmd_spec "$cmd")
+   spec=$(_cmd_spec "${@:$spec}")
set -- "${@:1:$(($ins-1))}" $spec "${@: $ins}"
echo "### $@" >> "$RESULTS" 2>&1
if [[ $TEST_LOG =~ tty ]]; then echo "CMD(mustfail): $@" > /dev/tty; fi
diff --git a/tests/common.local b/tests/common.local
index 9f567c27..4f56bb08 100644
--- a/tests/common.local
+++ b/tests/common.local
@@ -17,7 +17,7 @@ TEST_ARGS_CHECK=--mode=lowmem
 # break tests
 _skip_spec()
 {
-   if echo "$TEST_CHECK" | grep -q 'mode=lowmem' &&
+   if echo "$TEST_ARGS_CHECK" | grep -q 'mode=lowmem' &&
   echo "$@" | grep -q -- '--repair'; then
return 0
fi
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] btrfs-progs: fsck-test: Add test image for lowmem mode block group false alert

2017-01-23 Thread Qu Wenruo
Add a minimal image which can reproduce the block group used space
false alert for lowmem mode fsck.

Reported-by: Christoph Anton Mitterer 
Signed-off-by: Qu Wenruo 
---
 .../block_group_item_false_alert.raw.xz | Bin 0 -> 47792 bytes
 tests/fsck-tests/020-extent-ref-cases/test.sh   |  15 +++
 2 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 
tests/fsck-tests/020-extent-ref-cases/block_group_item_false_alert.raw.xz

diff --git 
a/tests/fsck-tests/020-extent-ref-cases/block_group_item_false_alert.raw.xz 
b/tests/fsck-tests/020-extent-ref-cases/block_group_item_false_alert.raw.xz
new file mode 100644
index 
..559c3fa9e8491f3ce1f424d1baef29853e8fb889
GIT binary patch
literal 47792
zcmeHwWmukDmL={If(LhZcL)S`cXxMpcPD`W!5xAV+%>qnyL)h$?&&{0(>=H9R^2;Q
z-Ti*Qd7ksJzrEL4dz~#W9G>c$ARrKXb9JI%AVi?JARr*`#??pO-s~W@bwEJuUf$mF
ze!PkCi=#Vo87MR+Qtsu2LO-1C;0I1K^l|foN>%#~MMlandk9peSuN#)cCI-~7;EQFhOF(7$wFo%;?(r9$c-B&8)gSv5^r*S24&1YlJ6HO@I5-X
zMm=+|4QW$f92}?pM0A{MA>fqJm@%6us;Ff@N+Zs65^bwee?I@KoP9G|&?dIU)1?`f
z4M(i5PN{Y3DY4oU7paaNp*_2dZ+Ar<+4GVa;iOf(R@$4vz3=WTw4%Jkc
z!0_XpyKSs!ByoH#K|!dqiZ7C;7K1l`aP_3bf$)5#L+LuAGa|IOgTifxCf#-T
z(LGUl?exf#!FLTA!OAMf+%4N%IaCqW+ls=Oz>+P64u$MzZO@a6fKyEAtKyH|ZzTJ1
z*|gv(!ggvf5O+#vp3dCF*euT_%Nqvn5fg>bjl!loHUf={<3BOM^fj`sHr(}n!w6Sx
z<4%=oG1p|C<=pua55w8;Mmk}y5>!*%yJm=d@BFM~bu>Z;Bl^OemlRq0vRWT*
z;f-!nqvd8#=lnh`q=6x=?6R^$o3BK>LJg(DBH#fXFhC{@k)T|K4-?>G0QV
zxf(2#%(dgu^iowEz
z)L6@2g|d8oc7Fl-k)%ZPE<3kpw;LQS*YWK$lN_o*K77Tb$paUxyWq22XaOtZ18eSQ
zMqILf0b$pH{Sp!{@No?e{5*~6gfJsqRO;@NV!iQb%Sfj<__U|Z=2
z4*U1Y;MNqn1oJ=IJV$_z3Lo1o9%;X6r?5EGr(Y>aT5{)bHdH?dp_c?*-yAO=J4$sO
z4u{Xr5V_oaV@Jg)vR3?ET?4?@fb!e2q_l*kFBY~
zwb_=-uWI5eely{$8k7NHJJPmz5Lxks?v4KVD_-+hH3gQoM(ClbqI`eGt`;QB
z!@yF-^OGWt^C-R+;m3znCBAzqV^|*lVQTC3dY+5kHhR5MWPT~rkMXETZ|{=s9o#kU
zyq#mj*g7HQPgqD}tJ@T>`_h6?C(FxC;xFQkx3qTiL-e=V9jyAeiXkK+*BPQ%%|m>5
zAz11uSfU_pm4sK*lMy3Ew
zIw5h-Xv@8+{3Jt^I{2ZLo(QynUBs`Dw@U?rp}-Jq4nqjUeMF9Su_z_K&3J;JIKR3h
zJnHIF?)oPi-@hgtNkxu`nmG|Rz64RdgxRQpIj9MESq-C|p2PY16{)6&!xYXr2DZw7
zfZ?sINZXU|!sX3GlmeYoA8^HwSh!NquhCp;j+(d^-Na5g!7|gGY9{S#coB|-V;9)d
zs&+3Hy=|YI`%Jzow$F^;F=I(EcMG$SaPF!q)$Mij6bC+oKHNRb+G5FE?du@w07C+5
zgQ~_Vho9B{-J9t{){ht_Tl;TCBzh;5!y%I?sgb_a30D*@ci5d!sWPcA3zf6+$h~Ar
z_qPe`OppatXLeooGGYjm2`Hr%`oH;0GN~(^X1%=wr2#o{QYw1+lTZ6yesd2$`j$c|
zid^#Sz4r8#kzTbkF625Wpwr}U`bYn$qZN<^YO>yfz=}ZlFJbpc6RGCYOt%lV%mo*s
zC4_j{paX6t?|9|*zL?oVd*UF_c3!Q9drg?vVEF2l@-$W_Y662zU=p$d>UDN~E6I
zYrw-e=Unf94wBWIE(;6O`H~559gnEDaexalAs>$sDKU<_laNa%f6Vu>?w2u*m?CG>`vojw-wps>>5Yiz$-4Tt~dqC+|C*u(W|;)pDn5VK@h26sZbjj~ufR-RwPba=ALKKpZxs8GDUAZUV~jLq>q
zL@Ml#ZV*$rg*Bf~)xY_r|tgr+QO)qHz=Fr4~zx0*(A3wm2%I^%Dp*iF(q
zlh7tgvq>7fh}FEL(D}As-EPcsI346#EGxseSh@dg9{OdhGKZezepni26VK0#$kZlSpmj2Hq!yrq798ZkQlFH*X_A*aB48vIwR6RhZvH88_D8wj?+Wqz
z^~{wDAiDqW3+Xq2<=>(kfW!au1_uoGJ1GxfFu-8Hn~MEWR|J4z|5+9SgybI`lE|P<
zARVs(pQwi++UeY2$9zDD5SH59g7}Mn3=kmm6EKY6JU+=

[PATCH 1/9] btrfs-progs: lowmem check: Fix wrong block group check when search slot points to slot beyong leaf

2017-01-23 Thread Qu Wenruo
Since btrfs_search_slot() can points to the slot which is beyond the
leaves' capacity, in the following case, btrfs lowmem mode check will
skip the block group and report false alert:

leaf 29405184 items 37 free space 1273 generation 11 owner 2
...
item 36 key (77594624 EXTENT_ITEM 2097152)
extent refs 1 gen 8 flags DATA
extent data backref root 5 objectid 265 offset 0 count 1
leaf 29409280 items 43 free space 670 generation 11 owner 2
item 0 key (96468992 EXTENT_ITEM 2097152)
extent refs 1 gen 8 flags DATA
extent data backref root 5 objectid 274 offset 0 count 1
item 1 key (96468992 BLOCK_GROUP_ITEM 33554432)
block group used 2265088 chunk_objectid 256 flags DATA

When checking block group item, we will search key(96468992 0 0) to
start from the first item in the block group.

While search_slot() will point to leaf 29405184, slot 37 which is beyond
leaf capacity.

And when reading key from slot 37, uninitialized data can be read out
and cause us to exit block group item check, leading to false alert.

Fix it by checking path.slot[0] before reading out the key.

Reported-by: Christoph Anton Mitterer 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 1dba2985..c39392b7 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -10961,6 +10961,11 @@ static int check_block_group_item(struct btrfs_fs_info 
*fs_info,
/* Iterate extent tree to account used space */
while (1) {
leaf = path.nodes[0];
+
+   /* Search slot can point to the last item beyond leaf nritems */
+   if (path.slots[0] >= btrfs_header_nritems(leaf))
+   goto next;
+
btrfs_item_key_to_cpu(leaf, _key, path.slots[0]);
if (extent_key.objectid >= bg_key.objectid + bg_key.offset)
break;
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] btrfs-progs: tests: Move fsck-tests/015 to fuzz tests

2017-01-23 Thread Qu Wenruo
The test case fsck-tests/015-check-bad-memory-access can't be repair by
btrfs check, and it's a fortunate bug makes original mode to forget the
error code from extent tree, making original mode pass it.

So fuzz-tests is more suitable for it.

Signed-off-by: Qu Wenruo 
---
 .../images}/bko-97171-btrfs-image.raw.txt   |   0
 .../images}/bko-97171-btrfs-image.raw.xz| Bin
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename tests/{fsck-tests/015-check-bad-memory-access => 
fuzz-tests/images}/bko-97171-btrfs-image.raw.txt (100%)
 rename tests/{fsck-tests/015-check-bad-memory-access => 
fuzz-tests/images}/bko-97171-btrfs-image.raw.xz (100%)

diff --git 
a/tests/fsck-tests/015-check-bad-memory-access/bko-97171-btrfs-image.raw.txt 
b/tests/fuzz-tests/images/bko-97171-btrfs-image.raw.txt
similarity index 100%
rename from 
tests/fsck-tests/015-check-bad-memory-access/bko-97171-btrfs-image.raw.txt
rename to tests/fuzz-tests/images/bko-97171-btrfs-image.raw.txt
diff --git 
a/tests/fsck-tests/015-check-bad-memory-access/bko-97171-btrfs-image.raw.xz 
b/tests/fuzz-tests/images/bko-97171-btrfs-image.raw.xz
similarity index 100%
rename from 
tests/fsck-tests/015-check-bad-memory-access/bko-97171-btrfs-image.raw.xz
rename to tests/fuzz-tests/images/bko-97171-btrfs-image.raw.xz
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] btrfs-progs: fsck-tests: Make 013 compatible with lowmem mode

2017-01-23 Thread Qu Wenruo
fsck-tests/013-extent-tree-rebuild uses "--init-extent-tree", which
implies "--repair".

But the test script doesn't specify "--repair" for lowmem mode test to
detect it.

Add it so lowmem mode test can be happy with it.

Signed-off-by: Qu Wenruo 
---
 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh 
b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
index 37bdcd9c..08c1e50e 100755
--- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh
+++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
@@ -36,7 +36,7 @@ test_extent_tree_rebuild()
 
$SUDO_HELPER $TOP/btrfs check $TEST_DEV >& /dev/null && \
_fail "btrfs check should detect failure"
-   run_check $SUDO_HELPER $TOP/btrfs check --init-extent-tree $TEST_DEV
+   run_check $SUDO_HELPER $TOP/btrfs check --repair --init-extent-tree 
$TEST_DEV
run_check $SUDO_HELPER $TOP/btrfs check $TEST_DEV
 }
 
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] btrfs-progs: fsck-tests: Add new test case for partly written prealloc extent

2017-01-23 Thread Qu Wenruo
This is a bug found in lowmem mode, which reports false alert for partly
written prealloc extent.

Reported-by: Chris Murphy 
Signed-off-by: Qu Wenruo 
---
 tests/fsck-tests/020-extent-ref-cases/test.sh | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tests/fsck-tests/020-extent-ref-cases/test.sh 
b/tests/fsck-tests/020-extent-ref-cases/test.sh
index 5dc5e55d..91340671 100755
--- a/tests/fsck-tests/020-extent-ref-cases/test.sh
+++ b/tests/fsck-tests/020-extent-ref-cases/test.sh
@@ -18,6 +18,7 @@
 source $TOP/tests/common
 
 check_prereq btrfs
+check_global_prereq xfs_io
 
 for img in *.img *.raw.xz
 do
@@ -28,3 +29,17 @@ do
run_check $TOP/btrfs check "$image"
rm -f "$image"
 done
+
+# Extra test case for partly written prealloc extents.
+test_prealloc_written()
+{
+   run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV
+
+   run_check_mount_test_dev
+   xfs_io -f -c "falloc 0 128k" -c "syncfs" $TEST_MNT/tmpfile
+   xfs_io -c "pwrite 0 64k" $TEST_MNT/tmpfile
+
+   run_check_umount_test_dev
+
+   run_check $TOP/btrfs check $TEST_DEV
+}
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-01-23 Thread Qu Wenruo
Patches can be fetch from github:
https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes

Although there are near 10 patches, but they are all small.
No need to be scared. :)

Thanks for reports from Chris Murphy and Christoph Anton Mitterer,
several new bugs are exposed for lowmem mode fsck.

And one original mode bug, not fixed in this patchset.
The original mode bug is caused by fsck/006, which repairs doesn't
fix backrefs of a data extent, which lowmem mode detects it correctly.

1) Block group used space false alert
   If a BLOCK_GROUP_ITEM or its first EXTENT/METADATA_ITEM is located at
   the first slot of a leaf, search_slot() used by lowmem mode can
   point to previous leaf, with path->slots[0] beyond valid leaf slots.

   This makes us to read out uninitialized data, and can abort block
   group used space check loop, causing a false alert.

   Fix it with a test case image inside fsck-tests/020/extent-ref-cases
   Reported by Christoph.

2) Partly written prealloc extent false alert
   If a prealloc extent gets partily written, lowmem mode will report
   prealloc extent shouldn't have csum.

   Lowmem mode passed wrong variable to csum checking code, causing it
   to check the whole range of the prealloc extent, making the bug
   happens.

   Fix it with a test case inside fsck-tests/020/extent-ref-cases.
   Reported by Chirs Murphy And Christoph.

3) Extent item size false alert.
   Under certain case, btrfs lowmem mode check reports data backref
   lost.
   It's because newly introduced extent item size check aborts normal
   check routine.

   It can happen if a data/metadata extent item has no inline ref.

   Fix it, test case already submitted before and merged, but due to
   fsck-tests framework bugs, it never get called for lowmem mode.

4) fsck-tests Lowmem mode override fixes
   Allow lowmem mode override to get called for all tests, and allow
   them all to pass lowmem mode except fsck-tests/006, which is a
   original repair mode bug.


Lu Fengqi (1):
  btrfs-progs: fsck: Fix lowmem mode override to allow it skip repair
work

Qu Wenruo (8):
  btrfs-progs: lowmem check: Fix wrong block group check when search
slot points to slot beyong leaf
  btrfs-progs: fsck-test: Add test image for lowmem mode block group
false alert
  btrfs-progs: fsck: Output verbose error when fsck found a bug
  btrfs-progs: lowmem check: Fix false alert in checking data extent
pointing to prealloc extent
  btrfs-progs: lowmem check: Fix extent item size false alert
  btrfs-progs: tests: Move fsck-tests/015 to fuzz tests
  btrfs-progs: fsck-tests: Make 013 compatible with lowmem mode
  btrfs-progs: fsck-tests: Add new test case for partly written prealloc
extent

 cmds-check.c   |  80 -
 tests/common   |   8 +--
 tests/common.local |   2 +-
 tests/fsck-tests/013-extent-tree-rebuild/test.sh   |   2 +-
 .../block_group_item_false_alert.raw.xz| Bin 0 -> 47792 bytes
 tests/fsck-tests/020-extent-ref-cases/test.sh  |  30 ++--
 .../images}/bko-97171-btrfs-image.raw.txt  |   0
 .../images}/bko-97171-btrfs-image.raw.xz   | Bin
 8 files changed, 95 insertions(+), 27 deletions(-)
 create mode 100644 
tests/fsck-tests/020-extent-ref-cases/block_group_item_false_alert.raw.xz
 rename tests/{fsck-tests/015-check-bad-memory-access => 
fuzz-tests/images}/bko-97171-btrfs-image.raw.txt (100%)
 rename tests/{fsck-tests/015-check-bad-memory-access => 
fuzz-tests/images}/bko-97171-btrfs-image.raw.xz (100%)

-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] btrfs-progs: lowmem check: Fix extent item size false alert

2017-01-23 Thread Qu Wenruo
If one extent item has no inline ref, btrfs lowmem mode check can give
false alert without outputting any error message.

The problem is lowmem mode always assume that extent item has inline
refs, and when it encounters such case it flags the extent item has
wrong size, but doesn't output the error message.

Although we already have such image submitted, at the commit time due to
another bug in cmds-check return value, it doesn't detect it until that
bug is fixed.

Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index fd176b76..802d179f 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -10730,13 +10730,20 @@ static int check_extent_item(struct btrfs_fs_info 
*fs_info,
}
end = (unsigned long)ei + item_size;
 
-   if (ptr >= end) {
+next:
+   /* Reached extent item end normally */
+   if (ptr == end)
+   goto out;
+
+   /* Beyond extent item end, wrong item size */
+   if (ptr > end) {
err |= ITEM_SIZE_MISMATCH;
+   error("extent item at bytenr %llu slot %d has wrong size",
+   eb->start, slot);
goto out;
}
 
/* Now check every backref in this extent item */
-next:
iref = (struct btrfs_extent_inline_ref *)ptr;
type = btrfs_extent_inline_ref_type(eb, iref);
offset = btrfs_extent_inline_ref_offset(eb, iref);
@@ -10773,8 +10780,7 @@ next:
}
 
ptr += btrfs_extent_inline_ref_size(type);
-   if (ptr < end)
-   goto next;
+   goto next;
 
 out:
return err;
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] btrfs-progs: fsck: Output verbose error when fsck found a bug

2017-01-23 Thread Qu Wenruo
Although we output error like "errors found in extent allocation tree or
chunk allocation", but we lacks such output for other trees, but leaving
the final "found error is %d" to catch the last return value(and
sometime it's cleared)

This patch adds extra error message for top level error path, and modify
the last "found error is %d" to "error(s) found" or "no error found".

Cc: Christoph Anton Mitterer 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 43 +--
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index c39392b7..f158daf9 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -12913,8 +12913,10 @@ int cmd_check(int argc, char **argv)
 
ret = repair_root_items(info);
err |= !!ret;
-   if (ret < 0)
+   if (ret < 0) {
+   error("failed to repair root items: %s", strerror(-ret));
goto close_out;
+   }
if (repair) {
fprintf(stderr, "Fixed %d roots.\n", ret);
ret = 0;
@@ -12937,8 +12939,13 @@ int cmd_check(int argc, char **argv)
}
ret = check_space_cache(root);
err |= !!ret;
-   if (ret)
+   if (ret) {
+   if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
+   error("errors found in free space tree");
+   else
+   error("errors found in free space cache");
goto out;
+   }
 
/*
 * We used to have to have these hole extents in between our real
@@ -12954,22 +12961,28 @@ int cmd_check(int argc, char **argv)
else
ret = check_fs_roots(root, _cache);
err |= !!ret;
-   if (ret)
+   if (ret) {
+   error("errors found in fs roots");
goto out;
+   }
 
fprintf(stderr, "checking csums\n");
ret = check_csums(root);
err |= !!ret;
-   if (ret)
+   if (ret) {
+   error("errors found in csum tree");
goto out;
+   }
 
fprintf(stderr, "checking root refs\n");
/* For low memory mode, check_fs_roots_v2 handles root refs */
if (check_mode != CHECK_MODE_LOWMEM) {
ret = check_root_refs(root, _cache);
err |= !!ret;
-   if (ret)
+   if (ret) {
+   error("errors found in root refs");
goto out;
+   }
}
 
while (repair && !list_empty(>fs_info->recow_ebs)) {
@@ -12980,8 +12993,10 @@ int cmd_check(int argc, char **argv)
list_del_init(>recow);
ret = recow_extent_buffer(root, eb);
err |= !!ret;
-   if (ret)
+   if (ret) {
+   error("fails to fix transid errors");
break;
+   }
}
 
while (!list_empty(_items)) {
@@ -13000,13 +13015,17 @@ int cmd_check(int argc, char **argv)
fprintf(stderr, "checking quota groups\n");
ret = qgroup_verify_all(info);
err |= !!ret;
-   if (ret)
+   if (ret) {
+   error("failed to check quota groups");
goto out;
+   }
report_qgroups(0);
ret = repair_qgroups(info, _repaired);
err |= !!ret;
-   if (err)
+   if (err) {
+   error("failed to repair quota groups");
goto out;
+   }
ret = 0;
}
 
@@ -13027,8 +13046,12 @@ out:
   "backup data and re-format the FS. *\n\n");
err |= 1;
}
-   printf("found %llu bytes used err is %d\n",
-  (unsigned long long)bytes_used, ret);
+   printf("found %llu bytes used, ",
+  (unsigned long long)bytes_used);
+   if (err)
+   printf("error(s) found\n");
+   else
+   printf("no error found\n");
printf("total csum bytes: %llu\n",(unsigned long long)total_csum_bytes);
printf("total tree bytes: %llu\n",
   (unsigned long long)total_btree_bytes);
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID56 status?

2017-01-23 Thread Janos Toth F.
On Mon, Jan 23, 2017 at 7:57 AM, Brendan Hide  wrote:
>
> raid0 stripes data in 64k chunks (I think this size is tunable) across all
> devices, which is generally far faster in terms of throughput in both
> writing and reading data.

I remember seeing some proposals for configurable stripe size in the
form of patches (which changed a lot over time) but I don't think the
idea reached a consensus (let alone if a final patch materialized and
got merged). I think it would be a nice feature though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID56 status?

2017-01-23 Thread Brendan Hide


Hey, all

Long-time lurker/commenter here. Production-ready RAID5/6 and N-way 
mirroring are the two features I've been anticipating most, so I've 
commented regularly when this sort of thing pops up. :)


I'm only addressing some of the RAID-types queries as Qu already has a 
handle on the rest.


Small-yet-important hint: If you don't have a backup of it, it isn't 
important.


On 01/23/2017 02:25 AM, Jan Vales wrote:

[ snip ]
Correct me, if im wrong...
* It seems, raid1(btrfs) is actually raid10, as there are no more than 2
copies of data, regardless of the count of devices.


The original "definition" of raid1 is two mirrored devices. The *nix 
industry standard implementation (mdadm) extends this to any number of 
mirrored devices. Thus confusion here is understandable.



** Is there a way to duplicate data n-times?


This is a planned feature, especially in lieu of feature-parity with 
mdadm, though the priority isn't particularly high right now. This has 
been referred to as "N-way mirroring". The last time I recall discussion 
over this, it was hoped to get work started on it after raid5/6 was stable.



** If there are only 3 devices and the wrong device dies... is it dead?


Qu has the right answers. Generally if you're using anything other than 
dup, raid0, or single, one disk failure is "okay". More than one failure 
is closer to "undefined". Except with RAID6, where you need to have more 
than two disk failures before you have lost data.



* Whats the diffrence of raid1(btrfs) and raid10(btrfs)?


Some nice illustrations from Qu there. :)


** After reading like 5 diffrent wiki pages, I understood, that there
are diffrences ... but not what they are and how they affect me :/
* Whats the diffrence of raid0(btrfs) and "normal" multi-device
operation which seems like a traditional raid0 to me?


raid0 stripes data in 64k chunks (I think this size is tunable) across 
all devices, which is generally far faster in terms of throughput in 
both writing and reading data.


By '"normal" multi-device' I will assume this means "single" with 
multiple devices. New writes with "single" will use a 1GB chunk on one 
device until the chunk is full, at which point it allocates a new chunk, 
which will usually be put on the disk with the most available free 
space. There is no particular optimisation in place comparable to raid0 
here.




Maybe rename/alias raid-levels that do not match traditional
raid-levels, so one cannot expect some behavior that is not there.



The extreme example is imho raid1(btrfs) vs raid1.
I would expect that if i have 5 btrfs-raid1-devices, 4 may die and btrfs
should be able to fully recover, which, if i understand correctly, by
far does not hold.
If you named that raid-level say "george" ... I would need to consult
the docs and I obviously would not expect any behavior. :)


We've discussed this a couple of times. Hugo came up with a notation 
since dubbed "csp" notation: c->Copies, s->Stripes, and p->Parities.


Examples of this would be:
raid1: 2c
3-way mirroring across 3 (or more*) devices: 3c
raid0 (2-or-more-devices): 2s
raid0 (3-or-more): 3s
raid5 (5-or-more): 4s1p
raid16 (12-or-more): 2c4s2p

* note the "or more": Mdadm *cannot* mirror less mirrors or stripes than 
devices, whereas there is no particular reason why btrfs won't be able 
to do this.


A minor problem with csp notation is that it implies a complete 
implementation of *any* combination of these, whereas the idea was 
simply to create a way to refer to the "raid" levels in a consistent way.


I hope this brings some clarity. :)



regards,
Jan Vales
--
I only read plaintext emails.



--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html