Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Sun, Oct 28, 2018 at 07:27:22AM +0800, Qu Wenruo wrote: > > I can't drop all the snapshots since at least two is used for btrfs > > send/receive backups. > > However, if I delete more snapshots, and do a full balance, you think > > it'll free up more space? > > No. > > You're already too

Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Sat, Oct 27, 2018 at 02:12:02PM -0400, Remi Gauvin wrote: > On 2018-10-27 01:42 PM, Marc MERLIN wrote: > > > > > I've been using btrfs for a long time now but I've never had a > > filesystem where I had 15GB apparently unusable (7%) after a balance. > > >

Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Wed, Oct 24, 2018 at 01:07:25PM +0800, Qu Wenruo wrote: > > saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v . > > Dumping filters: flags 0x6, state 0x0, force is off > > METADATA (flags 0x2): balancing, usage=80 > > SYSTEM (flags 0x2): balancing, usage=80 > > Done, had to

Have 15GB missing in btrfs filesystem.

2018-10-23 Thread Marc MERLIN
Normally when btrfs fi show will show lost space because your trees aren't balanced. Balance usually reclaims that space, or most of it. In this case, not so much. kernel 4.17.6: saruman:/mnt/btrfs_pool1# btrfs fi show . Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23

Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-18 Thread Marc MERLIN
On Wed, Jul 18, 2018 at 10:42:21PM +0300, Andrei Borzenkov wrote: > > Any help from other experienced developers would definitely help to > > solve why memory of 'btrfs check' is not swapped out or why OOM killer > > is not triggered. > > Almost all used memory is marked as "active" and active

Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote: > No OOM triggers? That's a little strange. > Maybe it's related to how kernel handles memory over-commit? Yes, I think you are correct. > And for the hang, I think it's related to some memory allocation failure > and error handler just

Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
Ok, I did more testing. Qu is right that btrfs check does not crash the kernel. It just takes all the memory until linux hangs everywhere, and somehow (no idea why) the OOM killer never triggers. Details below: On Tue, Jul 17, 2018 at 01:32:57PM -0700, Marc MERLIN wrote: > Here is what I

btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
On Tue, Jul 17, 2018 at 10:50:32AM -0700, Marc MERLIN wrote: > I got the following on 4.17.6 while running btrfs check --repair on an > unmounted filesystem (not the lowmem version) > > I understand that btrfs check is userland only, although it seems that > it caused

task btrfs-transacti:921 blocked for more than 120 seconds during check repair

2018-07-17 Thread Marc MERLIN
I got the following on 4.17.6 while running btrfs check --repair on an unmounted filesystem (not the lowmem version) I understand that btrfs check is userland only, although it seems that it caused these FS hangs on a different filesystem (the trace of course does not provide info on which FS)

Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2)

2018-07-12 Thread Marc MERLIN
On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote: > > > On 2018年07月12日 01:09, Chris Murphy wrote: > > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN wrote: > >> Thanks to Su and Qu, I was able to get my filesystem to a point that > >> it's mou

Re: btrfs check mode normal still hard crash-hanging systems

2018-07-11 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 11:09:56AM -0600, Chris Murphy wrote: > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN wrote: > > Thanks to Su and Qu, I was able to get my filesystem to a point that > > it's mountable. > > I then deleted loads of snapshots and I'm down to 26. >

Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 12:07:05PM +0800, Su Yue wrote: > > So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and > > I'm running it without the extra options you added with hardcoded stuff: > > gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem > > --repair

Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 09:58:36AM +0800, Su Yue wrote: > > > On 07/11/2018 09:44 AM, Marc MERLIN wrote: > > On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: > > > > > > > > > On 07/11/2018 08:58 AM, Marc MERLIN wrote: > > > >

Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: > > > On 07/11/2018 08:58 AM, Marc MERLIN wrote: > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > > > Problems > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes

Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > Problems > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > server, despite my deleting lots of snapshots. > > Is it because I have too many files then? > > > Yes. Original check first gather all infomation

btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
Thanks to Su and Qu, I was able to get my filesystem to a point that it's mountable. I then deleted loads of snapshots and I'm down to 26. IT now looks like this: gargamel:~# btrfs fi show /mnt/mnt Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Total devices 1 FS bytes used

Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
with responses to Qu On Tue, Jul 10, 2018 at 09:09:33AM +0800, Qu Wenruo wrote: > > > On 2018年07月10日 01:48, Marc MERLIN wrote: > > Success! > > Well done Su, this is a huge improvement to the lowmem code. It went from > > days to less than 3 hours. > > Awesome work!

Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote: > Ok, this is where I am now: > WARNING: debug: end of checking extent item[18457780273152 169 1] > type: 176 offset: 2 > checking extent items [18457780273152/18457780273152] > ERROR: errors found in extent

Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:46:59PM -0600, Chris Murphy wrote: > On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo wrote: > > > > > > There must be something wrong, however due to the size of the fs, and > > the complexity of extent tree, I can't tell. > > Right, which is why I'm asking if any of the

Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:34:45PM -0600, Chris Murphy wrote: > On Tue, Jul 3, 2018 at 2:34 AM, Su Yue wrote: > > > Yes, extent tree is the hardest part for lowmem mode. I'm quite > > confident the tool can deal well with file trees(which records metadata > > about file and directory name,

Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:50:48PM +0800, Qu Wenruo wrote: > > It sounds like there may not be a fix to this problem with the filesystem's > > design, outside of "do not get there, or else". > > It would even be useful for btrfs tools to start computing heuristics and > > output warnings like "you

Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:26:37AM +, Paul Jones wrote: > I don't have any experience with this, but since it's the internet let me > tell you how I'd do it anyway  That's the spirit :) > raid5 > dm-crypt > lvm (using thin provisioning + cache) > btrfs > > The cache mode on lvm requires

Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote: > So the idea behind journaled file systems is that journal replay > enabled mount time "repair" that's faster than an fsck. Already Btrfs > use cases with big, but not huge, file systems makes btrfs check a > problem. Either running

Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 09:37:47AM +0800, Qu Wenruo wrote: > > If I do this, I would have > > software raid 5 < dmcrypt < bcache < lvm < btrfs > > That's a lot of layers, and that's also starting to make me nervous :) > > If you could keep the number of snapshots to minimal (less than 10) for >

Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 12:51:30AM +, Paul Jones wrote: > You could combine bcache and lvm if you are happy to use dm-cache instead > (which lvm uses). > I use it myself (but without thin provisioning) and it works well. Interesting point. So, I used to use lvm and then lvm2 many years ago

Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:35:19PM -0400, Austin S. Hemmelgarn wrote: > >I kind of linked the thin provisioning idea because it's hands off, > >which is appealing. Any reason against it? > No, not currently, except that it adds a whole lot more stuff between > BTRFS and whatever layer is below

Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:33:09PM +0500, Roman Mamedov wrote: > On Mon, 2 Jul 2018 08:19:03 -0700 > Marc MERLIN wrote: > > > I actually have fewer snapshots than this per filesystem, but I backup > > more than 10 filesystems. > > If I used as many snapshots as you re

Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 12:59:02PM -0400, Austin S. Hemmelgarn wrote: > > Am I supposed to put LVM thin volumes underneath so that I can share > > the same single 10TB raid5? > > Actually, because of the online resize ability in BTRFS, you don't > technically _need_ to use thin provisioning here.

Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
Hi Qu, thanks for the detailled and honest answer. A few comments inline. On Mon, Jul 02, 2018 at 10:42:40PM +0800, Qu Wenruo wrote: > For full, it depends. (but for most real world case, it's still flawed) > We have small and crafted images as test cases, which btrfs check can > repair without

Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
Hi Qu, I'll split this part into a new thread: > 2) Don't keep unrelated snapshots in one btrfs. >I totally understand that maintain different btrfs would hugely add >maintenance pressure, but as explains, all snapshots share one >fragile extent tree. Yes, I understand that this is

Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote: > > Ok, that's 29MB, so it doesn't fit on pastebin: > > http://marc.merlins.org/tmp/dshelf2_inspect.txt > > > Sorry Marc. After offline communication with Qu, both > of us think the filesystem is hard to repair. > The filesystem is too large

Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:02:33AM +0800, Su Yue wrote: > Could you try follow dumps? They shouldn't cost much time. > > #btrfs inspect dump-tree -t 21872 | grep -C 50 "374857 > EXTENT_DATA " > > #btrfs inspect dump-tree -t 22911 | grep -C 50 "374857 > EXTENT_DATA " Ok, that's 29MB, so it

Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Thu, Jun 28, 2018 at 11:43:54PM -0700, Marc MERLIN wrote: > On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote: > > > > https://github.com/Damenly/btrfs-progs/tree/tmp1 > > > > > > Not sure if I undertand that you meant, here. > > > > > So

Re: btrfs check of a raid0?

2018-07-01 Thread Marc MERLIN
On Sun, Jul 01, 2018 at 01:15:09PM -0600, Chris Murphy wrote: > > How is it supposed to work when you have multiple devices for a btrfs > > filesystem? > > > > gargamel:~# btrfs check --repair -p /dev/bcache2 > > enabling repair mode > > ERROR: mount check: cannot open /dev/bcache2: No such device

btrfs check of a raid0?

2018-07-01 Thread Marc MERLIN
Howdy, I have a btrfs filesystem made out of 2 devices: [ 75.141414] BTRFS: device label btrfs_space devid 1 transid 429220 /dev/bcache3 [ 75.164745] BTRFS: device label btrfs_space devid 2 transid 429220 /dev/bcache2 One of the 2 devices had a hardware error (not btrfs' fault):

Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Marc MERLIN
Sorry that I missed the beginning of this discussion, but I think this is what I documented here after hitting hte same problem: http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html Marc On Sun, Jul 01, 2018 at 01:03:37AM +0200, Hannes

Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Marc MERLIN
On Sat, Jun 30, 2018 at 10:49:07PM +0800, Qu Wenruo wrote: > But the last abort looks pretty possible to be the culprit. > > Would you try to dump the extent tree? > # btrfs inspect dump-tree -t extent | grep -A50 156909494272 Sure, there you go: item 25 key (156909494272 EXTENT_ITEM

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
Well, there goes that. After about 18H: ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 1, have: 1452 backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 btrfs(+0x3a232)[0x56091704f232]

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:28:31AM -0700, Marc MERLIN wrote: > So, I rebooted, and will now run Su's btrfs check without repair and > report back. As expected, it will likely still take days, here's the start: gargamel:~# btrfs check --mode=lowmem -p /dev/mapper/dshelf2 Checking file

Re: btrfs send/receive vs rsync

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 10:04:02AM +0200, Lionel Bouton wrote: > Hi, > > On 29/06/2018 09:22, Marc MERLIN wrote: > > On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote: > >> On Thu, 28 Jun 2018 23:59:03 -0700 > >> Marc MERLIN wrote: > >> >

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 03:20:42PM +0800, Qu Wenruo wrote: > If certain btrfs specific operations are involved, it's definitely not OK: > 1) Balance > 2) Quota > 3) Btrfs check Ok, I understand. I'll try to balance almost never then. My problems did indeed start because I ran balance and it got

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote: > On Thu, 28 Jun 2018 23:59:03 -0700 > Marc MERLIN wrote: > > > I don't waste a week recreating the many btrfs send/receive relationships. > > Consider not using send/receive, and switching to regular rsync in

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:29:10PM +0800, Qu Wenruo wrote: > > If --repair doesn't work, check is useless to me sadly. > > Not exactly. > Although it's time consuming, I have manually patched several users fs, > which normally ends pretty well. Ok I understand now. > > Agreed, I doubt I have

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote: > > > https://github.com/Damenly/btrfs-progs/tree/tmp1 > > > > Not sure if I undertand that you meant, here. > > > Sorry for my unclear words. > Simply speaking, I suggest you to stop current running check. > Then, clone above branch to

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:02:19PM +0800, Su Yue wrote: > I have figured out the bug is lowmem check can't deal with shared tree block > in reloc tree. The fix is simple, you can try the follow repo: > > https://github.com/Damenly/btrfs-progs/tree/tmp1 Not sure if I undertand that you meant,

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:48:17PM +0800, Qu Wenruo wrote: > Just normal btrfs check, and post the output. > If normal check eats up all your memory, btrfs check --mode=lowmem. Does check without --repair eat less RAM? > --repair should be considered as the last method. If --repair doesn't

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:35:06PM +0800, Su Yue wrote: > > It's hard to estimate, especially when every cross check involves a lot > > of disk IO. > > > > But at least, we could add such indicator to show we're doing something. > > Maybe we can account all roots in root tree first, before

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote: > > lowmem repair seems to be going still, but it's been days and -p seems > > to do absolutely nothing. > > I'm a afraid you hit a bug in lowmem repair code. > By all means, --repair shouldn't really be used unless you're pretty > sure

So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
Regular btrfs check --repair has a nice progress option. It wasn't perfect, but it showed something. But then it also takes all your memory quicker than the linux kernel can defend itself and reliably completely kills my 32GB server quicker than it can OOM anything. lowmem repair seems to be

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Mon, Jun 25, 2018 at 01:07:10PM -0400, Austin S. Hemmelgarn wrote: > > - mount -o recovery still hung > > - mount -o ro did not hang though > One tip here specifically, if you had to reboot during a balance and the FS > hangs when it mounts, try mounting with `-o skip_balance`. That should >

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Mon, Jun 25, 2018 at 06:24:37PM +0200, Hans van Kranenburg wrote: > >> output hasn't changed for over 36 hours, unless you've got an insanely slow > >> storage array, that's extremely unusual (it should only be moving at most > >> 3GB of data per chunk)). > > > > I didn't hear from any

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote: > > In your situation, I would run "btrfs pause ", wait to hear from > > a btrfs developer, and not use the volume whatsoever in the meantime. > I would say this is probably good advice. I don't really know what's going > on

Re: btrfs balance did not progress after 12H

2018-06-19 Thread Marc MERLIN
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote: > So, I ran this: > gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & > [1] 24450 > Dumping filters: flags 0x1, state 0x0, force is off > DATA (flags 0x2): balancing, usage=60 > gargamel:/mnt/bt

btrfs balance did not progress after 12H

2018-06-18 Thread Marc MERLIN
So, I ran this: gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & [1] 24450 Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=60 gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done 0 out of about 0 chunks balanced (0

Re: 4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-15 Thread Marc MERLIN
On Tue, May 15, 2018 at 09:36:11AM +0100, Filipe Manana wrote: > We got a fix for this recently: https://patchwork.kernel.org/patch/10396523/ Thanks very much for the notice, sorry that I missed it. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft

4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-14 Thread Marc MERLIN
static noinline struct extent_buffer * read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent, int slot) { int level = btrfs_header_level(parent); struct extent_buffer *eb; if (slot < 0 || slot >= btrfs_header_nritems(parent))

Re: How to change/fix 'Received UUID'

2018-03-10 Thread Marc MERLIN
Thanks all for the help again. I just wrote a blog post to explain the process to others should anyone need this later. http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html Marc -- "A mouse is a device used to point at the xterm you

Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN
On Thu, Mar 08, 2018 at 09:36:49PM +0300, Andrei Borzenkov wrote: > Yes. Your source has Received UUID. In this case btrfs send will > transmit received UUID instead of subvolume UUID as reference to base > snapshot. You need to either clear received UUID on source or set > received UUID on

Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN
On Thu, Mar 08, 2018 at 09:34:45AM +0300, Andrei Borzenkov wrote: > 08.03.2018 09:06, Marc MERLIN пишет: > > On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote: > >>> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d > >>

Re: How to change/fix 'Received UUID'

2018-03-07 Thread Marc MERLIN
On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote: > > https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d > > Well, I had never heard about it until now, thank you. > > I'll see if I can make it work when I get a bit of time

Re: How to change/fix 'Received UUID'

2018-03-06 Thread Marc MERLIN
On Tue, Mar 06, 2018 at 08:12:15PM +0100, Hans van Kranenburg wrote: > On 05/03/2018 20:47, Marc MERLIN wrote: > > On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote: > >>> If I absolutely know that the data is the same on both sides, how do I > >&g

Re: How to change/fix 'Received UUID'

2018-03-05 Thread Marc MERLIN
On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote: > > If I absolutely know that the data is the same on both sides, how do I > > either > > 1) force back in a 'Received UUID' value on the destination > > I suppose the most simple is to write small program that does it using >

How to change/fix 'Received UUID'

2018-03-05 Thread Marc MERLIN
Howdy, I did a bunch of copies and moving around subvolumes between disks and at some point, I did a snapshot dir1/Win_ro.20180205_21:18:31 dir2/Win_ro.20180205_21:18:31 As a result, I lost the ro flag, and apparently 'Received UUID' which is now preventing me from restarting the btrfs

Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-18 Thread Marc MERLIN
On Sat, Nov 18, 2017 at 08:16:32AM +0800, Qu Wenruo wrote: > > item 27 key (1919785864 DIR_ITEM 2591417872) itemoff 14637 itemsize > > 46 > > location key (1919805647 INODE_ITEM 0) type FILE > > transid 2231988 data_len 0 name_len 16 > >

Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-17 Thread Marc MERLIN
On Thu, Nov 16, 2017 at 09:53:15PM -0800, Marc MERLIN wrote: > > I suggest that you try lvmcache instead. It's much more flexible than > > bcache, > > does pretty much the same job, and has much less of the "hacky" feel to it. > > I can read up on it, it's goin

Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-17 Thread Marc MERLIN
On Fri, Nov 17, 2017 at 04:12:07PM +0800, Qu Wenruo wrote: > > > On 2017年11月17日 15:30, Marc MERLIN wrote: > > Here's the whole output: > > gargamel:~# btrfs-debug-tree -t 258 /dev/mapper/raid0d1 | grep 1919805647 > > Sorry, I missed "-C10" parameter f

Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-16 Thread Marc MERLIN
transid verify failed on 1174605512704 wanted 2245171 found 2247435 parent transid verify failed on 1174605512704 wanted 2245171 found 2247435 Ignoring transid failure WARNING: eb corrupted: item 130 eb level 0 next level 2, skipping the rest On Thu, Nov 16, 2017 at 10:17:07PM -0800, Marc MERLIN

Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-16 Thread Marc MERLIN
On Fri, Nov 17, 2017 at 01:17:19PM +0800, Qu Wenruo wrote: > > > On 2017年11月17日 10:26, Marc MERLIN wrote: > > Howdy, > > > > Up to date git pull from btrfs-progs: > > > > gargamel:~# btrfs check --repair /dev/mapper/raid0d1 > > enabling repai

Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
On Fri, Nov 17, 2017 at 10:41:48AM +0500, Roman Mamedov wrote: > On Thu, 16 Nov 2017 16:12:56 -0800 > Marc MERLIN <m...@merlins.org> wrote: > > > On Thu, Nov 16, 2017 at 11:32:33PM +0100, Holger Hoffstätte wrote: > > > Don't pop the champagne just yet, I just

btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-16 Thread Marc MERLIN
Howdy, Up to date git pull from btrfs-progs: gargamel:~# btrfs check --repair /dev/mapper/raid0d1 enabling repair mode Checking filesystem on /dev/mapper/raid0d1 UUID: 01334b81-c0db-4e80-92e4-cac4da867651 checking extents corrupt extent record: key 203003699200 168 40960 corrupt extent record:

Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
On Thu, Nov 16, 2017 at 11:32:33PM +0100, Holger Hoffstätte wrote: > Don't pop the champagne just yet, I just read that apprently 4.14 broke > bcache for some people [1]. Not sure how much that affects you, but it might > well make things worse. Yeah, I know, wonderful. Oh my, that's actually

Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
On Thu, Nov 16, 2017 at 06:27:44PM +0100, Holger Hoffstätte wrote: > On 11/16/17 18:07, Marc MERLIN wrote: > > Sorry, was missing the kernel number in the subject, just fixed that. > > > > On Thu, Nov 16, 2017 at 09:04:45AM -0800, Marc MERLIN wrote: > >> My serve

4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
Sorry, was missing the kernel number in the subject, just fixed that. On Thu, Nov 16, 2017 at 09:04:45AM -0800, Marc MERLIN wrote: > My server now reboots every 20mn or so, with this. > Sadly another BUG_ON() and it won't even tell me which filesystem > it's on > > st

kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
My server now reboots every 20mn or so, with this. Sadly another BUG_ON() and it won't even tell me which filesystem it's on static inline u32 btrfs_extent_inline_ref_size(int type) { if (type == BTRFS_TREE_BLOCK_REF_KEY || type == BTRFS_SHARED_BLOCK_REF_KEY)

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-27 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 05:22:14PM -0700, Marc MERLIN wrote: > On Sun, Sep 10, 2017 at 01:16:26PM +, Josef Bacik wrote: > > Great, if the free space cache is fucked again after the next go > > around then I need to expand the verifier to watch entries being added > >

Re: [PATCH] btrfs-progs: Output time elapsed for each major tree it checked

2017-09-11 Thread Marc MERLIN
I'll try check vs check --repair again and report the times if they are weird. In the meantime you can check in btrfs-progs WIP and maybe someone else will get useful time data before I can again. Thanks, Marc > Reported-by: Marc MERLIN <m...@merlins.org> > Signed-off-by: Qu Wenruo &

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-10 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 01:16:26PM +, Josef Bacik wrote: > Great, if the free space cache is fucked again after the next go > around then I need to expand the verifier to watch entries being added > to the cache as well. Thanks, Well, I copied about 1TB of data, and nothing happened. So it

Re: netapp-alike snapshots?

2017-09-10 Thread Marc MERLIN
On Sat, Sep 09, 2017 at 10:43:16PM +0300, Andrei Borzenkov wrote: > 09.09.2017 16:44, Ulli Horlacher пишет: > > > > Your tool does not create .snapshot subdirectories in EVERY directory like > > Neither does NetApp. Those "directories" are magic handles that do not > really exist. Correct,

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-10 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 02:01:58PM +0800, Qu Wenruo wrote: > > > On 2017年09月10日 01:44, Marc MERLIN wrote: > > So, should I assume that btrfs progs git has some issue since there is > > no plausible way that a check --repair should be faster than a regular > > chec

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-10 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 03:12:16AM +, Josef Bacik wrote: > Ok mount -o clear_cache, umount and run fsck again just to make sure. Then > if it comes out clean mount with ref_verify again and wait for it to blow up > again. Thanks, Ok, just did the 2nd fsck, came back clean after mount -o

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Marc MERLIN
On Sat, Sep 09, 2017 at 10:56:14PM +, Josef Bacik wrote: > Well that's odd, a block allocated on disk is in the free space cache. Can I > see the full output of the fsck? I want to make sure it's actually getting > to the part where it checks the free space cache. If it does then I'll

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Marc MERLIN
On Tue, Sep 05, 2017 at 06:19:25PM +, Josef Bacik wrote: > Alright I just reworked the build tree ref stuff and tested it to make sure > it wasn’t going to give false positives again. Apparently I had only ever > used this with very basic existing fs’es and nothing super complicated, so it

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-09 Thread Marc MERLIN
So, should I assume that btrfs progs git has some issue since there is no plausible way that a check --repair should be faster than a regular check? Thanks, Marc On Tue, Sep 05, 2017 at 07:45:25AM -0700, Marc MERLIN wrote: > On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wr

Re: netapp-alike snapshots?

2017-09-09 Thread Marc MERLIN
On Sat, Sep 09, 2017 at 03:26:14PM +0200, Ulli Horlacher wrote: > On Tue 2017-08-22 (15:22), Ulli Horlacher wrote: > > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > > You can find these snapshots in every local directory (readonly). > > > I would like to have something

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-05 Thread Marc MERLIN
On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote: > > gargamel:~# btrfs fi df /mnt/btrfs_pool1 > > Data, single: total.60TiB, used.54TiB > > System, DUP: total2.00MiB, used=1.19MiB > > Metadata, DUP: totalX.00GiB, used.69GiB > > Wait for a minute. > > Is that .69GiB means 706 MiB? Or

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-04 Thread Marc MERLIN
Ok, not quite hours, but check takes 88mn, check --repair takes 11mn gargamel:/var/local/src/btrfs-progs# time btrfs check /dev/mapper/dshelf1 Checking filesystem on /dev/mapper/dshelf1 UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d checking extents checking free space cache cache and super

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-04 Thread Marc MERLIN
On Tue, Sep 05, 2017 at 09:21:55AM +0800, Qu Wenruo wrote: > > > On 2017年09月05日 09:05, Marc MERLIN wrote: > >Ok, I don't want to sound like I'm complaining :) but I updated > >btrfs-progs to top of tree in git, installed it, and ran it on an 8TiB > >filesystem

btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-04 Thread Marc MERLIN
Ok, I don't want to sound like I'm complaining :) but I updated btrfs-progs to top of tree in git, installed it, and ran it on an 8TiB filesystem that used to take 12H or so to check. It finished in maybe 10mn, just 10mn! :) gargamel:/var/local/src/btrfs-progs# btrfs check --repair

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 05:33:33PM +, Josef Bacik wrote: > Alright pushed, sorry about that. I'm reasonably sure I'm running the new code, but still got this: [ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block [ 2104.358226] Dumping block entry [115253923840

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 02:38:57PM +, Josef Bacik wrote: > Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be > difficult ;). Thanks, Right, except that I thought I did: saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config CONFIG_STACKTRACE_SUPPORT=y

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 03:26:34AM +, Josef Bacik wrote: > I was looking through the code for other ways to cut down memory usage when I > noticed we only catch improper re-allocations, not adding another ref for > metadata which is what I suspect your problem is. I added another patch and

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-02 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 12:30:07AM +, Josef Bacik wrote: > My bad, I forgot I don't dynamically allocate the stack trace space so my > patch did nothing, I blame the children for distracting me. I've dropped > allocating the action altogether for the on disk stuff, that should >

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-02 Thread Marc MERLIN
On Sat, Sep 02, 2017 at 04:52:20PM +, Josef Bacik wrote: > Oops, ok I've updated my tree so we don't save the stack trace of the initial > scan, which we don't need anyway. That should save a decent amount of memory > in your case. It was an in place update so you'll need to blow away your

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-02 Thread Marc MERLIN
On Fri, Sep 01, 2017 at 11:01:30PM +, Josef Bacik wrote: > You'll be fine, it's only happening on the one fs right? That's 13gib of > metadata with checksums and all that shit, it'll probably look like 8 or 9gib > of ram worst case. I'd mount with -o ref_verify and check the slab amount in

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-01 Thread Marc MERLIN
On Thu, Aug 31, 2017 at 05:48:23PM +, Josef Bacik wrote: > We are using 4.11 in production at fb with backports from recent (a month > ago?) stuff. I’m relatively certain nothing bad will happen, and this branch > has the most recent fsync() corruption fix (which exists in your kernel so >

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-31 Thread Marc MERLIN
On Thu, Aug 31, 2017 at 02:52:56PM +, Josef Bacik wrote: > Hello, > > Sorry I really thought I could accomplish this with BPF, but ref tracking is > just too complicated to work properly with BPF. I forward ported my ref > verification patch to the latest kernel, you can find it in the

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-29 Thread Marc MERLIN
On Tue, Aug 29, 2017 at 06:22:38PM +, Josef Bacik wrote: > How much metadata do you have on this fs? I was going to hold everything in > bpf hash trees, but I’m worried we’ll hit collisions and then the tracing > will be useless. If it’s too big I’ll have to dump everything to userspace >

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-29 Thread Marc MERLIN
On Tue, Aug 29, 2017 at 02:30:19PM +, Josef Bacik wrote: > Sorry Marc, I’ll wire up a bcc script to try and catch when this > happens. In order for it to work it’ll need to read the extent tree in > before you mount the fs, is that something you’ll be able to swing or is > this your root fs?

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-28 Thread Marc MERLIN
On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote: > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > > Dear Chris and other developers, > > > > Can you look at this bug which has been happening since 2012 on apparently > > all kernels betwee

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-08-01 Thread Marc MERLIN
On Mon, Jul 31, 2017 at 03:00:53PM -0700, Justin Maggard wrote: > Marc, do you have quotas enabled? IIRC, you're a send/receive user. > The combination of quotas and btrfs receive can corrupt your > filesystem, as shown by the xfstest I sent to the list a little while > ago. Thanks for checking.

  1   2   3   4   5   6   7   8   >