Re: superblock checksum mismatch after crash, cannot mount

2014-08-23 Thread Zygo Blaxell
On Sat, Aug 23, 2014 at 09:34:10AM +, Duncan wrote: Florian Gamböck posted on Sat, 23 Aug 2014 10:38:47 +0200 as excerpted: Am 23.08.2014 07:27, schrieb Duncan: btrfs-show-super is your tool for inspecting the superblocks. [...] Then use btrfs rescue super-recover Yes, show-super

backtrace for segfault in btrfs-convert

2014-08-23 Thread Zygo Blaxell
This came from trying to convert a ~1.8T ext4 filesystem with btrfs-progs master (24cf4d8c3ee924b474f68514e0167cc2e602a48d) on Debian. e2fsck -f reports no errors on the source filesystem. I've done several ext4 conversions before this one, so I'm pretty sure the tool works most of the time. ;)

Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-02 Thread Zygo Blaxell
On Tue, Sep 02, 2014 at 05:20:29AM +, Duncan wrote: suspect your firmware is SERIOUSLY out of space and shuffling, as that'll slow the balance down too, and again after), try running fstrim on the device. It may or may not work on that device, but if it does and the firmware /was/ out

Re: Is it necessary to balance a btrfs raid1 array?

2014-09-10 Thread Zygo Blaxell
On Wed, Sep 10, 2014 at 09:25:17PM -0400, Sean Greenslade wrote: On Thu, Sep 11, 2014 at 12:28:56AM +0200, Goffredo Baroncelli wrote: The WD datasheet says something different. It reports Non-recoverable read errors per bits read less than 1/10^14. They express the number of error in

Re: Is it necessary to balance a btrfs raid1 array?

2014-09-10 Thread Zygo Blaxell
On Wed, Sep 10, 2014 at 01:27:36PM +0100, Bob Williams wrote: I have two 2TB disks formatted as a btrfs raid1 array, mirroring both data and metadata. Last night I started # btrfs filesystem balance path and it is still running 18 hours later. This suggests that most stuff only gets

3.16.3..3.17.1 hang in renameat2()

2014-10-19 Thread Zygo Blaxell
I've seen a hang in renameat2() from time to time on the last few stable kernels. I can reproduce it easily but only on one specific multi-terabyte filesystem with millions of files. I've tried to make a simpler repro setup but so far without success. Here is what I know so far. First, the

Re: strange 3.16.3 problem

2014-10-20 Thread Zygo Blaxell
your mail server do a lot of renames? Is one perhaps stuck? If so, that sounds like the same thing Zygo Blaxell is reporting in the 3.16.3..3.17.1 hang in renameat2() thread, OP on Sun, 19 Oct 2014 15:25:26 -400, Msg-ID: 20141019192525.ga29...@hungrycats.org, as linked here: http

Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Zygo Blaxell
On Fri, Oct 17, 2014 at 08:17:37AM +, Hugo Mills wrote: On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote: On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote: Recently I've observed some corruptions to systemd's journal files which are somewhat puzzling. This is

Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

2014-10-21 Thread Zygo Blaxell
On Tue, Oct 21, 2014 at 06:10:27PM -0700, Robert White wrote: That's an unmanageably large and probably pointless number of snapshots guys. I mean 150 is a heck of a lot, and 5000 is almost unfathomable in terms of possible usefulness. Snapshots are cheap but they aren't free. This could

Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

2014-10-22 Thread Zygo Blaxell
On Wed, Oct 22, 2014 at 07:41:32AM +, Duncan wrote: Tomasz Chmielewski posted on Wed, 22 Oct 2014 09:14:14 +0200 as excerpted: Tho that is of course per subvolume. If you have multiple subvolumes on the same filesystem, that can still end up being a thousand or two snapshots per

Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

2014-10-22 Thread Zygo Blaxell
On Wed, Oct 22, 2014 at 01:37:15PM -0700, Robert White wrote: On 10/22/2014 01:08 PM, Zygo Blaxell wrote: I have datasets where I record 14000+ snapshots of filesystem directory trees scraped from test machines and aggregated onto a single server for deduplication...but I store each snapshot

Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

2014-10-23 Thread Zygo Blaxell
On Wed, Oct 22, 2014 at 10:18:09PM -0700, Robert White wrote: On 10/22/2014 09:30 PM, Chris Murphy wrote: Sure. So if Btrfs is meant to address scalability, then perhaps at the moment it's falling short. As it's easy to add large drives and get very large multiple device volumes, the

Check tree block failed, want=17716610236416, have=0

2014-10-23 Thread Zygo Blaxell
I attempted to run btrfs check --repair, but it got stuck spinning in what appeared to be an infinite loop. strace and ltrace revealed nothing, and gdb wasn't particularly helpful, so I rebuilt btrfs with debug symbols and tried again. Now I get this from btrfs check: Couldn't map the

Re: Check tree block failed, want=17716610236416, have=0

2014-10-23 Thread Zygo Blaxell
(without an rpm etc). On 10/23/2014 04:16 PM, Zygo Blaxell wrote: I attempted to run btrfs check --repair, but it got stuck spinning in what appeared to be an infinite loop. strace and ltrace revealed nothing, and gdb wasn't particularly helpful, so I rebuilt btrfs with debug symbols and tried

Re: Check tree block failed, want=17716610236416, have=0

2014-10-23 Thread Zygo Blaxell
On Thu, Oct 23, 2014 at 09:24:48PM -0400, Zygo Blaxell wrote: On Thu, Oct 23, 2014 at 05:28:58PM -0700, Robert White wrote: You may be in deep error land from the long use of 3.10... that said, the --init-csum-tree or --init-extent-tree options may be your friend here. The backtrace shows

Re: device balance times

2014-10-23 Thread Zygo Blaxell
On Fri, Oct 24, 2014 at 01:05:39AM +, Duncan wrote: Austin S Hemmelgarn posted on Thu, 23 Oct 2014 07:39:28 -0400 as excerpted: On 2014-10-23 05:19, Miao Xie wrote: Now my colleague and I is implementing the scrub/replace for RAID5/6 and I have a plan to reimplement the balance and

Re: [bug] btrfs check --subvol-extents segfault

2014-10-23 Thread Zygo Blaxell
I just stumbled across this bug a few hours ago. It's still in btrfs-progs 3.17. On Mon, Sep 29, 2014 at 11:20:06AM +0800, Qu Wenruo wrote: Ping. No response? Thanks, Qu Original Message Subject: Re: [bug] btrfs check --subvol-extents segfault From: Eric Sandeen

Re: device balance times

2014-10-24 Thread Zygo Blaxell
On Fri, Oct 24, 2014 at 05:13:27AM +, Duncan wrote: Zygo Blaxell posted on Thu, 23 Oct 2014 22:35:29 -0400 as excerpted: My pet peeve: if balance is converting profiles from RAID1 to single, the conversion should be *instantaneous* (or at least small_constant * number_of_block_groups

Re: device balance times

2014-10-24 Thread Zygo Blaxell
On Fri, Oct 24, 2014 at 06:58:25AM -0400, Rich Freeman wrote: On Thu, Oct 23, 2014 at 10:35 PM, Zygo Blaxell ce3g8...@umail.furryterror.org wrote: - single profile: we can tolerate zero missing disks, so we don't allow rw mounts even if degraded

Re: Check tree block failed, want=17716610236416, have=0 [RESOLVED]

2014-10-24 Thread Zygo Blaxell
On Thu, Oct 23, 2014 at 07:16:22PM -0400, Zygo Blaxell wrote: I attempted to run btrfs check --repair, but it got stuck spinning in what appeared to be an infinite loop. strace and ltrace revealed nothing, and gdb wasn't particularly helpful, so I rebuilt btrfs with debug symbols and tried

Re: Poll: time to switch skinny-metadata on by default?

2014-10-26 Thread Zygo Blaxell
On Mon, Oct 20, 2014 at 06:34:03PM +0200, David Sterba wrote: On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote: I'd like to make it default with the 3.17 release of btrfs-progs. Please let me know if you have objections. For the record, 3.17 will not change the defaults. The

Re: RAID1 fails to recover chunk tree

2014-10-30 Thread Zygo Blaxell
On Thu, Oct 30, 2014 at 09:30:46AM -0400, Zack Coffey wrote: Rob, That second drive was immediately put to use elsewhere. I figured having only the metadata on that drive, it wouldn't matter. The data stayed single and wasn't part of the second drive, only the metadata was. I must not be

Re: btrfs deduplication and linux cache management

2014-10-30 Thread Zygo Blaxell
On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote: Hi, I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files. To explain in deep: - I use btrfs for whole system with few subvolumes with some compression on some

Re: filesystem corruption

2014-11-02 Thread Zygo Blaxell
On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote: On Nov 1, 2014, at 10:49 PM, Robert White rwh...@pobox.com wrote: On 10/31/2014 10:34 AM, Tobias Holst wrote: I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my

Re: filesystem corruption

2014-11-03 Thread Zygo Blaxell
On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote: On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote: btrfs seems to assume the data is correct on both disks (the generation numbers and checksums are OK) but gets confused by equally plausible but different

Re: btrfs deduplication and linux cache management

2014-11-04 Thread Zygo Blaxell
a 4K cacheable page that was compressed to 312 bytes somewhere in the middle of a 57K compressed data extent...what's that page's block number, again?). Thanks, have a nice day, -- LuVar - Zygo Blaxell zblax...@furryterror.org wrote: On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu

Re: filesystem corruption

2014-11-04 Thread Zygo Blaxell
On Tue, Nov 04, 2014 at 11:28:39AM -0700, Chris Murphy wrote: On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote: It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_

Re: BTRFS messes up snapshot LV with origin

2014-11-20 Thread Zygo Blaxell
On Mon, Nov 17, 2014 at 08:04:05PM +0100, Goffredo Baroncelli wrote: On 2014-11-17 07:59, Brendan Hide wrote: That leaves two aspects of this issue which I view as two separate bugs: a) Btrfs cannot gracefully handle separate filesystems that have the same UUID. At all. b) Grub

Re: BTRFS messes up snapshot LV with origin

2014-11-20 Thread Zygo Blaxell
On Wed, Nov 19, 2014 at 10:20:17AM -0500, Phillip Susi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 9:54 PM, Chris Murphy wrote: Why is it silly? Btrfs on a thin volume has practical use case aside from just being thinly provisioned, its snapshots are block device

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Zygo Blaxell
On Tue, Nov 18, 2014 at 09:29:54AM +0200, Brendan Hide wrote: Hey, guys See further below extracted output from a daily scrub showing csum errors on sdb, part of a raid1 btrfs. Looking back, it has been getting errors like this for a few days now. The disk is patently unreliable but

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Zygo Blaxell
On Fri, Nov 21, 2014 at 09:05:32AM +0200, Brendan Hide wrote: On 2014/11/21 06:58, Zygo Blaxell wrote: You have one reallocated sector, so the drive has lost some data at some time in the last 49000(!) hours. Normally reallocations happen during writes so the data that was lost was data you

Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Zygo Blaxell
On Fri, Nov 21, 2014 at 06:22:57AM +, Duncan wrote: After all, an LVM block-level snapshot takes the same space as a file containing the same raw data, and if there's room for the data in an LVM snapshot, given a different layout, there's room for exactly the same amount of data as a

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Zygo Blaxell
On Fri, Nov 21, 2014 at 11:06:19AM -0700, Chris Murphy wrote: On Fri, Nov 21, 2014 at 10:42 AM, Zygo Blaxell zblax...@furryterror.org wrote: I run 'smartctl -t long' from cron overnight (or whenever the drives are most idle). You can also set up smartd.conf to launch the self tests

Re: BTRFS messes up snapshot LV with origin

2014-11-22 Thread Zygo Blaxell
On Sat, Nov 22, 2014 at 06:34:38PM +0100, Goffredo Baroncelli wrote: On 11/21/2014 05:28 AM, Zygo Blaxell wrote: e.g. if an ext4 filesystem explodes, I can: 1. make a LVM snapshot of the broken filesystem 2. run e2fsck on the snapshot 3. mount and repair

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Zygo Blaxell
On Mon, Nov 24, 2014 at 08:58:25PM +, Hugo Mills wrote: On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo bo.li@oracle.com wrote: This brings a strong-but-slow checksum algorithm, sha256. Actually btrfs used sha256 at the early

Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Zygo Blaxell
On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote: On 11/23/2014 01:19 AM, Zygo Blaxell wrote: [...] md-raid works as long as you specify the devices, and because it's always the lowest layer it can ignore LVs (snapshot or otherwise). It's also not a particularly common

Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Zygo Blaxell
On Tue, Nov 25, 2014 at 10:59:53PM +0100, Goffredo Baroncelli wrote: On 11/25/2014 09:29 PM, Zygo Blaxell wrote: On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote: On 11/23/2014 01:19 AM, Zygo Blaxell wrote: [...] md-raid works as long as you specify the devices

Re: BTRFS messes up snapshot LV with origin

2014-11-26 Thread Zygo Blaxell
On Wed, Nov 26, 2014 at 06:19:05PM +0100, Goffredo Baroncelli wrote: On 11/25/2014 11:21 PM, Zygo Blaxell wrote: However I still doesn't understood why you want btrfs-w/multiple disk over LVM ? I want to split a few disks into partitions, but I want to create, move, and resize

Re: Balance and RAID-1

2014-11-27 Thread Zygo Blaxell
On Fri, Nov 28, 2014 at 01:37:50AM +1100, Russell Coker wrote: I had a RAID-1 filesystem with 2*3TB disks and 330G of disk space free according to df -h. I replaced a 3TB disk with a 4TB disk and df reported no change in the free space (as expected). Did you btrfs resize that 4TB disk? If

Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Zygo Blaxell
On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote: On 11/27/2014 05:15 AM, Zygo Blaxell wrote: This is a weakness of the current udev and asynchronous device hotplug concept: there is no notion of bus enumeration in progress, so we can be trying to assemble multi-device

Re: BTRFS messes up snapshot LV with origin

2014-12-01 Thread Zygo Blaxell
On Fri, Nov 28, 2014 at 11:55:07PM -0800, Robert White wrote: On 11/28/2014 08:59 PM, Zygo Blaxell wrote: On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote: On 11/27/2014 05:15 AM, Zygo Blaxell wrote: This is a weakness of the current udev and asynchronous device hotplug

Re: Possible to undo subvol delete?

2014-12-01 Thread Zygo Blaxell
On Mon, Dec 01, 2014 at 10:09:44PM +0530, Shriramana Sharma wrote: On Mon, Dec 1, 2014 at 7:16 PM, Roman Mamedov r...@romanrm.net wrote: A more sensible idea could be adding a global-level '-i' switch, same as in 'rm', so that you or distros could then alias 'btrfs' to 'btrfs -i' (ask

Re: Possible to undo subvol delete?

2014-12-02 Thread Zygo Blaxell
On Tue, Dec 02, 2014 at 01:52:52PM +0100, David Sterba wrote: On a side note...only root can delete subvolumes, but non-root users can create them, which results in...this: $ /sbin/btrfs sub create foo Create subvolume './foo' $ date foo/bar $ /sbin/btrfs sub delete

Re: Thin metadata and nohole options recommended?

2014-12-02 Thread Zygo Blaxell
On Tue, Dec 02, 2014 at 08:38:15PM +0530, Shriramana Sharma wrote: From what I'm reading, thin metadata and nohole options were introduced to make the FS more efficient. Does this mean that for someone about to do mkfs.btrfs, it is actively recommended to use these options? If you're using

Re: Possible to undo subvol delete?

2014-12-03 Thread Zygo Blaxell
On Wed, Dec 03, 2014 at 07:48:43PM +0100, David Sterba wrote: On Tue, Dec 02, 2014 at 10:25:55AM -0500, Zygo Blaxell wrote: On Tue, Dec 02, 2014 at 01:52:52PM +0100, David Sterba wrote: On a side note...only root can delete subvolumes, but non-root users can create them, which results

Re: Possible to undo subvol delete?

2014-12-03 Thread Zygo Blaxell
On Wed, Dec 03, 2014 at 07:26:33PM +0100, David Sterba wrote: On Tue, Dec 02, 2014 at 02:09:45PM +, Hugo Mills wrote: On Tue, Dec 02, 2014 at 01:52:52PM +0100, David Sterba wrote: On Mon, Dec 01, 2014 at 10:14:03PM -0500, Zygo Blaxell wrote: export BTRFS_SUBVOLUME_DELETE_CONFIRM=1

Re: Why is the actual disk usage of btrfs considered unknowable?

2014-12-07 Thread Zygo Blaxell
On Sun, Dec 07, 2014 at 08:45:59PM +0530, Shriramana Sharma wrote: IIUC: 1) btrfs fi df already shows the alloc-ed space and the space used out of that. 2) Despite snapshots, CoW and compression, the tree knows how many extents of data and metadata there are, and how many bytes on disk

Re: Why is the actual disk usage of btrfs considered unknowable?

2014-12-08 Thread Zygo Blaxell
On Mon, Dec 08, 2014 at 03:47:23PM +0100, Martin Steigerwald wrote: Am Sonntag, 7. Dezember 2014, 21:32:01 schrieb Robert White: On 12/07/2014 07:40 AM, Martin Steigerwald wrote: Almost full filesystems are their own reward. So you basically say that BTRFS with compression does not meet

Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

2014-12-10 Thread Zygo Blaxell
On Thu, Dec 04, 2014 at 02:56:55PM +0800, Qu Wenruo wrote: The main memory usage in btrfsck is extent record, which we can't free them until we read them all in and checked, so even we mmap/unmap, it can only help with the extent_buffer(which is already freed if not used according to refs).

Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

2014-12-10 Thread Zygo Blaxell
On Thu, Dec 11, 2014 at 10:05:20AM +0800, Qu Wenruo wrote: Original Message Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL? From: Zygo Blaxell zblax...@furryterror.org To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年12月11日 05:57 On Thu, Dec

Re: mkfs.btrfs limits odd [and maybe a failed phantom device?]

2014-12-11 Thread Zygo Blaxell
On Wed, Dec 10, 2014 at 02:18:55PM -0800, Robert White wrote: (3) why can I make a raid5 out of two devices? (I understand that we are currently just making mirrors, but the standard requires three devices in the geometry etc. So I would expect a two device RAID5 to be considered degraded with

Re: Balance scrub defrag

2014-12-11 Thread Zygo Blaxell
On Wed, Dec 10, 2014 at 04:15:17PM -0600, sys.syphus wrote: I am working on a script that i can run daily that will do maintenance on my btrfs mountpoints. is there any reason not to concurrently do all of the above? possibly including discards as well. also, is there anything existing

Re: mkfs.btrfs limits odd [and maybe a failed phantom device?]

2014-12-12 Thread Zygo Blaxell
On Thu, Dec 11, 2014 at 10:01:06PM -0800, Robert White wrote: So RAID5 with three media M is MMM MMM D1 D2 P(a) D3 P(b) D4 P(c) D5 D6 RAID5 with two media is well defined, and looks like this: MMM D1 P(a) P(b) D2 D3 P(c) With even parity and N disks P(a) ^

Re: mkfs.btrfs limits odd [and maybe a failed phantom device?]

2014-12-12 Thread Zygo Blaxell
On Fri, Dec 12, 2014 at 02:28:06PM -0800, Robert White wrote: On 12/12/2014 08:45 AM, Zygo Blaxell wrote: On Thu, Dec 11, 2014 at 10:01:06PM -0800, Robert White wrote: So RAID5 with three media M is MMM MMM D1 D2 P(a) D3 P(b) D4 P(c) D5 D6 RAID5 with two media is well

Re: Balance scrub defrag

2014-12-12 Thread Zygo Blaxell
On Fri, Dec 12, 2014 at 11:17:58AM +0200, Erkki Seppala wrote: That may be sort of true, but I think even SMART is helped by the fact that the media is read through from the beginning to the end*, so it can detect even the errors that don't bubble through the IO layer. And BTRFS can indeed

Re: [PATCH v2 1/3] Btrfs: get more accurate output in df command.

2014-12-18 Thread Zygo Blaxell
On Sun, Dec 14, 2014 at 10:06:50PM -0800, Robert White wrote: ABSTRACT:: Stop being clever, just give the raw values. That's what you should be doing anyway. There are no other correct values to give that doesn't blow someone's paradigm somewhere. The trouble is a lot of existing software

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Zygo Blaxell
On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote: And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you

Re: Uncorrectable errors on RAID-1?

2014-12-23 Thread Zygo Blaxell
On Sun, Dec 21, 2014 at 05:25:47PM -0700, Chris Murphy wrote: For the kernel to automatically fix bad sectors by overwriting them, the drive needs to explicitly report read errors. If the SCSI command timer value is shorter than the drive's error recovery, the SATA link might get reset before

Re: btrfs is using 25% more disk than it should

2014-12-23 Thread Zygo Blaxell
On Sat, Dec 20, 2014 at 06:28:22AM -0500, Josef Bacik wrote: We now have two extents with the same bytenr but with different lengths. [...] Then there is the problem of actually returning the free space. Now if we drop all of the refs for an extent we know the space is free and we return

Re: btrfs doesn't format eMMC if previous filesystem is ext4

2014-12-26 Thread Zygo Blaxell
On Fri, Dec 26, 2014 at 03:24:59PM +, Ankur Tank wrote: I wanted to test btrfs on the eMMC of beaglebone black based custom board. Precondition: eMMC is formatted with ext4 filesystem Use case: Format eMMC with mkfs.btrfs -L label dev Result: Mkfs.btrfs denies

Re: BTRFS free space handling still needs more work: Hangs again

2014-12-27 Thread Zygo Blaxell
On Sat, Dec 27, 2014 at 09:30:43AM +, Hugo Mills wrote: On Sat, Dec 27, 2014 at 10:01:17AM +0100, Martin Steigerwald wrote: Am Freitag, 26. Dezember 2014, 14:48:38 schrieb Robert White: On 12/26/2014 05:37 AM, Martin Steigerwald wrote: Now, since you're seeing lockups when the space

Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, just tasks stuck for some time)

2014-12-28 Thread Zygo Blaxell
On Sat, Dec 27, 2014 at 08:23:59PM +0100, Martin Steigerwald wrote: My simple test case didn´t trigger it, and I so not have another twice 160 GiB available on this SSDs available to try with a copy of my home filesystem. Then I could safely test without bringing the desktop session to an

Re: Standards Problems [Was: [PATCH v2 1/3] Btrfs: get more accurate output in df command.]

2014-12-30 Thread Zygo Blaxell
On Wed, Dec 17, 2014 at 08:07:27PM -0800, Robert White wrote: [...] There are a number of pathological examples in here, but I think there are justifiable correct answers for each of them that emerge from a single interpretation of the meanings of f_bavail, f_blocks, and f_bfree. One gotcha is

Re: should I use btrfs on Centos 7 for a new production server?

2015-01-01 Thread Zygo Blaxell
On Tue, Dec 30, 2014 at 07:29:10PM -0800, Dave Stevens wrote: I have a well tested and working fine Centos5-Xen system. Accumulated cruft from various development efforts make it desirable to redo the install. Currently a RAID-10 ext4 filesystem with LVM and 750G of storage. There's a hot

spuious I/O errors from btrfs...at the caching layer?

2015-01-24 Thread Zygo Blaxell
I am seeing a lot of spurious I/O errors that look like they come from the cache-facing side of btrfs. While running a heavy load with some extent-sharing (e.g. building 20 Linux kernels at once from source trees copied with 'cp -a --reflink=always'), some files will return spurious EIO on read.

Re: 3.19-rc5: Bug 91911: [REGRESSION] rm command hangs big time with deleting a lot of files at once

2015-01-23 Thread Zygo Blaxell
On Fri, Jan 23, 2015 at 03:01:28PM +0100, Martin Steigerwald wrote: Hi! Anyone seen this? Reported as: https://bugzilla.kernel.org/show_bug.cgi?id=91911 I have seen something like this since 3.15. I've also seen its cousin, which gets stuck in evict_inode, but the stacks of the hanging

Re: paused balance convert from raid1 can no longer be a writeable mount

2015-02-04 Thread Zygo Blaxell
On Wed, Feb 04, 2015 at 01:53:09PM -0700, Chris Murphy wrote: This is completely reproducible with a brand new file system created as raid1, using kernel 3.19 and btrfs-progs 3.18. I think you'll find it's reproducible with any kernel after 3.8-rc1 (circa October 2012). The conversion from

repeatable btrfs deadlock in unlink, kernel versions v3.15..v3.18.3

2015-01-17 Thread Zygo Blaxell
Processes keep getting stuck in btrfs_evict_inode during unlink. I've seen this dozens of times, usually when two subvols on a btrfs filesystem are active at the same time (i.e. it never happens on single-subvol filesystems). It happens on kernel versions v3.15..v3.18.3. This used to happen

Re: spuious I/O errors from btrfs...at the caching layer?

2015-01-25 Thread Zygo Blaxell
On Sat, Jan 24, 2015 at 01:06:01PM -0500, Zygo Blaxell wrote: I am seeing a lot of spurious I/O errors that look like they come from the cache-facing side of btrfs. While running a heavy load with some extent-sharing (e.g. building 20 Linux kernels at once from source trees copied with 'cp

Resolved...ish. was: Re: spurious I/O errors from btrfs...at the caching layer?

2015-01-25 Thread Zygo Blaxell
parameters, although I wouldn't expect any value of this sysctl to cause these symptoms... :-P On Sun, Jan 25, 2015 at 11:50:36AM -0500, Zygo Blaxell wrote: On Sat, Jan 24, 2015 at 01:06:01PM -0500, Zygo Blaxell wrote: I am seeing a lot of spurious I/O errors that look like they come from

Re: 3.19-rc5: Bug 91911: [REGRESSION] rm command hangs big time with deleting a lot of files at once

2015-01-25 Thread Zygo Blaxell
On Fri, Jan 23, 2015 at 02:38:09PM +, Holger Hoffstätte wrote: On Fri, 23 Jan 2015 15:01:28 +0100, Martin Steigerwald wrote: Hi! Anyone seen this? Reported as: https://bugzilla.kernel.org/show_bug.cgi?id=91911 You might be interested in:

Re: Get the diff from a file in two snapshots.

2015-01-27 Thread Zygo Blaxell
On Tue, Jan 27, 2015 at 12:43:52AM +0100, Stef Bon wrote: 2015-01-26 22:14 GMT+01:00 Chris Murphy li...@colorremedies.com: is there a way to get the difference between these two files by making use of btrfs? Snapper has this functionality built into it. I'm not sure if it uses diff or

Re: price to pay for nocow file bit?

2015-01-09 Thread Zygo Blaxell
On Fri, Jan 09, 2015 at 04:41:03PM +0100, David Sterba wrote: On Thu, Jan 08, 2015 at 01:36:21PM -0500, Zygo Blaxell wrote: Hmmm...it seems the handwaving about tail-packing that I was previously ignoring is important after all. A few quick tests with filefrag show that btrfs isn't doing

Re: price to pay for nocow file bit?

2015-01-08 Thread Zygo Blaxell
On Thu, Jan 08, 2015 at 05:53:21PM +0100, Lennart Poettering wrote: On Thu, 08.01.15 10:56, Zygo Blaxell (ce3g8...@umail.furryterror.org) wrote: On Wed, Jan 07, 2015 at 06:43:15PM +0100, Lennart Poettering wrote: Heya! Currently, systemd-journald's disk access patterns (appending

Re: price to pay for nocow file bit?

2015-01-08 Thread Zygo Blaxell
On Wed, Jan 07, 2015 at 06:43:15PM +0100, Lennart Poettering wrote: Heya! Currently, systemd-journald's disk access patterns (appending to the end of files, then updating a few pointers in the front) result in awfully fragmented journal files on btrfs, which has a pretty negative effect on

Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, just tasks stuck for some time)

2015-01-07 Thread Zygo Blaxell
On Wed, Jan 07, 2015 at 08:08:50PM +0100, Martin Steigerwald wrote: Am Dienstag, 6. Januar 2015, 15:03:23 schrieb Zygo Blaxell: ext3 has a related problem when it's nearly full: it will try to search gigabytes of block allocation bitmaps searching for a free block, which can result

Re: [PATCH 1/1] btrfs: Align EOF length to block in extent_same

2015-03-02 Thread Zygo Blaxell
I second this. I've seen the same behavior. Clone seems to have evolved a little further than extent-same knows about. e.g. there is code in the extent-same ioctl that tries to avoid doing a clone from within one inode to elsewhere in the same inode; however, the clone ioctl (which extent-same

extent-same ioctl hangs holding locks (v3.18.8)

2015-03-03 Thread Zygo Blaxell
The extent-same ioctl seems to have a locking bug. My test machines will run between 0 and 3 days before something gets locked and stays locked forever. In the dumps and logs below, 'btrsame' calls the btrfs extent-same ioctl on its arguments. As you can see from the trace, it is stuck in this

rsync vs. extent-same: this time with lock debugging (still v3.18.8)

2015-03-04 Thread Zygo Blaxell
rsync seems to get stuck just by reading the same file that extent-same is acting upon. Mar 4 21:35:08 sneezy kernel: [89798.758960] INFO: task rsync:7425 blocked for more than 1800 seconds. Mar 4 21:35:08 sneezy kernel: [89798.759007] Tainted: GW 3.18.8-zb64+ #1 Mar 4

Re: 3.19-rc5: Bug 91911: [REGRESSION] rm command hangs big time with deleting a lot of files at once

2015-01-25 Thread Zygo Blaxell
On Fri, Jan 23, 2015 at 06:29:40PM -0500, Zygo Blaxell wrote: On Fri, Jan 23, 2015 at 03:01:28PM +0100, Martin Steigerwald wrote: Hi! Anyone seen this? Reported as: https://bugzilla.kernel.org/show_bug.cgi?id=91911 I have seen something like this since 3.15. I've also seen

Re: Big disk space usage difference, even after defrag, on identical data

2015-04-13 Thread Zygo Blaxell
On Mon, Apr 13, 2015 at 04:06:39PM +0200, Gian-Carlo Pascutto wrote: On 13-04-15 07:06, Duncan wrote: So what can explain this? Where did the 66G go? Out of curiosity, does a balance on the actively used btrfs help? You mentioned defrag -v -r -clzo, but didn't use the -f (flush) or

Re: The FAQ on fsync/O_SYNC

2015-04-20 Thread Zygo Blaxell
On Mon, Apr 20, 2015 at 06:07:09AM +, Duncan wrote: 4.0 is out. There's reason people may want to stick one version back by default, to 3.19 currently, since it can take a few weeks for early reports to develop into a coherent problem, and sticking one stable series back allows for

Re: Carefully crafted BTRFS-image causes kernel to crash

2015-04-21 Thread Zygo Blaxell
On Tue, Apr 21, 2015 at 11:16:44AM +0800, Qu Wenruo wrote: Original Message Subject: Carefully crafted BTRFS-image causes kernel to crash From: Lukas Lueg lukas.l...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2015年04月21日 07:04 See also

[PATCH] btrfs-progs: report failure when resize ioctl fails

2015-04-21 Thread Zygo Blaxell
The BTRFS_IOC_RESIZE ioctl returns 0 on success, negative for POSIX errors, and positive for btrfs-specific errors. If resize fails with a btrfs-specific error, decode the error and report it. If we can't decode the error, report its numeric value so that the userspace tool is not instantly

Re: The FAQ on fsync/O_SYNC

2015-04-20 Thread Zygo Blaxell
On Mon, Apr 20, 2015 at 10:13:47AM +0200, Gian-Carlo Pascutto wrote: On 20-04-15 06:27, Zygo Blaxell wrote: I'm curious as to whether +C has any effect on BTRFS's durability, too. I would expect it to be strictly equal to or worse than the CoW durability. In addition to the stuff

Re: The FAQ on fsync/O_SYNC

2015-04-19 Thread Zygo Blaxell
On Sun, Apr 19, 2015 at 10:31:02PM +0800, Craig Ringer wrote: On 19 April 2015 at 22:28, Martin Steigerwald mar...@lichtvoll.de wrote: Am Sonntag, 19. April 2015, 21:20:11 schrieb Craig Ringer: Hi all Hi Craig, I'm looking into the advisability of running PostgreSQL on BTRFS, and

Re: Big disk space usage difference, even after defrag, on identical data

2015-04-12 Thread Zygo Blaxell
On Sat, Apr 11, 2015 at 09:59:50PM +0200, Gian-Carlo Pascutto wrote: Linux mozwell 3.19.0-trunk-amd64 #1 SMP Debian 3.19.1-1~exp1 (2015-03-08) x86_64 GNU/Linux btrfs-progs v3.19.1 I have a btrfs volume that's been in use for a week or 2. It has about ~560G of uncompressible data (video

Re: mlocate/updatedb and btrfs subvolume mounts

2015-04-12 Thread Zygo Blaxell
I generally set up that kind of service on bind mounts of its own, e.g. mount --bind / /.binds/root mount --bind /boot /.binds/root/boot mount --bind /home /.binds/root/home ... repeat for everything else you care about then I tell backup / indexing software to

Re: Btrfs and integration with GNU ++

2015-05-20 Thread Zygo Blaxell
On Tue, May 19, 2015 at 12:02:31PM -0600, Chris Murphy wrote: On Tue, May 19, 2015 at 1:24 AM, Russell Coker russ...@coker.com.au wrote: Do you have a reference for fsck on a ro mounted ext4 filesystem being dangerous? The standard behavior of Linux systems has been to fsck a ro mounted

Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same

2015-06-25 Thread Zygo Blaxell
On Thu, Jun 25, 2015 at 09:10:31AM -0400, Austin S Hemmelgarn wrote: On 2015-06-25 08:52, David Sterba wrote: On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote: Is there any sane use case where we would _want_ EXTENT_SAME to change the mtime? We do a lot of work to make sure

Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same

2015-06-24 Thread Zygo Blaxell
On Tue, Jun 23, 2015 at 05:11:56PM +0200, David Sterba wrote: On Mon, Jun 22, 2015 at 03:47:42PM -0700, Mark Fasheh wrote: --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 { static int btrfs_clone(struct inode *src, struct

Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes

2015-06-27 Thread Zygo Blaxell
On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote: One issue users have reported is that dedupe changes mtime on files, resulting in tools like rsync thinking that their contents have changed when in fact the data is exactly the same. Clone still wants an mtime change, so we special

Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes

2015-06-29 Thread Zygo Blaxell
On Mon, Jun 29, 2015 at 10:52:41AM -0700, Mark Fasheh wrote: On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote: On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote: One issue users have reported is that dedupe changes mtime on files, resulting in tools like rsync

Re: Discuss on inband dedup implement (Original strange data backref offset)

2015-07-21 Thread Zygo Blaxell
On Wed, Jul 22, 2015 at 09:49:52AM +0800, Qu Wenruo wrote: Change subject to reflect the core of the conversation. Zygo Blaxell wrote on 2015/07/21 18:14 -0400: On Tue, Jul 21, 2015 at 02:52:38PM +0800, Qu Wenruo wrote: Zygo Blaxell wrote on 2015/07/21 00:55 -0400: There's already a read

Re: Strange data backref offset?

2015-07-21 Thread Zygo Blaxell
On Tue, Jul 21, 2015 at 02:52:38PM +0800, Qu Wenruo wrote: Zygo Blaxell wrote on 2015/07/21 00:55 -0400: An in-band dedup can avoid some of these problems, especially if it intercepts writes before they make it to disk. There is no need for complicated on-disk-extent-splitting algorithms

Re: Strange data backref offset?

2015-07-19 Thread Zygo Blaxell
On Sat, Jul 18, 2015 at 07:35:31PM +0800, Liu Bo wrote: On Fri, Jul 17, 2015 at 10:38:32AM +0800, Qu Wenruo wrote: Hi all, While I'm developing a new btrfs inband dedup mechanism, I found btrfsck and kernel doing strange behavior for clone. [Reproducer] # mount /dev/sdc -t btrfs

Re: Strange data backref offset?

2015-07-20 Thread Zygo Blaxell
On Mon, Jul 20, 2015 at 10:24:38AM +0800, Qu Wenruo wrote: Zygo Blaxell wrote on 2015/07/19 03:23 -0400: But I'm a little considered about the facts that extents get quite small(4K) and the increasing number of backref/file extents may affect performance. At the moment I just ignore any block

Re: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source for clone

2015-10-22 Thread Zygo Blaxell
On Tue, Oct 20, 2015 at 04:29:46PM +0300, Timofey Titovets wrote: > For performance reason, leave data at the start of disk, is preferable > while deduping > It's might sense for the reasons: > 1. Spinning rust - start of the disk is much faster > 2. Btrfs can deallocate empty data chunk from the

Possible FIEMAP deadlock?

2015-10-23 Thread Zygo Blaxell
This looks like a deadlock in FIEMAP: # cat /proc/version Linux version 4.1.8-zb64+ (root@buildhost) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP PREEMPT Tue Sep 22 00:54:04 EDT 2015 # cat /proc/6943/stack [] lock_extent_bits+0x1ad/0x200 []

Re: [PATCH 1/2] btrfs: extend balance filter limit to take minimum and maximum

2015-10-19 Thread Zygo Blaxell
On Mon, Oct 19, 2015 at 11:14:16AM +0200, David Sterba wrote: > On Fri, Oct 16, 2015 at 07:08:37PM -0400, Zygo Blaxell wrote: > > On Fri, Oct 16, 2015 at 06:50:08PM +0200, David Sterba wrote: > > > The 'limit' filter is underdesigned, it should have been a range for > >

  1   2   3   >