I've had the chance to use a testsystem here and couldn't
resist
Unfortunately there seems to be an overproduction of rather
meaningless file system benchmarks...
running a few benchmark programs on them: bonnie++, tiobench,
dbench and a few generic ones (cp/rm/tar/etc...) on ext{234},
> I have tried BTRFS from Ubuntu 16.04 LTS for write intensive
> OLTP MySQL Workload.
This has a lot of interesting and mostly agreeable information:
https://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp
The main target of Btrfs is where one wants checksums and
occasional
>> My system is or seems to be running out of disk space but I
>> can't find out how or why. [ ... ]
>> FilesystemSize Used Avail Use% Mounted on
>> /dev/sda3 28G 26G 2.1G 93% /
[ ... ]
> So from chunk level, your fs is already full. And balance
> won't success since
[ ... ]
> The issue isn't total size, it's the difference between total
> size and the amount of data you want to store on it. and how
> well you manage chunk usage. If you're balancing regularly to
> compact chunks that are less than 50% full, [ ... ] BTRFS on
> 16GB disk images before with
[ ... ]
> a ten disk raid1 using 7.2k 3 TB SAS drives
Those are really low IOPS-per-TB devices, but good choice for
SAS, as they will have SCT/ERC.
> and used aio to test IOOP rates. I was surprised to measure
> 215 read and 72 write IOOPs on the clean new filesystem.
For that you really want
>>> On Mon, 27 Feb 2017 22:11:29 +, p...@btrfs.list.sabi.co.uk (Peter
>>> Grandi) said:
> [ ... ]
>> I have a 6-device test setup at home and I tried various setups
>> and I think I got rather better than that.
[ ... ]
> That's a range of 700-1300 4KiB
[ ... ]
> I have a 6-device test setup at home and I tried various setups
> and I think I got rather better than that.
* 'raid1' profile:
soft# btrfs fi df /mnt/sdb5
Data, RAID1:
> [ ... ] slaps together a large storage system in the cheapest
> and quickest way knowing that while it is mostly empty it will
> seem very fast regardless and therefore to have awesome
> performance, and then the "clever" sysadm disappears surrounded
> by a halo of glory before the storage
> [ ... ] reminded of all the cases where someone left me to
> decatastrophize a storage system built on "optimistic"
> assumptions.
In particular when some "clever" sysadm with a "clever" (or
dumb) manager slaps together a large storage system in the
cheapest and quickest way knowing that while
This is going to be long because I am writing something detailed
hoping pointlessly that someone in the future will find it by
searching the list archives while doing research before setting
up a new storage system, and they will be the kind of person
that tolerates reading messages longer than
> I glazed over at “This is going to be long” … :)
>> [ ... ]
Not only that, you also top-posted while quoting it pointlessly
in its entirety, to the whole mailing list. Well played :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to
> [ ... ] In each filesystem subdirectory are incremental
> snapshot subvolumes for that filesystem. [ ... ] The scheme
> is something like this:
> /backup///
BTW hopefully this does not amounts to too many subvolumes in
the '.../backup/' volume, because that can create complications,
where
[ ... ]
> BUT if i take a snapshot from the system, and want to transfer
> it to the external HD, i can not set a parent subvolume,
> because there isn't any.
Questions like this are based on incomplete understanding of
'send' and 'receive', and on IRC user "darkling" explained it
fairly well:
>
> I’ve glazed over on “Not only that …” … can you make youtube
> video of that :)) [ ... ] It’s because I’m special :*
Well played again, that's a fairly credible impersonation of a
node.js/mongodb developer :-).
> On a real note thank’s [ ... ] to much of open source stuff is
> based on short
>> As a general consideration, shrinking a large filetree online
>> in-place is an amazingly risky, difficult, slow operation and
>> should be a last desperate resort (as apparently in this case),
>> regardless of the filesystem type, and expecting otherwise is
>> "optimistic".
> The way btrfs is
>> My guess is that very complex risky slow operations like that are
>> provided by "clever" filesystem developers for "marketing" purposes,
>> to win box-ticking competitions. That applies to those system
>> developers who do know better; I suspect that even some filesystem
>> developers are
>>> My guess is that very complex risky slow operations like
>>> that are provided by "clever" filesystem developers for
>>> "marketing" purposes, to win box-ticking competitions.
>>> That applies to those system developers who do know better;
>>> I suspect that even some filesystem developers
> Can you try to first dedup the btrfs volume? This is probably
> out of date, but you could try one of these: [ ... ] Yep,
> that's probably a lot of work. [ ... ] My recollection is that
> btrfs handles deduplication differently than zfs, but both of
> them can be very, very slow
But the big
>>> The way btrfs is designed I'd actually expect shrinking to
>>> be fast in most cases. [ ... ]
>> The proposed "move whole chunks" implementation helps only if
>> there are enough unallocated chunks "below the line". If regular
>> 'balance' is done on the filesystem there will be some, but
>> [ ... ] CentOS, Redhat, and Oracle seem to take the position
>> that very large data subvolumes using btrfs should work
>> fine. But I would be curious what the rest of the list thinks
>> about 20 TiB in one volume/subvolume.
> To be sure I'm a biased voice here, as I have multiple
>
> [ ... ] what the signifigance of the xargs size limits of
> btrfs might be. [ ... ] So what does it mean that btrfs has a
> higher xargs size limit than other file systems? [ ... ] Or
> does the lower capacity for argument length for hfsplus
> demonstrate it is the superior file system for
> How can I attempt to rebuild the metadata, with a treescan or
> otherwise?
I don't know unfortunately for backrefs.
>> In general metadata in Btrfs is fairly intricate and metadata
>> block loss is pretty fatal, that's why metadata should most
>> times be redundant as in 'dup' or 'raid1' or
> Read error at byte 0, while reading 3975 bytes: Input/output error
Bad news. That means that probably the disk is damaged and
further issues may happen.
> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0
Even worse news.
> Incorrect local backref count on
> [ ... ] I tried to use eSATA and ext4 first, but observed
> silent data corruption and irrecoverable kernel hangs --
> apparently, SATA is not really designed for external use.
SATA works for external use, eSATA works well, but what really
matters is the chipset of the adapter card.
In my
>> Approximately 16 hours ago I've run a script that deleted
>> >~100 snapshots and started quota rescan on a large
>> USB-connected btrfs volume (5.4 of 22 TB occupied now).
That "USB-connected is a rather bad idea. On the IRC channel
#Btrfs whenever someone reports odd things happening I ask
[ ... ]
>>> $ D='btrfs f2fs gfs2 hfsplus jfs nilfs2 reiserfs udf xfs'
>>> $ find $D -name '*.ko' | xargs size | sed 's/^ *//;s/ .*\t//g'
>>> textfilename
>>> 832719 btrfs/btrfs.ko
>>> 237952 f2fs/f2fs.ko
>>> 251805 gfs2/gfs2.ko
>>> 72731 hfsplus/hfsplus.ko
>>> 171623
> [ ... ] This post is way too long [ ... ]
Many thanks for your report, it is really useful, especially the
details.
> [ ... ] using rsync with --link-dest to btrfs while still
> using rsync, but with btrfs subvolumes and snapshots [1]. [
> ... ] Currently there's ~35TiB of data present on the
>> Consider the common case of a 3-member volume with a 'raid1'
>> target profile: if the sysadm thinks that a drive should be
>> replaced, the goal is to take it out *without* converting every
>> chunk to 'single', because with 2-out-of-3 devices half of the
>> chunks will still be fully
[ ... on the difference between number of devices and length of
a chunk-stripe ... ]
> Note: possibilities get even more interesting with a 4-device
> volume with 'raid1' profile chunks, and similar case involving
> other profiles than 'raid1'.
Consider for example a 4-device volume with 2
>> What makes me think that "unmirrored" 'raid1' profile chunks
>> are "not a thing" is that it is impossible to remove
>> explicitly a member device from a 'raid1' profile volume:
>> first one has to 'convert' to 'single', and then the 'remove'
>> copies back to the remaining devices the 'single'
> [ ... ] Meanwhile, the problem as I understand it is that at
> the first raid1 degraded writable mount, no single-mode chunks
> exist, but without the second device, they are created. [
> ... ]
That does not make any sense, unless there is a fundamental
mistake in the design of the 'raid1'
[ ... ]
>>> I've got a mostly inactive btrfs filesystem inside a virtual
>>> machine somewhere that shows interesting behaviour: while no
>>> interesting disk activity is going on, btrfs keeps
>>> allocating new chunks, a GiB at a time.
[ ... ]
> Because the allocator keeps walking forward every
[ ... ]
> grep 'model name' /proc/cpuinfo | sort -u
> model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
Good, contemporary CPU with all accelerations.
> The sda device is a hardware RAID5 consisting of 4x8TB drives.
[ ... ]
> Strip Size : 256 KB
So the full RMW data
> [ ... ] It is hard for me to see a speed issue here with
> Btrfs: for comparison I have done a simple test with a both a
> 3+1 MD RAID5 set with a 256KiB chunk size and a single block
> device on "contemporary" 1T/2TB drives, capable of sequential
> transfer rates of 150-190MB/s: [ ... ]
The
[ ... ]
> Also added:
Feeling very generous :-) today, adding these too:
soft# mkfs.btrfs -mraid10 -draid10 -L test5 /dev/sd{b,c,d,e}3
[ ... ]
soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/sdb3 /mnt/test5
soft# rm -f /mnt/test5/testfile
soft# /usr/bin/time dd
[ ... ]
> This is the "storage for beginners" version, what happens in
> practice however depends a lot on specific workload profile
> (typical read/write size and latencies and rates), caching and
> queueing algorithms in both Linux and the HA firmware.
To add a bit of slightly more advanced
>> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe
>> size". [ ... ] several back-to-back 128KiB writes [ ... ] get
>> merged by the 3ware firmware only if it has a persistent
>> cache, and maybe your 3ware does not have one,
> KOS: No I don't have persistent cache. Only the 512 Mb cache
> [ ... ] announce loudly and clearly to any potential users, in
> multiple places (perhaps a key announcement in a few places
> and links to that announcement from many places,
https://btrfs.wiki.kernel.org/index.php/Gotchas#Having_many_subvolumes_can_be_very_slow
> ... DO expect to first have
> Old news is that systemd-journald journals end up pretty
> heavily fragmented on Btrfs due to COW.
This has been discussed before in detail indeeed here, but also
here: http://www.sabi.co.uk/blog/15-one.html?150203#150203
> While journald uses chattr +C on journal files now, COW still
>
> [ ... ] And that makes me wonder whether metadata
> fragmentation is happening as a result. But in any case,
> there's a lot of metadata being written for each journal
> update compared to what's being added to the journal file. [
> ... ]
That's the "wandering trees" problem in COW filesystems,
>> The gotcha though is there's a pile of data in the journal
>> that would never make it to rsyslogd. If you use journalctl
>> -o verbose you can see some of this.
> You can send *all the info* to rsyslogd via imjournal
> http://www.rsyslog.com/doc/v8-stable/configuration/modules/imjournal.html
> [ ... ] these extents are all over the place, they're not
> contiguous at all. 4K here, 4K there, 4K over there, back to
> 4K here next to this one, 4K over there...12K over there, 500K
> unwritten, 4K over there. This seems not so consequential on
> SSD, [ ... ]
Indeed there were recent
> [ ... ] Instead, you can use raw files (preferably sparse unless
> there's both nocow and no snapshots). Btrfs does natively everything
> you'd gain from qcow2, and does it better: you can delete the master
> of a cloned image, deduplicate them, deduplicate two unrelated images;
> you can turn
>> [ ... ] these extents are all over the place, they're not
>> contiguous at all. 4K here, 4K there, 4K over there, back to
>> 4K here next to this one, 4K over there...12K over there, 500K
>> unwritten, 4K over there. This seems not so consequential on
>> SSD, [ ... ]
> Indeed there were recent
> I am stuck with a problem of btrfs slow performance when using
> compression. [ ... ]
That to me looks like an issue with speed, not performance, and
in particular with PEBCAK issues.
As to high CPU usage, when you find a way to do both compression
and checksumming without using much CPU time,
In addition to my previous "it does not happen here" comment, if
someone is reading this thread, there are some other interesting
details:
> When the compression is turned off, I am able to get the
> maximum 500-600 mb/s write speed on this disk (raid array)
> with minimal cpu usage.
No details
> Peter, I don't think the filefrag is showing the correct
> fragmentation status of the file when the compression is used.
As reported on a previous message the output of 'filefrag -v'
which can be used to see what is going on:
filefrag /mnt/sde3/testfile
/mnt/sde3/testfile: 49287
> We use the crcs to catch storage gone wrong, [ ... ]
And that's an opportunistically feasible idea given that current
CPUs can do that in real-time.
> [ ... ] It's possible to protect against all three without COW,
> but all solutions have their own tradeoffs and this is the setup
> we chose.
[ ... ]
>>> Snapshots work fine with nodatacow, each block gets CoW'ed
>>> once when it's first written to, and then goes back to being
>>> NOCOW.
>>> The only caveat is that you probably want to defrag either
>>> once everything has been rewritten, or right after the
>>> snapshot.
>> I thought
[ ... ]
> But I've talked to some friend at the local super computing
> centre and they have rather general issues with CoW at their
> virtualisation cluster.
Amazing news! :-)
> Like SUSE's snapper making many snapshots leading the storage
> images of VMs apparently to explode (in terms of
>>> I've one system where a single kworker process is using 100%
>>> CPU sometimes a second process comes up with 100% CPU
>>> [btrfs-transacti]. [ ... ]
>> [ ... ]1413 Snapshots. I'm deleting 50 of them every night. But
>> btrfs-cleaner process isn't running / consuming CPU currently.
Reminder
[ ... ]
It is beneficial to not have snapshots in-place. With a local
directory of snapshots, [ ... ]
Indeed and there is a fair description of some options for
subvolume nesting policies here which may be interesting to the
original poster:
> How do I find the root filesystem of a subvolume?
> Example:
> root@fex:~# df -T
> Filesystem Type 1K-blocks Used Available Use% Mounted on
> - -1073740800 104244552 967773976 10% /local/.backup/home
[ ... ]
> I know, the root filesystem is /local,
That question is
[ ... ]
>> There is no fixed relationship between the root directory
>> inode of a subvolume and the root directory inode of any
>> other subvolume or the main volume.
> Actually, there is, because it's inherently rooted in the
> hierarchy of the volume itself. That root inode for the
>
> So, still: What is the problem with user_subvol_rm_allowed?
As usual, it is complicated: mostly that while subvol creation
is very cheap, subvol deletion can be very expensive. But then
so can be creating many snapshots, as in this:
https://www.spinics.net/lists/linux-btrfs/msg62760.html
> This is a vanilla SLES12 installation: [ ... ] Why does SUSE
> ignore this "not too many subvolumes" warning?
As in many cases with Btrfs "it's complicated" because of the
interaction of advanced features among themselves and the chosen
implementation and properties of storage; anisotropy
> I have a btrfs filesystem mounted at /btrfs_vol/ Every N
> minutes, I run bedup for deduplication of data in /btrfs_vol
> Inside /btrfs_vol, I have several subvolumes (consider this as
> home directories of several users) I have set individual
> qgroup limits for each of these subvolumes. [ ...
> [ ... ] This will make some filesystems mostly RAID1, negating
> all space savings of RAID5, won't it? [ ... ]
RAID5/RAID6/... don't merely save space, more precisely they
trade lower resilience and a more anisotropic and smaller
performance envelope to gain lower redundancy (= save space).
--
> I intend to provide different "views" of the data stored on
> btrfs subvolumes. e.g. mount a subvolume in location A rw;
> and ro in location B while also overwriting uids, gids, and
> permissions. [ ... ]
That's not how UNIX/Linux permissions and ACLs are supposed to
work, perhaps you should
> Trying to peg down why I have one server that has
> btrfs-transacti pegged at 100% CPU for most of the time.
Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What kind
of workload and file size/distribution profile?
Typical
> Case #1
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu cow2 storage
> -> guest BTRFS filesystem
> SQL table row insertions per second: 1-2
"Doctor, if I stab my hand with a fork it hurts a lot: can you
cure that?"
> Case #2
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs ->
[ ... ]
Case #1
2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs
-> qemu cow2 storage -> guest BTRFS filesystem
SQL table row insertions per second: 1-2
Case #2
2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs
-> qemu raw storage -> guest EXT4 filesystem
[ ... ]
> I can delete normal subvolumes but not the readonly snapshots:
It is because of ordinary permissions for both subvolumes and
snapshots:
tree$ btrfs sub create /fs/sda7/sub
Create subvolume '/fs/sda7/sub'
tree$ chmod a-w /fs/sda7/sub
tree$ btrfs sub del /fs/sda7/sub
> [ ... ] mounted with option user_subvol_rm_allowed [ ... ]
> root can delete this snapshot, but not the user. Why? [ ... ]
Ordinary permissions still apply both to 'create' and 'delete':
tree$ sudo mkdir /fs/sda7/dir
tree$ btrfs sub create /fs/sda7/dir/sub
ERROR: cannot access
> A few years ago I tried to use a RAID1 mdadm array of a SATA
> and a USB disk, which lead to strange error messages and data
> corruption.
That's common, quite a few reports of similar issues in previous
entries in this mailing list and for many other filesystems.
> I did some searching back
>> TL;DR: ran into some btrfs errors and weird behaviour, but
>> things generally seem to work. Just posting some details in
>> case it helps devs or other users. [ ... ] I've run into a
>> btrfs error trying to do a -j8 build of android on a btrfs
>> filesystem exported over NFSv3. [ ... ]
I
[ ... ]
> [233787.921018] Call Trace:
> [233787.921031] ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs]
> [233787.921039] __btrfs_run_delayed_refs+0x6f0/0x1380 [btrfs]
> [233787.921047] btrfs_run_delayed_refs+0x6b/0x250 [btrfs]
> [233787.921054] btrfs_write_dirty_block_groups+0x158/0x390 [btrfs]
> How can I test if a subvolume is a snapshot? [ ... ]
This question is based on the assumption that "snapshot" is a
distinct type of subvolume and not just an operation that
creates a subvolume with reflinked contents.
Unfortunately Btrfs does indeed make snapshots a distinct type
of
> As I am writing some documentation abount creating snapshots:
> Is there a generic name for both volume and subvolume root?
Yes, it is from the UNIX side 'root directory' and from the
Btrfs side 'subvolume'. Like some other things Btrfs, its
terminology is often inconsistent, but "volume"
> i run a few performance tests comparing mdadm, hardware raid
> and the btrfs raid.
Fantastic beginning already! :-)
> I noticed that the performance
I have seen over the years a lot of messages like this where
there is a wanton display of amusing misuses of terminology, of
which the misuse of
> I am trying to figure out which means "top level" in the
> output of "btrfs sub list"
The terminology (and sometimes the detailed behaviour) of Btrfs
is not extremely consistent, I guess because of permissive
editorship of the design, in a "let 1000 flowers bloom" sort
of fashion so that does
>> Using hundreds or thousands of snapshots is probably fine
>> mostly.
As I mentioned previously, with a link to the relevant email
describing the details, the real issue is reflinks/backrefs.
Usually subvolume and snapshots involve them.
> We find that typically apt is very slow on a machine
>>> [ ... ] Currently without any ssds i get the best speed with:
>>> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>>> and using btrfs as raid 0 for data and metadata on top of
>>> those 4 raid 5. [ ... ] the write speed is not as good as i
>>> would like - especially for random
>> [ ... ] Currently the write speed is not as good as i would
>> like - especially for random 8k-16k I/O. [ ... ]
> [ ... ] So this 60TB is then 20 4TB disks or so and the 4x 1GB
> cache is simply not very helpful I think. The working set
> doesn't fit in it I guess. If there is mostly single or
> [ ... ] I ran "btrfs balance" and then it started working
> correctly again. It seems that a btrfs filesystem if left
> alone will eventually get fragmented enough that it rejects
> writes [ ... ]
Free space will get fragmented, because Btrfs has a 2-level
allocator scheme (chunks within
> [ ... ] - needed volume size is 60TB
I wonder how long that takes to 'scrub', 'balance', 'check',
'subvolume delete', 'find', etc.
> [ ... ] 4x HW Raid 5 with 1GB controller memory of 4TB 3,5"
> devices and using btrfs as raid 0 for data and metadata on top
> of those 4 raid 5. [ ... ] the
>> I forget sometimes that people insist on storing large
>> volumes of data on unreliable storage...
Here obviously "unreliable" is used on the sense of storage that
can work incorrectly, not in the sense of storage that can fail.
> In my opinion the unreliability of the storage is the exact
>
> [ ... ] After all, btrfs would just have to discard one copy
> of each chunk. [ ... ] One more thing that is not clear to me
> is the replication profile of a volume. I see that balance can
> convert chunks between profiles, for example from single to
> raid1, but I don't see how the default
[ ... ]
>> Oh please, please a bit less silliness would be welcome here.
>> In a previous comment on this tedious thread I had written:
>> > If the block device abstraction layer and lower layers work
>> > correctly, Btrfs does not have problems of that sort when
>> > adding new devices;
[ ... ]
>> are USB drives really that unreliable [ ... ]
[ ... ]
> There are similar SATA chips too (occasionally JMicron and
> Marvell for example are somewhat less awesome than they could
> be), and practically all Firewire bridge chips of old "lied" a
> lot [ ... ]
> That plus Btrfs is
> [ ... ] However, the disappearance of the device doesn't get
> propagated up to the filesystem correctly,
Indeed, sometimes it does, sometimes it does not, in part
because of chipset bugs, in part because the USB protocol
signaling side does not handle errors well even if the chipset
were bug
[ ... ]
>>> Oh please, please a bit less silliness would be welcome here.
>>> In a previous comment on this tedious thread I had written:
> If the block device abstraction layer and lower layers work
> correctly, Btrfs does not have problems of that sort when
> adding new devices;
> [ ... ] when writes to a USB device fail due to a temporary
> disconnection, the kernel can actually recognize that a write
> error happened. [ ... ]
Usually, but who knows? Maybe half transfer gets written; maybe
the data gets written to the wrong address; maybe stuff gets
written but failure
>>> If the underlying protocal doesn't support retry and there
>>> are some transient errors happening somewhere in our IO
>>> stack, we'd like to give an extra chance for IO.
>> A limited number of retries may make sense, though I saw some
>> long stalls after retries on bad disks.
Indeed! One
> [ ... ] btrfs incorporates disk management which is actually a
> version of md layer, [ ... ]
As far as I know Btrfs has no disk management, and was wisely
designed without any, just like MD: Btrfs volumes and MD sets
can be composed from "block devices", not disks, and block
devices are quite
"Duncan"'s reply is slightly optimistic in parts, so some
further information...
[ ... ]
> Basically, at this point btrfs doesn't have "dynamic" device
> handling. That is, if a device disappears, it doesn't know
> it.
That's just the consequence of what is a completely broken
conceptual
>> I haven't seen that, but I doubt that it is the radical
>> redesign of the multi-device layer of Btrfs that is needed to
>> give it operational semantics similar to those of MD RAID,
>> and that I have vaguely described previously.
> I agree that btrfs volume manager is incomplete in view of
>
>> The fact is, the only cases where this is really an issue is
>> if you've either got intermittently bad hardware, or are
>> dealing with external
> Well, the RAID1+ is all about the failing hardware.
>> storage devices. For the majority of people who are using
>> multi-device setups, the
[ ... ]
> The advantage of writing single chunks when degraded, is in
> the case where a missing device returns (is readded,
> intact). Catching up that device with the first drive, is a
> manual but simple invocation of 'btrfs balance start
> -dconvert=raid1,soft -mconvert=raid1,soft' The
[ ... ]
> The poor performance has existed from the beginning of using
> BTRFS + KDE + Firefox (almost 2 years ago), at a point when
> very few snapshots had yet been created. A comparison system
> running similar hardware as well as KDE + Firefox (and LVM +
> EXT4) did not have the performance
> Another one is to find the most fragmented files first or all
> files of at least 1M with with at least say 100 fragments as in:
> find "$HOME" -xdev -type f -size +1M -print0 | xargs -0 filefrag \
> | perl -n -e 'print "$1\0" if (m/(.*): ([0-9]+) extents/ && $1 > 100)' \
> | xargs -0 btrfs fi
>> The issue is that updatedb by default will not index bind
>> mounts, but by default on Fedora and probably other distros,
>> put /home on a subvolume and then mount that subvolume which
>> is in effect a bind mount.
>
> So the issue isn't /home being btrfs (as you said in the
> subject), but
> When defragmenting individual files on a BTRFS filesystem with
> COW, I assume reflinks between that file and all snapshots are
> broken. So if there are 30 snapshots on that volume, that one
> file will suddenly take up 30 times more space... [ ... ]
Defragmentation works by effectively making
> I'm following up on all the suggestions regarding Firefox performance
> on BTRFS. [ ... ]
I haven't read that yet, so maybe I am missing something, but I
use Firefox with Btrfs all the time and I haven't got issues.
[ ... ]
> 1. BTRFS snapshots have proven to be too useful (and too important
> I formatted the / partition with Btrfs again and could restore
> the files from a backup. Everything seems to be there, I can
> mount the Btrfs manually. [ ... ] But SLES finds from where I
> don't know a UUID (see screenshot). This UUID is commented out
> in fstab and replaced by
>> But it could simply be that you have forgotten to refresh the
>> 'initramfs' with 'mkinitrd' after modifying the '/etc/fstab'.
> I finally managed it. I'm pretty sure having changed
> /boot/grub/menu.lst, but somehow changes got lost/weren't
> saved ?
So the next thing to check would indeed
> I am trying to better understand how the cleaner kthread
> (btrfs-cleaner) impacts foreground performance, specifically
> during snapshot deletion. My experience so far has been that
> it can be dramatically disruptive to foreground I/O.
That's such a warmly innocent and optimistic question!
> When testing Btrfs with fio 4k random write,
That's an exceptionally narrowly defined workload. Also it is
narrower than that, because it must be without 'fsync' after
each write, or else there would be no accumulation of dirty
blocks in memory at all.
> I found that volume with smaller free
98 matches
Mail list logo