r.
Disclaimer: all the above statements in relation to conception and
understanding of quotas, not to be confused with qgroups.
--
Tomasz Pala
other problem)
I am (more than before) aware what btrfs quotas are not.
So, my only expectation (except for worldwide peace and other
unrealistic ones) would be to stop using "quotas", "subvolume quotas"
and "qgroups" interchangeably in btrfs context, as IMvHO these are not
plain, well-known "quotas".
--
Tomasz Pala
ommand btrfs qgroup(8)"
- they are the same... just completely different from traditional "quotas".
My suggestion would be to completely remove the standalone "quota" word
from btrfs documentation - there is no "quota", just "subvolume quota"
or "qgroup" supported.
--
Tomasz Pala
one day without any known reason), misnamed ...and
not reflecting anything valuable, unless the problems with extent
fragmentation are already resolved somehow?
So IMHO current quotas are:
- not discoverable for user (shared->exclusive transition of my data by
someone's else action),
- not reliable for sysadm (offensive write pattern by any user can allocate
virtually any space despite of quotas).
--
Tomasz Pala
fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).
In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").
en with current approach it should be possible to interlace
defragmentation with some kind of naive-deduplication; "naive" in the
approach of comparing blocks only within the same in-subvolume paths.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux
On Sun, Feb 18, 2018 at 10:28:02 +0100, Tomasz Pala wrote:
> I've already noticed this problem on February 10th:
> [btrfs-progs] coreutils-like -i parameter, splitting permissions for various
> tasks
>
> In short: not possible. Regular user can only create subvolumes.
Not
ld fail miserably.
> After few years not using btrfs (because previously was quite
> unstable) It is really good to see that now I'm not able to crash it.
It's not crashing with LTS 4.4 and 4.9 kernels, many reports of various
crashes in 4.12, 4.14 and 4.15 were posted here. It is real
sibly hostile write patterns (like /home) as nocow.
Actually, if you do not use compression and don't need checksums of data
blocks, you may want to mount all the btrfs with nocow by default.
This way the quotas would be more accurate (no fragmentation _between_
snapshots) and you&
ackup-admin
with access to all the subvolumes or maintenance-admin that could issue
scrub or rebalance volumes. For backward compatibility, these tools
could be issued by 'btrfs' wrapper binary.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
is planned:
http://0pointer.net/blog/projects/stateless.html
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
look like.
Hard to agree with someone who refuses to do _anything_.
You can choose to follow whatever, MD, LVM, ZFS, invent something
totally different, write custom daemon or put timeout logic inside the
kernel itself. It doesn't matter. You know the ecosystem - it is the
udev that must be
; profiles degraded using OpenRC without needing anything more than adding
> rootflags=degraded to the kernel parameters must be a fluke then...
We are talking about automatic fallback after timeout, not manually
casting any magic spells! Since OpenRC doesn't read rootflags at all:
grep -iE 'rootflags|degraded|btrfs' openrc/**/*
it won't support this without some extra code.
> The thing is, it primarily breaks if there are hardware issues,
> regardless of the init system being used, but at least the other init
> systems _give you an error message_ (even if it's really the kernel
> spitting it out) instead of just hanging there forever with no
> indication of what's going on like systemd does.
If your systemd waits forever and you have no error messages, report bug
to your distro maintainer, as he is probably the one to blame for fixing
what ain't broken.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 30, 2018 at 16:09:50 +0100, Tomasz Pala wrote:
>> BCP for over a
>> decade has been to put multipathing at the bottom, then crypto, then
>> software RAID, than LVM, and then whatever filesystem you're using.
>
> Really? Let's enumerate some ca
systemd stepped in for some of there is that nobody else could
introduce and force Linux-wide consensus. And if anyone would succeed,
there would be some Austins blaming them for 'overtaking good old
trashyard into coherent de facto standard.'
> In this particular case, you don't need
won't
be accepted in systemd upstream, especially because it requires the
current udev rule to be slightly changed.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
things do try to mount
> filesystems without calling a mount helper, most notably the kernel when
> it mounts the root filesystem on boot if you're not using an initramfs).
> All in all, this type of thing gets out of hand _very_ fast.
You need to think about the two separately:
1.
Just change the BTRFS_IOC_DEVICES_READY handler to always return READY.
>>
> Or maybe we should just remove it completely, because checking it _IS
> WRONG_,
That's right. But before commiting upstream, check for consequences.
I've already described a few today, pointed the source and gave some
possible alternate solutions.
> which is why no other init system does it, and in fact no
Other init systems either fail at mounting degraded btrfs just like
systemd does, or have buggy workarounds in their code reimplemented in
each other just to handle thing, that should be centrally organized.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
operator action, so the
umount SHOULD happen, or we are facing some MALFUNCION, which is fatal
itself, not by being a "race condition".
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.
esort.
> It's not rocket science to edit an init script if knobs it exposes are not
> configurable enough for your needs.
How many init scripts were you involved in?
> If systemd decides to hide this
> functionality, it needs to provide the admin with some way to override.
There is - udev roules and systemd units I've mentioned. Just use them.
> We're talking about issuing a mount call, it's not _that_ complicated.
So just do it! https://github.com/systemd/systemd
Please, go ahead with some PoC implementation, as this is REALLY hard to
discuss init systems/scripts corner cases with someone that has
apparently never written a single line of such code.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
ator. But somewhere, sometime, someone would
have a NEED for totally different set of rules for handling degraded
volumes, just like MD or LVM does. This would be totally irresponsible
to hardcode any mount-degraded rule inside systemd itself.
That is exactly why this must go through the udev - u
y to push.
If the IOCTL would be extended to return TRYING_DEGRADED (when
instructed to do so after expired timeout), systemd could handle
additional per-filesystem fstab options, like x-systemd.allow-degraded.
Then in would be possible to have best-effort policy for rootfs (to make
machine boot)
othing more
than that), but overall _availability_.
I do not care if there are 2, 5 or 100 devices. I do care if there is
ENOUGH devices to run regular (including N-way mirroring and hot spares)
and if not - if there is ENOUGH devices to run degraded. Having ALL the
devices is just the edge case
On Sun, Jan 28, 2018 at 01:00:16 +0100, Tomasz Pala wrote:
> It can't mount degraded, because the "missing" device might go online a
> few seconds ago.
s/ago/after/
>> The central problem is the lack of a timer and time out.
>
> You got mdadm-last-resort@.time
ked as 'not available',
don't expect it to be kept used. Just fix the code to match reality.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
;the kernel has already mounted it" and ignore kernel screaming
"the device is (not yet there/gone)"?
Just update the internal state after successful mount and this
particular problem is gone. Unless there is some race condition and the
state should be changed before the mount i
/blah" ->
BTRFS_IOC_DEVICES_READY returns "READY" (or new value "DEGRADED") -> udev
catches event and changes SYSTEMD_READY -> systemd mounts the volume.
This is really simple. All you need to do is to pass "degraded" to the
btrfs.ko, so the BTRFS_IOC_DEVIC
at is not true. It's not how mdadm works anyway.
Yes it does. You can't mount mdadm until /dev/mdX appears, which happens
when array get's fully assembled *OR* times out and kernel get's
instructed to run array as degraded, which effects in /dev/mdX appearing.
There is NO a
sdc would answer the same, BTW). It can
> even ask for UUIDs -- all devices are present. So, mount will succeed,
> right?
Systemd doesn't count anything, it asks BTRFS_IOC_DEVICES_READY as
implemented in btrfs/super.c.
> Ie, the thing systemd can safely do, is to stop trying to rule
eady to be mounted, but not fully populated" (i.e.
"degraded mount possible"). Then systemd could _fallback_ after timing
out to degraded mount automatically according to some systemd-level
option.
Unless there is *some* signalling from btrfs, there is really not much
systemd can *sa
r basis? By 'required'
I mean by design/implementation issues/quirks, _not_ related to possible
hardware malfunctions.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
y kernel itself, while btrfs cannot
(so initrd is required for rootfs).
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
when _you_ need to stop ignoring
the fact, that you simply cannot just try mounting devices in a loop as
this would render any NAS/FC/iSCSI-backed or more complicated systems
unusable or hide problems in case of temporary problems with connection.
systemd waits for the _underlying_ device - unless btr
Errata:
On Wed, Dec 20, 2017 at 09:34:48 +0100, Tomasz Pala wrote:
> /dev/sda -> 'not ready'
> /dev/sdb -> 'not ready'
> /dev/sdc -> 'ready', triggers /dev/sda -> 'not ready' and /dev/sdb - still
> 'not ready'
&
t;no more devices, give
me all the remaining btrfs volumes in degraded mode if possible". By
"give me btrfs vulumes" I mean "mark them as 'ready'" so the udev could
fire it's rules. And if there would be anything for udev to distinguish
'ready' fr
me knob, module
>> parameter or anything else to make the *R*aid work.
> There's a mount option for it per-filesystem. Just add that to all your
> mount calls, and you get exactly the same effect.
If only they were passed...
--
Tomasz Pala
--
To unsubscribe from this lis
that aren't Arch, Gentoo, or
> Slackware derived do so too to a lesser degree), and it would require
> constant curation to keep up to date. Only for long-term known issues
OK, you've convinced me that kernel-vs-feature list is overhead.
So maybe other approach: just like sy
enable it.
I thought the work was already done if current kernel handles degraded RAID1
without switching to r/o, doesn't it? Or something else is missing?
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
have to be default, might be kernel compile-time knob, module
parameter or anything else to make the *R*aid work.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
from raid. And
Wouldn't want to worry you, but properly managed RAIDs make I/J-of-K
trivial-failures transparent. Just like ECC protects N/M bits transparently.
Investigating the reasons is sysadmin's job, just like other
maintenance, including restoring protection level.
--
Tomas
d be posted without creating the impression, that it's all
about creating complain-list. Not to mention I'm absolutely not familiar
with current patches, WIP and many many other corner cases or usage
scenarios. In a fact, not only the internals, but motivation and design
principles must be wel
to fix the volume, accidentally the machine has rebooted.
Which should do no harm if I had a RAID1.
4. As already said before, using r/w degraded RAID1 is FULLY ACCEPTABLE,
as long as you accept "no more redundancy"...
4a. ...or had an N-way mirror and there is still some redundancy
I got one "RAID1" stuck in r/o after degraded mount, not nice... Not
_expected_ to happen after single disk failure (without any reappearing).
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
like /home
or /tmp (if held on btrfs).
I'd say, that from security point of view the nocow should be default,
unless specified for mount or specific file... Currently, if I mount
with nocow, there is no way to whitelist trusted users or secure
location, and until btrfs-specific options cou
oesn't share any physical locations with the old one.
But still grows, so what does this situation have with snapshots anyway?
Oh, and BTW - 900+ extents for ~5 GB taken means there is about 5.5 MB
occupied per extent. How is that possible?
--
Tomasz Pala
File log.14 has 933
On Sun, Dec 10, 2017 at 12:27:38 +0100, Tomasz Pala wrote:
> I have found a directory - pam_abl databases, which occupy 10 MB (yes,
> TEN MEGAbytes) and released ...8.7 GB (almost NINE GIGAbytes) after
# df
Filesystem Size Used Avail Use% Mounted on
/dev/sda264G
estore
complete files due to the nature of data loss (beginning of blocks).
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
are worth defragging if space released from
extents is greater than space lost on inter-snapshot duplication.
I can't just defrag entire filesystem since it breaks links with snapshots.
This change was a real deal-breaker here...
Any way to fed the deduplication code with snapshots maybe? Th
cted to be fixed internally, as the needs are conflicting, but their
impact might be nullified by some housekeeping.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Dec 02, 2017 at 17:28:12 +0100, Tomasz Pala wrote:
>> Suppose you start with a 100 MiB file (I'm adjusting the sizes down from
> [...]
>> Now make various small changes to the file, say under 16 KiB each. These
>> will each be COWed elsewhere as one might ex
eral times the size of the original file!
>
> Luckily few people have this sort of usage pattern, but if you do...
>
> It would certainly explain the space eating...
Did anyone investigated how is that related to RRD rewrites? I don't use
rrdcached, never thought that 1
ect here - reclaiming space
before it is being locked inside snapshot.
Rationale behind this is obvious: since the snapshot-aware defrag was
removed, allow to defrag snapshot exclusive data only.
This would of course result in partial file defragmentation, but that
should be enough for pathological cases like mine.
--
Tomasz Pala
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
w / 0.00s user 0.00s system 0% cpu 30.798 total
> And further more, please ensure that all deleted files are really deleted.
> Btrfs delay file and subvolume deletion, so you may need to sync several
> times or use "btrfs subv sync" to ensure deleted files are deleted.
Ye
GB.
At least one recent snapshot, that was taken after some minor (<100 MB) changes
from the subvolume, that has undergo some minor changes since then,
occupied 8 GB during one night when the entire system was idling.
This was crosschecked on files metadata (mtimes compared) and 'du'
altering text config files mostly (plus
etckeeper's git metadata), so the volume of difference is extremelly
low. Actually most of the difs between subvolumes come from updating
distro packages. There were not much reflink copies made on this
partition, only one kernel source compiled (.ccache
64.00GiB
Device slack: 0.00B
Data,single: 1.07GiB
Data,RAID1: 55.97GiB
Metadata,RAID1: 2.00GiB
System,RAID1: 32.00MiB
Unallocated: 4.93GiB
/dev/sdb2, ID: 2
Device size:64.00GiB
Device slack:
.
And the same happens with other snapshots, much more exclusive data
shown in qgroup than actually found in files. So if not files, where
is that space wasted? Metadata?
btrfs-progs-4.12 running on Linux 4.9.46.
best regards,
--
Tomasz Pala
--
To unsubscribe from this list: send the line "uns
57 matches
Mail list logo