On Fri, Dec 12, 2014 at 11:17:58AM +0200, Erkki Seppala wrote:
> That may be sort of true, but I think even SMART is helped by the fact
> that the media is read through from the beginning to the end*, so it can
> detect even the errors that don't bubble through the IO layer. And BTRFS
> can indeed
On Fri, Dec 12, 2014 at 02:28:06PM -0800, Robert White wrote:
> On 12/12/2014 08:45 AM, Zygo Blaxell wrote:
> >On Thu, Dec 11, 2014 at 10:01:06PM -0800, Robert White wrote:
> >>So RAID5 with three media M is
> >>
> >>MMM MMM
> >>D1 D2 P(a)
> >>D3 P(b) D4
> >>P(c) D5 D6
> >
> >RAID5 wi
On 12/12/2014 05:12 PM, Duncan wrote:
Robert White posted on Fri, 12 Dec 2014 05:29:58 -0800 as excerpted:
This still doesnt say _anything_ is wrong with your filesystem except
that it doesn't have enough _raw_ space to create a 2-ish gig extent.
What's wrong with the filesystem is that there
Robert White posted on Fri, 12 Dec 2014 03:16:03 -0800 as excerpted:
> Perhaps is just a tautological belief that someone here didn't buy into.
> Like how people keep partitioning drives into little slices for things
> because thats the preserved wisdom from early eighties.
While I absolutely agr
Robert White posted on Fri, 12 Dec 2014 05:29:58 -0800 as excerpted:
> This still doesnt say _anything_ is wrong with your filesystem except
> that it doesn't have enough _raw_ space to create a 2-ish gig extent.
What's wrong with the filesystem is that there shouldn't /be/ a need to
create a 2-
Goffredo Baroncelli posted on Fri, 12 Dec 2014 19:00:20 +0100 as
excerpted:
> $ sudo ./btrfs fi df /mnt/btrfs1/
> Data, RAID1: total=1.00GiB, used=512.00KiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: total=8.00MiB, used=16.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadat
Based on looking at how identically sized, empty, qcow2 files grow
when they're added to a Btrfs volume, the 1GiB Btrfs chunk or
allocation unit, doesn't have an immediate physical allocation. It's
more of a virtual thing, but it has a physical manifestation.
Single profile, 5 disks: As data is co
On Fri, Dec 12, 2014 at 03:25:19PM -0800, Robert White wrote:
> On 12/12/2014 02:59 PM, Hugo Mills wrote:
> >On Fri, Dec 12, 2014 at 02:54:24PM -0800, Robert White wrote:
> >>I've seen it mentioned here that generally data extents are 1G and
> >>metadata extents are 256M.
> >>
> >>Is that per-drive
On 12/12/2014 02:59 PM, Hugo Mills wrote:
On Fri, Dec 12, 2014 at 02:54:24PM -0800, Robert White wrote:
I've seen it mentioned here that generally data extents are 1G and
metadata extents are 256M.
Is that per-drive or per-stripe in the case of RAID0?
That is, if I have data mode raid0 across
On Fri, Dec 12, 2014 at 02:54:24PM -0800, Robert White wrote:
> I've seen it mentioned here that generally data extents are 1G and
> metadata extents are 256M.
>
> Is that per-drive or per-stripe in the case of RAID0?
>
> That is, if I have data mode raid0 across N drives does the system
> alloca
On 12/12/2014 02:46 PM, Tomasz Chmielewski wrote:
On 2014-12-12 23:34, Robert White wrote:
On 12/12/2014 01:46 PM, Tomasz Chmielewski wrote:
On 2014-12-12 22:36, Robert White wrote:
In another thread [that was discussing SMART] you talked about
replacing a drive and then needing to do some pa
I've seen it mentioned here that generally data extents are 1G and
metadata extents are 256M.
Is that per-drive or per-stripe in the case of RAID0?
That is, if I have data mode raid0 across N drives does the system
allocate one 1G extent on each drive making the full stripe allocation
N-gigs;
On 2014-12-12 23:34, Robert White wrote:
On 12/12/2014 01:46 PM, Tomasz Chmielewski wrote:
On 2014-12-12 22:36, Robert White wrote:
In another thread [that was discussing SMART] you talked about
replacing a drive and then needing to do some patching-up of the
result because of drive failures.
On 12/12/2014 01:46 PM, Tomasz Chmielewski wrote:
On 2014-12-12 22:36, Robert White wrote:
In another thread [that was discussing SMART] you talked about
replacing a drive and then needing to do some patching-up of the
result because of drive failures. Is this the same filesystem where
that hap
On 12/12/2014 08:45 AM, Zygo Blaxell wrote:
On Thu, Dec 11, 2014 at 10:01:06PM -0800, Robert White wrote:
So RAID5 with three media M is
MMM MMM
D1 D2 P(a)
D3 P(b) D4
P(c) D5 D6
RAID5 with two media is well defined, and looks like this:
MMM
D1 P(a)
P(b) D2
D3 P(c)
Lik
On 2014-12-12 22:36, Robert White wrote:
In another thread [that was discussing SMART] you talked about
replacing a drive and then needing to do some patching-up of the
result because of drive failures. Is this the same filesystem where
that happened?
Nope, it was on a different server.
--
To
On 12/12/2014 06:37 AM, Tomasz Chmielewski wrote:
FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).
# time btrfs balance start /mnt/lxc2
Segmentation fault
real322m32.153s
user0m0.000s
sys 16m0.930s
(...)
[20306.981773] BTRFS (device sdd1): parent transi
We shouldn't BUG_ON() if there is corruption. I hit this while testing my block
group patch and the abort worked properly. Thanks,
Signed-off-by: Josef Bacik
---
fs/btrfs/extent-tree.c | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/
Currently any time we try to update the block groups on disk we will walk _all_
block groups and check for the ->dirty flag to see if it is set. This function
can get called several times during a commit. So if you have several terabytes
of data you will be a very sad panda as we will loop throug
Depending on where the extent root shows up in the dirty list we could end up
recowing the extent root a few times during commit. This is inefficient, so
instead only track the other COW only roots and update them all at once, and
then do the extent root/block group update loop by itself to try an
We unconditionally delete csums for data extents, but we don't have csums for
free space cache, so all this does is force us to recow the csum root, which
will cause us to re-write the block group cache. This patch fixes this by
noticing if we're a free space cache extent and simply skipping the d
On Fri, Dec 12, 2014 at 11:32:13AM +0100, David Sterba wrote:
> On Tue, Dec 09, 2014 at 05:45:41PM -0800, Omar Sandoval wrote:
> > After some discussion on the mailing list, I decided that for simplicity and
> > reliability, it's best to simply disallow COW files and files with shared
> > extents (
On Fri, Dec 12, 2014 at 11:51:22AM +0100, David Sterba wrote:
> On Tue, Dec 09, 2014 at 05:45:48PM -0800, Omar Sandoval wrote:
> > +static void __clear_swapfile_extents(struct inode *inode)
> > +{
> > + u64 isize = inode->i_size;
> > + struct extent_map *em;
> > + u64 start, len;
> > +
> > +
On Fri, Dec 12, 2014 at 12:59 PM, nick wrote:
Greetings Chris and Josef,
I am wondering if the bug at this URL,
https://urldefense.proofpoint.com/v1/url?u=https://bugzilla.kernel.org/show_bug.cgi?id%3D82251&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=LzvRKZOlBaBg
On Fri, Dec 12, 2014 at 2:24 PM, Linus Torvalds
wrote:
On Fri, Dec 12, 2014 at 11:07 AM, Chris Mason wrote:
From a feature point of view, most of the code here comes from Miao
Xie
and others at Fujitsu to implement scrubbing and replacing devices
on
raid56. This has been in developm
The only way that "ret" is set is when we call scrub_pages_for_parity()
so the skip to "if (ret) " test doesn't make sense and causes a static
checker warning.
Signed-off-by: Dan Carpenter
---
Static checker work. Not tested. There are some other valid looking
warnings from the same file:
fs/b
On 12/11/2014 09:31 AM, Dongsheng Yang wrote:
> When function btrfs_statfs() calculate the tatol size of fs, it is calculating
> the total size of disks and then dividing it by a factor. But in some usecase,
> the result is not good to user.
I Yang; during my test I discovered an error:
$ sudo l
On Fri, Dec 12, 2014 at 11:07 AM, Chris Mason wrote:
>
> From a feature point of view, most of the code here comes from Miao Xie
> and others at Fujitsu to implement scrubbing and replacing devices on
> raid56. This has been in development for a while, and it's a big
> improvement.
So this has p
Hi Linus,
Please pull my for-linus branch:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
>From a feature point of view, most of the code here comes from Miao Xie
and others at Fujitsu to implement scrubbing and replacing devices on
raid56. This has been in develo
On 12/11/2014 09:31 AM, Dongsheng Yang wrote:
> When function btrfs_statfs() calculate the tatol size of fs, it is calculating
> the total size of disks and then dividing it by a factor. But in some usecase,
> the result is not good to user.
I am checking it; to me it seems a good improvement. How
Make the extent buffer allocation interface consistent. Cloned eb will
set a valid fs_info. For dummy eb, we can drop the length parameter and
set it from fs_info.
The built-in sanity checks may pass a NULL fs_info that's queried for
nodesize, but we know it's 4096.
Signed-off-by: David Sterba
Signed-off-by: David Sterba
---
fs/btrfs/relocation.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index cb5d4462ebb4..d83085381bcc 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2855,9 +2855,10 @@ s
Same mask from all callers.
Signed-off-by: David Sterba
---
fs/btrfs/extent_io.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ebabd237153..619592d86c2a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.
Because we're using globally known nodesize. Do the same for the sanity
test function variant.
Signed-off-by: David Sterba
---
fs/btrfs/disk-io.c| 5 ++---
fs/btrfs/extent_io.c | 5 +++--
fs/btrfs/extent_io.h | 4 ++--
fs/btrfs/tests/qgroup-tests.c | 2 +-
4 files c
All callers pass nodesize.
Signed-off-by: David Sterba
---
fs/btrfs/ctree.c | 8 +++-
fs/btrfs/disk-io.c | 4 ++--
fs/btrfs/disk-io.h | 2 +-
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/relocation.c | 3 +--
5 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/
Here's the rest of the parameter removal, no warnings anymore and passed
xfstests. There are a few more clenaups that were required to finish the goal.
You can pull from
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
cleanup/blocksize-diet-part2
Based on current next branch 962
Signed-off-by: David Sterba
---
fs/btrfs/extent-tree.c | 9 -
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c025751c20d7..50ebc74db508 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7215,11 +7215,1
Signed-off-by: David Sterba
---
fs/btrfs/disk-io.c | 4 ++--
fs/btrfs/disk-io.h | 2 +-
fs/btrfs/reada.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index be9d7c612489..8123b03b1f9d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs
Finally it's clear that the requested blocksize is always equal to
nodesize, with one exception, the superblock.
Superblock has fixed size regardless of the metadata block size, but
uses the same helpers to initialize sys array/chunk tree and to work
with the chunk items. So it pretends to be an e
Replace with global nodesize instead.
Signed-off-by: David Sterba
---
fs/btrfs/reada.c | 15 ++-
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index b63ae20618fb..5c3fde6571bb 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@
On Thu, Dec 11, 2014 at 10:01:06PM -0800, Robert White wrote:
> So RAID5 with three media M is
>
> MMM MMM
> D1 D2 P(a)
> D3 P(b) D4
> P(c) D5 D6
RAID5 with two media is well defined, and looks like this:
MMM
D1 P(a)
P(b) D2
D3 P(c)
With even parity and N disks
P(
They just opencode taking address of the timespec member.
Signed-off-by: David Sterba
---
fs/btrfs/ctree.h | 25 -
fs/btrfs/delayed-inode.c | 28
fs/btrfs/inode.c | 28
fs/btrfs/send.c | 9
On 12/11/14 6:37 AM, Holger Hoffstätte wrote:
>
> David,
>
> I was wondering if you could please send out announcements for btrfs-progs
> when you tag a release or -rc? There doesn't seem to be a good mechanism
> to track releases and IMHO the more people are notified, the more
> testing we can g
On Fri, Dec 12, 2014 at 09:38:10AM -0600, sys.syphus wrote:
> why would there be "unknown" data below? i have 2 btrfs arrays and
> both have this going on. neither are active. any idea why and how to
> make it go away?
>
>
> Btrfs v3.12
^^^
This is what's "wrong". Upgrade your userspa
why would there be "unknown" data below? i have 2 btrfs arrays and
both have this going on. neither are active. any idea why and how to
make it go away?
Btrfs v3.12
btrfs fi df /media/btrfs
Data, RAID1: total=314.00GiB, used=313.55GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1:
On Fri, Dec 12, 2014 at 08:34:09AM +, Filipe David Manana wrote:
> Very simple solution.
>
> Do:
>
> 1) Create an empty file;
> 2) Use it as the backing file for a loop device;
> 3) Run mkfs.btrfs against the loop device;
> 4) Mount it;
> 5) Populate the fs;
> 6) Umount it;
> 7) Corrupt some
FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).
# time btrfs balance start /mnt/lxc2
Segmentation fault
real322m32.153s
user0m0.000s
sys 16m0.930s
[20182.461873] BTRFS info (device sdd1): relocating block group
6915027369984 flags 17
[20194.050641] BTRFS
On 12 December 2014 at 14:29, Robert White wrote:
>
> You yourself even found the annotation in the wiki that said you should have
> e4defragged the system before conversion.
There's no mention of e4defrag on the Btrfs wiki, it says to btrfs
defrag before balance to avoid ENOSPC, as the last step
Hi,
the 3.18-rc1 has been tagged. There are changes that were pending for too long
(the 'fi usage' and 'dev usage' commands) and I've pulled a lot of changes to
fsck/recovery tools. There are missing pieces of documentation that will be
added in the next round, the point is to release the code fir
On 12/12/2014 01:17 AM, Erkki Seppala wrote:
Robert White writes:
You need to buy better disks. 8-)
Where can one buy these better disks with reasonable prices?-) Disks are
best thought of as consumables.
A good disk is only about 9% more expensive. So like the WD "green"
disks were all c
On 12/11/2014 10:42 PM, Patrik Lundquist wrote:
On 11 December 2014 at 23:00, Robert White wrote:
On 12/11/2014 12:18 AM, Patrik Lundquist wrote:
* Full balance, that ended with "98 enospc errors during balance."
Assuming that quote is an actual quote from the output of the balance...
It
On Fri, Dec 12, 2014 at 03:16:03AM -0800, Robert White wrote:
> On 12/12/2014 01:06 AM, David Taylor wrote:
> >The above quote is discussing two device RAID5, you are discussing
> >three device RAID5.
>
> Heresy! (yes, some humor is required here.)
>
> There is no such thing as a "two device RAID
Add ./autogen.sh script, you have to use it after "git clone/clean" to
generate ./configure from configure.ac.
Modify version.sh to be usable from the configure script.
The patch also renames Makefile to Makefile.in, but does NOT change
anything in the file.
Signed-off-by: Karel Zak
---
.gitig
- use standard PACKAGE_{NAME,VERSION,STRING,URL,...} autoconf macros
rather than homemade BTRFS_BUILD_VERSION
- don't #include version.h, now the file is necessary for library API only
Note that "btrfs version" returns "btrfs-progs " instead of
the original confusing "btrfs ".
Signed-off-by: K
- add rule to generated version.h when any relevant stuff changed
- add rule to clean generated files on "make clean-all"
Signed-off-by: Karel Zak
---
Makefile.in | 14 --
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/Makefile.in b/Makefile.in
index 58200ca..df752d3
Signed-off-by: Karel Zak
---
Makefile.in | 1 +
configure.ac | 9 +
2 files changed, 10 insertions(+)
diff --git a/Makefile.in b/Makefile.in
index bdd7683..5889224 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -5,6 +5,7 @@ CC = @CC@
LN_S = @LN_S@
AR = @AR@
INSTALL = @INSTALL@
+DISABL
- define basic default CFLAGS in configure.ac, because:
* autoconf default is -g -O2, but btrfs uses -g -O1
* it's better to follow autoconf; standard way to modify
CFLAGS is to call: CFLAGS="foo bar" ./configure
- move all flags to one place in Makefile.in
- don't use AM_CFLAGS, th
Signed-off-by: Karel Zak
---
Makefile.in | 12 ++--
configure.ac | 2 ++
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/Makefile.in b/Makefile.in
index dad1685..17eea58 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -1,9 +1,9 @@
# Export all variables to sub-makes by def
It's better to use ./configure than manually edit Makefile.
Signed-off-by: Karel Zak
---
Makefile.in | 4
configure.ac | 10 ++
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/Makefile.in b/Makefile.in
index df752d3..bdd7683 100644
--- a/Makefile.in
+++ b/Makefile.
The original homemade solution is unnecessary, autotools provides better
infrastructure to generate files.
Signed-off-by: Karel Zak
---
Makefile.in | 4
configure.ac | 9 +
version.h.in | 11 +++
version.sh | 30 +++---
4 files changed, 23 insert
- the header file is generated by ./configure, the standard autotools
way is to use -include config.h on compiler command line rather than
include the file directly from code
- remove _GNU_SOURCE from code, the macros is already defined in config.h
by AC_USE_SYSTEM_EXTENSIONS autoconf macro
Signed-off-by: Karel Zak
---
Makefile.in | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/Makefile.in b/Makefile.in
index 17eea58..df590ab 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -26,11 +26,11 @@ libbtrfs_headers = send-stream.h send-utils.h send.h
rbtree.h bt
This is first step to make btrfs-progs build system more conventional
for userspace users and developers. All is implemented by small incremental
patches to keep things review-able.
The Makefile targets and rules are no changed, things like V=1 (verbose), C=1
(sparse) static builds, etc. still wor
On 12/12/2014 01:06 AM, David Taylor wrote:
The above quote is discussing two device RAID5, you are discussing
three device RAID5.
Heresy! (yes, some humor is required here.)
There is no such thing as a "two device RAID5". That's what RAID1 is for.
Saying "The above quote is discussing a two
Signed-off-by: Karel Zak
---
kerncompat.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kerncompat.h b/kerncompat.h
index 8afadc8..5c1cca9 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -123,7 +123,7 @@ typedef unsigned long long u64;
typedef unsigned char u8;
typedef uns
On Tue, Dec 09, 2014 at 05:45:48PM -0800, Omar Sandoval wrote:
> +static void __clear_swapfile_extents(struct inode *inode)
> +{
> + u64 isize = inode->i_size;
> + struct extent_map *em;
> + u64 start, len;
> +
> + start = 0;
> + while (start < isize) {
> + len = isi
On Tue, Dec 09, 2014 at 05:45:47PM -0800, Omar Sandoval wrote:
> Extents mapping a swap file should remain pinned in memory in order to
> avoid doing allocations to look up an extent when we're already low on
> memory. Rather than overloading EXTENT_FLAG_PINNED, add a new flag
> specifically for th
On Tue, Dec 09, 2014 at 05:45:41PM -0800, Omar Sandoval wrote:
> After some discussion on the mailing list, I decided that for simplicity and
> reliability, it's best to simply disallow COW files and files with shared
> extents (like files with extents shared with a snapshot). From a user's
> persp
On Thu, Dec 11, 2014 at 12:37:56PM +, Holger Hoffstätte wrote:
> I was wondering if you could please send out announcements for btrfs-progs
> when you tag a release or -rc? There doesn't seem to be a good mechanism
> to track releases and IMHO the more people are notified, the more
> testing we
I use SMART (smartmontools etc) and its tests to keep track of and warn
me of such issues. It's way more likely to catch incipient media
failures long before scrub would. It's also more likely to correct
situations before they become visible to userspace. Its also a way
better full-platter scan th
Robert White writes:
> You need to buy better disks. 8-)
Where can one buy these better disks with reasonable prices?-) Disks are
best thought of as consumables.
> I use SMART (smartmontools etc) and its tests to keep track of and
> warn me of such issues. It's way more likely to catch incipien
On Thu, 11 Dec 2014, Robert White wrote:
On 12/11/2014 07:56 PM, Zygo Blaxell wrote:
RAID5 with even parity and two devices should be exactly the same as
RAID1 (i.e. disk1 ^ disk2 == 0, therefore disk1 == disk2, the striping
is irrelevant because there is no difference in disk contents so the
Original Message
Subject: Re: [PATCH v4 00/13] btrfs-progs:fsck: Add inode nlink mismatch and
From: Filipe David Manana
To: Qu Wenruo
Date: 2014年12月12日 16:34
On Fri, Dec 12, 2014 at 12:32 AM, Qu Wenruo wrote:
Original Message
Subject: Re: [PATCH v4 00/13]
Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And q
When we exceed quota limit in writing, we will free
some reserved extent when we need to drop but not free
account in qgroup. It means, each time we exceed quota
in writing, there will be some remain space in qg->reserved
we can not use any more. If things go on like this, the
all space will be ate
On Fri, Dec 12, 2014 at 12:32 AM, Qu Wenruo wrote:
>
> Original Message
> Subject: Re: [PATCH v4 00/13] btrfs-progs:fsck: Add inode nlink mismatch and
> From: Filipe David Manana
> To: Qu Wenruo
> Date: 2014年12月11日 19:07
>>
>> On Thu, Dec 11, 2014 at 12:50 AM, Qu Wenruo
>> wro
76 matches
Mail list logo