On Wed, Nov 26, 2014 at 10:48:51AM +1100, Dave Chinner wrote:
No abuse necessary at all. Just a different inode_dirtied_after()
check is requires if the inode is on the time dirty list in
move_expired_inodes().
I'm still not sure what you have in mind here. When would this be
checked? It
Signed-off-by: Theodore Ts'o ty...@mit.edu
---
fs/fs-writeback.c | 1 +
fs/inode.c| 5 +
2 files changed, 6 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 529480a..3d87174 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -27,6 +27,7 @@
#include
Guarantee that the on-disk timestamps will be no more than 24 hours
stale.
Signed-off-by: Theodore Ts'o ty...@mit.edu
---
fs/fs-writeback.c | 1 +
fs/inode.c | 18 ++
include/linux/fs.h | 1 +
3 files changed, 20 insertions(+)
diff --git a/fs/fs-writeback.c
Add a new function find_active_inode_nowait() which will never block.
If there is an inode being freed or is still being initialized, this
function will return NULL instead of blocking waiting for an inode to
be freed or to finish initializing. Hence, a negative return from
this function does not
The only reason btrfs cloned code from the VFS layer was so it could
add a check to see if a subvolume is read-ony. Instead of doing that,
let's add a new inode operation which allows a file system to return
an error if the inode is read-only, and use that in update_time().
There may be other
This is an updated version of what had originally been an
ext4-specific patch which significantly improves performance by lazily
writing timestamp updates (and in particular, mtime updates) to disk.
The in-memory timestamps are always correct, but they are only written
to disk when required for
In preparation for adding support for the lazytime mount option, we
need to be able to separate out the update_time() and write_time()
inode operations. Currently, only btrfs and xfs uses update_time().
We needed to preserve update_time() because btrfs wants to have a
special
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.
Also add some temporary code so that we can set the
Add a new mount option which enables a new lazytime mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace
When running Debian kernel version 3.16.0-4-amd64 and btrfs-tools version
3.17-1.1 I ran a btrfs replace operation to replace a 3TB disk that was giving
read errors with a new 4TB disk.
After the replace the btrfs device stats command reported that the 4TB disk
had 16 read errors. It appears
On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote:
2) CityHash : for 256-bit hashes on all systems
https://code.google.com/p/cityhash/
Btw this is now superseded by Farmhash:
https://code.google.com/p/farmhash/
-h
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs
We will introduce new operation type later, if we still use integer
variant as bool variant to record the operation type, we would add new
variant and increase the size of raid bio structure. It is not good,
by this patch, we define different number for different operation,
and we can just use a
This patchset implement the device scrub/replace function for RAID56, the
most implementation of the common data is similar to the other RAID type.
The differentia or difficulty is the parity process. The basic idea is reading
and check the data which has checksum out of the raid56 stripe lock, if
This function reused the code of parity scrub, and we just write
the right parity or corrected parity into the target device before
the parity scrub end.
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1 - v3:
- None.
---
fs/btrfs/raid56.c | 23 +++
From: Zhao Lei zhao...@cn.fujitsu.com
bbio_ret in this condition is always !NULL because previous code
already have a check-and-skip:
4908 if (!bbio_ret)
4909 goto out;
Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
Reviewed-by: David Sterba
From: Zhao Lei zhao...@cn.fujitsu.com
stripe_index's value was set again in latter line:
stripe_index = 0;
Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz
---
Changelog v1 - v3:
- None.
---
fs/btrfs/volumes.c
Because we will reuse bbio and raid_map during the scrub later, it is
better that we don't change any variant of bbio and don't free it at
the end of IO request. So we introduced similar variants into the raid
bio, and don't access those bbio's variants any more.
Signed-off-by: Miao Xie
The implementation is:
- Read and check all the data with checksum in the same stripe.
All the data which has checksum is COW data, and we are sure
that it is not changed though we don't lock the stripe. because
the space of that data just can be reclaimed after the current
transction is
From: Zhao Lei zhao...@cn.fujitsu.com
Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1 - v3:
- None.
---
fs/btrfs/dev-replace.c | 5 -
1 file changed, 5 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
The increase/decrease of bio counter is on the I/O path, so we should
use io_schedule() instead of schedule(), or the deadlock might be
triggered by the pending I/O in the plug list. io_schedule() can help
us because it will flush all the pending I/O before the task is going
to sleep.
The implementation is simple:
- In order to avoid changing the code logic of btrfs_map_bio and
RAID56, we add the stripes of the replace target devices at the
end of the stripe array in btrfs bio, and we sort those target
device stripes in the array. And we keep the number of the target
This patch implement the RAID5/6 common data repair function, the
implementation is similar to the scrub on the other RAID such as
RAID1, the differentia is that we don't read the data from the
mirror, we use the data repair function of RAID5/6.
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
On 2014/11/25 13:30, Liu Bo wrote:
This is actually inspired by ZFS, who offers checksum functions ranging
from the simple-and-fast fletcher2 to the slower-but-secure sha256.
Back to btrfs, crc32c is the only choice.
And also for the slowness of sha256, Intel has a set of instructions for
it,
On 2014/11/25 18:47, David Sterba wrote:
We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need to compute the checksums again. At the
On 2014-11-26 08:38, Brendan Hide wrote:
On 2014/11/25 18:47, David Sterba wrote:
We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need
On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote:
The increase/decrease of bio counter is on the I/O path, so we should
use io_schedule() instead of schedule(), or the deadlock might be
triggered by the pending I/O in the plug list. io_schedule() can help
us because it will
If the transaction handle doesn't have used blocks but has created new block
groups make sure we turn the fs into readonly mode too. This is because the
new block groups didn't get all their metadata persisted into the chunk and
device trees, and therefore if a subsequent transaction starts,
Trimming is completely transactionless, and the way it operates consists
of hiding free space entries from a block group, perform the trim/discard
and then make the free space entries visible again.
Therefore while free space entry is being trimmed, we can have free space
cache writing running in
This patchset fixes several issues exposed by block group removal/allocation and
trim/discard running in parallel.
The first 3 patches and the last one (6) are independent and don't depend on
each
other. Patches 3 and 6 are not really related to trim/discard at all.
I bundled all these patches
There's a race between adding a block group to the list of the unused
block groups and removing an unused block group (cleaner kthread) that
leads to freeing extents that are in use or a crash during transaction
commmit. Basically the cleaner kthread, when executing
btrfs_delete_unused_bgs(),
If we grab a block group, for example in btrfs_trim_fs(), we will be holding
a reference on it but the block group can be removed after we got it (via
btrfs_remove_block_group), which means it will no longer be part of the
rbtree.
However, btrfs_remove_block_group() was only calling rb_erase()
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space range represented by the free space
entries.
If we remove a block group (because it became empty), we might have left
a caching_ctl structure in fs_info-caching_block_groups that points to
the block group and is accessed at transaction commit time. This results
in accessing an invalid or incorrect block group. This issue became visible
after
Stress btrfs' block group allocation and deallocation while running
fstrim in parallel. Part of the goal is also to get data block groups
deallocated so that new metadata block groups, using the same physical
device space ranges, get allocated while fstrim is running. This caused
several issues
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If we remove a block group (because it became empty), we might have left
a caching_ctl structure in fs_info-caching_block_groups that points to
the block group and is accessed at transaction commit time. This results
in accessing an invalid or
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If we grab a block group, for example in btrfs_trim_fs(), we will be holding
a reference on it but the block group can be removed after we got it (via
btrfs_remove_block_group), which means it will no longer be part of the
rbtree.
However,
On 11/26/2014 10:28 AM, Filipe Manana wrote:
There's a race between adding a block group to the list of the unused
block groups and removing an unused block group (cleaner kthread) that
leads to freeing extents that are in use or a crash during transaction
commmit. Basically the cleaner kthread,
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If the transaction handle doesn't have used blocks but has created new block
groups make sure we turn the fs into readonly mode too. This is because the
new block groups didn't get all their metadata persisted into the chunk and
device trees, and
On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If we remove a block group (because it became empty), we might have left
a caching_ctl structure in fs_info-caching_block_groups that points to
the block group and is accessed at
On 11/26/2014 10:28 AM, Filipe Manana wrote:
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space
On Wed, Nov 26, 2014 at 4:07 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If the transaction handle doesn't have used blocks but has created new
block
groups make sure we turn the fs into readonly mode too. This is because
the
new block groups didn't get
On 11/26/2014 11:15 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 4:07 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If the transaction handle doesn't have used blocks but has created new
block
groups make sure we turn the fs into readonly mode
On Wed, Nov 26, 2014 at 4:15 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its
On 11/26/2014 11:09 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If we remove a block group (because it became empty), we might have left
a caching_ctl structure in fs_info-caching_block_groups
On Wed, Nov 26, 2014 at 4:19 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 11:15 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 4:07 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If the transaction handle doesn't have used blocks but has
On 11/26/2014 11:25 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 4:15 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting
On 11/26/2014 11:29 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 4:19 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 11:15 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 4:07 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If the
On Wed, Nov 26, 2014 at 4:24 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 11:09 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If we remove a block group (because it became empty), we might
On 11/26/2014 11:34 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 4:24 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 11:09 AM, Filipe David Manana wrote:
On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote:
On 11/26/2014 10:28 AM, Filipe Manana wrote:
If we
This was written when we didn't do a caching control for the fast free space
cache loading. However we started doing that a long time ago, and there is
still a small window of time that we could be caching the block group the fast
way, so if there is a caching_ctl at all on the block group just
On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
However I still doesn't understood why you want btrfs-w/multiple disk over
LVM ?
I want to split a few disks into partitions, but I want to create,
move, and resize the partitions from time to time. Only LVM can do
that without taking the
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space range represented by the free space
entries.
On Wed, Nov 26, 2014 at 4:50 AM, Holger Hoffstätte
holger.hoffstae...@googlemail.com wrote:
On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote:
2) CityHash : for 256-bit hashes on all systems
https://code.google.com/p/cityhash/
Btw this is now superseded by Farmhash:
Am Fri, 14 Nov 2014 17:00:26 -0500
schrieb Josef Bacik jba...@fb.com:
On 11/14/2014 04:51 PM, Hugo Mills wrote:
Chris, Josef, anyone else who's interested,
On IRC, I've been seeing reports of two persistent unsolved
problems. Neither is showing up very often, but both have turned
As mentioned last round please move the addition of the is_readonly
operation to the first thing in the series, so that the ordering makes
more sense.
Second I think this patch is incorrect for XFS - XFS uses -update_time
to set the time stampst in the dinode. These two need to be coherent
as we
The subject line seems incorrect, this seems to implement some form
of dirty inode writeback clustering.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello,
I used btrfs-convert to switch my FS from Ext4 to Btrfs. As it was a rather
large 10 TB filesystem, to save on the conversion time, I used the -d,
disable data checksum option of btrfs-convert.
Turns out now I can't cp --reflink any files that were already on the FS
prior to conversion.
On 11/25/2014 07:22 PM, Duncan wrote:
From my perspective, however, btrfs is simply incompatible with lvm
snapshots, because the basic assumptions are incompatible. Btrfs assumes
UUIDs will be exactly what they say on the label, /unique/, while lvm's
snapshot feature directly breaks that
On Wed, Nov 26, 2014 at 05:20:17AM -0500, Theodore Ts'o wrote:
On Wed, Nov 26, 2014 at 10:48:51AM +1100, Dave Chinner wrote:
No abuse necessary at all. Just a different inode_dirtied_after()
check is requires if the inode is on the time dirty list in
move_expired_inodes().
I'm still not
On Wed, Nov 26, 2014 at 05:23:56AM -0500, Theodore Ts'o wrote:
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table
On Nov 26, 2014, at 3:48 PM, Dave Chinner da...@fromorbit.com wrote:
On Wed, Nov 26, 2014 at 05:23:56AM -0500, Theodore Ts'o wrote:
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode
On 11/26/2014 11:55 AM, Roman Mamedov wrote:
Is there really a good reason to stop these files without checksums from being
cloneable? It's not like they have the noCoW attribute, so I'd assume any new
write to these files would cause a CoW and proper checksums for all new blocks
anyways.
The
On Wed, 26 Nov 2014 15:18:26 -0800
Robert White rwh...@pobox.com wrote:
So you _could_ reflink the file but you'd have to do it to another file
with no data checksums -- which basically means a NOCOW file, or
mounting with nodatasum while you do the reflink, but now you have more
problem
On Wed, Nov 26, 2014 at 04:10:44PM -0700, Andreas Dilger wrote:
On Nov 26, 2014, at 3:48 PM, Dave Chinner da...@fromorbit.com wrote:
On Wed, Nov 26, 2014 at 05:23:56AM -0500, Theodore Ts'o wrote:
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write
On 11/26/2014 03:33 PM, Roman Mamedov wrote:
On Wed, 26 Nov 2014 15:18:26 -0800
Robert White rwh...@pobox.com wrote:
So you _could_ reflink the file but you'd have to do it to another file
with no data checksums -- which basically means a NOCOW file, or
mounting with nodatasum while you do the
On Wed, 26 Nov 2014 16:00:23 -0800
Robert White rwh...@pobox.com wrote:
Uh... you may _still_ have no checksums on any of those data extents.
They are not going to come back until you write them to a normal file
with a normal copy. So you may be lacking most of the data validation
features
On 11/26/2014 03:33 PM, Roman Mamedov wrote:
Finished with no rewriting necessary. After that I recursively-removed the +C
attribute from all newly reflinked files, and cp --reflink as well as
snapshotting of those works fine.
I did some double checking and I think you'll find that if you
On Wed, 26 Nov 2014 16:20:44 -0800
Robert White rwh...@pobox.com wrote:
I did some double checking and I think you'll find that if you lsattr
those files they still have the C (NoCOW) attribute, which also means
they are still unsummed.
Indeed, I looked at the top level only, which had just
On 11/26/2014 04:20 PM, Roman Mamedov wrote:
On Wed, 26 Nov 2014 16:00:23 -0800
Robert White rwh...@pobox.com wrote:
Uh... you may _still_ have no checksums on any of those data extents.
They are not going to come back until you write them to a normal file
with a normal copy. So you may be
On 11/26/2014 04:28 PM, Roman Mamedov wrote:
On Wed, 26 Nov 2014 16:20:44 -0800
Robert White rwh...@pobox.com wrote:
(Trying to clear the NOCOW attribute on a file in BTRFS is _silently_
ignored as invalid. That recursive removal only changed the directories.)
And the chattr command even
On 11/26/2014 04:31 PM, Robert White wrote:
On 11/26/2014 04:20 PM, Roman Mamedov wrote:
On Wed, 26 Nov 2014 16:00:23 -0800
Robert White rwh...@pobox.com wrote:
You might want to go experiment. Make another new subvol (or at least a
directory in a directory/root/subvol that never had the +C
Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space range represented by the free space
entries.
If you exec:
# btrfs sub show dir == non-subvolume dir
The cmd print error messages as expected, but returns 0.
By convetion, it should return non-zero and we should explicitly
set it before it goto out.
With other pieces adopted:
1) removed a unnecessary return value set -EINVAL
2)
For now,
# btrfs fi show /mnt/btrfs
gives info correctly, while
# btrfs fi show /mnt/btrfs/
gives nothing.
This implies that the @realpath() function should be applied to
unify the behavior.
Made a more clear comment right above the call as well.
Signed-off-by: Gui Hecheng
On Thu, 27 Nov 2014 09:39:56 +0800, Miao Xie wrote:
On Wed, 26 Nov 2014 10:02:23 -0500, Chris Mason wrote:
On Wed, Nov 26, 2014 at 8:04 AM, Miao Xie mi...@cn.fujitsu.com wrote:
The increase/decrease of bio counter is on the I/O path, so we should
use io_schedule() instead of schedule(), or the
On Tue, 2014-10-14 at 11:32 +0200, David Sterba wrote:
On Tue, Oct 14, 2014 at 10:06:16AM +0200, Marc Dietrich wrote:
This hasn't landed in an btrfs-progs branch I found. Any update?
I had it tagged for review and found something that needs fixing. The
PAGE_CACHE_SIZE is hardcoded to 4k,
On Thu, Nov 27, 2014 at 12:55:27AM +0500, Roman Mamedov wrote:
Hello,
I used btrfs-convert to switch my FS from Ext4 to Btrfs. As it was a rather
large 10 TB filesystem, to save on the conversion time, I used the -d,
disable data checksum option of btrfs-convert.
Turns out now I can't cp
On Tue, Nov 25, 2014 at 05:39:05PM +0100, David Sterba wrote:
On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
This brings a strong-but-slow checksum algorithm, sha256.
Actually btrfs used sha256 at the early time, but then moved to crc32c for
performance purposes.
As crc32c
On Wed, Nov 26, 2014 at 06:19:05PM +0100, Goffredo Baroncelli wrote:
On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
However I still doesn't understood why you want btrfs-w/multiple disk
over LVM ?
I want to split a few disks into partitions, but I want to create,
move, and resize the
resolve_one_root() returns the objectid of a tree rather than the logical
address of the root node. Hence using root_bytenr is misleading. Fix this.
Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
qgroup-verify.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff
80 matches
Mail list logo